# TimeSeriesScientist
**Repository Path**: ring24/TimeSeriesScientist
## Basic Information
- **Project Name**: TimeSeriesScientist
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-11-21
- **Last Updated**: 2025-11-21
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
TimeSeriesScientist: A General-Purpose AI Agent for Time Series Analysis
Haokun Zhao1,2 ★
Xiang Zhang3 ★
Jiaqi Wei4 ★
Yiwei Xu5
Yuting He6
Siqi Sun7
Chenyu You1
1 Stony Brook University
2 University of California, San Diego
3 University of British Columbia
4 Zhejiang University
5 University of California, Los Angeles
6 Case Western Reserve University
7 Fudan University
★ Equal Contribution
## Abstract
> In this paper, we introduce **TimeSeriesScientist** (**TSci**), the first LLM-driven agentic framework for general time series forecasting.
>
> The framework comprises four specialized agents:
>
> - **Curator** – performs LLM-guided diagnostics augmented by external tools that reason over data statistics to choose targeted preprocessing;
> - **Planner** – narrows the hypothesis space of model choice by leveraging multi-modal diagnostics and self-planning over the input;
> - **Forecaster** – performs model fitting and validation and based on the results to adaptively select the best model configuration as well as ensemble strategy to make final predictions;
> - **Reporter** – synthesizes the whole process into a comprehensive, transparent report.
>
> TSci transforms the forecasting workflow into a white-box system that is both interpretable and extensible across tasks.

## 📢 News
- **2025.10.02** Our paper is now available on [arXiv](https://arxiv.org/abs/2510.01538)! 📄
- **2025.10.01** Code Released. TimeSeriesScientist now available! 🎉🎉
## 🚀 Quick Start
### 1. Prerequisites
- Python 3.8 or higher
- OpenAI API key
- Sufficient disk space for results and models
### 2. Create Virtual Environment
```bash
# Create virtual environment
conda create -n TSci
conda activate TSci
```
### 3. Install Dependencies
```bash
pip install -r requirements.txt
```
### 4. Set Environment Variables
```bash
# Set your OpenAI API key
export OPENAI_API_KEY="your-openai-api-key-here"
# Optional: Set other environment variables
export GOOGLE_API_KEY="your-google-api-key-here"
```
## 🎯 Code Execution
### 1. Basic Usage
```bash
# Run with default configuration
python main.py
```
### 2. Custom Configuration
The customized dataset must be provided in .csv format with at least two columns: `"date"` and `"OT"`. The `"date"` column contains the timestamps, while the `"OT"` column records the target values at each corresponding timestamp. An Example input file is provided at `dataset/HTTh1.csv`.
```bash
# Run with custom parameters
python main.py --data_path /your-customized-dataset \
--num_slices 10 \
--horizon 96 \
--llm_model gpt-4o
```
### 3. Debug Mode
```bash
# Run with debug and verbose output
python main.py --debug --verbose --data_path your_data.csv
```
## Agent Workflow
The system uses a sophisticated multi-agent workflow orchestrated by LangGraph. Here's how the agents work together:
### 1. PreprocessAgent
**Purpose**: Data cleaning and preparation
**Responsibilities**:
- Load and validate time series data
- Handle missing values using LLM-recommended strategies
- Detect and handle outliers using rolling window IQR
- Generate data quality reports
- Create preprocessing visualizations
**LLM Integration**: Uses LLM to analyze data quality and recommend preprocessing strategies
**Output**: Cleaned data, quality metrics, and preprocessing visualizations
### 2. AnalysisAgent
**Purpose**: Comprehensive data analysis and insights
**Responsibilities**:
- Analyze data characteristics (trend, seasonality, stationarity)
- Generate statistical summaries
- Identify patterns and anomalies
- Provide forecasting readiness assessment
- Create analysis visualizations
**LLM Integration**: Uses LLM to interpret data patterns and provide insights
**Output**: Analysis report with key findings and recommendations
### 3. ValidationAgent
**Purpose**: Model selection and hyperparameter optimization
**Responsibilities**:
- Select best models based on data characteristics
- Optimize hyperparameters for each selected model
- Evaluate models on validation data
- Rank models by performance
- Generate validation reports
**LLM Integration**: Uses LLM to select appropriate models and hyperparameters
**Output**: Selected models with optimized hyperparameters and validation scores
### 4. ForecastAgent
**Purpose**: Generate predictions and ensemble forecasts
**Responsibilities**:
- Train models with optimized hyperparameters
- Generate individual model predictions
- Create ensemble predictions using multiple methods
- Calculate confidence intervals
- Generate forecast visualizations
- Compute performance metrics
**LLM Integration**: Uses LLM to decide ensemble integration strategies
**Output**: Individual and ensemble predictions, metrics, and visualizations
### 5. ReportAgent
**Purpose**: Generate comprehensive final reports
**Responsibilities**:
- Synthesize all experiment results
- Create executive summaries
- Generate actionable recommendations
- Produce final visualizations
- Save results in multiple formats
**LLM Integration**: Uses LLM to generate human-readable reports
**Output**: Comprehensive experiment report with insights and recommendations
## Workflow Execution
```
Data Input → PreprocessAgent → AnalysisAgent → ValidationAgent → ForecastAgent → ReportAgent → Final Output
↓ ↓ ↓ ↓ ↓ ↓
Raw Data Cleaned Data Analysis Report Selected Models Predictions Final Report
```
### Key Features of the Workflow:
1. **State Management**: Each agent receives and updates a shared state object
2. **Error Recovery**: Built-in retry mechanisms and fallback strategies
3. **Rate Limiting**: Automatic delays and retries for API rate limits
4. **Progress Monitoring**: Real-time progress tracking and timing information
5. **Result Aggregation**: Automatic aggregation of results across multiple data slices
## Project Structure
```
time_series_agent/
│
├── agents/ # Agent modules
│ ├── preprocess_agent.py # Data preprocessing agent
│ ├── analysis_agent.py # Data analysis agent
│ ├── validation_agent.py # Model validation agent
│ ├── forecast_agent.py # Forecasting agent
│ ├── report_agent.py # Report generation agent
│ └── memory.py # Memory management module
│
├── graph/ # Workflow orchestration
│ └── agent_graph.py # LangGraph workflow definition
│
├── utils/ # Utility modules
│ ├── data_utils.py # Data utilities
│ ├── visualization_utils.py # Visualization utilities
│ ├── file_utils.py # File management utilities
│ └── model_library.py # Model implementations
│
├── config/ # Configuration
│ └── default_config.py # Default configuration
│
├── main.py # Main entry point
├── requirements.txt # Dependencies
└── README.md # This file
```
## Output Structure
```
results/
├── reports/ # Generated reports
│ ├── complete_time_series_report_YYYYMMDD_HHMMSS.json
│ └── aggregated_forecast_results_YYYYMMDD_HHMMSS.json
├── preprocess/ # Preprocessing outputs
│ ├── visualizations/
│ └── analysis_reports/
├── forecast/ # Forecasting outputs
│ ├── visualizations/
│ └── prediction_results/
└── logs/ # Execution logs
```
## Supported Models
### Statistical Models
- ARMA, ARIMA
- Exponential Smoothing
- TBATS, Theta
- Prophet
### Machine Learning Models
- Linear Regression
- Random Forest
- SVR (Support Vector Regression)
- Gradient Boosting
- XGBoost, LightGBM
### Deep Learning Models
- LSTM
- Neural Networks
- Transformer
### Ensemble Methods
- Simple Average
- Weighted Average
- Median
- Trimmed Mean
## Configuration Options
### Basic Configuration
```python
config = {
"data_path": "../dataset/ETT-small/ETTh1.csv",
"num_slices": 10,
"input_length": 512,
"horizon": 96,
"k_models": 3,
"llm_model": "gpt-4o",
"debug": True
}
```
### Advanced Configuration
```python
config = {
# Data processing
"preprocess": {
"missing_value_strategy": "interpolate",
"outlier_strategy": "clip",
"normalization": False
},
# Model selection
"models": {
"available_models": ["ARMA", "LSTM", "RandomForest"],
"ensemble_method": "weighted_average",
"hyperparameter_optimization": True
},
# Visualization
"visualization": {
"figure_size": (12, 8),
"save_format": "png",
"show_plots": False
}
}
```
## Troubleshooting
### Common Issues
1. **API Rate Limiting**
- The system automatically handles rate limits with retries
- Increase delays between slices if needed
- Check your OpenAI API usage limits
2. **Memory Issues**
- Reduce `num_slices` parameter
- Use smaller `input_length`
- Monitor system memory usage
3. **Model Import Errors**
- Install missing dependencies: `pip install prophet tbats`
- Some models have optional dependencies
- Check the model library for requirements
### Debug Mode
Enable debug mode for detailed logging:
```bash
python main.py --debug --verbose
```
## Performance Optimization
### For Large Datasets
- Reduce `num_slices` to process fewer slices
- Use smaller `input_length` for faster processing
- Enable parallel processing in configuration
### For API Rate Limits
- Increase delays between agent calls
- Use fallback strategies when API fails
- Monitor API usage and adjust accordingly
## Contact
For questions, feedback, or collaboration opportunities, please contact:
**Email**: [haz118@ucsd.edu](mailto:haz118@ucsd.edu), [chenyu.you@stonybrook.edu](mailto:chenyu.you@stonybrook.edu)
## License
MIT License
## Citation
```
@misc{zhao2025timeseriesscientistgeneralpurposeaiagent,
title={TimeSeriesScientist: A General-Purpose AI Agent for Time Series Analysis},
author={Haokun Zhao and Xiang Zhang and Jiaqi Wei and Yiwei Xu and Yuting He and Siqi Sun and Chenyu You},
year={2025},
eprint={2510.01538},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2510.01538},
}
```