# tslm

**Repository Path**: ring24/tslm

## Basic Information

- **Project Name**: tslm
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-01-28
- **Last Updated**: 2026-01-28

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# tsrlm: Time-Series Report Language Model (SFT + optional DPO)

This repo is a **minimal, engineering-first** skeleton for:
- reading your generated time-series→text JSONL,
- encoding time series with a PatchTST-style patch encoder (default),
- bridging encoder outputs into a Causal LLM via **prefix embeddings** (works with most HF causal LMs),
- running **SFT** training; and
- leaving clear interfaces for ablations (RevIN on/off, encoder swap, bridge swap, LLM swap, sliding windows, etc.).

> Note: the cross-attention bridge and Chronos-2 encoder are included as *interfaces / stubs*.
> The prefix bridge + PatchTST encoder path is complete and intended as your first reproducible baseline.

## 1) Install

```bash
pip install -r requirements.txt
```

If you want LoRA:
```bash
pip install peft
```

## 2) Data format (expected)

We train from a JSONL file where each line is one sample.

Minimal fields:
```json
{
  "id": "UCR/XYZ/train/000123",
  "values": [0.1, 0.2, ...],          // or [[...],[...]] for multivariate
  "text": "要生成的中文描述…",
  "stats": {"mean": 0.0, "std": 1.0, "min": -1.2, "max": 2.3, "length": 256},
  "claims": [ {"type": "global_trend_label", "data": {"label":"down"}}, ... ]
}
```

See: `src/tsrlm/data/format.md`.

## 3) Quick start (SFT)

```bash
python -m scripts.train_sft   --train_jsonl /path/to/train.jsonl   --eval_jsonl  /path/to/val.jsonl   --llm_name_or_path Qwen/Qwen3-0.6B-Base   --output_dir runs/sft_qwen3_0p6b_patchtst_prefix
```

## 4) Project structure

- `src/tsrlm/data/`: dataset & collator
- `src/tsrlm/models/`: RevIN, PatchTST encoder, prefix bridge, model wrapper
- `scripts/`: training & evaluation entrypoints

## 5) What you should edit first

- `scripts/prepare_jsonl_adapter.py`: adapt from **your current generator JSON** → the expected JSONL.
- `configs/*.yaml`: your ablation configs.