# decision_tree

**Repository Path**: az13js/decision_tree

## Basic Information

- **Project Name**: decision_tree
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: WTFPL
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-04-05
- **Last Updated**: 2026-04-05

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# 决策树学习 Demo

一个使用纯 Python 实现的决策树学习项目，用于学习和理解决策树算法的核心原理。

## 项目特点

- **零第三方依赖**：核心算法仅使用 Python 标准库（`math`、`collections`）
- **ID3 算法**：基于信息增益（Information Gain）和香农熵（Shannon Entropy）进行特征选择
- **支持分类特征**：适用于离散型/类别型特征的数据集
- **可配置最大深度**：通过 `max_depth` 参数控制树的复杂度，防止过拟合
- **完整的测试覆盖**：27 个测试用例覆盖核心函数、边界条件和异常处理

## 快速开始

### 环境配置

```bash
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt
```

### 运行示例

```bash
python main.py
```

输出示例：

```
Decision Tree Structure:
----------------------------------------
outlook == overcast?
  Yes:   -> yes
  No:   humidity == normal?
    Yes:     wind == strong?
      ...

Predictions:
  ['sunny', 'hot', 'high', 'weak'] -> no
  ['overcast', 'cool', 'normal', 'strong'] -> yes

Training accuracy: 92.9%
```

### 在自己的数据上使用

```python
from src.decision_tree import DecisionTree

# 训练数据（类别型特征）
data = [
    ["sunny", "hot", "high", "weak"],
    ["sunny", "hot", "high", "strong"],
    ["overcast", "hot", "high", "weak"],
    ["rainy", "mild", "high", "weak"],
    # ... 更多样本
]
labels = ["no", "no", "yes", "yes"]  # 对应标签
feature_names = ["outlook", "temperature", "humidity", "wind"]

# 训练模型
tree = DecisionTree(max_depth=5)
tree.fit(data, labels, feature_names)

# 预测
prediction = tree.predict(["sunny", "cool", "normal", "weak"])
print(prediction)  # -> yes 或 no

# 批量预测
predictions = tree.predict_batch(data)

# 打印树结构
tree.print_tree()
```

## 项目结构

```
.
├── README.md
├── requirements.txt
├── AGENTS.md
├── main.py                     # 演示脚本
├── src/
│   ├── __init__.py
│   └── decision_tree.py        # 决策树核心实现
└── tests/
    └── test_decision_tree.py   # 测试用例
```

## 算法说明

本项目实现的是 **ID3（Iterative Dichotomiser 3）** 决策树算法：

1. **香农熵（Entropy）**：衡量数据集的纯度/不确定性
2. **信息增益（Information Gain）**：选择使熵下降最多的特征进行分裂
3. **递归构建**：从根节点开始，递归选择最优特征，直到所有样本属于同一类或达到最大深度

## 运行测试

```bash
pytest              # 运行所有测试
pytest -v           # 详细输出
pytest tests/test_decision_tree.py::TestEntropy  # 运行单个测试类
```

## 代码检查

```bash
ruff check .        # 检查代码
ruff check --fix .  # 自动修复
ruff format .       # 格式化代码
```