# 字幕处理工具

**Repository Path**: oa-officeweb-crawler/subtitle-processing-tool

## Basic Information

- **Project Name**: 字幕处理工具
- **Description**: - 优点 ：全面反映项目功能，包括处理、转换等多种操作
- 适用场景 ：功能丰富的项目，强调综合性
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-04-13
- **Last Updated**: 2026-05-06

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# 字幕处理工具

一个功能全面的视频处理系统，具有先进的转录、字幕生成和内容分析能力，支持中文界面和音频导出功能。

## 功能特性

### 核心功能
- **增强型转录**：高精度语音转文字，支持领域特定词汇
- **多格式支持**：SRT、VTT、ASS、JSON等多种字幕格式
- **内容概述**：AI驱动的内容摘要和主题提取
- **批量处理**：高效处理多个视频文件
- **性能优化**：内存管理和资源监控
- **音频导出**：支持单个和批量音频导出，支持WAV和MP3格式
- **音频转文字**：将音频文件转换为文字和字幕，支持MP3、WAV、FLAC、AAC、OGG、M4A等多种音频格式
- **视频压缩**：支持单个和批量视频压缩，可调整分辨率、比特率、帧率等参数
- **完整文本导出**：生成包含完整识别文本的_content.txt文件
- **视频录制与翻录**：支持脚本加载机制、屏幕录制和硬件级捕获
- **SZ转MP4**：支持将SZ文件转换为MP4格式，确保音视频同步，文件格式符合行业标准
- **SZ提取音频**：支持从SZ文件中提取音频，支持MP3、WAV、AAC、FLAC等多种音频格式

### 高级特性
- **噪声 reduction**：音频预处理，提高转录准确性
- **模型集成**：多个Whisper模型，提高准确性
- **情感分析**：识别转录内容的情感语调
- **主题提取**：自动识别关键主题和话题
- **质量验证**：字幕质量检查和验证
- **实时进度**：详细的进度跟踪和状态更新
- **中文界面**：完全中文的操作面板，包括所有菜单、按钮、提示信息及帮助文档
- **主题支持**：浅色和深色主题切换，提升用户体验

## 安装

### 先决条件
- Python 3.8 或更高版本
- FFmpeg（用于视频/音频处理）
- CUDA 兼容的 GPU（可选，用于更快的处理）

### 依赖项

安装所需的包：

```bash
pip install -r requirements.txt
```

### 可选依赖项

对于增强功能，安装额外的包：

```bash
# 用于高级NLP功能
pip install jieba textblob scikit-learn

# 用于GPU加速
pip install torch torchvision

# 用于增强音频处理
pip install pydub

# 用于GUI界面
pip install PyQt5
```

## 核心技术栈

### 编程语言
- **Python 3.8+**：项目的主要开发语言，用于实现所有核心功能和GUI界面

### 主要框架和库

#### 1. 语音处理
- **Whisper**：OpenAI的语音识别模型，用于高精度语音转文字
  - **应用场景**：核心转录功能，将视频/音频文件转换为文字
  - **版本要求**：最新稳定版
  - **实现方式**：通过`src/enhanced_transcription.py`中的`EnhancedTranscriber`类集成

#### 2. 视频/音频处理
- **FFmpeg**：强大的多媒体处理工具
  - **应用场景**：视频转音频、音频分段、格式转换、视频压缩
  - **版本要求**：4.0+ 
  - **实现方式**：通过命令行调用，用于音频提取、格式转换和视频压缩
- **pydub**：Python音频处理库
  - **应用场景**：音频分段、噪声处理
  - **版本要求**：0.25.1+
  - **实现方式**：用于音频预处理和增强

#### 3. GUI界面
- **PyQt5**：Python的Qt绑定，用于创建跨平台GUI应用
  - **应用场景**：用户界面、视频预览、进度显示
  - **版本要求**：5.15.0+
  - **实现方式**：通过`gui/enhanced_main_window.py`中的`MainWindow`类实现

#### 4. 文本处理和NLP
- **jieba**：中文分词库
  - **应用场景**：中文文本分析、关键词提取
  - **版本要求**：0.42.1+
  - **实现方式**：用于中文内容的分词和分析
- **textblob**：文本处理库
  - **应用场景**：情感分析、文本分类
  - **版本要求**：0.17.1+
  - **实现方式**：用于转录内容的情感分析
- **scikit-learn**：机器学习库
  - **应用场景**：主题提取、文本聚类
  - **版本要求**：1.0.0+
  - **实现方式**：用于内容分析和主题提取

#### 5. 性能优化
- **torch**：PyTorch深度学习框架
  - **应用场景**：GPU加速、模型推理
  - **版本要求**：1.8.0+
  - **实现方式**：用于Whisper模型的GPU加速
- **memory_profiler**：内存分析工具
  - **应用场景**：内存使用监控、优化
  - **版本要求**：0.58.0+
  - **实现方式**：用于性能监控和内存优化

### 开发工具
- **Git**：版本控制系统
  - **应用场景**：代码管理、版本控制
  - **版本要求**：2.0+
- **pytest**：测试框架
  - **应用场景**：单元测试、集成测试
  - **版本要求**：6.0.0+
- **black**：代码格式化工具
  - **应用场景**：代码风格统一
  - **版本要求**：21.0.0+
- **flake8**：代码质量检查工具
  - **应用场景**：代码风格检查
  - **版本要求**：4.0.0+
- **mypy**：类型检查工具
  - **应用场景**：静态类型检查
  - **版本要求**：0.900+

### 外部服务和工具
- **CUDA**：NVIDIA的并行计算平台
  - **应用场景**：GPU加速
  - **版本要求**：10.2+
  - **实现方式**：通过PyTorch利用GPU加速模型推理
- **ffprobe**：FFmpeg的媒体信息工具
  - **应用场景**：视频/音频文件分析
  - **版本要求**：与FFmpeg版本匹配
  - **实现方式**：用于获取媒体文件的详细信息

### 技术栈应用架构

```
┌─────────────────────────────────────────────────────────────┐
│                       应用层                               │
├─────────────────────────────────────────────────────────────┤
│  GUI界面 (PyQt5)    │  命令行工具  │  API接口 (未来扩展)     │
├─────────────────────────────────────────────────────────────┤
│                       核心层                               │
├─────────────────────────────────────────────────────────────┤
│  转录引擎 (Whisper)  │  音频处理 (FFmpeg/pydub)            │
│  内容分析 (NLP库)    │  字幕格式转换                      │
│  批量处理系统        │  性能优化                          │
│  视频压缩 (FFmpeg)   │  音频导出                          │
├─────────────────────────────────────────────────────────────┤
│                       基础层                               │
├─────────────────────────────────────────────────────────────┤
│  Python 3.8+         │  操作系统                          │
│  硬件加速 (CUDA)     │  文件系统                          │
└─────────────────────────────────────────────────────────────┘
```

### 技术栈版本管理

项目使用`requirements.txt`文件管理依赖版本，确保环境一致性：

```
# 核心依赖
whisper==1.0.0
numpy==1.21.0
pandas==1.3.0

# 音频处理
pydub==0.25.1

# GUI
PyQt5==5.15.4

# NLP
jieba==0.42.1
textblob==0.17.1
scikit-learn==1.0.2

# 性能优化
torch==1.9.0
memory_profiler==0.58.0

# 测试
pytest==6.2.5
```

### 技术选择理由

1. **Whisper**：选择理由是其在语音识别领域的高精度表现，尤其是对中文的支持较好
2. **FFmpeg**：行业标准的多媒体处理工具，功能强大且稳定
3. **PyQt5**：跨平台GUI框架，提供丰富的UI组件和良好的用户体验
4. **Python**：选择理由是其丰富的生态系统和在AI/ML领域的广泛应用
5. **CUDA**：用于GPU加速，显著提高处理速度，尤其是对于大型模型

### 技术栈扩展性

项目设计考虑了技术栈的扩展性：

- **模块化设计**：核心功能模块化，便于替换或升级单个组件
- **插件系统**：支持添加新的转录模型、字幕格式或分析功能
- **API设计**：预留了API接口，便于与其他系统集成
- **容器化**：支持Docker容器化部署，便于在不同环境中运行

### 技术栈维护

- **依赖管理**：定期更新依赖版本，确保安全性和性能
- **兼容性测试**：在不同Python版本和操作系统上测试
- **文档更新**：及时更新技术栈相关文档
- **性能监控**：持续监控系统性能，优化技术栈配置

## 快速开始

### 基本使用

```python
from src.enhanced_transcription import EnhancedTranscriber
from src.enhanced_overview import EnhancedOverviewGenerator
from src.subtitle_formats import SubtitleFormatConverter

# 初始化组件
transcriber = EnhancedTranscriber(model_size="base")
overview_generator = EnhancedOverviewGenerator()
format_converter = SubtitleFormatConverter()

# 转录视频
result = transcriber.transcribe_segment("video.mp4")

# 生成概述
overview = overview_generator.generate_overview(result.full_text)

# 转换为不同格式
srt_content = format_converter.to_srt(result.segments)
vtt_content = format_converter.to_vtt(result.segments)
```

### GUI应用

启动增强型GUI应用：

```bash
python gui/enhanced_main_window.py
```

### 视频压缩

```python
from gui.enhanced_main_window import VideoCompressionThread, VideoBatchCompressionThread

# 单个视频压缩
thread = VideoCompressionThread(
    video_path="video.mp4",
    output_dir="outputs",
    config=config
)
thread.start()
thread.wait()

# 批量视频压缩
thread = VideoBatchCompressionThread(
    video_paths=["video1.mp4", "video2.mp4"],
    output_dir="outputs",
    config=config
)
thread.start()
thread.wait()
```

### 音频导出

```python
from gui.enhanced_main_window import AudioExportThread, BatchAudioExportThread

# 单个音频导出
thread = AudioExportThread(
    video_path="video.mp4",
    output_dir="outputs",
    format="wav"
)
thread.start()
thread.wait()

# 批量音频导出
thread = BatchAudioExportThread(
    video_paths=["video1.mp4", "video2.mp4"],
    output_dir="outputs",
    format="mp3"
)
thread.start()
thread.wait()
```

### SZ转MP4

```python
from gui.enhanced_main_window import SZToMP4Thread

# SZ转MP4
thread = SZToMP4Thread(
    sz_file_path="file.sz",
    output_dir="outputs",
    video_codec="libx264",
    audio_codec="aac"
)
thread.start()
thread.wait()
```

### SZ提取音频

```python
from gui.enhanced_main_window import SZToAudioThread

# SZ提取音频
thread = SZToAudioThread(
    sz_file_path="file.sz",
    output_dir="outputs",
    audio_format="mp3"
)
thread.start()
thread.wait()
```

### 批量处理

```python
from src.batch_processor import BatchManager

# 初始化批处理管理器
batch_manager = BatchManager()

# 处理多个文件
job_id = batch_manager.quick_batch(
    input_files=["video1.mp4", "video2.mp4"],
    output_dir="outputs",
    formats=["srt", "vtt", "json"]
)

# 等待完成
batch_manager.wait_for_completion(job_id)

# 获取结果
results = batch_manager.get_job_results(job_id)
```

## 配置

### 设置配置

编辑 `config/config.py` 来自定义默认设置：

```python
# Whisper模型配置
WHISPER_MODEL_SIZE = "base"  # tiny, base, small, medium, large

# 音频处理
SILENCE_THRESHOLD = "-30dB"
MIN_SEGMENT_DURATION = 5.0

# 性能
MAX_WORKERS = 4
```

### 领域词汇

添加领域特定术语以提高转录准确性：

```python
DOMAIN_VOCABULARY = {
    "technology": {
        "zh": ["AI", "ML", "DL", "NLP", "CV"],
        "en": ["artificial intelligence", "machine learning", "deep learning"]
    }
}
```

## Architecture

### Module Structure

```
video_subtitle_project/
    src/
        enhanced_transcription.py    # Advanced transcription engine
        enhanced_overview.py         # Content analysis and summarization
        subtitle_formats.py          # Multi-format subtitle support
        batch_processor.py           # Batch processing system
        performance_optimizer.py     # Performance monitoring
        video_subtitle.py            # Core video processing
        audio_segmentation.py        # Audio segmentation algorithms
        video_compression.py         # Video compression functionality
    gui/
        enhanced_main_window.py      # Modern GUI interface
    config/
        config.py                    # Configuration settings
    tests/
        test_enhanced_modules.py     # Comprehensive test suite
        test_video_compression.py    # Video compression tests
    docs/
        api_reference.md             # API documentation
        user_guide.md                # User guide
```

### Key Components

#### EnhancedTranscriber
- Multi-model ensemble for improved accuracy
- Noise reduction and audio preprocessing
- Domain-specific vocabulary enhancement
- Confidence scoring and quality metrics

#### SubtitleFormatConverter
- Support for SRT, VTT, ASS, JSON formats
- Automatic format detection
- Quality validation and error checking
- Metadata preservation

#### BatchProcessor
- Parallel processing of multiple files
- Progress tracking and error handling
- Resource management and optimization
- Result aggregation and reporting

#### PerformanceOptimizer
- Real-time resource monitoring
- Memory management and cleanup
- CPU optimization and thread management
- Intelligent caching system

## API Reference

### EnhancedTranscriber

```python
class EnhancedTranscriber:
    def __init__(self, model_size: str = "base", domain: str = "general")
    def transcribe_segment(self, audio_path: str, language: str = None) -> TranscriptionResult
    def transcribe_with_chunks(self, audio_path: str, chunk_duration: float = 600.0) -> TranscriptionResult
```

### SubtitleFormatConverter

```python
class SubtitleFormatConverter:
    def convert(self, input_content: str, input_format: str, output_format: str) -> str
    def parse_srt(self, content: str) -> List[SubtitleEntry]
    def to_vtt(self, entries: List[SubtitleEntry]) -> str
    def validate_entries(self, entries: List[SubtitleEntry]) -> Dict[str, Any]
```

### BatchProcessor

```python
class BatchProcessor:
    def create_job(self, input_files: List[str], output_dir: str, settings: Dict) -> str
    def start_job(self, job_id: str, progress_callback: Callable = None) -> bool
    def get_job_status(self, job_id: str) -> Dict[str, Any]
    def get_job_results(self, job_id: str) -> List[BatchResult]
```

### VideoCompression

```python
from src.video_compression import VideoCompressionConfig, compress_video, batch_compress_videos, get_video_info, estimate_compression_size, get_optimal_compression_config

# 创建压缩配置
config = VideoCompressionConfig(
    output_format="mp4",
    resolution="720p",
    bitrate="medium",
    framerate=30,
    crf=23,
    preset="medium",
    audio_bitrate="128k",
    codec="h265"  # h264 or h265
)

# 获取智能压缩配置（根据视频特性自动选择最优参数）
config = get_optimal_compression_config("input.mp4")

# 压缩单个视频
error = compress_video("input.mp4", "output.mp4", config, progress_callback=progress_func)

# 批量压缩视频
successful, failed = batch_compress_videos(["video1.mp4", "video2.mp4"], "output_dir", config, progress_callback=progress_func)

# 获取视频信息
info = get_video_info("video.mp4")

# 估算压缩大小
size = estimate_compression_size("video.mp4", config)
```

## Performance Optimization

### Memory Management

The system includes automatic memory optimization:

```python
from src.performance_optimizer import start_global_optimization

# Start automatic optimization
start_global_optimization()

# Use memory-efficient decorator
@memory_efficient(max_memory_mb=2048)
def process_large_video(video_path):
    # Processing logic
    pass
```

### Caching

Enable intelligent caching for repeated operations:

```python
@cached_result(ttl_seconds=3600, max_size=1000)
def transcribe_audio(audio_path):
    # Transcription logic
    pass
```

### Resource Monitoring

Monitor system resources in real-time:

```python
from src.performance_optimizer import get_global_optimizer

optimizer = get_global_optimizer()
report = optimizer.get_performance_report()

print(f"CPU Usage: {report['resources']['current']['cpu_percent']}%")
print(f"Memory Usage: {report['resources']['current']['memory_mb']} MB")
```

## Testing

Run the comprehensive test suite:

```bash
python -m pytest tests/ -v
```

### Smoke Test Suite

Run the comprehensive smoke test to validate core functionality:

```bash
python tests/comprehensive_smoke_test.py
```

The smoke test covers:
- Video to subtitle conversion
- Video compression
- SZ to MP4 conversion
- Audio to text conversion
- Command-line tools
- Configuration files
- Core module imports

### Test Coverage

- Unit tests for all core modules
- Integration tests for batch processing
- Performance benchmarks
- GUI testing with Qt Test framework
- Smoke tests for core functionality validation

### Test Categories

```bash
# Run unit tests only
python -m pytest tests/test_units/ -v

# Run integration tests
python -m pytest tests/test_integration/ -v

# Run performance tests
python -m pytest tests/test_performance/ -v

# Run video compression tests
python -m pytest tests/test_video_compression.py -v

# Run smoke tests
python tests/comprehensive_smoke_test.py

# Run all tests with coverage
python -m pytest tests/ --cov=src --cov-report=html
```

## Troubleshooting

### Common Issues

#### Low Transcription Accuracy
1. Ensure audio quality is good (clear speech, minimal background noise)
2. Use appropriate model size (larger models are more accurate)
3. Enable noise reduction in settings
4. Add domain-specific vocabulary

#### Memory Issues
1. Reduce batch size
2. Enable automatic memory optimization
3. Use smaller model size
4. Process files sequentially instead of in parallel

#### Slow Processing
1. Use GPU acceleration if available
2. Optimize chunk size for longer videos
3. Reduce number of parallel workers
4. Enable caching for repeated operations

#### GUI Issues
1. Ensure PyQt5 is properly installed
2. Check system display settings
3. Run with administrator privileges if needed
4. Update graphics drivers

### Error Messages

#### "Whisper model not found"
```bash
# Download model manually
python -c "import whisper; whisper.load_model('base')"
```

#### "FFmpeg not found"
```bash
# Install FFmpeg
# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html
```

#### "CUDA out of memory"
```python
# Reduce model size or batch size
transcriber = EnhancedTranscriber(model_size="base")
```

#### "Whisper and related dependencies are required"
```bash
# 安装缺少的依赖
pip install whisper langdetect

# 验证安装
python -c "import whisper; print('Whisper installed successfully')"
python -c "import langdetect; print('langdetect installed successfully')"
```

#### "不支持的音频格式"
```
# 确保音频格式被支持
支持的格式: .mp3, .wav, .flac, .aac, .ogg, .m4a

# 转换不支持的格式
ffmpeg -i input.unsupported output.mp3
```

## Contributing

### Development Setup

1. Clone the repository
2. Create virtual environment
3. Install dependencies
4. Run tests to verify setup

```bash
git clone <repository-url>
cd video_subtitle_project
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
python -m pytest tests/
```

### Code Style

Follow PEP 8 style guidelines. Use the following tools:

```bash
# Format code
black src/ tests/

# Check style
flake8 src/ tests/

# Type checking
mypy src/
```

### Submitting Changes

1. Create feature branch
2. Add tests for new functionality
3. Ensure all tests pass
4. Update documentation
5. Submit pull request

## License

This project is licensed under the MIT License. See LICENSE file for details.

## Changelog

### Version 2.5.0
- SZ转MP4功能：支持将SZ文件转换为MP4格式，确保音视频同步，文件格式符合行业标准
- SZ提取音频功能：支持从SZ文件中提取音频，支持MP3、WAV、AAC、FLAC等多种音频格式
- 增强的错误处理和备用方案，确保功能的稳定性和可靠性
- 优化的FFmpeg命令参数，提高转换和提取的质量
- 详细的功能测试和冒烟测试

### Version 2.4.0
- Comprehensive smoke test suite for core functionality validation
- Video compression optimization with intelligent parameter selection
- H.265 encoding support for better compression efficiency
- Compression ratio verification to ensure at least 10% size reduction
- Fixed subprocess module import issues in audio segmentation
- Improved error handling and stability

### Version 2.3.0
- Video recording and transcoding functionality
- Script loading mechanism for flexible content capture
- Screen recording with custom region support
- Hardware-level video capture solution
- TX meeting recording analysis and implementation
- Performance comparison between software and hardware capture

### Version 2.2.0
- Complete text export functionality
- Generate _content.txt files with full recognized text for videos and audio files
- Fixed validation warning for transcription results
- Improved memory usage tracking

### Version 2.1.0
- Video compression functionality with single and batch processing
- Multiple output formats support (MP4, AVI, MOV, MKV, WebM)
- Adjustable compression parameters (resolution, bitrate, framerate)
- Progress tracking and error handling for compression
- Size estimation before compression
- Updated GUI with video compression tab
- Comprehensive video compression tests

### Version 2.0.0
- Enhanced transcription accuracy with model ensemble
- Multi-format subtitle support
- Advanced content analysis and overview generation
- Batch processing capabilities
- Performance optimization and monitoring
- Modern GUI interface
- Comprehensive test suite

### Version 1.0.0
- Basic video transcription
- SRT subtitle format support
- Simple GUI interface
- Basic configuration options

## Support

For support and questions:

1. Check the troubleshooting section
2. Search existing issues on GitHub
3. Create new issue with detailed description
4. Include system information and error logs

## System Requirements

### Minimum Requirements
- Python 3.8+
- 4GB RAM
- 2 CPU cores
- 10GB free disk space

### Recommended Requirements
- Python 3.9+
- 16GB RAM
- 4+ CPU cores
- CUDA-capable GPU with 8GB+ VRAM
- 50GB free disk space
- SSD for better I/O performance

## Performance Benchmarks

### Transcription Speed
- Tiny model: ~10x real-time
- Base model: ~5x real-time
- Small model: ~2x real-time
- Medium model: ~1x real-time
- Large model: ~0.5x real-time

### Memory Usage
- Tiny model: ~500MB
- Base model: ~1GB
- Small model: ~2GB
- Medium model: ~4GB
- Large model: ~8GB

### Quality Metrics
- Base model accuracy: ~85%
- Small model accuracy: ~90%
- Medium model accuracy: ~93%
- Large model accuracy: ~95%

## 未来发展

### 计划功能
- 实时流转录
- 其他字幕格式支持
- 高级视频分析
- 云处理集成
- 移动应用
- API服务器实现

### 研究领域
- 改进的噪声 reduction 算法
- 更好的说话人区分
- 增强的内容理解
- 多语言支持改进
- 自定义模型训练能力