# json-rag **Repository Path**: wangpengabc/json-rag ## Basic Information - **Project Name**: json-rag - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-10-17 - **Last Updated**: 2025-10-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # JSON RAG Integration A tool for efficiently loading and integrating nested JSON data structures into RAG (Retrieval-Augmented Generation) systems, with enhanced entity tracking, relationship detection, and context preservation. ## Key Features * **Advanced Query Understanding**: - Temporal patterns (exact dates, relative ranges, named periods) - Metric aggregations (average, maximum, minimum, sum, count) - Entity relationships (direct, semantic, and cross-file connections) - State transitions and system conditions - Hybrid search combining vector similarity, relationships, and filters * **Smart Data Processing**: - Automatic entity detection and relationship mapping - Cross-file relationship detection and validation - Key-value pair extraction for filtered searches - Embedded metadata tracking - Batch processing with change detection * **Archetype-Aware Processing**: - Pattern detection (entities, events, metrics, collections) - Archetype-based scoring and ranking - Relationship validation by archetype - Context-aware embedding generation - Archetype-specific traversal strategies * **Hierarchical Data Management**: - Full JSON structure preservation - Parent-child relationship tracking - Cross-file relationship mapping - Contextual embedding with ancestry - Path-based chunk identification * **Enhanced Retrieval**: - Vector similarity search using PGVector - Relationship-aware context assembly - Entity-aware result filtering - Cross-file context expansion - Confidence-based scoring and ranking ## Quick Start 1. Clone and install: ```bash git clone https://github.com/Mocksi/json-rag.git cd json_rag uv venv rag_env source rag_env/bin/activate # Windows: .\rag_env\Scripts\activate uv pip install -r requirements.txt ``` 2. Set up environment: ```bash # Create .env file with: OPENAI_API_KEY=your-key-here POSTGRES_DB=crowllector POSTGRES_USER=crowllector POSTGRES_PASSWORD=yourpassword POSTGRES_HOST=localhost POSTGRES_DB_PORT=5432 ``` 3. Initialize and run: ```bash python -m app.main --new # Truncates all tables and starts fresh python -m app.main # Normal operation ``` ## Architecture ``` app/ ├── analysis/ # Analysis and pattern detection │ ├── archetype.py # Pattern and archetype detection │ └── relationships.py# Cross-file relationship analysis ├── core/ # Core system components │ ├── config.py # Configuration settings │ └── models.py # Data models ├── processing/ # Data processing modules │ ├── json_parser.py # JSON structure parsing │ ├── parsing.py # Document parsing and chunking │ └── processor.py # Data processing pipeline ├── retrieval/ # Query processing and retrieval │ ├── embedding.py # Vector embedding generation │ └── retrieval.py # Query pipeline and execution ├── storage/ # Data persistence │ └── database.py # PostgreSQL and vector storage ├── utils/ # Utility modules │ └── logging_config.py # Logging configuration ├── __init__.py # Package initialization ├── chat.py # Chat interface and interactions └── main.py # Application entry point ``` The codebase is organized into logical modules: - **analysis/**: Modules for analyzing data patterns, cross-file relationships, and user intent - **core/**: Core system configuration and shared components - **processing/**: Data processing and relationship detection modules - **retrieval/**: Relationship-aware search and context assembly - **storage/**: Database interaction and relationship persistence - **utils/**: Shared utility functions and helpers Each module is designed to be independent with clear responsibilities, while working together through well-defined interfaces. ## Installation Requirements - Python 3.8 or higher - PostgreSQL 12 or higher with PGVector extension - OpenAI API key - Required Python packages (see requirements.txt) ## Documentation The codebase features comprehensive inline documentation: - Detailed module-level docstrings explaining key concepts - Function and class documentation with examples - Type hints and parameter descriptions - Usage examples and implementation notes ## Contributing We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details on: - Setting up your development environment - Code style guidelines - Pull request process - Development workflow ## Code of Conduct This project follows the [Contributor Covenant Code of Conduct](CODE_OF_CONDUCT.md). By participating, you are expected to uphold this code. Please report unacceptable behavior. ## License MIT License - see LICENSE file for details. ## Roadmap - [x] Cross-file relationship detection - [x] Archetype-aware retrieval - [x] Relationship-based context expansion - [x] Confidence scoring algorithm refinement - [ ] State transition handling improvements - [ ] Batch processing optimization - [ ] Metric aggregation capabilities - [ ] Entity filtering rules improvement - [ ] Context assembly performance optimization - [ ] Advanced archetype pattern detection ## Query Pipeline The system implements a structured reasoning pipeline: 1. **Query Analysis**: - Determines required data types - Identifies needed operations (filtering, aggregation) - Detects relationships and constraints 2. **Plan Creation**: - Builds retrieval strategy - Plans processing operations - Determines result formatting 3. **Execution**: - Retrieves relevant chunks - Processes according to plan - Assembles coherent response This systematic approach ensures consistent and reliable query handling while preserving context and relationships.