# sql_upsert **Repository Path**: pipboy492/sql_upsert ## Basic Information - **Project Name**: sql_upsert - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-03-17 - **Last Updated**: 2026-03-31 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # SQL Upsert Tool A Python command-line utility to execute SQL `INSERT` statements from a directory of `.sql` files against a MySQL/MariaDB database. It automatically converts standard `INSERT` statements into `INSERT ... ON DUPLICATE KEY UPDATE` to handle upserts (update if exists, insert if not). **New!** Now supports `.csv`, `.json`, `.xls`, and `.xlsx` files for data import. ## Features - **Automatic Upsert:** Transforms `INSERT INTO table (cols) VALUES ...` to `... ON DUPLICATE KEY UPDATE col=VALUES(col)...` - **Multi-Format Support:** Handles `.sql`, `.csv`, `.json`, `.xls`, and `.xlsx` files. - **Dynamic Schema Detection:** Automatically fetches table columns for `INSERT` statements without column lists or data files. - **Robust SQL Parsing:** Correctly handles semicolons within strings and comments. - **Transaction Support:** Each file is executed in a single transaction. If any error occurs, the entire file is rolled back. - **Dry Run Mode:** Preview operations without modifying the database. - **Multi-threaded Execution:** Process multiple files in parallel for faster execution. - **Progress Bar:** Shows real-time progress of file processing. - **Recursive Search:** Finds supported files in the specified directory and subdirectories. ## Supported Formats ### SQL (`.sql`) Standard SQL dumps or scripts. The tool automatically converts `INSERT` statements to `ON DUPLICATE KEY UPDATE`. ### CSV (`.csv`) / Excel (`.xls`, `.xlsx`) - **Table Name:** Derived from the filename (e.g., `users.xlsx` -> `users` table). - **Format:** First row must be the header with column names. - **Logic:** Each row generates an `INSERT ... ON DUPLICATE KEY UPDATE` statement. ### JSON (`.json`) - **Table Name:** Derived from the filename (e.g., `products.json` -> `products` table). - **Format:** Must be a JSON array of objects. ```json [ {"id": 1, "name": "Product A", "price": 100}, {"id": 2, "name": "Product B", "price": 200} ] ``` - **Logic:** Each object generates an `INSERT ... ON DUPLICATE KEY UPDATE` statement. ## Installation & Usage 1. **Run directly (Linux Binary):** ```bash ./dist/sql_upsert [--db ] [--dry-run] [--threads 4] ``` 2. **Run from Source:** ```bash pip install -r requirements.txt python sql_upsert.py [--db ] [--dry-run] [--threads 4] ``` ### Arguments - `ip`: Database Host IP - `port`: Database Port (e.g., 3306) - `username`: Database Username - `password`: Database Password - `data_directory`: Directory containing data files - `--db`: (Optional) Target Database Name. Highly recommended for data imports. - `--dry-run`: (Optional) Enable dry-run mode to preview SQL statements without execution. - `--threads`: (Optional) Number of concurrent threads for file processing (default: 4). ### Example ```bash # Execute with 8 threads ./dist/sql_upsert 127.0.0.1 3306 root mypassword ./data_files --db my_database --threads 8 # Dry run (preview only) ./dist/sql_upsert 127.0.0.1 3306 root mypassword ./data_files --db my_database --dry-run ``` ## How it works 1. The script scans the provided directory for `.sql`, `.csv`, `.json`, `.xls`, and `.xlsx` files. 2. For **SQL files**: - Parses statements, handling complex syntax. - Converts `INSERT` to Upsert logic, dynamically fetching schemas if needed. 3. For **CSV/JSON/Excel files**: - Reads data and generates `INSERT` statements mapping to the table named after the file. - Applies the same Upsert transformation. 4. Executes each file in a dedicated transaction (skipped in dry-run mode). 5. Uses a thread pool to process multiple files simultaneously. ## Limitations - **Database:** Designed for MySQL/MariaDB syntax. - **Data Files:** Assumes the filename matches the database table name exactly (case-sensitive depending on OS/DB settings).