# sql_upsert

**Repository Path**: pipboy492/sql_upsert

## Basic Information

- **Project Name**: sql_upsert
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-03-17
- **Last Updated**: 2026-03-31

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# SQL Upsert Tool

A Python command-line utility to execute SQL `INSERT` statements from a directory of `.sql` files against a MySQL/MariaDB database. It automatically converts standard `INSERT` statements into `INSERT ... ON DUPLICATE KEY UPDATE` to handle upserts (update if exists, insert if not).

**New!** Now supports `.csv`, `.json`, `.xls`, and `.xlsx` files for data import.

## Features

- **Automatic Upsert:** Transforms `INSERT INTO table (cols) VALUES ...` to `... ON DUPLICATE KEY UPDATE col=VALUES(col)...`
- **Multi-Format Support:** Handles `.sql`, `.csv`, `.json`, `.xls`, and `.xlsx` files.
- **Dynamic Schema Detection:** Automatically fetches table columns for `INSERT` statements without column lists or data files.
- **Robust SQL Parsing:** Correctly handles semicolons within strings and comments.
- **Transaction Support:** Each file is executed in a single transaction. If any error occurs, the entire file is rolled back.
- **Dry Run Mode:** Preview operations without modifying the database.
- **Multi-threaded Execution:** Process multiple files in parallel for faster execution.
- **Progress Bar:** Shows real-time progress of file processing.
- **Recursive Search:** Finds supported files in the specified directory and subdirectories.

## Supported Formats

### SQL (`.sql`)
Standard SQL dumps or scripts. The tool automatically converts `INSERT` statements to `ON DUPLICATE KEY UPDATE`.

### CSV (`.csv`) / Excel (`.xls`, `.xlsx`)
- **Table Name:** Derived from the filename (e.g., `users.xlsx` -> `users` table).
- **Format:** First row must be the header with column names.
- **Logic:** Each row generates an `INSERT ... ON DUPLICATE KEY UPDATE` statement.

### JSON (`.json`)
- **Table Name:** Derived from the filename (e.g., `products.json` -> `products` table).
- **Format:** Must be a JSON array of objects.
  ```json
  [
    {"id": 1, "name": "Product A", "price": 100},
    {"id": 2, "name": "Product B", "price": 200}
  ]
  ```
- **Logic:** Each object generates an `INSERT ... ON DUPLICATE KEY UPDATE` statement.

## Installation & Usage

1. **Run directly (Linux Binary):**
   ```bash
   ./dist/sql_upsert <ip> <port> <username> <password> <data_directory> [--db <database_name>] [--dry-run] [--threads 4]
   ```

2. **Run from Source:**
   ```bash
   pip install -r requirements.txt
   python sql_upsert.py <ip> <port> <username> <password> <data_directory> [--db <database_name>] [--dry-run] [--threads 4]
   ```

### Arguments

- `ip`: Database Host IP
- `port`: Database Port (e.g., 3306)
- `username`: Database Username
- `password`: Database Password
- `data_directory`: Directory containing data files
- `--db`: (Optional) Target Database Name. Highly recommended for data imports.
- `--dry-run`: (Optional) Enable dry-run mode to preview SQL statements without execution.
- `--threads`: (Optional) Number of concurrent threads for file processing (default: 4).

### Example

```bash
# Execute with 8 threads
./dist/sql_upsert 127.0.0.1 3306 root mypassword ./data_files --db my_database --threads 8

# Dry run (preview only)
./dist/sql_upsert 127.0.0.1 3306 root mypassword ./data_files --db my_database --dry-run
```

## How it works

1. The script scans the provided directory for `.sql`, `.csv`, `.json`, `.xls`, and `.xlsx` files.
2. For **SQL files**:
   - Parses statements, handling complex syntax.
   - Converts `INSERT` to Upsert logic, dynamically fetching schemas if needed.
3. For **CSV/JSON/Excel files**:
   - Reads data and generates `INSERT` statements mapping to the table named after the file.
   - Applies the same Upsert transformation.
4. Executes each file in a dedicated transaction (skipped in dry-run mode).
5. Uses a thread pool to process multiple files simultaneously.

## Limitations

- **Database:** Designed for MySQL/MariaDB syntax.
- **Data Files:** Assumes the filename matches the database table name exactly (case-sensitive depending on OS/DB settings).