# AudioLab **Repository Path**: liyihao17/AudioLab ## Basic Information - **Project Name**: AudioLab - **Description**: No description available - **Primary Language**: Python - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-23 - **Last Updated**: 2025-04-23 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # AudioLab ![AudioLab Logo](./res/audiolab_lg.png) [![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE) [![Python](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/) [![CUDA](https://img.shields.io/badge/CUDA-cu121-brightgreen)](https://developer.nvidia.com/cuda-downloads) [![Contributions Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen)](CONTRIBUTING.md) > **Huge thanks to RunDiffusion for supporting this project!** 🎉 AudioLab is an open-source powerhouse for voice-cloning and audio separation, built with modularity and extensibility in mind. Whether you're an audio engineer, researcher, or just a curious tinkerer, AudioLab has you covered. --- ## 🌟 Features ### 🎵 Audio Processing Capabilities - **🎼 Music Generation:** Create music from scratch or remix existing tracks using YuE. - **🎵 Song Generation:** Create full-length songs with vocals and instrumentals using DiffRhythm. - **🗣️ Zonos Text-to-Speech:** High-quality TTS with deep learning. - **🎭 Orpheus TTS:** Real-time natural-sounding speech powered by large language models. - **📢 Text-to-Speech:** Clone voices and generate natural-sounding speech with Coqui TTS. - **🔊 Text-to-Audio:** Generate sound effects and ambient audio from text descriptions using Stable Audio. - **🎛️ Audio Separation:** Isolate vocals, drums, bass, and other components from a track. - **🎤 Vocal Isolation:** Distinguish lead vocals from background. - **🔇 Noise Removal:** Get rid of echo, crowd noise, and unwanted sounds. - **🧬 Voice Cloning:** Train high-quality voice models with just 30-60 minutes of data. - **🚀 Audio Super Resolution:** Enhance and clean up audio. - **🎚️ Remastering:** Apply spectral characteristics from a reference track. - **🎵 Timbre Transfer:** Transform instrument sounds while preserving musical content using WaveTransfer. - **🔄 Audio Conversion:** Convert between popular formats effortlessly. - **📜 Export to DAW:** Easily create Ableton Live and Reaper projects from separated stems. ### 🤖 Automation Features - **Auto-preprocessing** for voice model training. - **Merge separated sources** back into a single file with ease. --- ## 🛠️ Pre-requisites Before you dive in, make sure you have: 1. **Python 3.10** – *Because match statements exist, and fairseq is allergic to 3.11.* 2. **CUDA 12.4** – *Other versions? Maybe fine. Maybe not. Do you like surprises?* 3. **Virtual Environment** – *Strongly recommended to avoid dependency chaos.* 4. **Windows Users** – *You're in for an adventure! Zonos/Triton can be a pain. Make sure to install MSVC and add these paths to your environment variables:* ```plaintext C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.42.34433\bin\Hostx64\x64 C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.42.34433\bin\Hostx86\x86 ``` > **Note:** This project assumes basic Python knowledge. If you've never set up a virtual environment before... now's the time to learn! 🚀 --- ## 🚑 Windows Troubleshooting If dependencies refuse to install on Windows, try the following: - Install **MSVC Build Tools**: - [VC Redist x64](https://aka.ms/vs/17/release/vc_redist.x64.exe) - [Build Tools](https://aka.ms/vs/17/release/vs_BuildTools.exe) - Ensure **CUDA is correctly installed**: - Check version: `nvcc --version` - [Download CUDA 12.4](https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_551.61_windows.exe) - DLL Errors? Try moving necessary DLLs from `/libs` to: ```plaintext .venv\lib\site-packages\pandas\_libs\window .venv\lib\site-packages\sklearn\.libs C:\Program Files\Python310\ (or wherever your Python is installed) ``` --- ## 🚀 Installation > **Heads up!** The `requirements.txt` is *not* complete on purpose. Use the setup scripts instead! ### 🛠 Steps 1. Clone the repository: ```bash git clone https://github.com/yourusername/audiolab.git cd audiolab ``` 2. Set up a virtual environment: ```bash python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate ``` 3. Run the setup script: ```bash ./setup.sh # Windows: setup.bat ``` **Common Issues & Fixes:** - Downgrade `pip` if installation fails: ```bash python -m pip install pip==24.0 ``` - Install older CUDA drivers if needed: [CUDA Toolkit Archive](https://developer.nvidia.com/cuda-toolkit-archive) - Install `fairseq` manually if necessary: ```bash pip install fairseq>=0.12.2 --no-deps ``` --- ## 🎛️ Running AudioLab 1. Activate your virtual environment: ```bash source venv/bin/activate # Windows: venv\Scripts\activate.bat ``` 2. Run the application: ```bash python main.py ``` 3. Optional flags: - `--listen` → Bind to `0.0.0.0` for remote access. - `--port PORT` → Specify a custom port. --- ## 📸 Screenshots | ![Screenshot 1](./res/img/ss1_zonos.png) | ![Screenshot 2](./res/img/ss2_tts.png) | |---------------------------------|---------------------------------| | ![Screenshot 3](./res/img/ss3_yue.png) | ![Screenshot 4](./res/img/ss4_process.png) | | ![Screenshot 5](./res/img/ss4_train.png) | | --- ## 💻 Key Features ### Sound Forge: Text-to-Audio Generation Generate high-quality sound effects, ambient audio, and musical samples from text descriptions: - **🔊 Text Prompting:** Create sounds by describing them in natural language. - **⏱️ Variable Duration:** Generate audio up to 47 seconds long. - **🎛️ Full Control:** Adjust parameters like inference steps and guidance scale. - **🎭 Negative Prompts:** Specify what to avoid in your generated audio. - **🎲 Multiple Variations:** Generate different versions of the same prompt. Example prompts: - "A peaceful forest ambience with birds chirping and leaves rustling" - "An electronic beat with pulsing bass at 120 BPM" - "A sci-fi spaceship engine humming" ### DiffRhythm: Full-Length Song Generation Create complete songs with vocals and instrumentals using state-of-the-art latent diffusion: - **🎵 Complete Songs:** Generate full-length songs up to 4m45s. - **🎤 Lyrics Support:** Add lyrics using LRC format with timestamps. - **🎹 Style Control:** Define the musical style using text prompts or reference audio. - **⚡ Blazingly Fast:** Efficient generation compared to other music models. - **💾 Memory Efficient:** Chunked decoding option for consumer GPUs. Example use cases: - Create original songs in any genre with your own lyrics - Generate background music for videos with specific moods - Experiment with unique musical styles and vocal characteristics ### WaveTransfer: Instrument Timbre Transfer Transform the sound characteristics of one instrument to another using diffusion models: - **🎵 Preserve Musical Content:** Transform timbre while keeping the original musical composition intact. - **🎸 Multi-instrument Support:** Transfer between any types of musical instruments. - **🔄 Two-Step Process:** Easy-to-follow train-then-generate workflow for custom instruments. - **⚙️ Flexible Configuration:** Adjust noise schedules and steps for different transfer qualities. - **💾 Memory Optimization:** Use chunked processing for longer audio files. Example applications: - Transform a piano recording to sound like a guitar - Create hybrid instruments with unique sound characteristics - Convert acoustic instrument recordings to electronic sounds - Experiment with novel timbres for music production ### Orpheus TTS: Real-time Speech Synthesis Generate natural-sounding speech with LLM-powered text-to-speech capabilities: - **⚡ Real-time Processing:** Instantaneous speech generation. - **🗣️ Voice Cloning:** Create custom voice models from your recordings. - **😀 Emotion Control:** Adjust speaking style for more expressive speech. - **🌐 Multilingual Support:** Generate speech in multiple languages. - **🎭 Style Variety:** Create different styles from a single voice model. Example applications: - Create audiobooks with natural narration - Develop voice assistants with your own voice - Generate voiceovers for videos and presentations - Create accessible content for those with reading difficulties ### Transcribe: Advanced Speech-to-Text Convert audio recordings to text with speaker identification and precise timing: - **👥 Speaker Diarization:** Automatically identify and label different speakers. - **⏱️ Word-Level Timestamps:** Create perfectly aligned text with audio timing. - **🌍 Multilingual Support:** Transcribe content in multiple languages. - **📊 Batch Processing:** Process multiple audio files in sequence. - **📋 Multiple Output Formats:** Generate both JSON metadata and readable text. Example applications: - Create subtitles for videos with speaker labels - Transcribe interviews and meetings with speaker attribution - Generate searchable archives of audio content - Create training data for voice and speech models ### Process Tab: Audio Processing Pipeline The heart of AudioLab with modular audio processing through a chain of wrappers: - **🔊 Separate:** Split audio into vocals, drums, bass, and other instruments. - **🎤 Clone:** Apply voice conversion with trained models. - **⚡ Remaster:** Enhance audio based on reference tracks. - **🔬 Super Resolution:** Improve audio detail and clarity. - **🔀 Merge:** Mix separate audio tracks with complete control. - **🔄 Convert:** Change audio formats with customizable settings. Example workflows: - Extract vocals → Apply voice clone → Merge with original instruments - Split song → Enhance each component → Remix with new levels - Remaster old recordings using modern reference tracks ### RVC Training: Voice Model Creation Train custom voice models for voice conversion and cloning: - **🎯 One-Click Process:** Simplified training with automatic preprocessing. - **⚙️ Advanced Options:** Fine-tune training for specific voice characteristics. - **📊 Training Visualization:** Monitor progress in real-time. - **🔄 Model Management:** Organize and share your trained voice models. Example applications: - Create virtual versions of your own voice - Develop character voices for games or animations - Restore or enhance historical recordings --- ## 🤝 Acknowledgements AudioLab is powered by some fantastic open-source projects: - 🎵 [python-audio-separator](https://github.com/nomadkaraoke/python-audio-separator) – Core for audio separation. - 🎚 [matchering](https://github.com/sergree/matchering) – Professional-grade remastering. - 🔊 [versatile-audio-super-resolution](https://github.com/haoheliu/versatile_audio_super_resolution) – High-quality audio enhancement. - 🎙 [Real-Time-Voice-Cloning](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) – Voice cloning. - 🎶 [MVSEP-MDX23](https://github.com/ZFTurbo/MVSEP-MDX23-music-separation-model) – Music separation. - 📜 [WhisperX](https://github.com/m-bain/whisperX) – Audio transcription. - 🗣 [Coqui TTS](https://github.com/coqui-ai/TTS) – State-of-the-art TTS. - 🎼 [YuE](https://github.com/multimodal-art-projection/YuE) – Music generation. - 🏆 [Zonos](https://github.com/Zyphra/Zonos) – High-quality TTS. - 🔈 [Stable Audio](https://stability.ai/blog/stable-audio-open-1-0-free-text-to-audio-model) – Text-to-audio generation. - 🎵 [DiffRhythm](https://github.com/ASLP-lab/DiffRhythm) – Full-length song generation with latent diffusion. - 🗣️ [Orpheus-TTS](https://github.com/canopyai/Orpheus-TTS) – Real-time high-quality text-to-speech. - 🎵 [WaveTransfer](https://github.com/tencent-ailab/bddm) – Instrument timbre transfer with diffusion. --- ## 🌟 Contribute Want to help? Check out the [Contributing Guide](CONTRIBUTING.md)! --- ## 📜 License Licensed under MIT. See [LICENSE](LICENSE) for details. --- Made with ❤️ by the AudioLab team. (AKA D8ahazard)