# Salt **Repository Path**: ailearn_1/Salt ## Basic Information - **Project Name**: Salt - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-04-17 - **Last Updated**: 2026-04-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Vikhr Salt: Speech And Language Transformer ![Vikhr Salt Logo](https://huggingface.co/Vikhrmodels/salt-116k/resolve/main/IMG_1304%20copy.png) Vikhr Salt is a multimodal model based on a pre-trained large language model, extended with new audio tokens to handle both TTS (text-to-speech) and ASR (automatic speech recognition) tasks. The model incorporates two variants for encoding audio—Encodec and SpeechTokenizer—and achieves stable training by fine-tuning precision settings. This approach allows Vikhr Salt to leverage pre-existing LLM knowledge while effectively generating and understanding speech, marking a step forward in multimodal learning. ## Model Authors Ksenia Sycheva, Konstantin Korolev, Aleksandr Nikolic ## Datasets - [TEDLIUM](https://huggingface.co/datasets/LIUM/tedlium) - [Librispeech](https://huggingface.co/datasets/openslr/librispeech_asr) ## How to run ### Preparing Data To tokenize data run [prepare_data.py](prepare_data.py). Configs for different tokenizers ([SpeechTokenizer](https://github.com/ZhangXInFD/SpeechTokenizer), [WavTokenizer](https://github.com/jishengpeng/WavTokenizer/), [FishTokenizer](https://github.com/fishaudio/fish-speech/)) are available in [this](configs/quantization) folder. ``` python prepare_data.py --config configs/quantization/.yaml ``` ### Training It is possible to configure tokenization for TTS and ASR differently: - different number of tokens - different tokenizers To do that specify type of quantizer and number of codebooks for both tasks. Examples of configs can be found [here](configs/asr_tts). Notes: 1. music/other non-speech data is only supported by [this](configs/quantization/quantization-wav-music.yaml) version of WavTokenizer 2. WavTokenizer has fixed number of codebooks = 1, for SpeechTokenizer values between 1 and 8 can be chosen for single gpu ``` source scripts/run_me.sh ``` for multi gpu+ds2 ``` source scripts/run_me_ds2.sh ```