Skip to content

subspecs/FishS2Sharp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FishS2Sharp

A proper C# wrapper for the Fish Audio S2 Pro repo s2.cpp.

What features are currently implemented:

  • Ability to transform text to voice using either only using a text input and/or combined with an reference audio(voice sample) file.
  • Ability to save the resulting audio to file and/or retreive it as an array of float samples in code(mono).
  • The library is modular, meaning you can load the model/clients separately, which allows you to re-use the model in other FishS2Client instances without reloading them again.
  • Audio references are binded and stored in FishAudioVoiceReference instances, and they are processed at the time you create/add/register them with RegisterVoiceReference(). This allows to process the reference voice samples for the model ONLY ONCE and re-use it without having to re-load it for every prompt. (NOTE: You can only use/synth. processed voice samples for the model you used to process them with.)
  • CUDA, Vulkan and Metal support, as well as CPU fallback. (As long you have ggml-cuda.dll/ggml-vulkan.dll/ggml-metal.dll in the same folder as this library and the s2.dll)
  • This library is compiled against the netstandard 2.1, that means you can use it both in .NET 5+ applications and the Unity game engine, allowing this to be used for games as well.

You still have to build s2.dll manually along with the ggml.

Occasionally I'll include pre-built versions of S2 along with the ggml dll's in Releases, but usually they'll only come with CUDA/VULKAN/CPU support since I don't use crapple(apple).

Model variants

GGUF files are available at rodrigomt/s2-pro-gguf on Hugging Face.

File Size Notes
s2-pro-f16.gguf 9.9 GB Full precision — reference quality — VRAM Usage: ~11.5GB
s2-pro-q8_0.gguf 5.6 GB Near-lossless — VRAM Usage: ~6.7GB
s2-pro-q6_k.gguf 4.5 GB Good quality/size balance — VRAM Usage: ~5.5GB
s2-pro-q5_k_m.gguf 4.0 GB Smaller with still-good quality — VRAM Usage: ~5.0GB
s2-pro-q4_k_m.gguf 3.6 GB Best compact variant — VRAM Usage: ~4.5GB
s2-pro-q3_k.gguf 3.0 GB Usable, but starts stretching short words — VRAM Usage: ~3.9GB
s2-pro-q2_k.gguf 2.6 GB Lowest-size experimental variant — VRAM Usage: ~3.3GB

All variants include both the transformer weights and the audio codec in a single file. The quantized variants above were regenerated with the codec tensors (c.*) kept in F16, so only the AR transformer is quantized.

Example usage:

internal class Program
{
    static void Main(string[] args)
    {
            string ModelFileName = "s2-pro-q4_k_m.gguf";
            string ModelFolder = "D:\\AI Models\\FishModels\\";
            string VoiceFolderDir = "";


            System.Console.WriteLine("Loading Model...");

            //First we load a shared model instance. This instance can be re-used in multiple FishS2Client instances.
            FishS2Sharp.FishModel SharedModel = new FishS2Sharp.FishModel(ModelFolder + ModelFileName, ModelFolder + "tokenizer.json", FishS2Sharp.GPUBackendTypes.Cuda);

            System.Console.WriteLine("Done Loading Model!");


            System.Console.WriteLine("Generating Voice...");

            //You can create re-usable instances of cloned voices. All you need is a 10-15s voice sample, and you need a transcript of what is spoken in the voice sample.
            //Just don't mix/match different VoiceReference's generated from one type of model and run inference/synthesize with another. 
            FishS2Sharp.FishAudioVoiceReference VoiceReference = new FishS2Sharp.FishAudioVoiceReference(SharedModel, "Mortal Combat Voice", VoiceFolderDir + "2.mp3",
                "Raiden! Shang Tsung! Kitana! Choose your destiny! Johnny Cage! Sonya Blade! Kano! Jax! Round One... FIGHT! Finish Him! FATALITY! Flawless Victory!");

            System.Console.WriteLine("Done Generating Voice!");


            //Create an FishS2Client instance. These are NOT thread safe, so you should use your own locking mechanism.
            //If you need multithreaded processing, use a new FishS2Client instance per thread.
            FishS2Sharp.FishS2Client Instance = new FishS2Sharp.FishS2Client(SharedModel);

            //Create default pipeline settings. You can change settings inside it like TopK/P, Temp, MaxTokens etc..
            FishS2Sharp.FishAudioParameters PipelineParameters = new FishS2Sharp.FishAudioParameters() { Temp = 0.8f };


            //Finally, we synthesize our chosen text to our specific(but optional) voice:
            System.Console.WriteLine("Generating TTS...");

            System.Diagnostics.Stopwatch Timer = new System.Diagnostics.Stopwatch(); Timer.Start();
            Instance.Synthesize("My name is Jeff and england is my city!", "D:\\Jeff.wav", PipelineParameters, VoiceReference);

            Timer.Stop(); System.Console.WriteLine("Generation Time: " + Timer.Elapsed.TotalSeconds.ToString("0.000") + "s"); Timer.Reset();

            //Cleanup this sample code.
            Instance.Dispose();
    }
}

Want to help me pay off my therapist? Might make updates more frequent!

About

A proper C# wrapper for the Fish Audio S2 Pro.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages