FishS2Sharp

A proper C# wrapper for the Fish Audio S2 Pro repo s2.cpp.

What features are currently implemented:

Ability to transform text to voice using either only using a text input and/or combined with an reference audio(voice sample) file.
Ability to save the resulting audio to file and/or retreive it as an array of float samples in code(mono).
The library is modular, meaning you can load the model/clients separately, which allows you to re-use the model in other FishS2Client instances without reloading them again.
Audio references are binded and stored in FishAudioVoiceReference instances, and they are processed at the time you create/add/register them with RegisterVoiceReference(). This allows to process the reference voice samples for the model ONLY ONCE and re-use it without having to re-load it for every prompt. (NOTE: You can only use/synth. processed voice samples for the model you used to process them with.)
CUDA, Vulkan and Metal support, as well as CPU fallback. (As long you have ggml-cuda.dll/ggml-vulkan.dll/ggml-metal.dll in the same folder as this library and the s2.dll)
This library is compiled against the netstandard 2.1, that means you can use it both in .NET 5+ applications and the Unity game engine, allowing this to be used for games as well.

You still have to build s2.dll manually along with the ggml.

Occasionally I'll include pre-built versions of S2 along with the ggml dll's in Releases, but usually they'll only come with CUDA/VULKAN/CPU support since I don't use crapple(apple).

Model variants

GGUF files are available at rodrigomt/s2-pro-gguf on Hugging Face.

File	Size	Notes
`s2-pro-f16.gguf`	9.9 GB	Full precision — reference quality — VRAM Usage: ~11.5GB
`s2-pro-q8_0.gguf`	5.6 GB	Near-lossless — VRAM Usage: ~6.7GB
`s2-pro-q6_k.gguf`	4.5 GB	Good quality/size balance — VRAM Usage: ~5.5GB
`s2-pro-q5_k_m.gguf`	4.0 GB	Smaller with still-good quality — VRAM Usage: ~5.0GB
`s2-pro-q4_k_m.gguf`	3.6 GB	Best compact variant — VRAM Usage: ~4.5GB
`s2-pro-q3_k.gguf`	3.0 GB	Usable, but starts stretching short words — VRAM Usage: ~3.9GB
`s2-pro-q2_k.gguf`	2.6 GB	Lowest-size experimental variant — VRAM Usage: ~3.3GB

All variants include both the transformer weights and the audio codec in a single file. The quantized variants above were regenerated with the codec tensors (c.*) kept in F16, so only the AR transformer is quantized.

Example usage:

internal class Program
{
    static void Main(string[] args)
    {
            string ModelFileName = "s2-pro-q4_k_m.gguf";
            string ModelFolder = "D:\\AI Models\\FishModels\\";
            string VoiceFolderDir = "";


            System.Console.WriteLine("Loading Model...");

            //First we load a shared model instance. This instance can be re-used in multiple FishS2Client instances.
            FishS2Sharp.FishModel SharedModel = new FishS2Sharp.FishModel(ModelFolder + ModelFileName, ModelFolder + "tokenizer.json", FishS2Sharp.GPUBackendTypes.Cuda);

            System.Console.WriteLine("Done Loading Model!");


            System.Console.WriteLine("Generating Voice...");

            //You can create re-usable instances of cloned voices. All you need is a 10-15s voice sample, and you need a transcript of what is spoken in the voice sample.
            //Just don't mix/match different VoiceReference's generated from one type of model and run inference/synthesize with another. 
            FishS2Sharp.FishAudioVoiceReference VoiceReference = new FishS2Sharp.FishAudioVoiceReference(SharedModel, "Mortal Combat Voice", VoiceFolderDir + "2.mp3",
                "Raiden! Shang Tsung! Kitana! Choose your destiny! Johnny Cage! Sonya Blade! Kano! Jax! Round One... FIGHT! Finish Him! FATALITY! Flawless Victory!");

            System.Console.WriteLine("Done Generating Voice!");


            //Create an FishS2Client instance. These are NOT thread safe, so you should use your own locking mechanism.
            //If you need multithreaded processing, use a new FishS2Client instance per thread.
            FishS2Sharp.FishS2Client Instance = new FishS2Sharp.FishS2Client(SharedModel);

            //Create default pipeline settings. You can change settings inside it like TopK/P, Temp, MaxTokens etc..
            FishS2Sharp.FishAudioParameters PipelineParameters = new FishS2Sharp.FishAudioParameters() { Temp = 0.8f };


            //Finally, we synthesize our chosen text to our specific(but optional) voice:
            System.Console.WriteLine("Generating TTS...");

            System.Diagnostics.Stopwatch Timer = new System.Diagnostics.Stopwatch(); Timer.Start();
            Instance.Synthesize("My name is Jeff and england is my city!", "D:\\Jeff.wav", PipelineParameters, VoiceReference);

            Timer.Stop(); System.Console.WriteLine("Generation Time: " + Timer.Elapsed.TotalSeconds.ToString("0.000") + "s"); Timer.Reset();

            //Cleanup this sample code.
            Instance.Dispose();
    }
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
FishS2Sharp		FishS2Sharp
.gitignore		.gitignore
FishS2Sharp.sln		FishS2Sharp.sln
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FishS2Sharp

Model variants

Want to help me pay off my therapist? Might make updates more frequent!

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FishS2Sharp

Model variants

Want to help me pay off my therapist? Might make updates more frequent!

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages