NeuTTS Nano (English)

neuphonic/neutts-nano

published Nov 2025 · updated Feb 2026

NeuTTS Nano is a lightweight, on-device text-to-speech model with instant voice cloning, built for real-time speech synthesis on CPUs and edge devices.

est. price

~$0.0075

· estimated, set at launch

API providers

downloads / mo

113.3K

license

other

specs

Task	Text-to-Speech (TTS)
Architecture	Compact LM backbone + NeuCodec audio codec (single codebook)
Parameters	~116.8M active, ~228.7M total
License	NeuTTS Open License 1.0
Language	English only

about this model

NeuTTS Nano is an English-language text-to-speech (TTS) model built for on-device generation with instant voice cloning, combining a compact language model backbone with a neural audio codec. Gigarouter hosts this model as a managed, OpenAI‑compatible API, enabling developers to integrate high‑quality speech synthesis without maintaining infrastructure.

NeuTTS Nano introduction video thumbnail

Model Details

Active parameters: ~116.8 M (backbone only); total parameters: ~228.7 M (backbone + tied embeddings/head).
Context window: 2048 tokens (≈30 seconds of audio including the prompt).
Audio codec: NeuCodec, a single‑codebook codec achieving low‑bitrate, high‑quality audio.
Optimized for real‑time CPU inference. On a 2‑thread CPU the model achieves a real‑time factor of 2× (twice as fast as real time).
Outputs are watermarked.

Throughput Benchmarks (Q4_0 Quantisation)

Token generation speed on four devices (tokens/s, CPU‑only unless noted):

Galaxy A25 5G: 45 t/s
AMD Ryzen 9 HX 370: 221 t/s
iMac M4 (16 GB): 195 t/s
NVIDIA RTX 4090: 19,268 t/s

Comparison with NeuTTS-Air

Model	Active Params	Total Params	License
NeuTTS-Air	~360 M	~552 M	Apache 2.0
NeuTTS Nano	~120 M	~229 M	NeuTTS Open License 1.0

Voice Cloning

To clone a voice, provide a reference audio sample (mono, 16–44 kHz, 3–15 s, clean, .wav) and a text prompt. The model synthesises the given text in the style of that reference speaker.

best for

·Instant voice cloning from a few seconds of audio
·Real-time speech synthesis on laptop-class CPUs
·On-device voice agents and assistants
·Privacy-sensitive applications where audio must stay local

FAQ

What are the input and output formats for this model?

The model takes a reference audio sample (mono WAV, 3-15 seconds, 16-44 kHz) and a text string. It outputs synthesized speech as a 24 kHz WAV file.

What is the context window size?

The context window is 2048 tokens, corresponding to roughly 30 seconds of audio including the prompt.

How fast is this model on CPU?

In Q4_0 quantisation, throughput is 45 tokens/s on a Galaxy A25 5G, 221 tokens/s on an AMD Ryzen 9 HX 370, and 195 tokens/s on an iMac M4 (all CPU-only).

What license does NeuTTS Nano use?

It uses the NeuTTS Open License 1.0, which allows free non-commercial and commercial use with attribution. Check the full license text for details.

How can I call this model via an API?

Use the gigarouter OpenAI-compatible endpoint with an API key. Pass the reference audio as a file and the text as a prompt to generate speech.

not yet live

We're benchmarking and onboarding NeuTTS Nano (English) as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text-to-speech models

compare all →

XTTS-v2

9.3M dl/mo

Qwen3-TTS-12Hz-1.7B-CustomVoice

2M dl/mo

Qwen3-TTS-12Hz-0.6B-CustomVoice