3.8 KiB
This model was published in HF papers on 2024-10-21 and contributed to Hugging Face Transformers on 2026-02-04.
Moonshine Streaming
Moonshine Streaming is a streaming variant of the Moonshine speech recognition model, optimized for real-time transcription with low latency. Like the original Moonshine, it is an encoder-decoder model that uses Rotary Position Embedding (RoPE) for handling variable-length speech efficiently. The streaming architecture includes sliding window attention in the encoder and a context adapter that enables incremental processing of audio chunks.
Moonshine Streaming is available in three sizes: tiny, small, and medium, offering a trade-off between speed and accuracy. It is particularly well-suited for on-device streaming transcription and voice command applications.
You can find all the original Moonshine Streaming checkpoints under the Useful Sensors organization.
Tip
Moonshine Streaming processes raw audio waveforms directly without requiring mel-spectrogram preprocessing, making it efficient for real-time applications.
The example below demonstrates how to transcribe speech into text with [Pipeline] or the [AutoModel] class.
from transformers import pipeline
pipe = pipeline(
task="automatic-speech-recognition",
model="UsefulSensors/moonshine-streaming-tiny",
device=0
)
pipe("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
from datasets import load_dataset
from transformers import AutoProcessor, MoonshineStreamingForConditionalGeneration
processor = AutoProcessor.from_pretrained("UsefulSensors/moonshine-streaming-tiny")
model = MoonshineStreamingForConditionalGeneration.from_pretrained(
"UsefulSensors/moonshine-streaming-tiny",
device_map="auto",
attn_implementation="sdpa"
)
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
audio_sample = ds[0]["audio"]
inputs = processor(audio_sample["array"], return_tensors="pt").to(model.device)
inputs = inputs.to(model.device)
generated_ids = model.generate(**inputs, max_new_tokens=100)
transcription = processor.decode(generated_ids[0], skip_special_tokens=True)
transcription
MoonshineStreamingProcessor
autodoc MoonshineStreamingProcessor
MoonshineStreamingEncoderConfig
autodoc MoonshineStreamingEncoderConfig
MoonshineStreamingConfig
autodoc MoonshineStreamingConfig
MoonshineStreamingModel
autodoc MoonshineStreamingModel - forward
MoonshineStreamingForConditionalGeneration
autodoc MoonshineStreamingForConditionalGeneration - forward - generate