gavin/transformers

Fork 0

Files

陈赣 06f1fd69a6

Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled

Details

Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled

Details

Build documentation / build (push) Has been cancelled

Details

Build documentation / build_other_lang (push) Has been cancelled

Details

CodeQL Security Analysis / CodeQL Analysis (push) Has been cancelled

Details

New model PR merged notification / Notify new model (push) Has been cancelled

Details

PR CI / pr-ci (push) Has been cancelled

Details

Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled

Details

Secret Leaks / trufflehog (push) Has been cancelled

Details

Update Transformers metadata / build_and_package (push) Has been cancelled

Details

Slow tests on important models (on Push - A10) / Model CI (push) Has been cancelled

Details

Check Tiny Models / Check tiny models (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled

Details

Nvidia CI - Flash Attn / Setup (push) Has been cancelled

Details

Nvidia CI - Flash Attn / Model CI (push) Has been cancelled

Details

Nvidia CI / Setup (push) Has been cancelled

Details

Nvidia CI / Model CI (push) Has been cancelled

Details

Nvidia CI / Torch pipeline CI (push) Has been cancelled

Details

Nvidia CI / Example CI (push) Has been cancelled

Details

Nvidia CI / Trainer/FSDP CI (push) Has been cancelled

Details

Nvidia CI / DeepSpeed CI (push) Has been cancelled

Details

Nvidia CI / Quantization CI (push) Has been cancelled

Details

Nvidia CI / Kernels CI (push) Has been cancelled

Details

Doctests / Setup (push) Has been cancelled

Details

Doctests / Call doctest jobs (push) Has been cancelled

Details

Doctests / Send results to webhook (push) Has been cancelled

Details

Extras Smoke Test / Get supported Python versions (push) Has been cancelled

Details

Extras Smoke Test / Test extras on Python ${{ matrix.python-version }} (push) Has been cancelled

Details

Extras Smoke Test / Check Slack token availability (push) Has been cancelled

Details

Extras Smoke Test / Notify failures to Slack (push) Has been cancelled

Details

Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled

Details

Stale Bot / Close Stale Issues (push) Has been cancelled

Details

first commit

2026-06-05 16:53:03 +08:00

8.3 KiB

Raw Permalink Blame History

This model was contributed to Hugging Face Transformers on 2023-07-17.

Bark

Overview

Bark is a transformer-based text-to-speech model proposed by Suno AI in suno-ai/bark.

Bark is made of 4 main models:

[BarkSemanticModel] (also referred to as the 'text' model): a causal auto-regressive transformer model that takes as input tokenized text, and predicts semantic text tokens that capture the meaning of the text.
[BarkCoarseModel] (also referred to as the 'coarse acoustics' model): a causal autoregressive transformer, that takes as input the results of the [BarkSemanticModel] model. It aims at predicting the first two audio codebooks necessary for EnCodec.
[BarkFineModel] (the 'fine acoustics' model), this time a non-causal autoencoder transformer, which iteratively predicts the last codebooks based on the sum of the previous codebooks embeddings.
having predicted all the codebook channels from the [EncodecModel], Bark uses it to decode the output audio array.

It should be noted that each of the first three modules can support conditional speaker embeddings to condition the output sound according to specific predefined voice.

This model was contributed by Yoach Lacombe (ylacombe) and Sanchit Gandhi (sanchit-gandhi). The original code can be found here.

Optimizing Bark

Bark can be optimized with just a few extra lines of code, which significantly reduces its memory footprint and accelerates inference.

Using half-precision

You can speed up inference and reduce memory footprint by 50% simply by loading the model in half-precision.

from transformers import BarkModel


model = BarkModel.from_pretrained("suno/bark-small", device_map="auto")

Using CPU offload

As mentioned above, Bark is made up of 4 sub-models, which are called up sequentially during audio generation. In other words, while one sub-model is in use, the other sub-models are idle.

If you're using a CUDA GPU or Intel XPU, a simple solution to benefit from an 80% reduction in memory footprint is to offload the submodels from device to CPU when they're idle. This operation is called CPU offloading. You can use it with one line of code as follows:

model.enable_cpu_offload()

Note that 🤗 Accelerate must be installed before using this feature. Here's how to install it.

Using Flash Attention 2

Flash Attention 2 is an even faster, optimized version of the previous optimization.

Installation

First, check whether your hardware is compatible with Flash Attention 2. The latest list of compatible hardware can be found in the official documentation. Next, install the latest version of Flash Attention 2:

pip install -U flash-attn --no-build-isolation

Usage

To load a model using Flash Attention 2, we can pass the attn_implementation="flash_attention_2" flag to .from_pretrained. We'll also load the model in half-precision (e.g. torch.float16), since it results in almost no degradation to audio quality but significantly lower memory usage and faster inference:

model = BarkModel.from_pretrained("suno/bark-small", attn_implementation="flash_attention_2", device_map="auto")

Performance comparison

The following diagram shows the latency for the native attention implementation (no optimisation) against Flash Attention 2. In all cases, we generate 400 semantic tokens on a 40GB A100 GPU with PyTorch 2.1:

To put this into perspective, on an NVIDIA A100 and when generating 400 semantic tokens with a batch size of 16, you can get 17 times the throughput and still be 2 seconds faster than generating sentences one by one with the native model implementation. In other words, all the samples will be generated 17 times faster.

Combining optimization techniques

You can combine optimization techniques, and use CPU offload, half-precision and Flash Attention 2 all at once.

from transformers import BarkModel


# load in fp16 and use Flash Attention 2
model = BarkModel.from_pretrained("suno/bark-small", attn_implementation="flash_attention_2", device_map="auto")

# enable CPU offload
model.enable_cpu_offload()

Find out more on inference optimization techniques here.

Usage tips

Suno offers a library of voice presets in a number of languages here. These presets are also uploaded in the hub here or here.

from transformers import AutoProcessor, BarkModel


processor = AutoProcessor.from_pretrained("suno/bark")
model = BarkModel.from_pretrained("suno/bark", device_map="auto")

voice_preset = "v2/en_speaker_6"

inputs = processor("Hello, my dog is cute", voice_preset=voice_preset)

audio_array = model.generate(**inputs)
audio_array = audio_array.cpu().numpy().squeeze()

Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects.

# Multilingual speech - simplified Chinese
inputs = processor("惊人的！我会说中文")

# Multilingual speech - French - let's use a voice_preset as well
inputs = processor("Incroyable! Je peux générer du son.", voice_preset="fr_speaker_5")

# Bark can also generate music. You can help it out by adding music notes around your lyrics.
inputs = processor("♪ Hello, my dog is cute ♪")

audio_array = model.generate(**inputs)
audio_array = audio_array.cpu().numpy().squeeze()

The model can also produce nonverbal communications like laughing, sighing and crying.

# Adding non-speech cues to the input text
inputs = processor("Hello uh [clears throat], my dog is cute [laughter]")

audio_array = model.generate(**inputs)
audio_array = audio_array.cpu().numpy().squeeze()

To save the audio, simply take the sample rate from the model config and some scipy utility:

from scipy.io.wavfile import write as write_wav


# save audio to disk, but first take the sample rate from the model config
sample_rate = model.generation_config.sample_rate
write_wav("bark_generation.wav", sample_rate, audio_array)

BarkConfig

autodoc BarkConfig - all

BarkProcessor

autodoc BarkProcessor - all - call

BarkModel

autodoc BarkModel - generate - enable_cpu_offload

BarkSemanticModel

autodoc BarkSemanticModel - forward

BarkCoarseModel

autodoc BarkCoarseModel - forward

BarkFineModel

autodoc BarkFineModel - forward

BarkCausalModel

autodoc BarkCausalModel - forward

BarkCoarseConfig

autodoc BarkCoarseConfig - all

BarkFineConfig

autodoc BarkFineConfig - all

BarkSemanticConfig

autodoc BarkSemanticConfig - all

8.3 KiB Raw Permalink Blame History

Bark

Overview

Optimizing Bark

Using half-precision

Using CPU offload

Using Flash Attention 2

Installation

Usage

Performance comparison

Combining optimization techniques

Usage tips

BarkConfig

BarkProcessor

BarkModel

BarkSemanticModel

BarkCoarseModel

BarkFineModel

BarkCausalModel

BarkCoarseConfig

BarkFineConfig

BarkSemanticConfig

8.3 KiB

Raw Permalink Blame History