gavin/transformers

Fork 0

Files

陈赣 06f1fd69a6

Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled

Details

Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled

Details

Build documentation / build (push) Has been cancelled

Details

Build documentation / build_other_lang (push) Has been cancelled

Details

CodeQL Security Analysis / CodeQL Analysis (push) Has been cancelled

Details

New model PR merged notification / Notify new model (push) Has been cancelled

Details

PR CI / pr-ci (push) Has been cancelled

Details

Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled

Details

Secret Leaks / trufflehog (push) Has been cancelled

Details

Update Transformers metadata / build_and_package (push) Has been cancelled

Details

Slow tests on important models (on Push - A10) / Model CI (push) Has been cancelled

Details

Check Tiny Models / Check tiny models (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled

Details

Nvidia CI - Flash Attn / Setup (push) Has been cancelled

Details

Nvidia CI - Flash Attn / Model CI (push) Has been cancelled

Details

Nvidia CI / Setup (push) Has been cancelled

Details

Nvidia CI / Model CI (push) Has been cancelled

Details

Nvidia CI / Torch pipeline CI (push) Has been cancelled

Details

Nvidia CI / Example CI (push) Has been cancelled

Details

Nvidia CI / Trainer/FSDP CI (push) Has been cancelled

Details

Nvidia CI / DeepSpeed CI (push) Has been cancelled

Details

Nvidia CI / Quantization CI (push) Has been cancelled

Details

Nvidia CI / Kernels CI (push) Has been cancelled

Details

Doctests / Setup (push) Has been cancelled

Details

Doctests / Call doctest jobs (push) Has been cancelled

Details

Doctests / Send results to webhook (push) Has been cancelled

Details

Extras Smoke Test / Get supported Python versions (push) Has been cancelled

Details

Extras Smoke Test / Test extras on Python ${{ matrix.python-version }} (push) Has been cancelled

Details

Extras Smoke Test / Check Slack token availability (push) Has been cancelled

Details

Extras Smoke Test / Notify failures to Slack (push) Has been cancelled

Details

Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled

Details

Stale Bot / Close Stale Issues (push) Has been cancelled

Details

first commit

2026-06-05 16:53:03 +08:00

5.1 KiB

Raw Blame History

CLVP

Overview

CLVP (Contrastive Language-Voice Pretrained Transformer) モデルは、James Betker によって Better speech synthesis through scaling で提案されました。

論文の要約は次のとおりです。

*近年、画像生成の分野は自己回帰変換器と DDPM の応用によって革命を起こしています。これらのアプローチは、画像生成のプロセスを段階的な確率的プロセスとしてモデル化し、大量のコンピューティングとデータを活用して画像の分布を学習します。パフォーマンスを向上させるこの方法論は、画像に限定される必要はありません。この論文では、画像生成ドメインの進歩を音声合成に適用する方法について説明します。その結果、表現力豊かなマルチ音声テキスト読み上げシステムである TorToise が誕生しました。

このモデルは Susnato Dhar によって提供されました。元のコードはここにあります。

Usage tips

CLVP は Tortoise TTS モデルの不可欠な部分です。
CLVP を使用して、生成されたさまざまな音声候補を提供されたテキストと比較することができ、最良の音声トークンが拡散モデルに転送されます。
Tortoise の使用には、[ClvpModelForConditionalGeneration.generate()] メソッドの使用を強くお勧めします。
16 kHz を期待する他のオーディオモデルとは対照的に、CLVP モデルはオーディオが 22.05 kHz でサンプリングされることを期待していることに注意してください。

Brief Explanation:

[ClvpTokenizer] はテキスト入力をトークン化し、[ClvpFeatureExtractor] は目的のオーディオからログメルスペクトログラムを抽出します。
[ClvpConditioningEncoder] は、これらのテキストトークンとオーディオ表現を取得し、テキストとオーディオに基づいて条件付けされた埋め込みに変換します。
[ClvpForCausalLM] は、これらの埋め込みを使用して複数の音声候補を生成します。
各音声候補は音声エンコーダ ([ClvpEncoder]) を通過してベクトル表現に変換され、テキストエンコーダ ([ClvpEncoder]) はテキストトークンを同じ潜在空間に変換します。
最後に、各音声ベクトルをテキストベクトルと比較して、どの音声ベクトルがテキストベクトルに最も類似しているかを確認します。
[ClvpModelForConditionalGeneration.generate()] は、上記のすべてのロジックを 1 つのメソッドに圧縮します。

例：

>>> import datasets
>>> from transformers import ClvpProcessor, ClvpModelForConditionalGeneration

>>> # Define the Text and Load the Audio (We are taking an audio example from HuggingFace Hub using `datasets` library).
>>> text = "This is an example text."

>>> ds = datasets.load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> ds = ds.cast_column("audio", datasets.Audio(sampling_rate=22050))
>>> sample = ds[0]["audio"]

>>> # Define processor and model.
>>> processor = ClvpProcessor.from_pretrained("susnato/clvp_dev")
>>> model = ClvpModelForConditionalGeneration.from_pretrained("susnato/clvp_dev")

>>> # Generate processor output and model output.
>>> processor_output = processor(raw_speech=sample["array"], sampling_rate=sample["sampling_rate"], text=text, return_tensors="pt")
>>> generated_output = model.generate(**processor_output)

ClvpConfig

autodoc ClvpConfig

ClvpEncoderConfig

autodoc ClvpEncoderConfig

ClvpDecoderConfig

autodoc ClvpDecoderConfig

ClvpTokenizer

autodoc ClvpTokenizer - save_vocabulary

ClvpFeatureExtractor

autodoc ClvpFeatureExtractor - call

ClvpProcessor

autodoc ClvpProcessor - call - decode - batch_decode

ClvpModelForConditionalGeneration

autodoc ClvpModelForConditionalGeneration - forward - generate - get_text_features - get_speech_features

ClvpForCausalLM

autodoc ClvpForCausalLM

ClvpModel

autodoc ClvpModel

ClvpEncoder

autodoc ClvpEncoder

ClvpDecoder

autodoc ClvpDecoder

5.1 KiB Raw Blame History Unescape Escape

CLVP

Overview

Usage tips

Brief Explanation:

ClvpConfig

ClvpEncoderConfig

ClvpDecoderConfig

ClvpTokenizer

ClvpFeatureExtractor

ClvpProcessor

ClvpModelForConditionalGeneration

ClvpForCausalLM

ClvpModel

ClvpEncoder

ClvpDecoder

5.1 KiB

Raw Blame History