first commit
Some checks failed
Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled
Build documentation / build (push) Has been cancelled
Build documentation / build_other_lang (push) Has been cancelled
CodeQL Security Analysis / CodeQL Analysis (push) Has been cancelled
New model PR merged notification / Notify new model (push) Has been cancelled
PR CI / pr-ci (push) Has been cancelled
Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Update Transformers metadata / build_and_package (push) Has been cancelled
Slow tests on important models (on Push - A10) / Model CI (push) Has been cancelled
Check Tiny Models / Check tiny models (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI - Flash Attn / Setup (push) Has been cancelled
Nvidia CI - Flash Attn / Model CI (push) Has been cancelled
Nvidia CI / Setup (push) Has been cancelled
Nvidia CI / Model CI (push) Has been cancelled
Nvidia CI / Torch pipeline CI (push) Has been cancelled
Nvidia CI / Example CI (push) Has been cancelled
Nvidia CI / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI / DeepSpeed CI (push) Has been cancelled
Nvidia CI / Quantization CI (push) Has been cancelled
Nvidia CI / Kernels CI (push) Has been cancelled
Doctests / Setup (push) Has been cancelled
Doctests / Call doctest jobs (push) Has been cancelled
Doctests / Send results to webhook (push) Has been cancelled
Extras Smoke Test / Get supported Python versions (push) Has been cancelled
Extras Smoke Test / Test extras on Python ${{ matrix.python-version }} (push) Has been cancelled
Extras Smoke Test / Check Slack token availability (push) Has been cancelled
Extras Smoke Test / Notify failures to Slack (push) Has been cancelled
Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled
Stale Bot / Close Stale Issues (push) Has been cancelled
Some checks failed
Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled
Build documentation / build (push) Has been cancelled
Build documentation / build_other_lang (push) Has been cancelled
CodeQL Security Analysis / CodeQL Analysis (push) Has been cancelled
New model PR merged notification / Notify new model (push) Has been cancelled
PR CI / pr-ci (push) Has been cancelled
Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Update Transformers metadata / build_and_package (push) Has been cancelled
Slow tests on important models (on Push - A10) / Model CI (push) Has been cancelled
Check Tiny Models / Check tiny models (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI - Flash Attn / Setup (push) Has been cancelled
Nvidia CI - Flash Attn / Model CI (push) Has been cancelled
Nvidia CI / Setup (push) Has been cancelled
Nvidia CI / Model CI (push) Has been cancelled
Nvidia CI / Torch pipeline CI (push) Has been cancelled
Nvidia CI / Example CI (push) Has been cancelled
Nvidia CI / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI / DeepSpeed CI (push) Has been cancelled
Nvidia CI / Quantization CI (push) Has been cancelled
Nvidia CI / Kernels CI (push) Has been cancelled
Doctests / Setup (push) Has been cancelled
Doctests / Call doctest jobs (push) Has been cancelled
Doctests / Send results to webhook (push) Has been cancelled
Extras Smoke Test / Get supported Python versions (push) Has been cancelled
Extras Smoke Test / Test extras on Python ${{ matrix.python-version }} (push) Has been cancelled
Extras Smoke Test / Check Slack token availability (push) Has been cancelled
Extras Smoke Test / Notify failures to Slack (push) Has been cancelled
Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled
Stale Bot / Close Stale Issues (push) Has been cancelled
This commit is contained in:
103
docs/source/en/add_audio_processing_components.md
Normal file
103
docs/source/en/add_audio_processing_components.md
Normal file
@@ -0,0 +1,103 @@
|
||||
<!--Copyright 2026 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Add audio processing components
|
||||
|
||||
Audio models require a feature extractor which is accessible behind the [`AutoFeatureExtractor`] entry point.
|
||||
|
||||
> [!NOTE]
|
||||
> For the model and configuration steps, follow the [modular](./modular_transformers) guide first.
|
||||
|
||||
## Feature extractor
|
||||
|
||||
Add a feature extractor when the model consumes raw audio or audio-derived features.
|
||||
|
||||
Create `feature_extraction_<model_name>.py` in the model directory. Inherit from [`SequenceFeatureExtractor`] so the new class gets shared padding, truncation, saving, and loading behavior.
|
||||
|
||||
```py
|
||||
from ...feature_extraction_sequence_utils import SequenceFeatureExtractor
|
||||
|
||||
|
||||
class MyModelFeatureExtractor(SequenceFeatureExtractor):
|
||||
model_input_names = ["input_features", "attention_mask"]
|
||||
|
||||
def __init__(self, feature_size=80, sampling_rate=16000, padding_value=0.0, **kwargs):
|
||||
super().__init__(feature_size=feature_size, sampling_rate=sampling_rate, padding_value=padding_value, **kwargs)
|
||||
|
||||
def __call__(self, raw_speech, sampling_rate=None, **kwargs):
|
||||
if sampling_rate is not None and sampling_rate != self.sampling_rate:
|
||||
raise ValueError(f"`sampling_rate` must be {self.sampling_rate}, but got {sampling_rate}.")
|
||||
|
||||
# Convert raw_speech to model features here.
|
||||
...
|
||||
```
|
||||
|
||||
Keep the constructor small and serializable. Store every value needed to reproduce preprocessing as an instance attribute, and avoid storing runtime-only values such as open files, devices, or decoded audio arrays.
|
||||
|
||||
The `__call__` method must validate the input sampling rate when users pass `sampling_rate`. If the input rate differs from the model's expected rate, raise an error instead of silently resampling.
|
||||
|
||||
Save the feature extractor with the checkpoint by instantiating it in the conversion script and calling [`~FeatureExtractionMixin.save_pretrained`]. Do not manually create or edit preprocessing config files.
|
||||
|
||||
> [!TIP]
|
||||
> See [`Gemma4AudioFeatureExtractor`] for reference.
|
||||
|
||||
## Register the classes
|
||||
|
||||
Expose the new classes from the model package `__init__.py`. Follow the lazy import pattern used by nearby models and guard imports with the same optional dependencies required by the class.
|
||||
|
||||
Map the new class to the model config so [`AutoFeatureExtractor`] can load it. Add an entry to `FEATURE_EXTRACTOR_MAPPING_NAMES` in `src/transformers/models/auto/feature_extraction_auto.py`, following the pattern of nearby entries. Then verify the model type appears there under `FEATURE_EXTRACTOR_MAPPING_NAMES` for [`AutoFeatureExtractor`].
|
||||
|
||||
- `FEATURE_EXTRACTOR_MAPPING_NAMES` for [`AutoFeatureExtractor`]
|
||||
|
||||
## Testing
|
||||
|
||||
Add tests for each audio processing component in the model test directory. Feature extractor tests usually live in `tests/models/<model_name>/test_feature_extraction_<model_name>.py`.
|
||||
|
||||
For feature extractors that inherit from [`SequenceFeatureExtractor`], inherit from [`SequenceFeatureExtractionTestMixin`]. The mixin covers save and load behavior, padding, truncation, tensor conversion, and common feature extractor properties. Provide a tester object with `prepare_feat_extract_dict()` and `prepare_inputs_for_common()` so the mixin can instantiate the feature extractor and build short dummy audio inputs.
|
||||
|
||||
```py
|
||||
from ...test_sequence_feature_extraction_common import SequenceFeatureExtractionTestMixin
|
||||
|
||||
class MyModelFeatureExtractionTest(SequenceFeatureExtractionTestMixin, unittest.TestCase):
|
||||
feature_extraction_class = MyModelFeatureExtractor
|
||||
|
||||
def setUp(self):
|
||||
self.feat_extract_tester = MyModelFeatureExtractionTester(self)
|
||||
```
|
||||
|
||||
Add focused tests for model-specific behavior that the mixin doesn't know about. For audio feature extractors, that usually means checking the feature shape returned by `__call__`, validating that an incorrect `sampling_rate` raises an error, and checking any custom normalization or feature computation.
|
||||
|
||||
If the model also has a [`ProcessorMixin`] that wraps the feature extractor, add `tests/models/<model_name>/test_processing_<model_name>.py` and inherit from [`ProcessorTesterMixin`]. Set `processor_class` and override `_setup_<component>()` class methods for components that can't be constructed without arguments. Use `_setup_test_attributes()` to expose placeholder tokens used by the common processor tests.
|
||||
|
||||
```py
|
||||
from ...test_processing_common import ProcessorTesterMixin
|
||||
|
||||
class MyModelProcessorTest(ProcessorTesterMixin, unittest.TestCase):
|
||||
processor_class = MyModelProcessor
|
||||
|
||||
@classmethod
|
||||
def _setup_feature_extractor(cls):
|
||||
return cls._get_component_class_from_processor("feature_extractor")(sampling_rate=16000)
|
||||
|
||||
@classmethod
|
||||
def _setup_test_attributes(cls, processor):
|
||||
cls.audio_token = getattr(processor, "audio_token", "")
|
||||
```
|
||||
|
||||
## Next steps
|
||||
|
||||
- Read the [Auto-generating docstrings](./auto_docstring) guide to auto-generate consistent docstrings with `@auto_docstring`.
|
||||
- Read the [Feature extractors](./feature_extractors) guide for user-facing preprocessing behavior.
|
||||
Reference in New Issue
Block a user