first commit

2026-06-05 16:53:03 +08:00
commit 06f1fd69a6
6047 changed files with 1895387 additions and 0 deletions
--- a/docs/source/en/internal/audio_utils.md
+++ b/docs/source/en/internal/audio_utils.md
@@ -0,0 +1,39 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# Utilities for `FeatureExtractors`
+
+This page lists all the utility functions that can be used by the audio [`FeatureExtractor`] in order to compute special features from a raw audio using common algorithms such as *Short Time Fourier Transform* or *log mel spectrogram*.
+
+Most of those are only useful if you are studying the code of the audio processors in the library.
+
+## Audio Transformations
+
+[[autodoc]] audio_utils.hertz_to_mel
+
+[[autodoc]] audio_utils.mel_to_hertz
+
+[[autodoc]] audio_utils.mel_filter_bank
+
+[[autodoc]] audio_utils.optimal_fft_length
+
+[[autodoc]] audio_utils.window_function
+
+[[autodoc]] audio_utils.spectrogram
+
+[[autodoc]] audio_utils.power_to_db
+
+[[autodoc]] audio_utils.amplitude_to_db
--- a/docs/source/en/internal/file_utils.md
+++ b/docs/source/en/internal/file_utils.md
@@ -0,0 +1,45 @@
+<!--Copyright 2021 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# General Utilities
+
+This page lists all of Transformers general utility functions that are found in the file `utils.py`.
+
+Most of those are only useful if you are studying the general code in the library.
+
+## Enums and namedtuples
+
+[[autodoc]] utils.ExplicitEnum
+
+[[autodoc]] utils.PaddingStrategy
+
+[[autodoc]] utils.TensorType
+
+## Special Decorators
+
+[[autodoc]] utils.add_start_docstrings
+
+[[autodoc]] utils.add_start_docstrings_to_model_forward
+
+[[autodoc]] utils.add_end_docstrings
+
+[[autodoc]] utils.add_code_sample_docstrings
+
+[[autodoc]] utils.replace_return_docstrings
+
+## Other Utilities
+
+[[autodoc]] utils._LazyModule
--- a/docs/source/en/internal/generation_utils.md
+++ b/docs/source/en/internal/generation_utils.md
@@ -0,0 +1,282 @@
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# Utilities for Generation
+
+This page lists all the utility functions used by [`~generation.GenerationMixin.generate`].
+
+## Generate Outputs
+
+The output of [`~generation.GenerationMixin.generate`] is an instance of a subclass of
+[`~utils.ModelOutput`]. This output is a data structure containing all the information returned
+by [`~generation.GenerationMixin.generate`], but that can also be used as tuple or dictionary.
+
+Here's an example:
+
+```python
+from transformers import GPT2Tokenizer, GPT2LMHeadModel
+
+tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
+model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
+
+inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
+generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
+```
+
+The `generation_output` object is a [`~generation.GenerateDecoderOnlyOutput`], as we can
+see in the documentation of that class below, it means it has the following attributes:
+
+- `sequences`: the generated sequences of tokens
+- `scores` (optional): the prediction scores of the language modelling head, for each generation step
+- `hidden_states` (optional): the hidden states of the model, for each generation step
+- `attentions` (optional): the attention weights of the model, for each generation step
+
+Here we have the `scores` since we passed along `output_scores=True`, but we don't have `hidden_states` and
+`attentions` because we didn't pass `output_hidden_states=True` or `output_attentions=True`.
+
+You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
+will get `None`. Here for instance `generation_output.scores` are all the generated prediction scores of the
+language modeling head, and `generation_output.attentions` is `None`.
+
+When using our `generation_output` object as a tuple, it only keeps the attributes that don't have `None` values.
+Here, for instance, it has two elements, `loss` then `logits`, so
+
+```python
+generation_output[:2]
+```
+
+will return the tuple `(generation_output.sequences, generation_output.scores)` for instance.
+
+When using our `generation_output` object as a dictionary, it only keeps the attributes that don't have `None`
+values. Here, for instance, it has two keys that are `sequences` and `scores`.
+
+We document here all output types.
+
+[[autodoc]] generation.GenerateDecoderOnlyOutput
+
+[[autodoc]] generation.GenerateEncoderDecoderOutput
+
+[[autodoc]] generation.GenerateBeamDecoderOnlyOutput
+
+[[autodoc]] generation.GenerateBeamEncoderDecoderOutput
+
+## LogitsProcessor
+
+A [`LogitsProcessor`] can be used to modify the prediction scores of a language model head for
+generation.
+
+[[autodoc]] AlternatingCodebooksLogitsProcessor
+    - __call__
+
+[[autodoc]] ClassifierFreeGuidanceLogitsProcessor
+    - __call__
+
+[[autodoc]] EncoderNoRepeatNGramLogitsProcessor
+    - __call__
+
+[[autodoc]] EncoderRepetitionPenaltyLogitsProcessor
+    - __call__
+
+[[autodoc]] EpsilonLogitsWarper
+    - __call__
+
+[[autodoc]] EtaLogitsWarper
+    - __call__
+
+[[autodoc]] ExponentialDecayLengthPenalty
+    - __call__
+
+[[autodoc]] ForcedBOSTokenLogitsProcessor
+    - __call__
+
+[[autodoc]] ForcedEOSTokenLogitsProcessor
+    - __call__
+
+[[autodoc]] InfNanRemoveLogitsProcessor
+    - __call__
+
+[[autodoc]] LogitNormalization
+    - __call__
+
+[[autodoc]] LogitsProcessor
+    - __call__
+
+[[autodoc]] LogitsProcessorList
+    - __call__
+
+[[autodoc]] MinLengthLogitsProcessor
+    - __call__
+
+[[autodoc]] MinNewTokensLengthLogitsProcessor
+    - __call__
+
+[[autodoc]] MinPLogitsWarper
+    - __call__
+
+[[autodoc]] NoBadWordsLogitsProcessor
+    - __call__
+
+[[autodoc]] NoRepeatNGramLogitsProcessor
+    - __call__
+
+[[autodoc]] PrefixConstrainedLogitsProcessor
+    - __call__
+
+[[autodoc]] RepetitionPenaltyLogitsProcessor
+    - __call__
+
+[[autodoc]] SequenceBiasLogitsProcessor
+    - __call__
+
+[[autodoc]] SuppressTokensAtBeginLogitsProcessor
+    - __call__
+
+[[autodoc]] SuppressTokensLogitsProcessor
+    - __call__
+
+[[autodoc]] SynthIDTextWatermarkLogitsProcessor
+    - __call__
+
+[[autodoc]] TemperatureLogitsWarper
+    - __call__
+
+[[autodoc]] TopHLogitsWarper
+    - __call__
+
+[[autodoc]] TopKLogitsWarper
+    - __call__
+
+[[autodoc]] TopPLogitsWarper
+    - __call__
+
+[[autodoc]] TypicalLogitsWarper
+    - __call__
+
+[[autodoc]] UnbatchedClassifierFreeGuidanceLogitsProcessor
+    - __call__
+
+[[autodoc]] WhisperTimeStampLogitsProcessor
+    - __call__
+
+[[autodoc]] WatermarkLogitsProcessor
+    - __call__
+
+## StoppingCriteria
+
+A [`StoppingCriteria`] can be used to change when to stop generation (other than EOS token). Please note that this is exclusively available to our PyTorch implementations.
+
+[[autodoc]] StoppingCriteria
+    - __call__
+
+[[autodoc]] StoppingCriteriaList
+    - __call__
+
+[[autodoc]] MaxLengthCriteria
+    - __call__
+
+[[autodoc]] MaxTimeCriteria
+    - __call__
+
+[[autodoc]] StopStringCriteria
+    - __call__
+
+[[autodoc]] EosTokenCriteria
+    - __call__
+
+## Streamers
+
+[[autodoc]] TextStreamer
+
+[[autodoc]] TextIteratorStreamer
+
+[[autodoc]] AsyncTextIteratorStreamer
+
+## Caches
+
+[[autodoc]] CacheLayerMixin
+    - update
+    - get_seq_length
+    - get_mask_sizes
+    - get_max_cache_shape
+    - reset
+    - reorder_cache
+    - lazy_initialization
+
+[[autodoc]] DynamicLayer
+    - update
+    - lazy_initialization
+    - crop
+    - batch_repeat_interleave
+    - batch_select_indices
+
+[[autodoc]] StaticLayer
+    - update
+    - lazy_initialization
+
+[[autodoc]] StaticSlidingWindowLayer
+    - update
+    - lazy_initialization
+
+[[autodoc]] QuantoQuantizedLayer
+    - update
+    - lazy_initialization
+
+[[autodoc]] HQQQuantizedLayer
+    - update
+    - lazy_initialization
+
+[[autodoc]] Cache
+    - update
+    - early_initialization
+    - get_seq_length
+    - get_mask_sizes
+    - get_max_cache_shape
+    - reset
+    - reorder_cache
+    - crop
+    - batch_repeat_interleave
+    - batch_select_indices
+
+[[autodoc]] DynamicCache
+
+[[autodoc]] StaticCache
+
+[[autodoc]] QuantizedCache
+
+[[autodoc]] EncoderDecoderCache
+
+## Watermark Utils
+
+[[autodoc]] WatermarkingConfig
+    - __call__
+
+[[autodoc]] WatermarkDetector
+    - __call__
+
+[[autodoc]] BayesianDetectorConfig
+
+[[autodoc]] BayesianDetectorModel
+    - forward
+
+[[autodoc]] SynthIDTextWatermarkingConfig
+
+[[autodoc]] SynthIDTextWatermarkDetector
+    - __call__
+
+## Compile Utils
+
+[[autodoc]] CompileConfig
+    - __call__
--- a/docs/source/en/internal/image_processing_utils.md
+++ b/docs/source/en/internal/image_processing_utils.md
@@ -0,0 +1,48 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# Utilities for Image Processors
+
+This page lists all the utility functions used by the image processors, mainly the functional
+transformations used to process the images.
+
+Most of those are only useful if you are studying the code of the image processors in the library.
+
+## Image Transformations
+
+[[autodoc]] image_transforms.center_crop
+
+[[autodoc]] image_transforms.center_to_corners_format
+
+[[autodoc]] image_transforms.corners_to_center_format
+
+[[autodoc]] image_transforms.id_to_rgb
+
+[[autodoc]] image_transforms.normalize
+
+[[autodoc]] image_transforms.pad
+
+[[autodoc]] image_transforms.rgb_to_id
+
+[[autodoc]] image_transforms.rescale
+
+[[autodoc]] image_transforms.resize
+
+[[autodoc]] image_transforms.to_pil_image
+
+## ImageProcessingMixin
+
+[[autodoc]] image_processing_utils.ImageProcessingMixin
--- a/docs/source/en/internal/import_utils.md
+++ b/docs/source/en/internal/import_utils.md
@@ -0,0 +1,101 @@
+<!--Copyright 2025 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# Import Utilities
+
+This page goes through the transformers utilities to enable lazy and fast object import.
+While we strive for minimal dependencies, some models have specific dependencies requirements that cannot be
+worked around. We don't want for all users of `transformers` to have to install those dependencies to use other models,
+we therefore mark those as soft dependencies rather than hard dependencies.
+
+The transformers toolkit is not made to error-out on import of a model that has a specific dependency; instead, an
+object for which you are lacking a dependency will error-out when calling any method on it. As an example, if
+`torchvision` isn't installed, the fast image processors will not be available.
+
+This object is still importable:
+
+```python
+>>> from transformers import DetrImageProcessor
+>>> print(DetrImageProcessor)
+<class 'DetrImageProcessor'>
+```
+
+However, no method can be called on that object:
+
+```python
+>>> DetrImageProcessor.from_pretrained()
+ImportError:
+DetrImageProcessor requires the Torchvision library but it was not found in your environment. Check out the instructions on the
+installation page: https://pytorch.org/get-started/locally/ and follow the ones that match your environment.
+Please note that you may need to restart your runtime after installation.
+```
+
+Let's see how to specify specific object dependencies.
+
+## Specifying Object Dependencies
+
+### Filename-based
+
+All objects under a given filename have an automatic dependency to the tool linked to the filename
+
+**PyTorch**: All files starting with `modeling_` have an automatic PyTorch dependency
+
+**Tokenizers**: All files starting with `tokenization_` and ending with `_fast` have an automatic `tokenizers` dependency
+
+**Vision**: All files starting with `image_processing_` have an automatic dependency to the `vision` dependency group;
+at the time of writing, this only contains the `pillow` dependency.
+
+**Vision + Torch + Torchvision**: All files starting with `image_processing_` and ending with `_fast` have an automatic
+dependency to `vision`, `torch`, and `torchvision`.
+
+All of these automatic dependencies are added on top of the explicit dependencies that are detailed below.
+
+### Explicit Object Dependencies
+
+We add a method called `requires` that is used to explicitly specify the dependencies of a given object. As an
+example, the `Trainer` class has two hard dependencies: `torch` and `accelerate`. Here is how we specify these
+required dependencies:
+
+```python
+from .utils.import_utils import requires
+
+@requires(backends=("torch", "accelerate"))
+class Trainer:
+    ...
+```
+
+Backends that can be added here are all the backends that are available in the `import_utils.py` module.
+
+Additionally, specific versions can be specified in each backend. For example, this is how you would specify
+a requirement on torch>=2.6 on the `Trainer` class:
+
+```python
+from .utils.import_utils import requires
+
+@requires(backends=("torch>=2.6", "accelerate"))
+class Trainer:
+    ...
+```
+
+You can specify the following operators: `==`, `>`, `>=`, `<`, `<=`, `!=`.
+
+## Methods
+
+[[autodoc]] utils.import_utils.define_import_structure
+
+[[autodoc]] utils.import_utils.requires
+
+[[autodoc]] utils.import_utils.requires_backends
--- a/docs/source/en/internal/model_debugging_utils.md
+++ b/docs/source/en/internal/model_debugging_utils.md
@@ -0,0 +1,439 @@
+<!--Copyright 2025 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# Model debugging toolboxes
+
+This page lists all the debugging and model adding tools used by the library, as well as the utility functions it
+provides for it.
+
+Most of those are only useful if you are adding new models in the library.
+
+## Model addition debuggers
+
+### Model addition debugger - context manager for model adders
+
+This context manager is a power user tool intended for model adders. It tracks all forward calls within a model forward
+and logs a slice of each input and output on a nested JSON. To note, this context manager enforces `torch.no_grad()`.
+
+### Rationale
+
+When porting models to transformers, even from python to python, model adders often have to do a lot of manual
+operations, involving saving and loading tensors, comparing dtypes, etc. This small tool can hopefully shave off some
+time.
+
+### Usage
+
+Add this context manager as follows to debug a model:
+
+```python
+import torch
+from PIL import Image
+import requests
+from transformers import LlavaProcessor, LlavaForConditionalGeneration
+from transformers.model_debugging_utils import model_addition_debugger_context
+torch.random.manual_seed(673)
+
+# load pretrained model and processor
+model_id = "llava-hf/llava-1.5-7b-hf"
+processor = LlavaProcessor.from_pretrained(model_id)
+model = LlavaForConditionalGeneration.from_pretrained(model_id)
+
+# create random image input
+random_image = Image.fromarray(torch.randint(0, 256, (224, 224, 3), dtype=torch.uint8).numpy())
+
+# prompt
+prompt = "<image>Describe this image."
+
+# process inputs
+inputs = processor(text=prompt, images=random_image, return_tensors="pt")
+
+# call forward method (not .generate!)
+with model_addition_debugger_context(
+    model,
+    debug_path="optional_path_to_your_directory",
+    do_prune_layers=False # This will output ALL the layers of a model.
+):
+    output = model.forward(**inputs)
+
+```
+
+### Reading results
+
+The debugger generates two files from the forward call, both with the same base name, but ending either with
+`_SUMMARY.json` or with `_FULL_TENSORS.json`.
+
+The first one will contain a summary of each module's _input_ and _output_ tensor values and shapes.
+
+```json
+{
+  "module_path": "MolmoForConditionalGeneration",
+  "inputs": {
+    "args": [],
+    "kwargs": {
+      "input_ids": {
+        "shape": "torch.Size([1, 589])",
+        "dtype": "torch.int64"
+      },
+      "attention_mask": {
+        "shape": "torch.Size([1, 589])",
+        "dtype": "torch.int64"
+      },
+      "pixel_values": {
+        "shape": "torch.Size([1, 5, 576, 588])",
+        "dtype": "torch.float32",
+        "mean": "tensor(-8.9514e-01, device='cuda:0')",
+        "std": "tensor(9.2586e-01, device='cuda:0')",
+        "min": "tensor(-1.7923e+00, device='cuda:0')",
+        "max": "tensor(1.8899e+00, device='cuda:0')"
+    }
+  },
+  "children": [
+    {
+      "module_path": "MolmoForConditionalGeneration.language_model.model.embed_tokens",
+      "inputs": {
+        "args": [
+          {
+            "shape": "torch.Size([1, 589])",
+            "dtype": "torch.int64"
+          }
+        ]
+      },
+      "outputs": {
+        "shape": "torch.Size([1, 589, 3584])",
+        "dtype": "torch.float32",
+        "mean": "tensor(6.5460e-06, device='cuda:0')",
+        "std": "tensor(2.3807e-02, device='cuda:0')",
+        "min": "tensor(-3.3398e-01, device='cuda:0')",
+        "max": "tensor(3.9453e-01, device='cuda:0')"
+      }
+    },
+    {
+      "module_path": "MolmoForConditionalGeneration.vision_tower",
+      "inputs": {
+        "args": [
+          {
+            "shape": "torch.Size([5, 1, 576, 588])",
+            "dtype": "torch.float32",
+            "mean": "tensor(-8.9514e-01, device='cuda:0')",
+            "std": "tensor(9.2586e-01, device='cuda:0')",
+            "min": "tensor(-1.7923e+00, device='cuda:0')",
+            "max": "tensor(1.8899e+00, device='cuda:0')"
+          }
+        ],
+        "kwargs": {
+          "output_hidden_states": "True"
+        }
+      },
+      "children": [
+        { ... and so on
+```
+
+The `_FULL_TENSORS.json` file will display a full view of all tensors, which is useful for comparing two files.
+
+```json
+      "pixel_values": {
+        "shape": "torch.Size([1, 5, 576, 588])",
+        "dtype": "torch.float32",
+        "value": [
+          "tensor([[[[-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
+          "          [-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
+          "          [-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
+          "          ...,",
+          "          [-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
+          "          [-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
+          "          [-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00]],",
+          "",
+          "         [[-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
+          "          [-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
+          "          [-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
+          "          ...,",
+          "          [-1.4857e+00, -1.4820e+00, -1.2100e+00,  ..., -6.0979e-01, -5.9650e-01, -3.8527e-01],",
+          "          [-1.6755e+00, -1.7221e+00, -1.4518e+00,  ..., -7.5577e-01, -7.4658e-01, -5.5592e-01],",
+          "          [-7.9957e-01, -8.2162e-01, -5.7014e-01,  ..., -1.3689e+00, -1.3169e+00, -1.0678e+00]],",
+          "",
+          "         [[-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
+          "          [-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
+          "          [-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
+          "          ...,",
+          "          [-3.0322e-01, -5.0645e-01, -5.8436e-01,  ..., -6.2439e-01, -7.9160e-01, -8.1188e-01],",
+          "          [-4.4921e-01, -6.5653e-01, -7.2656e-01,  ..., -3.4702e-01, -5.2146e-01, -5.1326e-01],",
+          "          [-3.4702e-01, -5.3647e-01, -5.4170e-01,  ..., -1.0915e+00, -1.1968e+00, -1.0252e+00]],",
+          "",
+          "         [[-1.1207e+00, -1.2718e+00, -1.0678e+00,  ..., 1.2013e-01, -1.3126e-01, -1.7197e-01],",
+          "          [-6.9738e-01, -9.1166e-01, -8.5454e-01,  ..., -5.5050e-02, -2.8134e-01, -4.2793e-01],",
+          "          [-3.4702e-01, -5.5148e-01, -5.8436e-01,  ..., 1.9312e-01, -8.6235e-02, -2.1463e-01],",
+          "          ...,",
+          "          [-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
+          "          [-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
+          "          [-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00]],",
+          "",
+          "         [[-1.0039e+00, -9.5669e-01, -6.5546e-01,  ..., -1.4711e+00, -1.4219e+00, -1.1389e+00],",
+          "          [-1.0039e+00, -9.5669e-01, -6.5546e-01,  ..., -1.7193e+00, -1.6771e+00, -1.4091e+00],",
+          "          [-1.6317e+00, -1.6020e+00, -1.2669e+00,  ..., -1.2667e+00, -1.2268e+00, -8.9720e-01],",
+          "          ...,",
+          "          [-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
+          "          [-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
+          "          [-1.7923e+00, -1.7521e+00, -1.4802e+00,  ..., -1.7923e+00, -1.7521e+00, -1.4802e+00]]]], device='cuda:0')"
+        ],
+        "mean": "tensor(-8.9514e-01, device='cuda:0')",
+        "std": "tensor(9.2586e-01, device='cuda:0')",
+        "min": "tensor(-1.7923e+00, device='cuda:0')",
+        "max": "tensor(1.8899e+00, device='cuda:0')"
+      },
+```
+
+#### Saving tensors to disk
+
+Some model adders may benefit from logging full tensor values to disk to support, for example, numerical analysis
+across implementations.
+
+Set `use_repr=False` to write tensors to disk using [SafeTensors](https://huggingface.co/docs/safetensors/en/index).
+
+```python
+with model_addition_debugger_context(
+    model,
+    debug_path="optional_path_to_your_directory",
+    do_prune_layers=False,
+    use_repr=False,   # Defaults to True
+):
+    output = model.forward(**inputs)
+```
+
+When using `use_repr=False`, tensors are written to the same disk location as the `_SUMMARY.json` and
+`_FULL_TENSORS.json` files. The `value` property of entries in the `_FULL_TENSORS.json` file will contain a relative
+path reference to the associated `.safetensors` file. Each tensor is written to its own file as the `data` property of
+the state dictionary. File names are constructed using the `module_path` as a prefix with a few possible postfixes that
+are built recursively.
+
+* Module inputs are denoted with the `_inputs` and outputs by `_outputs`.
+* `list` and `tuple` instances, such as `args` or function return values, will be postfixed with `_{index}`.
+* `dict` instances will be postfixed with `_{key}`.
+
+### Comparing between implementations
+
+Once the forward passes of two models have been traced by the debugger, one can compare the `json` output files. See
+below: we can see slight differences between these two implementations' key projection layer. Inputs are mostly
+identical, but not quite. Looking through the file differences makes it easier to pinpoint which layer is wrong.
+
+![download-icon](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/files_difference_debugging.png)
+
+### Limitations and scope
+
+This feature will only work for torch-based models. Models relying heavily on external kernel calls may work, but trace will
+probably miss some things. Regardless, any python implementation that aims at mimicking another implementation can be
+traced once instead of reran N times with breakpoints.
+
+If you pass `do_prune_layers=False` to your model debugger, ALL the layers will be outputted to `json`. Else, only the
+first and last layer will be shown. This is useful when some layers (typically cross-attention) appear only after N
+layers.
+
+[[autodoc]] model_addition_debugger_context
+
+## Analyzer of skipped tests
+
+### Scan skipped tests - for model adders and maintainers
+
+This small util is a power user tool intended for model adders and maintainers. It lists all test methods
+existing in `test_modeling_common.py`, inherited by all model tester classes, and scans the repository to measure
+how many tests are being skipped and for which models.
+
+### Rationale
+
+When porting models to transformers, tests fail as they should, and sometimes `test_modeling_common` feels irreconcilable with the peculiarities of our brand new model. But how can we be sure we're not breaking everything by adding a seemingly innocent skip?
+
+This utility:
+
+- scans all test_modeling_common methods
+- looks for times where a method is skipped
+- returns a summary json you can load as a DataFrame/inspect
+
+**For instance test_inputs_embeds is skipped in a whooping 39% proportion at the time of writing this util.**
+
+![download-icon](https://huggingface.co/datasets/huggingface/documentation-images/resolve/f7f671f69b88ce4967e19179172c248958d35742/transformers/tests_skipped_visualisation.png)
+
+### Usage
+
+You can run the skipped test analyzer in two ways:
+
+#### Full scan (default)
+
+From the root of `transformers` repo, scans all common test methods and outputs the results to a JSON file (default: `all_tests_scan_result.json`).
+
+```bash
+python utils/scan_skipped_tests.py --output_dir path/to/output
+```
+
+- `--output_dir` (optional): Directory where the JSON results will be saved. Defaults to the current directory.
+
+**Example output:**
+
+```text
+🔬 Parsing 331 model test files once each...
+📝 Aggregating 224 tests...
+  (224/224) test_update_candidate_strategy_with_matches_1es_3d_is_nonecodet_schedule_fa_kwargs
+✅ Scan complete.
+
+📄 JSON saved to /home/pablo/git/transformers/all_tests_scan_result.json
+
+```
+
+And it will generate `all_tests_scan_result.json` file that you can inspect. The JSON is indexed by method name, and each entry follows this schema, indicating the origin as well (from `common`or `GenerationMixin`.)
+
+```json
+{
+  "<method_name>": {
+    "origin": "<test suite>"
+    "models_ran": ["<model_name>", ...],
+    "models_skipped": ["<model_name>", ...],
+    "skipped_proportion": <float>,
+    "reasons_skipped": ["<model_name>: <reason>",
+      ...
+    ]
+  },
+  ...
+}
+```
+
+Which you can visualise as above with e.g. `pandas`
+
+```python
+df = pd.read_json('all_tests_scan_result.json').T
+df.sort_values(by=['skipped_proportion'], ascending=False)
+
+```
+
+### Scan a single test method
+
+You can focus on a specific test method using `--test_method_name`:
+
+```bash
+python utils/scan_skipped_tests.py --test_method_name test_inputs_embeds --output_dir path/to/output
+```
+
+- `--test_method_name`: Name of the test method to scan (e.g., `test_inputs_embeds`).
+- `--output_dir` (optional): Directory where the JSON result will be saved.
+
+**Example output:**
+
+```bash
+$ python utils/scan_skipped_tests.py --test_method_name test_inputs_embeds
+
+🔬 Parsing 331 model test files once each...
+
+== test_inputs_embeds ==
+
+Ran    : 199/323
+Skipped : 124/323 (38.4%)
+ - aimv2: Aimv2 does not use inputs_embeds
+ - align: Inputs_embeds is tested in individual model tests
+ - altclip: Inputs_embeds is tested in individual model tests
+ - audio_spectrogram_transformer: AST does not use inputs_embeds
+ - beit: BEiT does not use inputs_embeds
+ - bit: Bit does not use inputs_embeds
+ - blip: Blip does not use inputs_embeds
+ - blip_2: Inputs_embeds is tested in individual model tests
+ - bridgetower:
+ - canine: CANINE does not have a get_input_embeddings() method.
+ - ...
+
+📄 JSON saved to /home/pablo/git/transformers/scan_test_inputs_embeds.json
+
+```
+
+## Modular model detector
+
+### Code similarity analyzer - for model adders
+
+This utility analyzes code similarities between model implementations to identify opportunities for modularization. It compares a new or existing modeling file against all models in the library using embedding-based and token-based similarity metrics.
+
+### Rationale
+
+When adding a new model to transformers, many components (attention layers, MLPs, outputs, etc.) may already exist in similar form in other models. Instead of implementing everything from scratch, model adders can identify which existing classes are similar and potentially reusable through modularization.
+
+The tool computes two similarity scores:
+
+- **Embedding score**: Uses semantic code embeddings (via `Qwen/Qwen3-Embedding-4B`) to detect functionally similar code even with different naming
+- **Jaccard score**: Measures token set overlap to identify structurally similar code patterns
+
+A score of 1.00 means the code is identical.
+
+### Usage
+
+From the root of the `transformers` repository:
+
+```bash
+python utils/modular_model_detector.py --modeling-file path/to/modeling_file.py
+```
+
+The tool will automatically download the pre-built index from the Hub (requires RAM/VRAM for the embedding model).
+
+**Example output:**
+
+```text
+Loading checkpoint shards: 100%|████████████████████| 2/2 [00:00<00:00, 33.62it/s]
+encoding 21 query definitions with Qwen/Qwen3-Embedding-4B (device=cuda, batch=16, max_length=4096)
+
+stuff.py::Beit3ImageTextMatchingOutput:
+embedding:
+    blip_2::Blip2ImageTextMatchingModelOutput (0.9994)
+    chinese_clip::ChineseCLIPOutput (0.9818)
+    owlvit::OwlViTOutput (0.9818)
+jaccard:
+    owlv2::Owlv2Output (0.9667)
+    metaclip_2::MetaClip2Output (0.9667)
+    altclip::AltCLIPOutput (0.9667)
+intersection:
+    blip::BlipOutput
+    owlvit::OwlViTOutput
+
+stuff.py::Beit3MLP:
+embedding:
+    efficientloftr::EfficientLoFTRMLP (0.9718)
+    seggpt::SegGptMlp (0.9650)
+jaccard:
+    chinese_clip::ChineseCLIPTextSelfOutput (0.5294)
+    bert::BertSelfOutput (0.5294)
+intersection:
+```
+
+The `intersection` field shows classes that appear in both top-5 results, indicating high confidence for modularization candidates.
+
+### Building a custom index
+
+To rebuild the index from your local codebase (useful after adding new models or using a different embedding model):
+
+```bash
+python utils/modular_model_detector.py --build
+```
+
+To push the rebuilt index to a Hub dataset:
+
+```bash
+python utils/modular_model_detector.py --build --push-new-index --hub-dataset your-org/your-dataset
+```
+
+### Options
+
+- `--modeling-file`: Path to the modeling file to analyze
+- `--build`: Build the code similarity index from all modeling files in `src/transformers/models/`
+- `--push-new-index`: After building, push the index to a Hub dataset (requires `--build`)
+- `--hub-dataset`: Hub dataset repository ID to pull/push the index (default: `hf-internal-testing/transformers_code_embeddings`)
+
+### Limitations
+
+This tool requires GPU/CPU resources to run the embedding model (`Qwen/Qwen3-Embedding-4B`). The pre-built index is downloaded from the Hub by default, which requires an internet connection on first use.
+
+Results are suggestions based on code similarity and should be manually reviewed before modularization. High similarity scores don't guarantee perfect compatibility.
--- a/docs/source/en/internal/modeling_utils.md
+++ b/docs/source/en/internal/modeling_utils.md
@@ -0,0 +1,67 @@
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# Custom Layers and Utilities
+
+This page lists all the custom layers used by the library, as well as the utility functions and classes it provides for modeling.
+
+Most of those are only useful if you are studying the code of the models in the library.
+
+## WeightConverter
+
+[[autodoc]] WeightConverter
+
+### Conversion operations
+
+[[autodoc]] ConversionOps
+
+[[autodoc]] Chunk
+
+[[autodoc]] Concatenate
+
+[[autodoc]] MergeModulelist
+
+[[autodoc]] SplitModulelist
+
+[[autodoc]] PermuteForRope
+
+## Layers
+
+[[autodoc]] GradientCheckpointingLayer
+
+## Attention Functions
+
+[[autodoc]] AttentionInterface
+    - register
+
+## Attention Mask Functions
+
+[[autodoc]] AttentionMaskInterface
+    - register
+
+## Rotary Position Embedding Functions
+
+[[autodoc]] dynamic_rope_update
+
+## Pytorch custom modules
+
+[[autodoc]] pytorch_utils.Conv1D
+
+## PyTorch Helper Functions
+
+[[autodoc]] pytorch_utils.apply_chunking_to_forward
+
+[[autodoc]] pytorch_utils.prune_linear_layer
--- a/docs/source/en/internal/pipelines_utils.md
+++ b/docs/source/en/internal/pipelines_utils.md
@@ -0,0 +1,41 @@
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# Utilities for pipelines
+
+This page lists all the utility functions the library provides for pipelines.
+
+Most of those are only useful if you are studying the code of the models in the library.
+
+## Argument handling
+
+[[autodoc]] pipelines.ArgumentHandler
+
+[[autodoc]] pipelines.ZeroShotClassificationArgumentHandler
+
+## Data format
+
+[[autodoc]] pipelines.PipelineDataFormat
+
+[[autodoc]] pipelines.CsvPipelineDataFormat
+
+[[autodoc]] pipelines.JsonPipelineDataFormat
+
+[[autodoc]] pipelines.PipedPipelineDataFormat
+
+## Utilities
+
+[[autodoc]] pipelines.PipelineException
--- a/docs/source/en/internal/rope_utils.md
+++ b/docs/source/en/internal/rope_utils.md
@@ -0,0 +1,83 @@
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# Utilities for Rotary Embedding
+
+This page explains how the Rotary Embedding is computed and applied in Transformers and what types of RoPE are supported.
+
+## Overview
+
+Rotary Position Embeddings are a technique used to inject positional information into attention mechanisms without relying on explicit position encodings.  
+Instead of adding position vectors to token embeddings, RoPE rotates query and key vectors in the complex plane according to their positions enabling relative positional awareness and better extrapolation to unseen sequence lengths.
+
+The Transformers library provides a flexible and extensible implementation of various RoPE types defined in `[`~modeling_rope_utils.ROPE_VALIDATION_FUNCTIONS`]`, including both the default and scaled variants:
+
+| Rope Type | Description |
+|------------|-------------|
+| `"default"` | Standard rotary embedding as in LLaMA. |
+| `"linear"` | Linear-scaled RoPE which allows longer context windows. |
+| `"dynamic"` | NTK-aware scaling computed by rescaling frequency base (`θ`) for longer context. |
+| `"yarn"` | YaRN scaling variant providing smoother extrapolation and stability. |
+| `"longrope"` | [LongRoPE](https://github.com/microsoft/LongRoPE) scaling as in Phi-2 model series. |
+| `"llama3"` | RoPE scaling as in Llama3.1. |
+
+## Configuration in Model Configs
+
+To enable and customize rotary embeddings, add a `rope_parameters` field to your model’s configuration file (`config.json`). This field controls the RoPE behavior across model layers. Note that each RoPE variant defines its own set of expected keys and missing keys will raise an error. See the example below which creates a llama config with default RoPE parameters:
+
+```python
+from transformers import LlamaConfig
+
+config = LlamaConfig()
+config.rope_parameters = {
+    "rope_type": "default", # type of RoPE to use
+    # rope_theta is optional — omitting it uses the model’s default_theta (typically 10000.0)
+}
+
+# If we want to apply a scaled RoPE type, we need to pass extra parameters
+config.rope_parameters = {
+    "rope_type": "linear",
+    "rope_theta": 10000.0,  # can be omitted to fall back to default_theta
+    "factor": 8.0  # scale factor for context extension
+}
+```
+
+## Per-Layer-Type RoPE Configuration
+
+Some models such as Gemma-3 use different layer types with different attention mechanisms, i.e. "full attention" in some blocks and "sliding-window attention" in others. Transformers supports specifying distinct RoPE parameters per layer type for these models. In this case, `rope_parameters` should be a nested dictionary, where top-level keys correspond to `config.layer_types` and values are per-type RoPE parameters. During model initialization, each decoder layer will automatically look up the matching RoPE configuration based on its declared layer type.
+
+```python
+from transformers import Gemma3Config
+
+config = Gemma3Config()
+config.rope_parameters = {
+    "full_attention": {
+        "rope_type": "dynamic",
+        "rope_theta": 1000000.0,
+        "factor": 8.0,
+        "original_max_position_embeddings": 8096,
+    },
+    "sliding_attention": {
+        "rope_type": "default",
+        "rope_theta": 10000.0,
+    }
+}
+```
+
+## Utilities
+
+[[autodoc]] RopeParameters
+    - __call__
--- a/docs/source/en/internal/time_series_utils.md
+++ b/docs/source/en/internal/time_series_utils.md
@@ -0,0 +1,29 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# Time Series Utilities
+
+This page lists all the utility functions and classes that can be used for Time Series based models.
+
+Most of those are only useful if you are studying the code of the time series models or you wish to add to the collection of distributional output classes.
+
+## Distributional Output
+
+[[autodoc]] time_series_utils.NormalOutput
+
+[[autodoc]] time_series_utils.StudentTOutput
+
+[[autodoc]] time_series_utils.NegativeBinomialOutput
--- a/docs/source/en/internal/tokenization_utils.md
+++ b/docs/source/en/internal/tokenization_utils.md
@@ -0,0 +1,37 @@
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# Utilities for Tokenizers
+
+This page lists all the utility functions used by the tokenizers, mainly the class
+[`~tokenization_utils_base.PreTrainedTokenizerBase`] that implements the common methods between
+[`PreTrainedTokenizer`] and [`PreTrainedTokenizerFast`].
+
+Most of those are only useful if you are studying the code of the tokenizers in the library.
+
+## PreTrainedTokenizerBase
+
+[[autodoc]] tokenization_utils_base.PreTrainedTokenizerBase
+    - __call__
+    - all
+
+## Enums and namedtuples
+
+[[autodoc]] tokenization_utils_base.TruncationStrategy
+
+[[autodoc]] tokenization_utils_base.CharSpan
+
+[[autodoc]] tokenization_utils_base.TokenSpan
--- a/docs/source/en/internal/trainer_utils.md
+++ b/docs/source/en/internal/trainer_utils.md
@@ -0,0 +1,45 @@
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# Utilities for Trainer
+
+This page lists all the utility functions used by [`Trainer`].
+
+Most of those are only useful if you are studying the code of the Trainer in the library.
+
+## Utilities
+
+[[autodoc]] EvalPrediction
+
+[[autodoc]] IntervalStrategy
+
+[[autodoc]] enable_full_determinism
+
+[[autodoc]] set_seed
+
+[[autodoc]] torch_distributed_zero_first
+
+## Callbacks internals
+
+[[autodoc]] trainer_callback.CallbackHandler
+
+## Trainer Argument Parser
+
+[[autodoc]] HfArgumentParser
+
+## Debug Utilities
+
+[[autodoc]] debug_utils.DebugUnderflowOverflow