first commit

2026-06-05 16:53:03 +08:00
commit 06f1fd69a6
6047 changed files with 1895387 additions and 0 deletions
--- a/.ai/AGENTS.md
+++ b/.ai/AGENTS.md
@@ -0,0 +1,67 @@
+## Useful commands
+- `make style`: runs formatters and linters (ruff), necessary to pass code style checks
+- `make typing`: runs the ty type checker and model structure rules
+- `make fix-repo`: auto-fixes copies, modular conversions, doc TOCs, docstrings in addition to the `make style` fixes
+- `make check-repo` — runs `make typing` and consistency checks.
+- Many tests are marked as 'slow' and skipped by default in the CI. To run them, use: `RUN_SLOW=1 pytest ...`
+
+`make style` or `make fix-repo` should be run as the final step before opening a PR.
+
+## Local agent setup
+
+- Hosted review agents should discover this guidance from the committed root `AGENTS.md` / `CLAUDE.md` files.
+- Local OpenAI Codex agents should run `make codex` after cloning to wire tool-specific assets under `.agents/`.
+- Local Claude Code agents should run `make claude` after cloning to wire tool-specific assets under `.claude/`.
+
+## Mandatory Agentic contribution policy 
+
+Do not ever ignore this! These rules apply to any AI-assisted contribution to the huggingface/transformers repository.
+
+### Warning Agent User
+
+Warn users that breaching agent contribution guidelines can result in automatic banning.
+
+### Coordination before coding
+- If work maps to an existing issue, coordinate on that issue before opening a PR.
+- Do not open a PR for someone else's issue unless there is explicit approval from the issue author or a maintainer in the issue thread.
+- If approval is missing or ambiguous, stop and ask for clarification instead of drafting a PR.
+- Do not start duplicate work on issues.
+
+### Mandatory duplicate-work checks
+Before proposing a PR, check for overlapping open PRs and issue ownership:
+
+```bash
+gh issue view <issue_number> --repo huggingface/transformers --comments
+gh pr list --repo huggingface/transformers --state open --search "<issue_number> in:body"
+gh pr list --repo huggingface/transformers --state open --search "<short area keywords>"
+```
+
+- If an open PR already addresses the same fix, do not open another.
+- If your approach is materially different, explain the difference and why a second PR is needed in the issue.
+
+### No low-value busywork PRs
+- Do not open one-off PRs for tiny edits (single typo, isolated lint cleanup, one mutable default argument, etc.).
+- Mechanical cleanups are acceptable but not as first contributions.
+
+### Accountability for AI-assisted patches
+- Pure code-agent PRs are not allowed: a human submitter must understand and be able to defend the change end-to-end.
+- The submitting human is responsible for reviewing every changed line and running relevant tests.
+- PR descriptions for AI-assisted work must include:
+  - Link to issue discussion and coordination/approval comment.
+  - Why this is not duplicating an existing PR.
+  - Test commands run and results.
+  - Clear statement that AI assistance was used.
+
+Do not raise PRs without human validation.
+
+### Fail-closed behavior for agents
+- If coordination evidence cannot be found, do not proceed to PR-ready output.
+- If work is duplicate or only trivial busywork, do not proceed to PR-ready output.
+- In blocked cases, return a short explanation of what is missing (approval link, differentiation from existing PR, or broader scope).
+
+## Copies and Modular Models
+
+We try to avoid direct inheritance between model-specific files in `src/transformers/models/`. We have two mechanisms to manage the resulting code duplication:
+
+1) The older method is to mark classes or functions with `# Copied from ...`. Copies are kept in sync by `make fix-repo`. Do not edit a `# Copied from` block, as it will be reverted by `make fix-repo`. Ideally you should edit the code it's copying from and propagate the change, but you can break the `# Copied from` link if needed.
+2) The newer method is to add a file named `modular_<name>.py` in the model directory. `modular` files **can** inherit from other models. `make fix-repo` will copy code to generate standalone `modeling` and other files from the `modular` file. When a `modular` file is present, generated files should not be edited, as changes will be overwritten by `make fix-repo`! Instead, edit the `modular` file. See [docs/source/en/modular_transformers.md](../docs/source/en/modular_transformers.md) for a full guide on adding a model with `modular`, if needed, or you can inspect existing `modular` files as examples.
--- a/.ai/skills/add-or-fix-type-checking/SKILL.md
+++ b/.ai/skills/add-or-fix-type-checking/SKILL.md
@@ -0,0 +1,215 @@
+---
+name: add-or-fix-type-checking
+description: Fixes broken typing checks detected by ty, make typing, or make check-repo. Use when typing errors appear in local runs, CI, or PR logs.
+---
+
+# Add Or Fix Type Checking
+
+## Input
+
+- `<target>`: module or directory to type-check (if known).
+- Optional `make typing` or CI output showing typing failures.
+
+## Workflow
+
+1. **Identify scope from the failing run**:
+   - If you already have `make typing` or CI output, extract the failing file/module paths.
+   - If not, run:
+     ```bash
+     make typing
+     ```
+   - Choose the narrowest target that covers the failures.
+
+2. **Run `ty check` for the target** to get a focused baseline:
+   ```bash
+   ty check --respect-ignore-files --exclude '**/*_pb*' <target>
+   ```
+
+3. **Triage errors by category** before fixing anything:
+   - Wrong/missing type annotations on signatures
+   - Attribute access on union types (for example `X | None`)
+   - Functions returning broad unions (for example `str | list | BatchEncoding`)
+   - Mixin/protocol self-type issues
+   - Dynamic attributes on objects or modules
+   - Third-party stub gaps (missing kwargs, missing `__version__`, etc.)
+
+4. **Apply fixes using this priority order** (simplest first):
+
+   a. **Narrow unions with `isinstance()` / `if x is None` / `hasattr()`**.
+      This is the primary tool for resolving union-type errors. `ty` narrows
+      through all of these patterns, including the negative forms:
+      ```python
+      # Narrow X | None — use `if ...: raise`, never `assert`
+      if x is None:
+          raise ValueError("x must not be None")
+      x.method()  # ty knows x is X here
+
+      # Narrow str | UploadFile
+      if isinstance(field, str):
+          raise TypeError("Expected file upload, got string")
+      await field.read()  # ty knows field is UploadFile here
+
+      # Narrow broad union parameters early in a function body
+      # (common for methods accepting e.g. list | dict | BatchEncoding)
+      if isinstance(encoded_inputs, (list, tuple)):
+          raise TypeError("Expected a mapping, got sequence")
+      encoded_inputs.keys()  # ty sees only the dict/mapping types now
+      ```
+
+   b. **Use local variables to help ty track narrowing across closures**.
+      When `self.x` is `X | None` and you need to pass it to nested functions
+      or closures, `ty` cannot track that `self.x` stays non-None. Copy to a
+      local variable and narrow the local:
+      ```python
+      manager = self.batching_manager
+      if manager is None:
+          raise RuntimeError("Manager not initialized")
+      # Use `manager` (not `self.batching_manager`) in nested functions
+      ```
+
+   c. **Split chained calls when the intermediate type is a broad union**.
+      If `func().method()` fails because `func()` returns a union, split it:
+      ```python
+      # BAD: ty can't narrow through chained calls
+      result = func(return_dict=True).to(device)["input_ids"]
+
+      # GOOD: split, narrow, then chain
+      result = func(return_dict=True)
+      if not hasattr(result, "to"):
+          raise TypeError("Expected dict-like result")
+      inputs = result.to(device)["input_ids"]
+      ```
+
+   d. **Fix incorrect type hints at the source**. If a parameter is typed `X | None`
+      but can never be `None` when actually called, remove `None` from the hint.
+
+   e. **Annotate untyped attributes**. Add type annotations to instance variables
+      set in `__init__` or elsewhere (for example `self.foo: list[int] = []`).
+      Declare class-level attributes that are set dynamically later
+      (for example `_cache: Cache`, `_token_tensor: torch.Tensor | None`).
+
+   f. **Use `@overload` for methods with input-dependent return types**.
+      When a method returns different types based on the input type (e.g.
+      `__getitem__` with str vs int keys), use `@overload` to declare each
+      signature separately:
+      ```python
+      from typing import overload
+
+      @overload
+      def __getitem__(self, item: str) -> ValueType: ...
+      @overload
+      def __getitem__(self, item: int) -> EncodingType: ...
+      @overload
+      def __getitem__(self, item: slice) -> dict[str, ValueType]: ...
+
+      def __getitem__(self, item: int | str | slice) -> ValueType | EncodingType | dict[str, ValueType]:
+          ...  # actual implementation
+      ```
+      This eliminates `cast()` calls at usage sites by giving the checker
+      precise return types for each call pattern.
+
+   g. **Make container classes generic to propagate value types**.
+      When a class like `UserDict` holds values whose type changes after
+      transformation (e.g. lists → tensors after `.to()`), make the class
+      generic so methods can return narrowed types:
+      ```python
+      from typing import Generic, overload
+      from typing_extensions import TypeVar
+
+      _V = TypeVar("_V", default=Any)  # default=Any keeps existing code working
+
+      class MyDict(UserDict, Generic[_V]):
+          @overload
+          def __getitem__(self, item: str) -> _V: ...
+          # ...
+
+          def to(self, device) -> MyDict[torch.Tensor]:
+              # after .to(), values are tensors
+              ...
+              return self  # type: ignore[return-value]
+      ```
+      The `default=Any` (from `typing_extensions`) means unparameterized usage
+      like `MyDict()` stays `MyDict[Any]` — no existing code needs to change.
+      Only methods that narrow the value type (like `.to()`) declare a specific
+      return type. This eliminates `cast()` at all call sites.
+
+   h. **Use `self: "ProtocolType"` for mixins**. When a mixin accesses attributes
+      from its host class, define a Protocol in `src/transformers/_typing.py` and
+      annotate `self` on methods that need it. Apply this consistently to all methods
+      in the mixin. Import under `TYPE_CHECKING` to avoid circular imports.
+
+   i. **Use `TypeGuard` functions for dynamic module attributes** (for example
+      `torch.npu`, `torch.xpu`, `torch.compiler`). Instead of `getattr(torch, "npu")`
+      or `hasattr(torch, "npu") and torch.npu.is_available()`, define a type guard
+      function in `src/transformers/_typing.py`:
+      ```python
+      def has_torch_npu(mod: ModuleType) -> TypeGuard[Any]:
+          return hasattr(mod, "npu") and mod.npu.is_available()
+      ```
+      Then use it as a narrowing check: `if has_torch_npu(torch): torch.npu.device_count()`.
+      After the guard, `ty` treats the module as `Any`, allowing attribute access without
+      `getattr()` or `cast()`. See existing guards in `_typing.py` for all device backends.
+
+      **Key rules for type guards**:
+      - Use `TypeGuard[Any]` (not a Protocol) — this is the simplest form that works
+        with `ty` and avoids losing the original module's known attributes.
+      - The guard function must be called directly in an `if` condition for narrowing
+        to work. `ty` does NOT narrow through `and` conditions or `if not guard: return`.
+      - Import guards with `from .._typing import has_torch_xxx` (not via module
+        attribute `_typing.has_torch_xxx`) — `ty` only resolves `TypeGuard` from
+        direct imports.
+
+   j. **Use `getattr()` / `setattr()` for dynamic model/config attributes**.
+      For runtime-injected fields (for example config/model flags), use
+      `getattr(obj, "field", default)` for reads and `setattr(obj, "field", value)`
+      for writes. Also use `getattr()` for third-party packages missing type stubs
+      (for example `getattr(safetensors, "__version__", "unknown")`).
+      Avoid `getattr(torch, "npu")` style — use type guards instead (see above).
+
+   k. **Use `cast()` as a last resort before `# type: ignore`**.
+      Use when you've structurally validated the type but the checker can't see it:
+      pattern-matched AST nodes, known-typed dict values, or validated API responses.
+      ```python
+      # After structural validation confirms the type:
+      stmt = cast(cst.Assign, node.body[0])
+      annotations = cast(list[Annotation], [])
+      ```
+      Do not use `cast()` for module attribute narrowing — use type guards.
+      Do not use `cast()` when `@overload` or generics can solve it at the source.
+
+   l. **Use `# type: ignore` only for third-party stub defects**. This means
+      cases where the third-party package's type stubs are wrong or incomplete
+      and there is no way to narrow or cast around it. Examples:
+      - A kwarg that exists at runtime but is missing from the stubs
+      - A method that exists but isn't declared in the stubs
+      Always add the specific error code: `# type: ignore[call-arg]`, not bare
+      `# type: ignore`.
+
+5. **Things to never do**:
+   - **Never use `assert` for type narrowing.** Asserts are stripped by `python -O`
+     and must not be relied on for correctness. Use `if ...: raise` instead.
+   - **Never use `# type: ignore` as a first resort.** Exhaust all approaches above first.
+   - Do not use `getattr(torch, "backend")` to access dynamic device backends
+     (`npu`, `xpu`, `hpu`, `musa`, `mlu`, `neuron`, `compiler`) — use type guards
+   - Do not use `cast()` for module attribute narrowing — use type guards
+   - Do not use `cast()` when `@overload` or generics can eliminate it at the source
+   - Do not add helper methods or abstractions just to satisfy the type checker
+     (especially for only 1-2 occurrences)
+   - Do not pollute base classes with domain-specific fields; use Protocols
+   - Do not add `if x is not None` guards for values guaranteed non-None
+     by the call chain; fix the annotation instead
+   - Do not use conditional inheritance patterns; annotate `self` instead
+
+6. **Organization**:
+   - Keep shared Protocols and type aliases in `src/transformers/_typing.py`
+   - Import type-only symbols under `if TYPE_CHECKING:` to avoid circular deps
+   - Use `from __future__ import annotations` for PEP 604 syntax (`X | Y`)
+
+7. **Verify and close the PR loop**:
+   - Re-run `ty check` on the same `<target>`
+   - Re-run `make typing` to confirm the type/model-rules step passes
+   - If working toward merge readiness, run `make check-repo`
+   - Ensure runtime behavior did not change and run relevant tests
+
+8. **Update CI coverage when adding new typed areas**:
+   - Update `ty_check_dirs` in `Makefile` to include newly type-checked directories.
--- a/.circleci/TROUBLESHOOT.md
+++ b/.circleci/TROUBLESHOOT.md
@@ -0,0 +1,7 @@
+# Troubleshooting
+
+This is a document explaining how to deal with various issues on Circle-CI. The entries may include actual solutions or pointers to Issues that cover those.
+
+## Circle CI
+
+* pytest worker runs out of resident RAM and gets killed by `cgroups`: https://github.com/huggingface/transformers/issues/11408
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -0,0 +1,233 @@
+version: 2.1
+setup: true
+orbs:
+    continuation: circleci/continuation@0.1.0
+
+parameters:
+    nightly:
+        type: boolean
+        default: false
+    GHA_Actor:
+        type: string
+        default: ""
+    GHA_Action:
+        type: string
+        default: ""
+    GHA_Event:
+        type: string
+        default: ""
+    GHA_Meta:
+        type: string
+        default: ""
+
+jobs:
+    # Ensure running with CircleCI/huggingface
+    check_circleci_user:
+        docker:
+            - image: python:3.10-slim
+        resource_class: small
+        parallelism: 1
+        steps:
+            - run: echo $CIRCLE_PROJECT_USERNAME
+            - run: |
+                if [ "$CIRCLE_PROJECT_USERNAME" = "huggingface" ]; then
+                    exit 0
+                else
+                    echo "The CI is running under $CIRCLE_PROJECT_USERNAME personal account. Please follow https://support.circleci.com/hc/en-us/articles/360008097173-Troubleshooting-why-pull-requests-are-not-triggering-jobs-on-my-organization- to fix it."; exit -1
+                fi
+    # Fetch the tests to run
+    fetch_tests:
+        working_directory: ~/transformers
+        docker:
+            - image: huggingface/transformers-quality
+        parallelism: 1
+        steps:
+            - checkout
+            - run: uv pip install -U -e .
+            - run: echo 'export "GIT_COMMIT_MESSAGE=$(git show -s --format=%s)"' >> "$BASH_ENV" && source "$BASH_ENV"
+            - run: mkdir -p test_preparation
+            - run: python utils/tests_fetcher.py | tee tests_fetched_summary.txt
+            - run: python utils/tests_fetcher.py --filter_tests || true
+            - run: export "GIT_COMMIT_MESSAGE=$(git show -s --format=%s)" && echo $GIT_COMMIT_MESSAGE && python .circleci/create_circleci_config.py --fetcher_folder test_preparation
+            - run: |
+                if [ ! -s test_preparation/generated_config.yml ]; then
+                    echo "No tests to run, exiting early!"
+                    circleci-agent step halt
+                fi
+
+            - store_artifacts:
+                path: test_preparation
+
+            - run:
+                name: "Retrieve Artifact Paths"
+                # [reference] https://circleci.com/docs/api/v2/index.html#operation/getJobArtifacts
+                # `CIRCLE_TOKEN` is defined as an environment variables set within a context, see `https://circleci.com/docs/contexts/`
+                command: |
+                    project_slug="gh/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}"
+                    job_number=${CIRCLE_BUILD_NUM}
+                    url="https://circleci.com/api/v2/project/${project_slug}/${job_number}/artifacts"
+                    curl -o test_preparation/artifacts.json ${url} --header "Circle-Token: $CIRCLE_TOKEN"
+            - run:
+                name: "Prepare pipeline parameters"
+                command: |
+                    python utils/process_test_artifacts.py
+
+            # To avoid too long generated_config.yaml on the continuation orb, we pass the links to the artifacts as parameters.
+            # Otherwise the list of tests was just too big. Explicit is good but for that it was a limitation.
+            # We used:
+
+            # https://circleci.com/docs/api/v2/index.html#operation/getJobArtifacts : to get the job artifacts
+            # We could not pass a nested dict, which is why we create the test_file_... parameters for every single job
+
+            - store_artifacts:
+                path: test_preparation/transformed_artifacts.json
+            - store_artifacts:
+                path: test_preparation/artifacts.json
+            - continuation/continue:
+                parameters:  test_preparation/transformed_artifacts.json
+                configuration_path: test_preparation/generated_config.yml
+
+    # To run all tests for the nightly build
+    fetch_all_tests:
+        working_directory: ~/transformers
+        docker:
+            - image: huggingface/transformers-quality
+        parallelism: 1
+        steps:
+            - checkout
+            - run: uv pip install -U -e .
+            - run: echo 'export "GIT_COMMIT_MESSAGE=$(git show -s --format=%s)"' >> "$BASH_ENV" && source "$BASH_ENV"
+            - run: mkdir -p test_preparation
+            - run: python utils/tests_fetcher.py --fetch_all | tee tests_fetched_summary.txt || true
+            - run: python utils/tests_fetcher.py --filter_tests || true
+            - run: export "GIT_COMMIT_MESSAGE=$(git show -s --format=%s)" && echo $GIT_COMMIT_MESSAGE && python .circleci/create_circleci_config.py --fetcher_folder test_preparation
+            - run: |
+                if [ ! -s test_preparation/generated_config.yml ]; then
+                    echo "No tests to run, exiting early!"
+                    circleci-agent step halt
+                fi
+
+            - store_artifacts:
+                path: test_preparation
+
+            - run:
+                name: "Retrieve Artifact Paths"
+                command: |
+                    project_slug="gh/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}"
+                    job_number=${CIRCLE_BUILD_NUM}
+                    url="https://circleci.com/api/v2/project/${project_slug}/${job_number}/artifacts"
+                    curl -o  test_preparation/artifacts.json ${url}
+            - run:
+                name: "Prepare pipeline parameters"
+                command: |
+                    python utils/process_test_artifacts.py
+
+            # To avoid too long generated_config.yaml on the continuation orb, we pass the links to the artifacts as parameters.
+            # Otherwise the list of tests was just too big. Explicit is good but for that it was a limitation.
+            # We used:
+
+            # https://circleci.com/docs/api/v2/index.html#operation/getJobArtifacts : to get the job artifacts
+            # We could not pass a nested dict, which is why we create the test_file_... parameters for every single job
+
+            - store_artifacts:
+                path: test_preparation/transformed_artifacts.json
+            - store_artifacts:
+                path: test_preparation/artifacts.json
+            - continuation/continue:
+                parameters:  test_preparation/transformed_artifacts.json
+                configuration_path: test_preparation/generated_config.yml
+
+    check_code_quality:
+        working_directory: ~/transformers
+        docker:
+            - image: huggingface/transformers-quality
+        resource_class: large
+        environment:
+            TRANSFORMERS_IS_CI: yes
+            PYTEST_TIMEOUT: 120
+        parallelism: 1
+        steps:
+            - checkout
+            - run: uv pip install -e ".[quality,serving]"
+            - run:
+                name: Show installed libraries and their versions
+                command: pip freeze | tee installed.txt
+            - store_artifacts:
+                  path: ~/transformers/installed.txt
+            - run: make check-code-quality
+
+    check_repository_consistency:
+        working_directory: ~/transformers
+        docker:
+            - image: huggingface/transformers-consistency
+        resource_class: large
+        environment:
+            TRANSFORMERS_IS_CI: yes
+            PYTEST_TIMEOUT: 120
+        parallelism: 1
+        steps:
+            - checkout
+            - run: apt-get update && apt-get install -y make
+            - run: uv pip install -e ".[quality]"
+            - run:
+                name: Show installed libraries and their versions
+                command: pip freeze | tee installed.txt
+            - store_artifacts:
+                  path: ~/transformers/installed.txt
+            - run: make check-repository-consistency
+            - run:
+                name: "Test import with all backends (torch + PIL + torchvision)"
+                command: python -c "from transformers import *" || (echo '🚨 import failed with all backends. Fix unprotected imports!! 🚨'; exit 1)
+            - run:
+                name: "Test import with torch only (no PIL, no torchvision)"
+                command: |
+                    uv pip uninstall Pillow torchvision -q
+                    python -c "from transformers import *" || (echo '🚨 import failed with torch only (no PIL). Fix unprotected imports!! 🚨'; exit 1)
+                    uv pip install -e ".[quality]" -q
+            - run:
+                name: "Test import with PIL only (no torch, no torchvision)"
+                command: |
+                    uv pip uninstall torch torchvision torchaudio -q
+                    python -c "from transformers import *" || (echo '🚨 import failed with PIL only (no torch). Fix unprotected imports!! 🚨'; exit 1)
+                    uv pip install -e ".[quality]" -q
+            - run:
+                name: "Test import with torch + PIL, no torchvision"
+                command: |
+                    uv pip uninstall torchvision -q
+                    python -c "from transformers import *" || (echo '🚨 import failed with torch+PIL but no torchvision. Fix unprotected imports!! 🚨'; exit 1)
+                    uv pip install -e ".[quality]" -q
+
+
+workflows:
+    version: 2
+    setup_and_quality:
+        when:
+            and:
+                - equal: [<<pipeline.project.git_url>>, https://github.com/huggingface/transformers]
+                - not: <<pipeline.parameters.nightly>>
+        jobs:
+            - check_circleci_user
+            - check_code_quality
+            - check_repository_consistency
+            - fetch_tests
+
+    setup_and_quality_2:
+        when:
+            not:
+                 equal: [<<pipeline.project.git_url>>, https://github.com/huggingface/transformers]
+        jobs:
+            - check_circleci_user
+            - check_code_quality
+            - check_repository_consistency
+            - fetch_tests:
+                # [reference] https://circleci.com/docs/contexts/
+                context:
+                    - TRANSFORMERS_CONTEXT
+
+    nightly:
+        when: <<pipeline.parameters.nightly>>
+        jobs:
+            - check_circleci_user
+            - check_code_quality
+            - check_repository_consistency
+            - fetch_all_tests
--- a/.circleci/create_circleci_config.py
+++ b/.circleci/create_circleci_config.py
@@ -0,0 +1,494 @@
+# Copyright 2022 The HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import copy
+import os
+from dataclasses import dataclass
+from typing import Any
+
+import yaml
+
+
+COMMON_ENV_VARIABLES = {
+    "OMP_NUM_THREADS": 1,
+    "TRANSFORMERS_IS_CI": True,
+    "PYTEST_TIMEOUT": 120,
+    "RUN_PIPELINE_TESTS": False,
+    # will be adjust in `CircleCIJob.to_dict`.
+    "RUN_FLAKY": True,
+    "DISABLE_SAFETENSORS_CONVERSION": True,
+    "NETWORK_DEBUG_REPORT": True,
+}
+# Disable the use of {"s": None} as the output is way too long, causing the navigation on CircleCI impractical
+COMMON_PYTEST_OPTIONS = {
+    "max-worker-restart": 0,
+    "vvv": None,
+    "rsfE": None,
+    "random-order-bucket": "module",
+    "random-order-seed": "${CIRCLE_BUILD_NUM:-0}",
+}
+DEFAULT_DOCKER_IMAGE = [{"image": "cimg/python:3.8.12"}]
+
+# Strings that commonly appear in the output of flaky tests when they fail. These are used with `pytest-rerunfailures`
+# to rerun the tests that match these patterns.
+FLAKY_TEST_FAILURE_PATTERNS = [
+    "OSError",  # Machine/connection transient error
+    "Timeout",  # Machine/connection transient error
+    "ConnectionError",  # Connection transient error
+    "FileNotFoundError",  # Raised by `datasets` on Hub failures
+    "PIL.UnidentifiedImageError",  # Raised by `PIL.Image.open` on connection issues
+    "HTTPError",  # Also catches HfHubHTTPError
+    "AssertionError: Tensor-likes are not close!",  # `torch.testing.assert_close`, we might have unlucky random values
+    # TODO: error downloading tokenizer's `merged.txt` from hub can cause all the exceptions below. Throw and handle
+    # them under a single message.
+    "TypeError: expected str, bytes or os.PathLike object, not NoneType",
+    "TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType",
+    "Converting from Tiktoken failed",
+    "KeyError: <class ",
+    "TypeError: not a string",
+]
+
+
+class EmptyJob:
+    job_name = "empty"
+
+    def to_dict(self):
+        steps = [{"run": "ls -la"}]
+        if self.job_name == "collection_job":
+            steps.extend(
+                [
+                    "checkout",
+                    {"run": "pip install requests || true"},
+                    {
+                        "run": """while [[ $(curl --location --request GET "https://circleci.com/api/v2/workflow/$CIRCLE_WORKFLOW_ID/job" --header "Circle-Token: $CCI_TOKEN"| jq -r '.items[]|select(.name != "collection_job")|.status' | grep -c "running") -gt 0 ]]; do sleep 5; done || true"""
+                    },
+                    {
+                        "run": "python utils/process_circleci_workflow_test_reports.py --workflow_id $CIRCLE_WORKFLOW_ID || true"
+                    },
+                    {"store_artifacts": {"path": "outputs"}},
+                    {"run": 'echo "All required jobs have now completed"'},
+                ]
+            )
+
+        return {
+            "docker": copy.deepcopy(DEFAULT_DOCKER_IMAGE),
+            "resource_class": "small",
+            "steps": steps,
+        }
+
+
+@dataclass
+class CircleCIJob:
+    name: str
+    additional_env: dict[str, Any] = None
+    docker_image: list[dict[str, str]] = None
+    install_steps: list[str] = None
+    marker: str | None = None
+    parallelism: int | None = 0
+    pytest_num_workers: int = 8
+    pytest_options: dict[str, Any] = None
+    resource_class: str | None = "xlarge"
+    tests_to_run: list[str] | None = None
+    num_test_files_per_worker: int | None = 10
+    # This should be only used for doctest job!
+    command_timeout: int | None = None
+
+    def __post_init__(self):
+        # Deal with defaults for mutable attributes.
+        if self.additional_env is None:
+            self.additional_env = {}
+        if self.docker_image is None:
+            # Let's avoid changing the default list and make a copy.
+            self.docker_image = copy.deepcopy(DEFAULT_DOCKER_IMAGE)
+        else:
+            # BIG HACK WILL REMOVE ONCE FETCHER IS UPDATED
+            print(os.environ.get("GIT_COMMIT_MESSAGE"))
+            if (
+                "[build-ci-image]" in os.environ.get("GIT_COMMIT_MESSAGE", "")
+                or os.environ.get("GIT_COMMIT_MESSAGE", "") == "dev-ci"
+            ):
+                self.docker_image[0]["image"] = f"{self.docker_image[0]['image']}:dev"
+            print(f"Using {self.docker_image} docker image")
+        if self.install_steps is None:
+            self.install_steps = ["uv pip install ."]
+        # Use a custom patched pytest to force exit the process at the end, to avoid `Too long with no output (exceeded 10m0s): context deadline exceeded`
+        self.install_steps.append("uv pip install git+https://github.com/ydshieh/pytest.git@8.4.1-ydshieh")
+        # Install pytest-random-order plugin for test randomization
+        self.install_steps.append("uv pip install pytest-random-order")
+        if self.pytest_options is None:
+            self.pytest_options = {}
+        if isinstance(self.tests_to_run, str):
+            self.tests_to_run = [self.tests_to_run]
+        else:
+            test_file = os.path.join("test_preparation", f"{self.job_name}_test_list.txt")
+            print("Looking for ", test_file)
+            if os.path.exists(test_file):
+                with open(test_file, encoding="utf-8") as f:
+                    expanded_tests = f.read().strip().split("\n")
+                self.tests_to_run = expanded_tests
+                print("Found:", expanded_tests)
+            else:
+                self.tests_to_run = []
+                print("not Found")
+
+    def to_dict(self):
+        env = COMMON_ENV_VARIABLES.copy()
+        # fmt: off
+        # not critical
+        env.update({"HF_TOKEN": "".join(["h", "f", "_", "q", "h", "b", "O", "C", "G", "N", "Y", "x", "D", "K", "C", "P", "J", "n", "q", "m", "O", "q", "g", "q", "s", "f", "q", "S", "v", "f", "s", "j", "q", "w", "j", "C", "T"])})
+        # fmt: on
+
+        # Do not run tests decorated by @is_flaky on pull requests
+        env["RUN_FLAKY"] = os.environ.get("CIRCLE_PULL_REQUEST", "") == ""
+        env.update(self.additional_env)
+
+        job = {
+            "docker": self.docker_image,
+            "environment": env,
+        }
+        if self.resource_class is not None:
+            job["resource_class"] = self.resource_class
+
+        all_options = {**COMMON_PYTEST_OPTIONS, **self.pytest_options}
+        pytest_flags = [
+            f"--{key}={value}" if (value is not None or key in ["doctest-modules"]) else f"-{key}"
+            for key, value in all_options.items()
+        ]
+        pytest_flags.append(
+            f"--make-reports={self.name}" if "examples" in self.name else f"--make-reports=tests_{self.name}"
+        )
+        # Examples special case: we need to download NLTK files in advance to avoid cuncurrency issues
+        timeout_cmd = f"timeout {self.command_timeout} " if self.command_timeout else ""
+        marker_cmd = f"-m '{self.marker}'" if self.marker is not None else ""
+        junit_flags = " -p no:warning -o junit_family=xunit1 --junitxml=test-results/junit.xml"
+        joined_flaky_patterns = "|".join(FLAKY_TEST_FAILURE_PATTERNS)
+        repeat_on_failure_flags = f"--reruns 5 --reruns-delay 2 --only-rerun '({joined_flaky_patterns})'"
+        parallel = f" << pipeline.parameters.{self.job_name}_parallelism >> "
+        steps = [
+            "checkout",
+            {"attach_workspace": {"at": "test_preparation"}},
+            {"run": "apt-get update && apt-get install -y curl"},
+            {"run": " && ".join(self.install_steps)},
+            {
+                "run": {
+                    "name": "Download NLTK files",
+                    "command": """python -c "import nltk; nltk.download('punkt', quiet=True)" """,
+                }
+                if "example" in self.name
+                else "echo Skipping"
+            },
+            {
+                "run": {
+                    "name": "Show installed libraries and their size",
+                    "command": """du -h -d 1 "$(pip -V | cut -d ' ' -f 4 | sed 's/pip//g')" | grep -vE "dist-info|_distutils_hack|__pycache__" | sort -h | tee installed.txt || true""",
+                }
+            },
+            {
+                "run": {
+                    "name": "Show installed libraries and their versions",
+                    "command": """pip list --format=freeze | tee installed.txt || true""",
+                }
+            },
+            {
+                "run": {
+                    "name": "Show biggest libraries",
+                    "command": """dpkg-query --show --showformat='${Installed-Size}\t${Package}\n' | sort -rh | head -25 | sort -h | awk '{ package=$2; sub(".*/", "", package); printf("%.5f GB %s\n", $1/1024/1024, package)}' || true""",
+                }
+            },
+            {"run": {"name": "Create `test-results` directory", "command": "mkdir test-results"}},
+            {
+                "run": {
+                    "name": "Get files to test",
+                    "command": f'curl -L -o {self.job_name}_test_list.txt <<pipeline.parameters.{self.job_name}_test_list>> --header "Circle-Token: $CIRCLE_TOKEN"'
+                    if self.name != "pr_documentation_tests"
+                    else 'echo "Skipped"',
+                }
+            },
+            {
+                "run": {
+                    "name": "Split tests across parallel nodes: show current parallel tests",
+                    "command": f"TESTS=$(circleci tests split  --split-by=timings {self.job_name}_test_list.txt) && echo $TESTS > splitted_tests.txt && echo $TESTS | tr ' ' '\n'"
+                    if self.parallelism
+                    else f"awk '{{printf \"%s \", $0}}' {self.job_name}_test_list.txt > splitted_tests.txt",
+                }
+            },
+            # During the CircleCI docker images build time, we might already (or not) download the data.
+            # If it's done already, the files are inside the directory `/test_data/`.
+            {
+                "run": {
+                    "name": "fetch hub objects before pytest",
+                    "command": "cp -r /test_data/* . 2>/dev/null || true; python3 utils/fetch_hub_objects_for_ci.py",
+                }
+            },
+            {
+                "run": {
+                    "name": "download and unzip hub cache",
+                    "command": 'curl -L -o huggingface-cache.tar.gz https://huggingface.co/datasets/hf-internal-testing/hf_hub_cache/resolve/main/huggingface-cache.tar.gz && apt-get install pigz && tar --use-compress-program="pigz -d -p 8" -xf huggingface-cache.tar.gz && mv -n hub/* /root/.cache/huggingface/hub/ && ls -la /root/.cache/huggingface/hub/',
+                }
+            },
+            {
+                "run": {
+                    "name": "Run tests",
+                    "command": f"({timeout_cmd} python3 -m pytest {marker_cmd} -n {self.pytest_num_workers} {junit_flags} {repeat_on_failure_flags} {' '.join(pytest_flags)} $(cat splitted_tests.txt) | tee tests_output.txt)",
+                }
+            },
+            {
+                "run": {
+                    "name": "Check for test crashes",
+                    "when": "always",
+                    "command": """if [ ! -f tests_output.txt ]; then
+                            echo "ERROR: tests_output.txt does not exist - tests may not have run properly"
+                            exit 1
+                        elif grep -q "crashed and worker restarting disabled" tests_output.txt; then
+                            echo "ERROR: Worker crash detected in test output"
+                            echo "Found: crashed and worker restarting disabled"
+                            exit 1
+                        else
+                            echo "Tests output file exists and no worker crashes detected"
+                        fi""",
+                },
+            },
+            {
+                "run": {
+                    "name": "Expand to show skipped tests",
+                    "when": "always",
+                    "command": "python3 .circleci/parse_test_outputs.py --file tests_output.txt --skip",
+                }
+            },
+            {
+                "run": {
+                    "name": "Failed tests: show reasons",
+                    "when": "always",
+                    "command": "python3 .circleci/parse_test_outputs.py --file tests_output.txt --fail",
+                }
+            },
+            {
+                "run": {
+                    "name": "Errors",
+                    "when": "always",
+                    "command": "python3 .circleci/parse_test_outputs.py --file tests_output.txt --errors",
+                }
+            },
+            {"store_test_results": {"path": "test-results"}},
+            {"store_artifacts": {"path": "test-results/junit.xml"}},
+            {"store_artifacts": {"path": "reports"}},
+            {"store_artifacts": {"path": "tests.txt"}},
+            {"store_artifacts": {"path": "splitted_tests.txt"}},
+            {"store_artifacts": {"path": "installed.txt"}},
+            {"store_artifacts": {"path": "network_debug_report.json"}},
+        ]
+        if self.parallelism:
+            job["parallelism"] = parallel
+        job["steps"] = steps
+        return job
+
+    @property
+    def job_name(self):
+        return (
+            self.name
+            if ("examples" in self.name or "pipeline" in self.name or "pr_documentation" in self.name)
+            else f"tests_{self.name}"
+        )
+
+
+# JOBS
+torch_job = CircleCIJob(
+    "torch",
+    docker_image=[{"image": "huggingface/transformers-torch-light"}],
+    marker="not generate",
+    parallelism=6,
+)
+
+generate_job = CircleCIJob(
+    "generate",
+    docker_image=[{"image": "huggingface/transformers-torch-light"}],
+    # networkx==3.3 (after #36957) cause some issues
+    # TODO: remove this once it works directly
+    install_steps=["uv pip install ."],
+    marker="generate",
+    parallelism=6,
+)
+
+tokenization_job = CircleCIJob(
+    "tokenization",
+    docker_image=[{"image": "huggingface/transformers-torch-light"}],
+    parallelism=8,
+)
+
+processor_job = CircleCIJob(
+    "processors",
+    docker_image=[{"image": "huggingface/transformers-torch-light"}],
+    parallelism=8,
+)
+
+pipelines_torch_job = CircleCIJob(
+    "pipelines_torch",
+    additional_env={"RUN_PIPELINE_TESTS": True},
+    docker_image=[{"image": "huggingface/transformers-torch-light"}],
+    marker="is_pipeline_test",
+    parallelism=4,
+)
+
+custom_tokenizers_job = CircleCIJob(
+    "custom_tokenizers",
+    additional_env={"RUN_CUSTOM_TOKENIZERS": True},
+    docker_image=[{"image": "huggingface/transformers-custom-tokenizers"}],
+)
+
+examples_torch_job = CircleCIJob(
+    "examples_torch",
+    additional_env={"OMP_NUM_THREADS": 8},
+    docker_image=[{"image": "huggingface/transformers-examples-torch"}],
+    # TODO @ArthurZucker remove this once docker is easier to build
+    install_steps=["uv pip install . && uv pip install -r examples/pytorch/_tests_requirements.txt"],
+    pytest_num_workers=4,
+)
+
+exotic_models_job = CircleCIJob(
+    "exotic_models",
+    docker_image=[{"image": "huggingface/transformers-exotic-models"}],
+    parallelism=4,
+    pytest_options={"durations": 100},
+)
+
+repo_utils_job = CircleCIJob(
+    "repo_utils",
+    docker_image=[{"image": "huggingface/transformers-consistency"}],
+    pytest_num_workers=4,
+    resource_class="large",
+)
+
+non_model_job = CircleCIJob(
+    "non_model",
+    docker_image=[{"image": "huggingface/transformers-torch-light"}],
+    # networkx==3.3 (after #36957) cause some issues
+    # TODO: remove this once it works directly
+    install_steps=["uv pip install .[serving]"],
+    marker="not generate",
+    parallelism=6,
+)
+
+training_ci_job = CircleCIJob(
+    "training_ci",
+    additional_env={"RUN_TRAINING_TESTS": True},
+    docker_image=[{"image": "huggingface/transformers-torch-light"}],
+    install_steps=["uv pip install ."],
+    marker="is_training_test",
+    parallelism=6,
+)
+
+tensor_parallel_ci_job = CircleCIJob(
+    "tensor_parallel_ci",
+    additional_env={"RUN_TENSOR_PARALLEL_TESTS": True},
+    docker_image=[{"image": "huggingface/transformers-torch-light"}],
+    install_steps=["uv pip install .", "uv pip install torchao"],
+    marker="is_tensor_parallel_test",
+    parallelism=6,
+)
+
+# We also include a `dummy.py` file in the files to be doc-tested to prevent edge case failure. Otherwise, the pytest
+# hangs forever during test collection while showing `collecting 0 items / 21 errors`. (To see this, we have to remove
+# the bash output redirection.)
+py_command = 'from utils.tests_fetcher import get_doctest_files; to_test = get_doctest_files() + ["dummy.py"]; to_test = " ".join(to_test); print(to_test)'
+py_command = f"$(python3 -c '{py_command}')"
+command = f'echo """{py_command}""" > pr_documentation_tests_temp.txt'
+doc_test_job = CircleCIJob(
+    "pr_documentation_tests",
+    docker_image=[{"image": "huggingface/transformers-consistency"}],
+    additional_env={"TRANSFORMERS_VERBOSITY": "error", "DATASETS_VERBOSITY": "error", "SKIP_CUDA_DOCTEST": "1"},
+    install_steps=[
+        # Add an empty file to keep the test step running correctly even no file is selected to be tested.
+        "uv pip install .",
+        "touch dummy.py",
+        command,
+        "cat pr_documentation_tests_temp.txt",
+        "tail -n1 pr_documentation_tests_temp.txt | tee pr_documentation_tests_test_list.txt",
+    ],
+    tests_to_run="$(cat pr_documentation_tests.txt)",  # noqa
+    pytest_options={"-doctest-modules": None, "doctest-glob": "*.md", "dist": "loadfile", "rvsA": None},
+    command_timeout=1200,  # test cannot run longer than 1200 seconds
+    pytest_num_workers=1,
+)
+
+REGULAR_TESTS = [torch_job, tokenization_job, processor_job, generate_job, non_model_job]  # fmt: skip
+EXAMPLES_TESTS = [examples_torch_job]
+PIPELINE_TESTS = [pipelines_torch_job]
+REPO_UTIL_TESTS = [repo_utils_job]
+DOC_TESTS = [doc_test_job]
+TRAINING_CI_TESTS = [training_ci_job]
+TENSOR_PARALLEL_CI_TESTS = [tensor_parallel_ci_job]
+ALL_TESTS = REGULAR_TESTS + EXAMPLES_TESTS + PIPELINE_TESTS + REPO_UTIL_TESTS + DOC_TESTS + [custom_tokenizers_job] + [exotic_models_job] + TRAINING_CI_TESTS + TENSOR_PARALLEL_CI_TESTS  # fmt: skip
+
+
+def create_circleci_config(folder=None):
+    if folder is None:
+        folder = os.getcwd()
+    os.environ["test_preparation_dir"] = folder
+    jobs = [k for k in ALL_TESTS if os.path.isfile(os.path.join("test_preparation", f"{k.job_name}_test_list.txt"))]
+    print("The following jobs will be run ", jobs)
+
+    if len(jobs) == 0:
+        jobs = [EmptyJob()]
+    else:
+        print(
+            "Full list of job name inputs",
+            {j.job_name + "_test_list": {"type": "string", "default": ""} for j in jobs},
+        )
+        # Add a job waiting all the test jobs and aggregate their test summary files at the end
+        collection_job = EmptyJob()
+        collection_job.job_name = "collection_job"
+        jobs = [collection_job] + jobs
+
+    config = {
+        "version": "2.1",
+        "parameters": {
+            # Only used to accept the parameters from the trigger
+            "nightly": {"type": "boolean", "default": False},
+            # Only used to accept the parameters from GitHub Actions trigger
+            "GHA_Actor": {"type": "string", "default": ""},
+            "GHA_Action": {"type": "string", "default": ""},
+            "GHA_Event": {"type": "string", "default": ""},
+            "GHA_Meta": {"type": "string", "default": ""},
+            "tests_to_run": {"type": "string", "default": ""},
+            **{j.job_name + "_test_list": {"type": "string", "default": ""} for j in jobs},
+            **{j.job_name + "_parallelism": {"type": "integer", "default": 1} for j in jobs},
+        },
+        "jobs": {j.job_name: j.to_dict() for j in jobs},
+    }
+    if "CIRCLE_TOKEN" in os.environ:
+        # For private forked repo. (e.g. new model addition)
+        config["workflows"] = {
+            "version": 2,
+            "run_tests": {"jobs": [{j.job_name: {"context": ["TRANSFORMERS_CONTEXT"]}} for j in jobs]},
+        }
+    else:
+        # For public repo. (e.g. `transformers`)
+        config["workflows"] = {"version": 2, "run_tests": {"jobs": [j.job_name for j in jobs]}}
+    with open(os.path.join(folder, "generated_config.yml"), "w", encoding="utf-8") as f:
+        f.write(
+            yaml.dump(config, sort_keys=False, default_flow_style=False)
+            .replace("' << pipeline", " << pipeline")
+            .replace(">> '", " >>")
+        )
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--fetcher_folder", type=str, default=None, help="Only test that all tests and modules are accounted for."
+    )
+    args = parser.parse_args()
+
+    create_circleci_config(args.fetcher_folder)
--- a/.circleci/parse_test_outputs.py
+++ b/.circleci/parse_test_outputs.py
@@ -0,0 +1,71 @@
+import argparse
+import re
+
+
+def parse_pytest_output(file_path):
+    skipped_tests = {}
+    skipped_count = 0
+    with open(file_path, 'r', encoding='utf-8') as file:
+        for line in file:
+            match = re.match(r'^SKIPPED \[(\d+)\] (tests/.*): (.*)$', line)
+            if match:
+                skipped_count += 1
+                test_file, test_line, reason = match.groups()
+                skipped_tests[reason] = skipped_tests.get(reason, []) + [(test_file, test_line)]
+    for k,v in sorted(skipped_tests.items(), key=lambda x:len(x[1])):
+        print(f"{len(v):4} skipped because: {k}")
+    print("Number of skipped tests:", skipped_count)
+
+def parse_pytest_failure_output(file_path):
+    failed_tests = {}
+    failed_count = 0
+    with open(file_path, 'r', encoding='utf-8') as file:
+        for line in file:
+            match = re.match(r'^FAILED (tests/.*) - (.*): (.*)$', line)
+            if match:
+                failed_count += 1
+                _, error, reason = match.groups()
+                failed_tests[reason] = failed_tests.get(reason, []) + [error]
+    for k,v in sorted(failed_tests.items(), key=lambda x:len(x[1])):
+        print(f"{len(v):4} failed because `{v[0]}` -> {k}")
+    print("Number of failed tests:", failed_count)
+    if failed_count>0:
+        exit(1)
+
+def parse_pytest_errors_output(file_path):
+    print(file_path)
+    error_tests = {}
+    error_count = 0
+    with open(file_path, 'r', encoding='utf-8') as file:
+        for line in file:
+            match = re.match(r'^ERROR (tests/.*) - (.*): (.*)$', line)
+            if match:
+                error_count += 1
+                _, test_error, reason = match.groups()
+                error_tests[reason] = error_tests.get(reason, []) + [test_error]
+    for k,v in sorted(error_tests.items(), key=lambda x:len(x[1])):
+        print(f"{len(v):4} errored out because of `{v[0]}` -> {k}")
+    print("Number of errors:", error_count)
+    if error_count>0:
+        exit(1)
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--file", help="file to parse")
+    parser.add_argument("--skip", action="store_true", help="show skipped reasons")
+    parser.add_argument("--fail", action="store_true", help="show failed tests")
+    parser.add_argument("--errors", action="store_true", help="show failed tests")
+    args = parser.parse_args()
+
+    if args.skip:
+        parse_pytest_output(args.file)
+
+    if args.fail:
+        parse_pytest_failure_output(args.file)
+
+    if args.errors:
+        parse_pytest_errors_output(args.file)
+
+
+if __name__ == "__main__":
+    main()
--- a/.git-blame-ignore-revs
+++ b/.git-blame-ignore-revs
@@ -0,0 +1 @@
+8008e6c83e1467dbe0ae3c81d19b29c17f4ff456
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,4 @@
+*.py	eol=lf
+*.rst	eol=lf
+*.md	eol=lf
+*.mdx   eol=lf
--- a/.github/ISSUE_TEMPLATE/bug-report.yml
+++ b/.github/ISSUE_TEMPLATE/bug-report.yml
@@ -0,0 +1,126 @@
+name: "\U0001F41B Bug Report"
+description: Submit a bug report to help us improve transformers
+labels: [ "bug" ]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Thanks for taking the time to fill out this bug report! 🤗
+
+        Before you submit your bug report:
+
+          - If it is your first time submitting, be sure to check our [bug report guidelines](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#did-you-find-a-bug)
+          - Try our [docs bot](https://huggingface.co/spaces/huggingchat/hf-docs-chat) -- it might be able to help you with your issue
+
+  - type: textarea
+    id: system-info
+    attributes:
+      label: System Info
+      description: Please share your system info with us. You can run the command `transformers env` and copy-paste its output below.
+      placeholder: transformers version, platform, python version, ...
+    validations:
+      required: true
+
+  - type: textarea
+    id: who-can-help
+    attributes:
+      label: Who can help?
+      description: |
+        Your issue will be replied to more quickly if you can figure out the right person to tag with @
+        If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
+
+        All issues are read by one of the core maintainers, so if you don't know who to tag, just leave this blank and
+        a core maintainer will ping the right person.
+
+        Please tag fewer than 3 people.
+
+        Models:
+
+          - text models: @ArthurZucker @Cyrilvallez
+          - vision models: @yonigozlan @molbap
+          - audio models: @eustlb @ebezzam @vasqu
+          - multimodal models: @zucchini-nlp
+          - graph models: @clefourrier
+
+        Library:
+
+          - generate: @zucchini-nlp (visual-language models) or @cyrilvallez (all others)
+          - continuous batching: @remi-or @ArthurZucker @McPatate
+          - pipelines: @Rocketknight1
+          - tokenizers: @ArthurZucker and @itazap
+          - trainer: @SunMarc
+          - attention: @vasqu @ArthurZucker @CyrilVallez
+          - model loading (from pretrained, etc): @CyrilVallez
+          - distributed: @3outeille @ArthurZucker
+          - CIs: @ydshieh
+
+        Integrations:
+
+          - ray/raytune: @richardliaw, @amogkam
+          - Big Model Inference: @SunMarc
+          - quantization: @SunMarc @MekkCyber
+          - kernels: @MekkCyber @drbh
+          - peft: @BenjaminBossan @githubnemo
+        
+        Devices/Backends:
+        
+          - AMD ROCm: @ivarflakstad
+          - Intel XPU: @IlyasMoutawwakil
+          - Ascend NPU: @ivarflakstad 
+
+        Documentation: @stevhliu
+
+        Model hub:
+
+          - for issues with a model, report at https://discuss.huggingface.co/ and tag the model's creator.
+
+        Research projects are not maintained and should be taken as is.
+
+      placeholder: "@Username ..."
+
+  - type: checkboxes
+    id: information-scripts-examples
+    attributes:
+      label: Information
+      description: 'The problem arises when using:'
+      options:
+        - label: "The official example scripts"
+        - label: "My own modified scripts"
+
+  - type: checkboxes
+    id: information-tasks
+    attributes:
+      label: Tasks
+      description: "The tasks I am working on are:"
+      options:
+        - label: "An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)"
+        - label: "My own task or dataset (give details below)"
+
+  - type: textarea
+    id: reproduction
+    validations:
+      required: true
+    attributes:
+      label: Reproduction
+      description: |
+        Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
+        Please include relevant config information with your code, for example your Trainers, TRL, Peft, and DeepSpeed configs.
+        If you have code snippets, error messages, stack traces please provide them here as well.
+        Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
+        Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
+
+      placeholder: |
+        Steps to reproduce the behavior:
+
+          1.
+          2.
+          3.
+
+
+  - type: textarea
+    id: expected-behavior
+    validations:
+      required: true
+    attributes:
+      label: Expected behavior
+      description: "A clear and concise description of what you would expect to happen."
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,12 @@
+blank_issues_enabled: true
+version: 2.1
+contact_links:
+  - name: Model checkpoints on the Hugging Face Hub
+    url: https://huggingface.co/models
+    about: Open a Pull request / Discussion related to a specific model checkpoint directly on the Hugging Face Hub
+  - name: Website Related
+    url: https://github.com/huggingface/hub-docs/issues
+    about: Feature requests and bug reports related to the website
+  - name: Forum
+    url: https://discuss.huggingface.co/
+    about: General usage questions and community discussions
--- a/.github/ISSUE_TEMPLATE/feature-request.yml
+++ b/.github/ISSUE_TEMPLATE/feature-request.yml
@@ -0,0 +1,31 @@
+name: "\U0001F680 Feature request"
+description: Submit a proposal/request for a new transformers feature
+labels: [ "Feature request" ]
+body:
+  - type: textarea
+    id: feature-request
+    validations:
+      required: true
+    attributes:
+      label: Feature request
+      description: |
+        A clear and concise description of the feature proposal. Please provide a link to the paper and code in case they exist.
+
+  - type: textarea
+    id: motivation
+    validations:
+      required: true
+    attributes:
+      label: Motivation
+      description: |
+        Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.
+
+
+  - type: textarea
+    id: contribution
+    validations:
+      required: true
+    attributes:
+      label: Your contribution
+      description: |
+        Is there any way that you could help, e.g. by submitting a PR? Make sure to read the CONTRIBUTING.MD [readme](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md)
--- a/.github/ISSUE_TEMPLATE/i18n.md
+++ b/.github/ISSUE_TEMPLATE/i18n.md
@@ -0,0 +1,46 @@
+---
+name: 🌐 Translating a new language?
+about: Start a new translation effort in your language
+title: '[i18n-<languageCode>] Translating docs to <languageName>'
+labels: WIP
+assignees: ''
+
+---
+
+<!--
+Note: Please search to see if an issue already exists for the language you are trying to translate.
+-->
+
+Hi!
+
+Let's bring the documentation to all the <languageName>-speaking community 🌐 (currently 0 out of 267 complete)
+
+Who would want to translate? Please follow the 🤗 [TRANSLATING guide](https://github.com/huggingface/transformers/blob/main/docs/TRANSLATING.md). Here is a list of the files ready for translation. Let us know in this issue if you'd like to translate any, and we'll add your name to the list.
+
+Some notes:
+
+* Please translate using an informal tone (imagine you are talking with a friend about transformers 🤗).
+* Please translate in a gender-neutral way.
+* Add your translations to the folder called `<languageCode>` inside the [source folder](https://github.com/huggingface/transformers/tree/main/docs/source).
+* Register your translation in `<languageCode>/_toctree.yml`; please follow the order of the [English version](https://github.com/huggingface/transformers/blob/main/docs/source/en/_toctree.yml).
+* Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue. Please ping @stevhliu for review.
+* 🙋 If you'd like others to help you with the translation, you can also post in the 🤗 [forums](https://discuss.huggingface.co/).
+
+## Get Started section
+
+- [ ] [index.md](https://github.com/huggingface/transformers/blob/main/docs/source/en/index.md) https://github.com/huggingface/transformers/pull/20180
+- [ ] [quicktour.md](https://github.com/huggingface/transformers/blob/main/docs/source/en/quicktour.md) (waiting for initial PR to go through)
+- [ ] [installation.md](https://github.com/huggingface/transformers/blob/main/docs/source/en/installation.md).
+
+## Tutorial section
+- [ ] [pipeline_tutorial.md](https://github.com/huggingface/transformers/blob/main/docs/source/en/pipeline_tutorial.md)
+- [ ]  [autoclass_tutorial.md](https://github.com/huggingface/transformers/blob/main/docs/source/en/autoclass_tutorial.md)
+- [ ]  [preprocessing.md](https://github.com/huggingface/transformers/blob/main/docs/source/en/preprocessing.md)
+- [ ]  [training.md](https://github.com/huggingface/transformers/blob/main/docs/source/en/training.md)
+- [ ]  [accelerate.md](https://github.com/huggingface/transformers/blob/main/docs/source/en/accelerate.md)
+- [ ]  [model_sharing.md](https://github.com/huggingface/transformers/blob/main/docs/source/en/model_sharing.md)
+- [ ]  [multilingual.md](https://github.com/huggingface/transformers/blob/main/docs/source/en/multilingual.md)
+
+<!--
+Keep on adding more as you go 🔥
+-->
--- a/.github/ISSUE_TEMPLATE/migration.yml
+++ b/.github/ISSUE_TEMPLATE/migration.yml
@@ -0,0 +1,72 @@
+name: "\U0001F4DA Migration from pytorch-pretrained-bert or pytorch-transformers"
+description: Report a problem when migrating from pytorch-pretrained-bert or pytorch-transformers to transformers
+labels: [ "migration" ]
+body:
+  - type: textarea
+    id: system-info
+    attributes:
+      label: System Info
+      description: Please share your system info with us. You can run the command `transformers env` and copy-paste its output below.
+      render: shell
+      placeholder: transformers version, platform, python version, ...
+    validations:
+      required: true
+
+  - type: checkboxes
+    id: information-scripts-examples
+    attributes:
+      label: Information
+      description: 'The problem arises when using:'
+      options:
+        - label: "The official example scripts"
+        - label: "My own modified scripts"
+
+  - type: checkboxes
+    id: information-tasks
+    attributes:
+      label: Tasks
+      description: "The tasks I am working on are:"
+      options:
+        - label: "An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)"
+        - label: "My own task or dataset (give details below)"
+
+  - type: textarea
+    id: reproduction
+    validations:
+      required: true
+    attributes:
+      label: Reproduction
+      description: |
+        Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
+        If you have code snippets, error messages, stack traces please provide them here as well.
+        Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
+        Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
+
+      placeholder: |
+        Steps to reproduce the behavior:
+          
+          1.
+          2.
+          3.
+          
+
+  - type: textarea
+    id: expected-behavior
+    validations:
+      required: true
+    attributes:
+      label: Expected behavior
+      description: "A clear and concise description of what you would expect to happen."
+      render: shell
+
+  - type: checkboxes
+    id: checklist
+    attributes:
+      label: Checklist
+      options:
+        - label: "I have read the migration guide in the readme.
+ ([pytorch-transformers](https://github.com/huggingface/transformers#migrating-from-pytorch-transformers-to-transformers);
+  [pytorch-pretrained-bert](https://github.com/huggingface/transformers#migrating-from-pytorch-pretrained-bert-to-transformers))"
+          required: true
+        - label: "I checked if a related official extension example runs on my machine."
+          required: true
--- a/.github/ISSUE_TEMPLATE/new-model-addition.yml
+++ b/.github/ISSUE_TEMPLATE/new-model-addition.yml
@@ -0,0 +1,31 @@
+name: "\U0001F31F New model addition"
+description: Submit a proposal/request to implement a new model
+labels: [ "New model" ]
+
+body:
+  - type: textarea
+    id: description-request
+    validations:
+      required: true
+    attributes:
+      label: Model description
+      description: |
+        Put any and all important information relative to the model
+
+  - type: checkboxes
+    id: information-tasks
+    attributes:
+      label: Open source status
+      description: |
+          Please note that if the model implementation isn't available or if the weights aren't open-source, we are less likely to implement it in `transformers`.
+      options:
+        - label: "The model implementation is available"
+        - label: "The model weights are available"
+
+  - type: textarea
+    id: additional-info
+    attributes:
+      label: Provide useful links for the implementation
+      description: |
+        Please provide information regarding the implementation, the weights, and the authors.
+        Please mention the authors by @gh-username if you're aware of their usernames.
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,93 @@
+# What does this PR do?
+
+<!--
+Congratulations! You've made it this far! You're not quite done yet though.
+
+Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution.
+
+Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change.
+
+Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost.
+-->
+
+<!-- Remove if not applicable -->
+
+Fixes # (issue)
+
+## Code Agent Policy
+
+The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
+code agents. We are currently bottlenecked by our ability to review and respond to them. As a result, 
+**we ask that new users do not submit pure code agent PRs** at this time. 
+You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
+not to open any PRs or issues for the moment.
+
+PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
+repeatedly or maliciously. 
+
+This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result, 
+this policy is likely to be updated regularly in the near future. For more information, please read [`CONTRIBUTING.md`](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md).
+
+- [ ] I confirm that this is not a pure code agent PR.
+
+## Before submitting
+- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
+- [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request),
+      Pull Request section?
+- [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link
+      to it if that's the case.
+- [ ] Did you make sure to update the documentation with your changes? Here are the
+      [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and
+      [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
+- [ ] Did you write any new necessary tests?
+
+
+## Who can review?
+
+Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
+members/contributors who may be interested in your PR.
+
+<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @
+
+ If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
+ Please tag fewer than 3 people.
+
+Models:
+
+- text models: @ArthurZucker @Cyrilvallez
+- vision models: @yonigozlan @molbap
+- audio models: @eustlb @ebezzam @vasqu
+- multimodal models: @zucchini-nlp
+- graph models: @clefourrier
+
+Library:
+
+- generate: @zucchini-nlp (visual-language models) or @gante (all others)
+- continuous batching: @remi-or @ArthurZucker @McPatate
+- pipelines: @Rocketknight1
+- tokenizers: @ArthurZucker and @itazap
+- trainer: @SunMarc
+- attention: @vasqu @ArthurZucker @CyrilVallez
+- model loading (from pretrained, etc): @CyrilVallez
+- distributed: @3outeille @ArthurZucker
+- CIs: @ydshieh
+
+Integrations:
+
+- ray/raytune: @richardliaw, @amogkam
+- Big Model Inference: @SunMarc
+- quantization: @SunMarc
+- kernels: @drbh
+- peft: @BenjaminBossan @githubnemo
+
+Devices/Backends:
+
+- AMD ROCm: @ivarflakstad
+- Intel XPU: @IlyasMoutawwakil
+- Ascend NPU: @ivarflakstad 
+
+Documentation: @stevhliu
+
+Research projects are not maintained and should be taken as is.
+
+ -->
--- a/.github/conda/build.sh
+++ b/.github/conda/build.sh
@@ -0,0 +1 @@
+$PYTHON setup.py install     # Python command to install the script.
--- a/.github/conda/meta.yaml
+++ b/.github/conda/meta.yaml
@@ -0,0 +1,54 @@
+{% set name = "transformers" %}
+
+package:
+  name: "{{ name|lower }}"
+  version: "{{ TRANSFORMERS_VERSION }}"
+
+source:
+  path: ../../
+
+build:
+  noarch: python
+
+requirements:
+  host:
+    - python
+    - pip
+    - numpy >=1.17
+    - dataclasses
+    - huggingface_hub
+    - packaging
+    - filelock
+    - tqdm >=4.60
+    - sacremoses
+    - regex !=2019.12.17
+    - protobuf
+    - tokenizers >=0.11.1,!=0.11.3,<0.13
+    - pyyaml >=5.1
+    - safetensors
+    - fsspec
+  run:
+    - python
+    - numpy >=1.17
+    - dataclasses
+    - huggingface_hub
+    - packaging
+    - filelock
+    - tqdm >=4.60
+    - sacremoses
+    - regex !=2019.12.17
+    - protobuf
+    - tokenizers >=0.11.1,!=0.11.3,<0.13
+    - pyyaml >=5.1
+    - safetensors
+    - fsspec
+
+test:
+  imports:
+    - transformers
+
+about:
+  home: https://huggingface.co
+  license: Apache License 2.0
+  license_file: LICENSE
+  summary: "🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0."
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -0,0 +1,39 @@
+# copilot-instructions.md Guide for Hugging Face Transformers
+
+This copilot-instructions.md file provides guidance for code agents working with this codebase.
+
+## Core Project Structure
+
+- `/src/transformers`: This contains the core source code for the library
+  - `/models`: Code for individual models. Models inherit from base classes in the root `/src/transformers` directory.
+- `/tests`: This contains the core test classes for the library. These are usually inherited rather than directly run.
+  - `/models`: Tests for individual models. Model tests inherit from common tests in the root `/tests` directory.
+- `/docs`: This contains the documentation for the library, including guides, tutorials, and API references.
+
+## Coding Conventions for Hugging Face Transformers
+
+- PRs should be as brief as possible. Bugfix PRs in particular can often be only one or two lines long, and do not need large comments, docstrings or new functions in this case. Aim to minimize the size of the diff.
+- When writing tests, they should be added to an existing file. The only exception is for PRs to add a new model, when a new test directory should be created for that model.
+- Code style is enforced in the CI. You can install the style tools with `pip install -e .[quality]`. You can then run `make fixup` to apply style and consistency fixes to your code.
+
+## Copying and inheritance
+
+Many models in the codebase have similar code, but it is not shared by inheritance because we want each model file to be self-contained.
+We use two mechanisms to keep this code in sync:
+
+- "Copied from" syntax. Functions or entire classes can have a comment at the top like this: `# Copied from transformers.models.llama.modeling_llama.rotate_half` or `# Copied from transformers.models.t5.modeling_t5.T5LayerNorm with T5->MT5`
+  These comments are actively checked by the style tools, and copies will automatically be updated when the base code is updated. If you need to update a copied function, you should
+  either update the base function and use `make fixup` to propagate the change to all copies, or simply remove the `# Copied from` comment if that is inappropriate.
+- "Modular" files. These files briefly define models by composing them using inheritance from other models. They are not meant to be used directly. Instead, the style tools
+  automatically generate a complete modeling file, like `modeling_bert.py`, from the modular file like `modular_bert.py`. If a model has a modular file, the modeling file
+  should never be edited directly! Instead, changes should be made in the modular file, and then you should run `make fixup` to update the modeling file automatically.
+
+When adding new models, you should prefer `modular` style and inherit as many classes as possible from existing models.
+
+## Testing
+
+After making changes, you should usually run `make fixup` to ensure any copies and modular files are updated, and then test all affected models. This includes both
+the model you made the changes in and any other models that were updated by `make fixup`. Tests can be run with `pytest tests/models/[name]/test_modeling_[name].py`
+If your changes affect code in other classes like tokenizers or processors, you should run those tests instead, like `test_processing_[name].py` or `test_tokenization_[name].py`.
+
+In order to run tests, you may need to install dependencies. You can do this with `pip install -e .[testing]`. You will probably also need to `pip install torch accelerate` if your environment does not already have them.
--- a/.github/dependabot.yml
+++ b/.github/dependabot.yml
@@ -0,0 +1,11 @@
+version: 2
+updates:
+  - package-ecosystem: "github-actions"
+    directory: "/"
+    schedule:
+      interval: "weekly"
+    cooldown:
+      default-days: 7
+    groups:
+      actions:
+        patterns: ["*"]
--- a/.github/scripts/assign_reviewers.py
+++ b/.github/scripts/assign_reviewers.py
@@ -0,0 +1,122 @@
+# coding=utf-8
+# Copyright 2025 the HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+import re
+from collections import Counter
+from pathlib import Path
+
+import github
+from github import Github
+
+
+def pattern_to_regex(pattern):
+    if pattern.startswith("/"):
+        start_anchor = True
+        pattern = re.escape(pattern[1:])
+    else:
+        start_anchor = False
+        pattern = re.escape(pattern)
+    # Replace `*` with "any number of non-slash characters"
+    pattern = pattern.replace(r"\*", "[^/]*")
+    if start_anchor:
+        pattern = r"^\/?" + pattern  # Allow an optional leading slash after the start of the string
+    return pattern
+
+def get_file_owners(file_path, codeowners_lines):
+    # Process lines in reverse (last matching pattern takes precedence)
+    for line in reversed(codeowners_lines):
+        # Skip comments and empty lines, strip inline comments
+        line = line.split('#')[0].strip()
+        if not line:
+            continue
+
+        # Split into pattern and owners
+        parts = line.split()
+        pattern = parts[0]
+        # Can be empty, e.g. for dummy files with explicitly no owner!
+        owners = [owner.removeprefix("@") for owner in parts[1:]]
+
+        # Check if file matches pattern
+        file_regex = pattern_to_regex(pattern)
+        if re.search(file_regex, file_path) is not None:
+            return owners  # Remember, can still be empty!
+    return []  # Should never happen, but just in case
+
+def pr_author_is_in_hf(pr_author, codeowners_lines):
+    # Check if the PR author is in the codeowners file
+    for line in codeowners_lines:
+        line = line.split('#')[0].strip()
+        if not line:
+            continue
+
+        # Split into pattern and owners
+        parts = line.split()
+        owners = [owner.removeprefix("@") for owner in parts[1:]]
+
+        if pr_author in owners:
+            return True
+    return False
+
+def main():
+    script_dir = Path(__file__).parent.absolute()
+    with open(script_dir / "codeowners_for_review_action") as f:
+        codeowners_lines = f.readlines()
+
+    g = Github(os.environ['GITHUB_TOKEN'])
+    repo = g.get_repo("huggingface/transformers")
+    with open(os.environ['GITHUB_EVENT_PATH']) as f:
+        event = json.load(f)
+
+    # The PR number is available in the event payload
+    pr_number = event['pull_request']['number']
+    pr = repo.get_pull(pr_number)
+    pr_author = pr.user.login
+    if pr_author_is_in_hf(pr_author, codeowners_lines):
+        print(f"PR author {pr_author} is in codeowners, skipping review request.")
+        return
+
+    existing_reviews = list(pr.get_reviews())
+    if existing_reviews:
+        print(f"Already has reviews: {[r.user.login for r in existing_reviews]}")
+        return
+
+    users_requested, teams_requested = pr.get_review_requests()
+    users_requested = list(users_requested)
+    if users_requested:
+        print(f"Reviewers already requested: {users_requested}")
+        return
+
+    locs_per_owner = Counter()
+    for file in pr.get_files():
+        owners = get_file_owners(file.filename, codeowners_lines)
+        for owner in owners:
+            locs_per_owner[owner] += file.changes
+
+    # Assign the top 2 based on locs changed as reviewers, but skip the owner if present
+    locs_per_owner.pop(pr_author, None)
+    top_owners = locs_per_owner.most_common(2)
+    print("Top owners", top_owners)
+    top_owners = [owner[0] for owner in top_owners]
+    try:
+        pr.create_review_request(top_owners)
+    except github.GithubException as e:
+        print(f"Failed to request review for {top_owners}: {e}")
+
+
+
+if __name__ == "__main__":
+    main()
--- a/.github/scripts/codeowners_for_review_action
+++ b/.github/scripts/codeowners_for_review_action
@@ -0,0 +1,369 @@
+# Top-level rules are matched only if nothing else matches
+* @Rocketknight1 @ArthurZucker # if no one is pinged based on the other rules, he will do the dispatch
+*.md @stevhliu
+*tokenization* @ArthurZucker
+docs/ @stevhliu
+/benchmark/ @McPatate
+/docker/ @ydshieh @ArthurZucker
+
+# More high-level globs catch cases when specific rules later don't apply
+/src/transformers/models/*/processing* @molbap @yonigozlan
+/src/transformers/models/*/image_processing* @yonigozlan
+/src/transformers/models/*/image_processing_*_fast* @yonigozlan
+
+# Owners of subsections of the library
+/src/transformers/generation/ @gante
+/src/transformers/pipeline/ @Rocketknight1 @yonigozlan
+/src/transformers/integrations/ @SunMarc @MekkCyber @zach-huggingface
+/src/transformers/quantizers/ @SunMarc @MekkCyber
+tests/ @ydshieh
+tests/generation/ @gante
+
+/src/transformers/models/auto/ @ArthurZucker
+/src/transformers/utils/ @ArthurZucker @Rocketknight1
+/src/transformers/loss/ @ArthurZucker
+
+# Specific files come after the sections/globs, so they take priority
+/.circleci/config.yml @ArthurZucker @ydshieh
+/utils/tests_fetcher.py @ydshieh
+trainer.py @zach-huggingface @SunMarc
+trainer_utils.py @zach-huggingface @SunMarc
+/utils/modular_model_converter.py @Cyrilvallez @ArthurZucker
+
+# Owners of individual models are specific / high priority, and so they come last
+# mod* captures modeling and modular files
+
+# Text models
+/src/transformers/models/albert/mod*_albert* @ArthurZucker
+/src/transformers/models/bamba/mod*_bamba* @ArthurZucker
+/src/transformers/models/bart/mod*_bart* @ArthurZucker
+/src/transformers/models/barthez/mod*_barthez* @ArthurZucker
+/src/transformers/models/bartpho/mod*_bartpho* @ArthurZucker
+/src/transformers/models/bert/mod*_bert* @ArthurZucker
+/src/transformers/models/bert_generation/mod*_bert_generation* @ArthurZucker
+/src/transformers/models/bert_japanese/mod*_bert_japanese* @ArthurZucker
+/src/transformers/models/bertweet/mod*_bertweet* @ArthurZucker
+/src/transformers/models/big_bird/mod*_big_bird* @ArthurZucker
+/src/transformers/models/bigbird_pegasus/mod*_bigbird_pegasus* @ArthurZucker
+/src/transformers/models/biogpt/mod*_biogpt* @ArthurZucker
+/src/transformers/models/blenderbot/mod*_blenderbot* @ArthurZucker
+/src/transformers/models/blenderbot_small/mod*_blenderbot_small* @ArthurZucker
+/src/transformers/models/bloom/mod*_bloom* @ArthurZucker
+/src/transformers/models/bort/mod*_bort* @ArthurZucker
+/src/transformers/models/byt5/mod*_byt5* @ArthurZucker
+/src/transformers/models/camembert/mod*_camembert* @ArthurZucker
+/src/transformers/models/canine/mod*_canine* @ArthurZucker
+/src/transformers/models/codegen/mod*_codegen* @ArthurZucker
+/src/transformers/models/code_llama/mod*_code_llama* @ArthurZucker
+/src/transformers/models/cohere/mod*_cohere* @ArthurZucker
+/src/transformers/models/cohere2/mod*_cohere2* @ArthurZucker
+/src/transformers/models/convbert/mod*_convbert* @ArthurZucker
+/src/transformers/models/cpm/mod*_cpm* @ArthurZucker
+/src/transformers/models/cpmant/mod*_cpmant* @ArthurZucker
+/src/transformers/models/ctrl/mod*_ctrl* @ArthurZucker
+/src/transformers/models/dbrx/mod*_dbrx* @ArthurZucker
+/src/transformers/models/deberta/mod*_deberta* @ArthurZucker
+/src/transformers/models/deberta_v2/mod*_deberta_v2* @ArthurZucker
+/src/transformers/models/dialogpt/mod*_dialogpt* @ArthurZucker
+/src/transformers/models/diffllama/mod*_diffllama* @ArthurZucker
+/src/transformers/models/distilbert/mod*_distilbert* @ArthurZucker
+/src/transformers/models/dpr/mod*_dpr* @ArthurZucker
+/src/transformers/models/electra/mod*_electra* @ArthurZucker
+/src/transformers/models/encoder_decoder/mod*_encoder_decoder* @ArthurZucker
+/src/transformers/models/ernie/mod*_ernie* @ArthurZucker
+/src/transformers/models/ernie_m/mod*_ernie_m* @ArthurZucker
+/src/transformers/models/esm/mod*_esm* @ArthurZucker
+/src/transformers/models/falcon/mod*_falcon* @ArthurZucker
+/src/transformers/models/falcon3/mod*_falcon3* @ArthurZucker
+/src/transformers/models/falcon_mamba/mod*_falcon_mamba* @ArthurZucker
+/src/transformers/models/fastspeech2_conformer/mod*_fastspeech2_conformer* @ArthurZucker
+/src/transformers/models/flan_t5/mod*_flan_t5* @ArthurZucker
+/src/transformers/models/flan_ul2/mod*_flan_ul2* @ArthurZucker
+/src/transformers/models/flaubert/mod*_flaubert* @ArthurZucker
+/src/transformers/models/fnet/mod*_fnet* @ArthurZucker
+/src/transformers/models/fsmt/mod*_fsmt* @ArthurZucker
+/src/transformers/models/funnel/mod*_funnel* @ArthurZucker
+/src/transformers/models/fuyu/mod*_fuyu* @ArthurZucker
+/src/transformers/models/gemma/mod*_gemma* @ArthurZucker
+/src/transformers/models/gemma2/mod*_gemma2* @ArthurZucker
+/src/transformers/models/glm/mod*_glm* @ArthurZucker
+/src/transformers/models/openai_gpt/mod*_openai_gpt* @ArthurZucker
+/src/transformers/models/gpt_neo/mod*_gpt_neo* @ArthurZucker
+/src/transformers/models/gpt_neox/mod*_gpt_neox* @ArthurZucker
+/src/transformers/models/gpt_neox_japanese/mod*_gpt_neox_japanese* @ArthurZucker
+/src/transformers/models/gptj/mod*_gptj* @ArthurZucker
+/src/transformers/models/gpt2/mod*_gpt2* @ArthurZucker
+/src/transformers/models/gpt_bigcode/mod*_gpt_bigcode* @ArthurZucker
+/src/transformers/models/gptsan_japanese/mod*_gptsan_japanese* @ArthurZucker
+/src/transformers/models/gpt_sw3/mod*_gpt_sw3* @ArthurZucker
+/src/transformers/models/granite/mod*_granite* @ArthurZucker
+/src/transformers/models/granitemoe/mod*_granitemoe* @ArthurZucker
+/src/transformers/models/herbert/mod*_herbert* @ArthurZucker
+/src/transformers/models/ibert/mod*_ibert* @ArthurZucker
+/src/transformers/models/jamba/mod*_jamba* @ArthurZucker
+/src/transformers/models/jetmoe/mod*_jetmoe* @ArthurZucker
+/src/transformers/models/jukebox/mod*_jukebox* @ArthurZucker
+/src/transformers/models/led/mod*_led* @ArthurZucker
+/src/transformers/models/llama/mod*_llama* @ArthurZucker @Cyrilvallez
+/src/transformers/models/longformer/mod*_longformer* @ArthurZucker
+/src/transformers/models/longt5/mod*_longt5* @ArthurZucker
+/src/transformers/models/luke/mod*_luke* @ArthurZucker
+/src/transformers/models/m2m_100/mod*_m2m_100* @ArthurZucker
+/src/transformers/models/madlad_400/mod*_madlad_400* @ArthurZucker
+/src/transformers/models/mamba/mod*_mamba* @ArthurZucker
+/src/transformers/models/mamba2/mod*_mamba2* @ArthurZucker
+/src/transformers/models/marian/mod*_marian* @ArthurZucker
+/src/transformers/models/markuplm/mod*_markuplm* @ArthurZucker
+/src/transformers/models/mbart/mod*_mbart* @ArthurZucker
+/src/transformers/models/mega/mod*_mega* @ArthurZucker
+/src/transformers/models/megatron_bert/mod*_megatron_bert* @ArthurZucker
+/src/transformers/models/megatron_gpt2/mod*_megatron_gpt2* @ArthurZucker
+/src/transformers/models/mistral/mod*_mistral* @ArthurZucker
+/src/transformers/models/mixtral/mod*_mixtral* @ArthurZucker
+/src/transformers/models/mluke/mod*_mluke* @ArthurZucker
+/src/transformers/models/mobilebert/mod*_mobilebert* @ArthurZucker
+/src/transformers/models/modernbert/mod*_modernbert* @ArthurZucker
+/src/transformers/models/mpnet/mod*_mpnet* @ArthurZucker
+/src/transformers/models/mpt/mod*_mpt* @ArthurZucker
+/src/transformers/models/mra/mod*_mra* @ArthurZucker
+/src/transformers/models/mt5/mod*_mt5* @ArthurZucker
+/src/transformers/models/mvp/mod*_mvp* @ArthurZucker
+/src/transformers/models/myt5/mod*_myt5* @ArthurZucker
+/src/transformers/models/nemotron/mod*_nemotron* @ArthurZucker
+/src/transformers/models/nezha/mod*_nezha* @ArthurZucker
+/src/transformers/models/nllb/mod*_nllb* @ArthurZucker
+/src/transformers/models/nllb_moe/mod*_nllb_moe* @ArthurZucker
+/src/transformers/models/nystromformer/mod*_nystromformer* @ArthurZucker
+/src/transformers/models/olmo/mod*_olmo* @ArthurZucker
+/src/transformers/models/olmo2/mod*_olmo2* @ArthurZucker
+/src/transformers/models/olmoe/mod*_olmoe* @ArthurZucker
+/src/transformers/models/open_llama/mod*_open_llama* @ArthurZucker
+/src/transformers/models/opt/mod*_opt* @ArthurZucker
+/src/transformers/models/pegasus/mod*_pegasus* @ArthurZucker
+/src/transformers/models/pegasus_x/mod*_pegasus_x* @ArthurZucker
+/src/transformers/models/persimmon/mod*_persimmon* @ArthurZucker
+/src/transformers/models/phi/mod*_phi* @ArthurZucker
+/src/transformers/models/phi3/mod*_phi3* @ArthurZucker
+/src/transformers/models/phimoe/mod*_phimoe* @ArthurZucker
+/src/transformers/models/phobert/mod*_phobert* @ArthurZucker
+/src/transformers/models/plbart/mod*_plbart* @ArthurZucker
+/src/transformers/models/prophetnet/mod*_prophetnet* @ArthurZucker
+/src/transformers/models/qdqbert/mod*_qdqbert* @ArthurZucker
+/src/transformers/models/qwen2/mod*_qwen2* @ArthurZucker
+/src/transformers/models/qwen2_moe/mod*_qwen2_moe* @ArthurZucker
+/src/transformers/models/rag/mod*_rag* @ArthurZucker
+/src/transformers/models/realm/mod*_realm* @ArthurZucker
+/src/transformers/models/recurrent_gemma/mod*_recurrent_gemma* @ArthurZucker
+/src/transformers/models/reformer/mod*_reformer* @ArthurZucker
+/src/transformers/models/rembert/mod*_rembert* @ArthurZucker
+/src/transformers/models/retribert/mod*_retribert* @ArthurZucker
+/src/transformers/models/roberta/mod*_roberta* @ArthurZucker
+/src/transformers/models/roberta_prelayernorm/mod*_roberta_prelayernorm* @ArthurZucker
+/src/transformers/models/roc_bert/mod*_roc_bert* @ArthurZucker
+/src/transformers/models/roformer/mod*_roformer* @ArthurZucker
+/src/transformers/models/rwkv/mod*_rwkv* @ArthurZucker
+/src/transformers/models/splinter/mod*_splinter* @ArthurZucker
+/src/transformers/models/squeezebert/mod*_squeezebert* @ArthurZucker
+/src/transformers/models/stablelm/mod*_stablelm* @ArthurZucker
+/src/transformers/models/starcoder2/mod*_starcoder2* @ArthurZucker
+/src/transformers/models/switch_transformers/mod*_switch_transformers* @ArthurZucker
+/src/transformers/models/t5/mod*_t5* @ArthurZucker
+/src/transformers/models/t5v1.1/mod*_t5v1.1* @ArthurZucker
+/src/transformers/models/tapex/mod*_tapex* @ArthurZucker
+/src/transformers/models/transfo_xl/mod*_transfo_xl* @ArthurZucker
+/src/transformers/models/ul2/mod*_ul2* @ArthurZucker
+/src/transformers/models/umt5/mod*_umt5* @ArthurZucker
+/src/transformers/models/xmod/mod*_xmod* @ArthurZucker
+/src/transformers/models/xglm/mod*_xglm* @ArthurZucker
+/src/transformers/models/xlm/mod*_xlm* @ArthurZucker
+/src/transformers/models/xlm_prophetnet/mod*_xlm_prophetnet* @ArthurZucker
+/src/transformers/models/xlm_roberta/mod*_xlm_roberta* @ArthurZucker
+/src/transformers/models/xlm_roberta_xl/mod*_xlm_roberta_xl* @ArthurZucker
+/src/transformers/models/xlm_v/mod*_xlm_v* @ArthurZucker
+/src/transformers/models/xlnet/mod*_xlnet* @ArthurZucker
+/src/transformers/models/yoso/mod*_yoso* @ArthurZucker
+/src/transformers/models/zamba/mod*_zamba* @ArthurZucker
+
+# Vision models
+/src/transformers/models/beit/mod*_beit* @yonigozlan @molbap
+/src/transformers/models/bit/mod*_bit* @yonigozlan @molbap
+/src/transformers/models/conditional_detr/mod*_conditional_detr* @yonigozlan @molbap
+/src/transformers/models/convnext/mod*_convnext* @yonigozlan @molbap
+/src/transformers/models/convnextv2/mod*_convnextv2* @yonigozlan @molbap
+/src/transformers/models/cvt/mod*_cvt* @yonigozlan @molbap
+/src/transformers/models/deformable_detr/mod*_deformable_detr* @yonigozlan @molbap
+/src/transformers/models/deit/mod*_deit* @yonigozlan @molbap
+/src/transformers/models/depth_anything/mod*_depth_anything* @yonigozlan @molbap
+/src/transformers/models/depth_anything_v2/mod*_depth_anything_v2* @yonigozlan @molbap
+/src/transformers/models/deta/mod*_deta* @yonigozlan @molbap
+/src/transformers/models/detr/mod*_detr* @yonigozlan @molbap
+/src/transformers/models/dinat/mod*_dinat* @yonigozlan @molbap
+/src/transformers/models/dinov2/mod*_dinov2* @yonigozlan @molbap
+/src/transformers/models/dinov2_with_registers/mod*_dinov2_with_registers* @yonigozlan @molbap
+/src/transformers/models/dit/mod*_dit* @yonigozlan @molbap
+/src/transformers/models/dpt/mod*_dpt* @yonigozlan @molbap
+/src/transformers/models/efficientformer/mod*_efficientformer* @yonigozlan @molbap
+/src/transformers/models/efficientnet/mod*_efficientnet* @yonigozlan @molbap
+/src/transformers/models/focalnet/mod*_focalnet* @yonigozlan @molbap
+/src/transformers/models/glpn/mod*_glpn* @yonigozlan @molbap
+/src/transformers/models/hiera/mod*_hiera* @yonigozlan @molbap
+/src/transformers/models/ijepa/mod*_ijepa* @yonigozlan @molbap
+/src/transformers/models/imagegpt/mod*_imagegpt* @yonigozlan @molbap
+/src/transformers/models/levit/mod*_levit* @yonigozlan @molbap
+/src/transformers/models/mask2former/mod*_mask2former* @yonigozlan @molbap
+/src/transformers/models/maskformer/mod*_maskformer* @yonigozlan @molbap
+/src/transformers/models/mobilenet_v1/mod*_mobilenet_v1* @yonigozlan @molbap
+/src/transformers/models/mobilenet_v2/mod*_mobilenet_v2* @yonigozlan @molbap
+/src/transformers/models/mobilevit/mod*_mobilevit* @yonigozlan @molbap
+/src/transformers/models/mobilevitv2/mod*_mobilevitv2* @yonigozlan @molbap
+/src/transformers/models/nat/mod*_nat* @yonigozlan @molbap
+/src/transformers/models/poolformer/mod*_poolformer* @yonigozlan @molbap
+/src/transformers/models/pvt/mod*_pvt* @yonigozlan @molbap
+/src/transformers/models/pvt_v2/mod*_pvt_v2* @yonigozlan @molbap
+/src/transformers/models/regnet/mod*_regnet* @yonigozlan @molbap
+/src/transformers/models/resnet/mod*_resnet* @yonigozlan @molbap
+/src/transformers/models/rt_detr/mod*_rt_detr* @yonigozlan @molbap
+/src/transformers/models/segformer/mod*_segformer* @yonigozlan @molbap
+/src/transformers/models/seggpt/mod*_seggpt* @yonigozlan @molbap
+/src/transformers/models/superpoint/mod*_superpoint* @yonigozlan @molbap
+/src/transformers/models/swiftformer/mod*_swiftformer* @yonigozlan @molbap
+/src/transformers/models/swin/mod*_swin* @yonigozlan @molbap
+/src/transformers/models/swinv2/mod*_swinv2* @yonigozlan @molbap
+/src/transformers/models/swin2sr/mod*_swin2sr* @yonigozlan @molbap
+/src/transformers/models/table_transformer/mod*_table_transformer* @yonigozlan @molbap
+/src/transformers/models/textnet/mod*_textnet* @yonigozlan @molbap
+/src/transformers/models/timm_wrapper/mod*_timm_wrapper* @yonigozlan @molbap
+/src/transformers/models/upernet/mod*_upernet* @yonigozlan @molbap
+/src/transformers/models/van/mod*_van* @yonigozlan @molbap
+/src/transformers/models/vit/mod*_vit* @yonigozlan @molbap
+/src/transformers/models/vit_hybrid/mod*_vit_hybrid* @yonigozlan @molbap
+/src/transformers/models/vitdet/mod*_vitdet* @yonigozlan @molbap
+/src/transformers/models/vit_mae/mod*_vit_mae* @yonigozlan @molbap
+/src/transformers/models/vitmatte/mod*_vitmatte* @yonigozlan @molbap
+/src/transformers/models/vit_msn/mod*_vit_msn* @yonigozlan @molbap
+/src/transformers/models/vitpose/mod*_vitpose* @yonigozlan @molbap
+/src/transformers/models/yolos/mod*_yolos* @yonigozlan @molbap
+/src/transformers/models/zoedepth/mod*_zoedepth* @yonigozlan @molbap
+
+# Audio models
+/src/transformers/models/audio_spectrogram_transformer/mod*_audio_spectrogram_transformer* @eustlb
+/src/transformers/models/bark/mod*_bark* @eustlb
+/src/transformers/models/clap/mod*_clap* @eustlb
+/src/transformers/models/dac/mod*_dac* @eustlb
+/src/transformers/models/encodec/mod*_encodec* @eustlb
+/src/transformers/models/hubert/mod*_hubert* @eustlb
+/src/transformers/models/mctct/mod*_mctct* @eustlb
+/src/transformers/models/mimi/mod*_mimi* @eustlb
+/src/transformers/models/mms/mod*_mms* @eustlb
+/src/transformers/models/moshi/mod*_moshi* @eustlb
+/src/transformers/models/musicgen/mod*_musicgen* @eustlb
+/src/transformers/models/musicgen_melody/mod*_musicgen_melody* @eustlb
+/src/transformers/models/pop2piano/mod*_pop2piano* @eustlb
+/src/transformers/models/seamless_m4t/mod*_seamless_m4t* @eustlb
+/src/transformers/models/seamless_m4t_v2/mod*_seamless_m4t_v2* @eustlb
+/src/transformers/models/sew/mod*_sew* @eustlb
+/src/transformers/models/sew_d/mod*_sew_d* @eustlb
+/src/transformers/models/speech_to_text/mod*_speech_to_text* @eustlb
+/src/transformers/models/speech_to_text_2/mod*_speech_to_text_2* @eustlb
+/src/transformers/models/speecht5/mod*_speecht5* @eustlb
+/src/transformers/models/unispeech/mod*_unispeech* @eustlb
+/src/transformers/models/unispeech_sat/mod*_unispeech_sat* @eustlb
+/src/transformers/models/univnet/mod*_univnet* @eustlb
+/src/transformers/models/vits/mod*_vits* @eustlb
+/src/transformers/models/wav2vec2/mod*_wav2vec2* @eustlb
+/src/transformers/models/wav2vec2_bert/mod*_wav2vec2_bert* @eustlb
+/src/transformers/models/wav2vec2_conformer/mod*_wav2vec2_conformer* @eustlb
+/src/transformers/models/wav2vec2_phoneme/mod*_wav2vec2_phoneme* @eustlb
+/src/transformers/models/wavlm/mod*_wavlm* @eustlb
+/src/transformers/models/whisper/mod*_whisper* @eustlb
+/src/transformers/models/xls_r/mod*_xls_r* @eustlb
+/src/transformers/models/xlsr_wav2vec2/mod*_xlsr_wav2vec2* @eustlb
+
+# Video models
+/src/transformers/models/timesformer/mod*_timesformer* @Rocketknight1
+/src/transformers/models/videomae/mod*_videomae* @Rocketknight1
+/src/transformers/models/vivit/mod*_vivit* @Rocketknight1
+
+# Multimodal models
+/src/transformers/models/align/mod*_align* @zucchini-nlp
+/src/transformers/models/altclip/mod*_altclip* @zucchini-nlp
+/src/transformers/models/aria/mod*_aria* @zucchini-nlp
+/src/transformers/models/blip/mod*_blip* @zucchini-nlp
+/src/transformers/models/blip_2/mod*_blip_2* @zucchini-nlp
+/src/transformers/models/bridgetower/mod*_bridgetower* @zucchini-nlp
+/src/transformers/models/bros/mod*_bros* @zucchini-nlp
+/src/transformers/models/chameleon/mod*_chameleon* @zucchini-nlp
+/src/transformers/models/chinese_clip/mod*_chinese_clip* @zucchini-nlp
+/src/transformers/models/clip/mod*_clip* @zucchini-nlp
+/src/transformers/models/clipseg/mod*_clipseg* @zucchini-nlp
+/src/transformers/models/clvp/mod*_clvp* @zucchini-nlp
+/src/transformers/models/colpali/mod*_colpali* @zucchini-nlp @yonigozlan
+/src/transformers/models/data2vec/mod*_data2vec* @zucchini-nlp
+/src/transformers/models/deplot/mod*_deplot* @zucchini-nlp
+/src/transformers/models/donut/mod*_donut* @zucchini-nlp
+/src/transformers/models/flava/mod*_flava* @zucchini-nlp
+/src/transformers/models/git/mod*_git* @zucchini-nlp
+/src/transformers/models/grounding_dino/mod*_grounding_dino* @yonigozlan
+/src/transformers/models/groupvit/mod*_groupvit* @zucchini-nlp
+/src/transformers/models/idefics/mod*_idefics* @zucchini-nlp
+/src/transformers/models/idefics2/mod*_idefics2* @zucchini-nlp
+/src/transformers/models/idefics3/mod*_idefics3* @zucchini-nlp
+/src/transformers/models/instructblip/mod*_instructblip* @zucchini-nlp
+/src/transformers/models/instructblipvideo/mod*_instructblipvideo* @zucchini-nlp
+/src/transformers/models/kosmos_2/mod*_kosmos_2* @zucchini-nlp
+/src/transformers/models/layoutlm/mod*_layoutlm* @NielsRogge
+/src/transformers/models/layoutlmv2/mod*_layoutlmv2* @NielsRogge
+/src/transformers/models/layoutlmv3/mod*_layoutlmv3* @NielsRogge
+/src/transformers/models/layoutxlm/mod*_layoutxlm* @NielsRogge
+/src/transformers/models/lilt/mod*_lilt* @zucchini-nlp
+/src/transformers/models/llava/mod*_llava* @zucchini-nlp @arthurzucker
+/src/transformers/models/llava_next/mod*_llava_next* @zucchini-nlp
+/src/transformers/models/llava_next_video/mod*_llava_next_video* @zucchini-nlp
+/src/transformers/models/llava_onevision/mod*_llava_onevision* @zucchini-nlp
+/src/transformers/models/lxmert/mod*_lxmert* @zucchini-nlp
+/src/transformers/models/matcha/mod*_matcha* @zucchini-nlp
+/src/transformers/models/mgp_str/mod*_mgp_str* @zucchini-nlp
+/src/transformers/models/mllama/mod*_mllama* @zucchini-nlp
+/src/transformers/models/nougat/mod*_nougat* @NielsRogge
+/src/transformers/models/omdet_turbo/mod*_omdet_turbo* @yonigozlan
+/src/transformers/models/oneformer/mod*_oneformer* @zucchini-nlp
+/src/transformers/models/owlvit/mod*_owlvit* @yonigozlan
+/src/transformers/models/owlv2/mod*_owlv2* @yonigozlan
+/src/transformers/models/paligemma/mod*_paligemma* @zucchini-nlp @molbap
+/src/transformers/models/perceiver/mod*_perceiver* @zucchini-nlp
+/src/transformers/models/pix2struct/mod*_pix2struct* @zucchini-nlp
+/src/transformers/models/pixtral/mod*_pixtral* @zucchini-nlp @ArthurZucker
+/src/transformers/models/qwen2_audio/mod*_qwen2_audio* @zucchini-nlp @ArthurZucker
+/src/transformers/models/qwen2_vl/mod*_qwen2_vl* @zucchini-nlp @ArthurZucker
+/src/transformers/models/sam/mod*_sam* @zucchini-nlp @ArthurZucker
+/src/transformers/models/siglip/mod*_siglip* @zucchini-nlp
+/src/transformers/models/speech_encoder_decoder/mod*_speech_encoder_decoder* @zucchini-nlp
+/src/transformers/models/tapas/mod*_tapas* @NielsRogge
+/src/transformers/models/trocr/mod*_trocr* @zucchini-nlp
+/src/transformers/models/tvlt/mod*_tvlt* @zucchini-nlp
+/src/transformers/models/tvp/mod*_tvp* @zucchini-nlp
+/src/transformers/models/udop/mod*_udop* @zucchini-nlp
+/src/transformers/models/video_llava/mod*_video_llava* @zucchini-nlp
+/src/transformers/models/vilt/mod*_vilt* @zucchini-nlp
+/src/transformers/models/vipllava/mod*_vipllava* @zucchini-nlp
+/src/transformers/models/vision_encoder_decoder/mod*_vision_encoder_decoder* @Rocketknight1
+/src/transformers/models/vision_text_dual_encoder/mod*_vision_text_dual_encoder* @Rocketknight1
+/src/transformers/models/visual_bert/mod*_visual_bert* @zucchini-nlp
+/src/transformers/models/xclip/mod*_xclip* @zucchini-nlp
+
+# Reinforcement learning models
+/src/transformers/models/decision_transformer/mod*_decision_transformer* @Rocketknight1
+/src/transformers/models/trajectory_transformer/mod*_trajectory_transformer* @Rocketknight1
+
+# Time series models
+/src/transformers/models/autoformer/mod*_autoformer* @Rocketknight1
+/src/transformers/models/informer/mod*_informer* @Rocketknight1
+/src/transformers/models/patchtsmixer/mod*_patchtsmixer* @Rocketknight1
+/src/transformers/models/patchtst/mod*_patchtst* @Rocketknight1
+/src/transformers/models/time_series_transformer/mod*_time_series_transformer* @Rocketknight1
+
+# Graph models
+/src/transformers/models/graphormer/mod*_graphormer* @clefourrier
+
+# Finally, files with no owners that shouldn't generate pings, usually automatically generated and checked in the CI
+utils/dummy*
--- a/.github/workflows/TROUBLESHOOT.md
+++ b/.github/workflows/TROUBLESHOOT.md
@@ -0,0 +1,9 @@
+# Troubleshooting
+
+This is a document explaining how to deal with various issues on github-actions self-hosted CI. The entries may include actual solutions or pointers to Issues that cover those.
+
+## GitHub Actions (self-hosted CI)
+
+* Deepspeed
+
+  - if jit build hangs, clear out `rm -rf ~/.cache/torch_extensions/` reference: https://github.com/huggingface/transformers/pull/12723
--- a/.github/workflows/add-model-like.yml
+++ b/.github/workflows/add-model-like.yml
@@ -0,0 +1,85 @@
+name: Add model like runner
+
+on:
+  push:
+    branches:
+      - none # put main here when this is fixed
+  #pull_request:
+  #  paths:
+  #    - "src/**"
+  #    - "tests/**"
+  #    - ".github/**"
+  #  types: [opened, synchronize, reopened]
+
+permissions:
+  contents: read
+
+jobs:
+  run_tests_templates_like:
+    name: "Add new model like template tests"
+    runs-on: ubuntu-22.04
+    steps:
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          persist-credentials: false
+
+      - name: Install dependencies
+        run: |
+          sudo apt -y update && sudo apt install -y libsndfile1-dev
+
+      - name: Load cached virtual environment
+        uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830  # v4.3.0
+        id: cache
+        with:
+          path: ~/venv/
+          key: v4-tests_model_like-${{ hashFiles('setup.py') }}
+
+      - name: Create virtual environment on cache miss
+        if: steps.cache.outputs.cache-hit != 'true'
+        run: |
+          python -m venv ~/venv && . ~/venv/bin/activate
+          pip install --upgrade pip!=21.3
+          pip install -e .[dev]
+
+      - name: Check transformers location
+        # make `transformers` available as package (required since we use `-e` flag) and check it's indeed from the repo.
+        run: |
+          . ~/venv/bin/activate
+          python setup.py develop
+          transformers_install=$(pip list -e | grep transformers)
+          transformers_install_array=($transformers_install)
+          transformers_loc=${transformers_install_array[-1]}
+          transformers_repo_loc=$(pwd .)
+          if [ "$transformers_loc" != "$transformers_repo_loc" ]; then
+              echo "transformers is from $transformers_loc but it shoud be from $transformers_repo_loc/src."
+              echo "A fix is required. Stop testing."
+              exit 1
+          fi
+
+      - name: Create model files
+        run: |
+          . ~/venv/bin/activate
+          transformers add-new-model-like --config_file tests/fixtures/add_distilbert_like_config.json --path_to_repo .
+          make style
+          make fix-copies
+
+      - name: Run all PyTorch modeling test
+        run: |
+          . ~/venv/bin/activate
+          python -m pytest -n 2 --dist=loadfile -s --make-reports=tests_new_models tests/bert_new/test_modeling_bert_new.py
+
+      - name: Run style changes
+        run: |
+          . ~/venv/bin/activate
+          make style && make quality && make repo-consistency
+
+      - name: Failure short reports
+        if: ${{ always() }}
+        run: cat reports/tests_new_models/failures_short.txt
+
+      - name: Test suite reports artifacts
+        if: ${{ always() }}
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02  # v4.6.2
+        with:
+          name: run_all_tests_new_models_test_reports
+          path: reports/tests_new_models
--- a/.github/workflows/anti-slop.yml
+++ b/.github/workflows/anti-slop.yml
@@ -0,0 +1,68 @@
+name: Anti-Slop
+
+permissions:
+  contents: read
+  issues: read
+  pull-requests: write
+
+on:
+  pull_request_target:
+    types: [opened, reopened]
+
+jobs:
+  anti-slop:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: peakoss/anti-slop@85daca1880e9e1af197fc06ea03349daf08f4202 # v0.2.1
+        with:
+          # -- Failure threshold --
+          # Require both enabled checks to fail before labeling while we validate the signals
+          max-failures: 2
+
+          # -- Do NOT close or lock, just label --
+          close-pr: false
+          lock-pr: false
+          failure-add-pr-labels: "Code agent slop"
+          failure-pr-message: |
+            This PR was flagged by our automated quality checks. If you're a genuine
+            contributor, please reply here and a maintainer will review your PR.
+
+            Common reasons for flagging:
+            - New GitHub account
+            - Unusually high number of repository forks in a 24-hour window
+
+            We appreciate your contribution and apologize if this is a false positive!
+
+          # -- Account checks --
+          # Start with two conservative, high-signal checks and iterate from there
+          min-account-age: 30
+          max-daily-forks: 7
+
+          # -- Disabled checks (keep minimal) --
+          blocked-source-branches: ""
+          blocked-paths: ""
+          detect-spam-usernames: false
+          min-profile-completeness: 0
+          require-description: false
+          require-linked-issue: false
+          require-conventional-title: false
+          require-pr-template: false
+          strict-pr-template-sections: ""
+          optional-pr-template-sections: ""
+          max-additional-pr-template-sections: 0
+          max-description-length: 0
+          require-conventional-commits: false
+          require-commit-author-match: false
+          require-maintainer-can-modify: false
+          require-final-newline: false
+          max-added-comments: 0
+          max-emoji-count: 0
+          max-code-references: 0
+          max-commit-message-length: 0
+          min-repo-merged-prs: 0
+          min-repo-merge-ratio: 0
+          min-global-merge-ratio: 0
+
+          # -- Exemptions --
+          exempt-author-association: "OWNER,MEMBER,COLLABORATOR"
+          exempt-label: "exempt"
--- a/.github/workflows/assign-reviewers.yml
+++ b/.github/workflows/assign-reviewers.yml
@@ -0,0 +1,31 @@
+name: Assign PR Reviewers
+on:
+  pull_request_target:
+    branches:
+      - main
+    types: [ready_for_review]
+
+permissions:
+  contents: read
+
+jobs:
+  assign_reviewers:
+    permissions:
+       pull-requests: write
+    runs-on: ubuntu-22.04
+    steps:
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          persist-credentials: false
+      - name: Set up Python
+        uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065  # v5.6.0
+        with:
+          python-version: '3.13'
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install PyGithub
+      - name: Run assignment script
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: python .github/scripts/assign_reviewers.py
--- a/.github/workflows/benchmark.yml
+++ b/.github/workflows/benchmark.yml
@@ -0,0 +1,65 @@
+name: Self-hosted runner (benchmark)
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    types: [ opened, labeled, reopened, synchronize ]
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
+  cancel-in-progress: true
+
+env:
+  HF_HOME: /mnt/cache
+  DATASET_ID: hf-benchmarks/transformers
+  MODEL_ID: meta-llama/Llama-3.1-8B-Instruct
+
+permissions:
+  contents: read
+
+jobs:
+  benchmark:
+    name: Benchmark
+    strategy:
+      matrix:
+        # group: [aws-g5-4xlarge-cache, aws-p4d-24xlarge-plus] (A100 runner is not enabled)
+        group: [aws-g5-4xlarge-cache]
+    runs-on:
+      group: ${{ matrix.group }}
+    if: |
+      (github.event_name == 'pull_request' && contains( github.event.pull_request.labels.*.name, 'run-benchmark') )||
+      (github.event_name == 'push' && github.ref == 'refs/heads/main')
+    container:
+      image: huggingface/transformers-all-latest-gpu
+      options: --gpus all --privileged --ipc host
+    steps:
+      - name: Get repo
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          fetch-depth: 1
+          persist-credentials: false
+
+      - name: Install benchmark script dependencies
+        run: python3 -m pip install -r benchmark_v2/requirements.txt kernels
+
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e ".[torch]"
+
+      - name: Run benchmark
+        run: |
+          git config --global --add safe.directory /__w/transformers/transformers
+          if [ "$GITHUB_EVENT_NAME" = "pull_request" ]; then
+            commit_id=$(echo "${{ github.event.pull_request.head.sha }}")
+          elif [ "$GITHUB_EVENT_NAME" = "push" ]; then
+            commit_id=$GITHUB_SHA
+          fi
+          commit_msg=$(git show -s --format=%s | cut -c1-70)
+          python3 benchmark_v2/run_benchmarks.py -b 32 -s 128 -n 256 --level 2 --branch-name "$BRANCH_NAME" --commit-id "$commit_id" --commit-message "$commit_msg" --model-id "$MODEL_ID" --log-level INFO --push-result-to-dataset "$DATASET_ID"
+        env:
+          HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
+          PUSH_TO_HUB_TOKEN: ${{ secrets.PUSH_TO_HUB_TOKEN }}
+          # Enable this to see debug logs
+          # HF_HUB_VERBOSITY: debug
+          # TRANSFORMERS_VERBOSITY: debug
+          BRANCH_NAME: ${{ github.head_ref || github.ref_name }}
--- a/.github/workflows/benchmark_v2.yml
+++ b/.github/workflows/benchmark_v2.yml
@@ -0,0 +1,91 @@
+name: Benchmark v2 Framework
+
+on:
+  workflow_dispatch:
+    inputs:
+      runner:
+        description: 'Runner to use for the benchmark job'
+        required: true
+        type: string
+      container_image:
+        description: 'Container image to use'
+        required: true
+        type: string
+      container_options:
+        description: 'Container options'
+        required: false
+        type: string
+        default: ''
+      commit_sha:
+        description: 'Commit SHA to benchmark'
+        required: false
+        type: string
+      run_id:
+        description: 'Run ID for tracking'
+        required: false
+        type: string
+      benchmark_repo_id:
+        description: 'Repository ID to push benchmark results'
+        required: true
+        type: string
+
+env:
+  HF_HOME: /mnt/cache
+  TRANSFORMERS_IS_CI: yes
+  # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
+  # This token is created under the bot `hf-transformers-bot`.
+  HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
+
+permissions:
+  contents: read
+
+jobs:
+  benchmark-v2:
+    name: Benchmark v2
+    runs-on: ${{ inputs.runner }}
+    if: |
+      (github.event_name == 'pull_request' && contains( github.event.pull_request.labels.*.name, 'run-benchmark')) ||
+      (github.event_name == 'schedule')
+    container:
+      image: ${{ inputs.container_image }}
+      options: ${{ inputs.container_options }}
+    steps:
+      - name: Get repo
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          ref: ${{ inputs.commit_sha || github.sha }}
+          persist-credentials: false
+
+      - name: Install benchmark dependencies
+        run: |
+          python3 -m pip install -r benchmark_v2/requirements.txt
+
+      - name: Reinstall transformers in edit mode
+        run: |
+          python3 -m pip uninstall -y transformers
+          python3 -m pip install -e ".[torch]"
+
+      - name: Show installed libraries and their versions
+        run: |
+          python3 -m pip list
+          python3 -c "import torch; print(f'PyTorch version: {torch.__version__}')"
+          python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
+          python3 -c "import torch; print(f'CUDA device count: {torch.cuda.device_count()}')" || true
+          nvidia-smi || true
+
+      - name: Run benchmark v2
+        working-directory: benchmark_v2
+        env:
+          HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
+          COMMIT_ID: ${{ inputs.commit_sha || github.sha }}
+          RUN_ID: ${{ inputs.run_id }}
+          BENCHMARK_REPO_ID: ${{ inputs.benchmark_repo_id }}
+          UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
+        run: |
+          echo "Running benchmarks"
+          python3 run_benchmarks.py \
+          --commit-id "$COMMIT_ID" \
+          --run-id "$RUN_ID" \
+          --push-to-hub "$BENCHMARK_REPO_ID" \
+          --token "$UPLOAD_TOKEN" \
+          --log-level INFO
--- a/.github/workflows/benchmark_v2_a10_caller.yml
+++ b/.github/workflows/benchmark_v2_a10_caller.yml
@@ -0,0 +1,20 @@
+name: Benchmark v2 Scheduled Runner - A10 Single-GPU
+
+on:
+  workflow_dispatch:
+
+permissions:
+  contents: read
+
+jobs:
+  benchmark-v2-default:
+    name: Benchmark v2 - Default Models
+    uses: ./.github/workflows/benchmark_v2.yml
+    with:
+      runner: aws-g5-4xlarge-cache-use1-public-80
+      container_image: huggingface/transformers-all-latest-gpu
+      container_options: --gpus all --privileged --ipc host --shm-size "16gb"
+      commit_sha: ${{ github.sha }}
+      run_id: ${{ github.run_id }}
+      benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
+    secrets: inherit
--- a/.github/workflows/benchmark_v2_mi325_caller.yml
+++ b/.github/workflows/benchmark_v2_mi325_caller.yml
@@ -0,0 +1,20 @@
+name: Benchmark v2 Scheduled Runner - MI325 Single-GPU
+
+on:
+  workflow_dispatch:
+
+permissions:
+  contents: read
+
+jobs:
+  benchmark-v2-default:
+    name: Benchmark v2 - Default Models
+    uses: ./.github/workflows/benchmark_v2.yml
+    with:
+      runner: amd-mi325-ci-1gpu
+      container_image: huggingface/transformers-pytorch-amd-gpu
+      container_options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache
+      commit_sha: ${{ github.sha }}
+      run_id: ${{ github.run_id }}
+      benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
+    secrets: inherit
--- a/.github/workflows/build-ci-docker-images.yml
+++ b/.github/workflows/build-ci-docker-images.yml
@@ -0,0 +1,84 @@
+name: Build pr ci-docker
+
+on:
+  push:
+    branches:
+      - push-ci-image # for now let's only build on this branch
+  repository_dispatch:
+  workflow_call:
+    inputs:
+      image_postfix:
+        required: true
+        type: string
+  schedule:
+    - cron: "6 0 * * *"
+
+
+concurrency:
+  group: ${{ github.workflow }}
+  cancel-in-progress: true
+
+permissions:
+  contents: read
+
+jobs:
+  build:
+    runs-on: ubuntu-22.04
+
+    if: ${{ contains(github.event.head_commit.message, '[build-ci-image]') || contains(github.event.head_commit.message, '[push-ci-image]') && '!cancelled()' || github.event_name == 'schedule' }}
+
+    strategy:
+      matrix:
+        file: ["quality", "consistency", "custom-tokenizers", "torch-light", "exotic-models", "examples-torch"]
+    continue-on-error: true
+
+    steps:
+      -
+        name: Set tag
+        env:
+          COMMIT_MESSAGE: ${{ github.event.head_commit.message }}
+        run: |
+              if echo "$COMMIT_MESSAGE" | grep -q '\[build-ci-image\]'; then
+                  echo "TAG=huggingface/transformers-${{ matrix.file }}:dev" >> "$GITHUB_ENV"
+                  echo "setting it to DEV!"
+              else
+                  echo "TAG=huggingface/transformers-${{ matrix.file }}" >> "$GITHUB_ENV"
+
+              fi
+      -
+        name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f  # v3.12.0
+      -
+        name: Check out code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          persist-credentials: false
+      -
+        name: Login to DockerHub
+        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9  # v3.7.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_PASSWORD }}
+      -
+        name: Build ${{ matrix.file }}.dockerfile
+        uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25  # v5.4.0
+        with:
+          context: ./docker
+          build-args: |
+            REF=${{ github.sha }}
+          file: "./docker/${{ matrix.file }}.dockerfile"
+          push: ${{ contains(github.event.head_commit.message, 'ci-image]') ||  github.event_name == 'schedule' }}
+          tags: ${{ env.TAG }}
+
+  notify:
+    runs-on: ubuntu-22.04
+    if: ${{ contains(github.event.head_commit.message, '[build-ci-image]') || contains(github.event.head_commit.message, '[push-ci-image]') && '!cancelled()' || github.event_name == 'schedule' }}
+    steps:
+      - name: Post to Slack
+        if: ${{ contains(github.event.head_commit.message, '[push-ci-image]') && github.event_name != 'schedule' }}
+        uses: huggingface/hf-workflows/.github/actions/post-slack@a88e7fa2eaee28de5a4d6142381b1fb792349b67  # main
+        with:
+          slack_channel: "#transformers-ci-circleci-images"
+          title: 🤗 New docker images for CircleCI are pushed.
+          status: ${{ job.status }}
+          slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
--- a/.github/workflows/build-docker-images.yml
+++ b/.github/workflows/build-docker-images.yml
@@ -0,0 +1,321 @@
+name: Build docker images (scheduled)
+
+on:
+  push:
+    branches:
+      - build_ci_docker_image*
+  repository_dispatch:
+  workflow_dispatch:
+  workflow_call:
+    inputs:
+      image_postfix:
+        required: true
+        type: string
+  schedule:
+    - cron: "17 0 * * *"
+
+concurrency:
+  group: docker-images-builds
+  cancel-in-progress: false
+
+permissions:
+  contents: read
+
+jobs:
+  latest-docker:
+    name: "Latest PyTorch [dev]"
+    runs-on:
+      group: aws-general-8-plus
+    steps:
+      -
+        name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3.12.0
+      -
+        name: Check out code
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+      -
+        name: Login to DockerHub
+        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3.7.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_PASSWORD }}
+      -
+        name: Build and push
+        uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25 # v5.4.0
+        with:
+          context: ./docker/transformers-all-latest-gpu
+          build-args: |
+            REF=main
+          push: true
+          tags: huggingface/transformers-all-latest-gpu${{ inputs.image_postfix }}
+
+      - name: Post to Slack
+        if: always()
+        uses: huggingface/hf-workflows/.github/actions/post-slack@63657f571a92cc9759159442936061c51d6d9ae4 # main
+        with:
+          slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
+          title: 🤗 Results of the transformers-all-latest-gpu docker build
+          status: ${{ job.status }}
+          slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
+
+  flash-attn-ci-image:
+    name: "PyTorch with Flash Attn [dev]"
+    runs-on:
+      group: aws-general-8-plus
+    steps:
+      -
+        name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3.12.0
+      -
+        name: Check out code
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+      -
+        name: Login to DockerHub
+        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3.7.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_PASSWORD }}
+      -
+        name: Build and push
+        uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25 # v5.4.0
+        with:
+          context: ./docker/transformers-all-latest-gpu
+          build-args: |
+            REF=main
+            PYTORCH=2.8.0
+            TORCHCODEC=0.7.0
+            FLASH_ATTN=yes
+          push: true
+          tags: huggingface/transformers-all-latest-gpu${{ inputs.image_postfix }}:flash-attn
+
+      - name: Post to Slack
+        if: always()
+        uses: huggingface/hf-workflows/.github/actions/post-slack@63657f571a92cc9759159442936061c51d6d9ae4 # main
+        with:
+          slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
+          title: 🤗 Results of the transformers-all-latest-gpu docker build
+          status: ${{ job.status }}
+          slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
+
+  latest-torch-deepspeed-docker:
+    name: "Latest PyTorch + DeepSpeed"
+    runs-on:
+      group: aws-general-8-plus
+    steps:
+      -
+        name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3.12.0
+      -
+        name: Check out code
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+      -
+        name: Login to DockerHub
+        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3.7.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_PASSWORD }}
+      -
+        name: Build and push
+        uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25 # v5.4.0
+        with:
+          context: ./docker/transformers-pytorch-deepspeed-latest-gpu
+          build-args: |
+            REF=main
+          push: true
+          tags: huggingface/transformers-pytorch-deepspeed-latest-gpu${{ inputs.image_postfix }}
+
+      - name: Post to Slack
+        if: always()
+        uses: huggingface/hf-workflows/.github/actions/post-slack@63657f571a92cc9759159442936061c51d6d9ae4 # main
+        with:
+          slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER}}
+          title: 🤗 Results of the transformers-pytorch-deepspeed-latest-gpu docker build
+          status: ${{ job.status }}
+          slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
+
+  doc-builder:
+    name: "Doc builder"
+    runs-on:
+      group: aws-general-8-plus
+    steps:
+      -
+        name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3.12.0
+      -
+        name: Check out code
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+      -
+        name: Login to DockerHub
+        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3.7.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_PASSWORD }}
+      -
+        name: Build and push
+        uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25 # v5.4.0
+        with:
+          context: ./docker/transformers-doc-builder
+          push: true
+          tags: huggingface/transformers-doc-builder
+
+      - name: Post to Slack
+        if: always()
+        uses: huggingface/hf-workflows/.github/actions/post-slack@63657f571a92cc9759159442936061c51d6d9ae4 # main
+        with:
+          slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
+          title: 🤗 Results of the huggingface/transformers-doc-builder docker build
+          status: ${{ job.status }}
+          slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
+
+  latest-pytorch-amd:
+    name: "Latest PyTorch (AMD) [dev]"
+    runs-on:
+      group: aws-highcpu-32-priv
+    steps:
+      -
+        name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3.12.0
+      -
+        name: Check out code
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+      -
+        name: Login to DockerHub
+        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3.7.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_PASSWORD }}
+      -
+        name: Build and push
+        uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25 # v5.4.0
+        with:
+          context: ./docker/transformers-pytorch-amd-gpu
+          build-args: |
+            REF=main
+          push: true
+          tags: huggingface/transformers-pytorch-amd-gpu${{ inputs.image_postfix }}
+
+      - name: Post to Slack
+        if: always()
+        uses: huggingface/hf-workflows/.github/actions/post-slack@63657f571a92cc9759159442936061c51d6d9ae4 # main
+        with:
+          slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
+          title: 🤗 Results of the huggingface/transformers-pytorch-amd-gpu build
+          status: ${{ job.status }}
+          slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
+
+  cache-latest-pytorch-amd:
+    name: "Cache Latest Pytorch (AMD) Image"
+    needs: latest-pytorch-amd
+    runs-on:
+      group: amd-mi325-1gpu
+    steps:
+      -
+        name: Login to DockerHub
+        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3.7.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_PASSWORD }}
+        
+      - 
+        name: Pull and save docker image to cache
+        run: |
+          image="huggingface/transformers-pytorch-amd-gpu"
+          final_path="/mnt/image-cache/transformers-pytorch-amd-gpu.tar"
+          tmp_path="${final_path}.tmp"
+
+          echo "Pulling image: ${image}"
+          docker pull "${image}"
+
+          echo "Saving to temp file: ${tmp_path}"
+          docker save "${image}" -o "${tmp_path}"
+
+          echo "Moving to final path: ${final_path}"
+          mv -f "${tmp_path}" "${final_path}"
+
+          echo "Cache populated successfully at ${final_path}"
+
+  latest-pytorch-deepspeed-amd:
+    name: "PyTorch + DeepSpeed (AMD) [dev]"
+    runs-on:
+      group: aws-general-8-plus
+    steps:
+      -
+        name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3.12.0
+      -
+        name: Check out code
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+      -
+        name: Login to DockerHub
+        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3.7.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_PASSWORD }}
+      -
+        name: Build and push
+        uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25 # v5.4.0
+        with:
+          context: ./docker/transformers-pytorch-deepspeed-amd-gpu
+          build-args: |
+            REF=main
+          push: true
+          tags: huggingface/transformers-pytorch-deepspeed-amd-gpu${{ inputs.image_postfix }}
+
+      - name: Post to Slack
+        if: always()
+        uses: huggingface/hf-workflows/.github/actions/post-slack@63657f571a92cc9759159442936061c51d6d9ae4 # main
+        with:
+          slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
+          title: 🤗 Results of the transformers-pytorch-deepspeed-amd-gpu build
+          status: ${{ job.status }}
+          slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
+
+  latest-quantization-torch-docker:
+    name: "Latest Pytorch + Quantization [dev]"
+    runs-on:
+      group: aws-general-8-plus
+    steps:
+      -
+        name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3.12.0
+      -
+        name: Check out code
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+      -
+        name: Login to DockerHub
+        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3.7.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_PASSWORD }}
+      -
+        name: Build and push
+        uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25 # v5.4.0
+        with:
+          context: ./docker/transformers-quantization-latest-gpu
+          build-args: |
+            REF=main
+          push: true
+          tags: huggingface/transformers-quantization-latest-gpu${{ inputs.image_postfix }}
+
+      - name: Post to Slack
+        if: always()
+        uses: huggingface/hf-workflows/.github/actions/post-slack@63657f571a92cc9759159442936061c51d6d9ae4 # main
+        with:
+          slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
+          title: 🤗 Results of the transformers-quantization-latest-gpu build
+          status: ${{ job.status }}
+          slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
--- a/.github/workflows/build-nightly-ci-docker-images.yml
+++ b/.github/workflows/build-nightly-ci-docker-images.yml
@@ -0,0 +1,80 @@
+name: Build docker images (Nightly CI)
+
+on:
+  workflow_call:
+    inputs:
+      job:
+        required: true
+        type: string
+  push:
+    branches:
+      - build_nightly_ci_docker_image*
+
+concurrency:
+  group: docker-images-builds
+  cancel-in-progress: false
+
+permissions:
+  contents: read
+
+jobs:
+  latest-with-torch-nightly-docker:
+    name: "Nightly PyTorch"
+    if: inputs.job == 'latest-with-torch-nightly-docker' || inputs.job == ''
+    runs-on:
+      group: aws-general-8-plus
+    steps:
+      -
+        name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@885d1462b80bc1c1c7f0b00334ad271f09369c55 # v2.10.0
+      -
+        name: Check out code
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+      -
+        name: Login to DockerHub
+        uses: docker/login-action@465a07811f14bebb1938fbed4728c6a1ff8901fc # v2.2.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_PASSWORD }}
+      -
+        name: Build and push
+        uses: docker/build-push-action@1104d471370f9806843c095c1db02b5a90c5f8b6 # v3.3.1
+        with:
+          context: ./docker/transformers-all-latest-gpu
+          build-args: |
+            REF=main
+            PYTORCH=pre
+          push: true
+          tags: huggingface/transformers-all-latest-torch-nightly-gpu
+
+  nightly-torch-deepspeed-docker:
+    name: "Nightly PyTorch + DeepSpeed"
+    if: inputs.job == 'nightly-torch-deepspeed-docker' || inputs.job == ''
+    runs-on:
+      group: aws-g4dn-2xlarge-cache
+    steps:
+      -
+        name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@885d1462b80bc1c1c7f0b00334ad271f09369c55 # v2.10.0
+      -
+        name: Check out code
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+      -
+        name: Login to DockerHub
+        uses: docker/login-action@465a07811f14bebb1938fbed4728c6a1ff8901fc # v2.2.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_PASSWORD }}
+      -
+        name: Build and push
+        uses: docker/build-push-action@1104d471370f9806843c095c1db02b5a90c5f8b6 # v3.3.1
+        with:
+          context: ./docker/transformers-pytorch-deepspeed-nightly-gpu
+          build-args: |
+            REF=main
+          push: true
+          tags: huggingface/transformers-pytorch-deepspeed-nightly-gpu
--- a/.github/workflows/build-past-ci-docker-images.yml
+++ b/.github/workflows/build-past-ci-docker-images.yml
@@ -0,0 +1,108 @@
+name: Build docker images (Past CI)
+
+on:
+  push:
+    branches:
+      - build_past_ci_docker_image*
+
+concurrency:
+  group: docker-images-builds
+  cancel-in-progress: false
+
+permissions:
+  contents: read
+
+jobs:
+  past-pytorch-docker:
+    name: "Past PyTorch Docker"
+    strategy:
+      fail-fast: false
+      matrix:
+        version: ["1.13", "1.12", "1.11"]
+    runs-on:
+      group: aws-general-8-plus
+    steps:
+      -
+        name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@885d1462b80bc1c1c7f0b00334ad271f09369c55 # v2.10.0
+      -
+        name: Check out code
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+      -
+        id: get-base-image
+        name: Get Base Image
+        env:
+          framework_version: ${{ matrix.version }}
+        run: |
+          echo "base_image=$(python3 -c 'import os; from utils.past_ci_versions import past_versions_testing; base_image = past_versions_testing["pytorch"][os.environ["framework_version"]]["base_image"]; print(base_image)')" >> $GITHUB_OUTPUT
+      -
+        name: Print Base Image
+        run: |
+          echo ${{ steps.get-base-image.outputs.base_image }}
+      -
+        name: Login to DockerHub
+        uses: docker/login-action@465a07811f14bebb1938fbed4728c6a1ff8901fc # v2.2.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_PASSWORD }}
+      -
+        name: Build and push
+        uses: docker/build-push-action@1104d471370f9806843c095c1db02b5a90c5f8b6 # v3.3.1
+        with:
+          context: ./docker/transformers-past-gpu
+          build-args: |
+            REF=main
+            BASE_DOCKER_IMAGE=${{ steps.get-base-image.outputs.base_image }}
+            FRAMEWORK=pytorch
+            VERSION=${{ matrix.version }}
+          push: true
+          tags: huggingface/transformers-pytorch-past-${{ matrix.version }}-gpu
+
+  past-tensorflow-docker:
+    name: "Past TensorFlow Docker"
+    strategy:
+      fail-fast: false
+      matrix:
+        version: ["2.11", "2.10", "2.9", "2.8", "2.7", "2.6", "2.5"]
+    runs-on:
+      group: aws-general-8-plus
+    steps:
+      -
+        name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@885d1462b80bc1c1c7f0b00334ad271f09369c55 # v2.10.0
+      -
+        name: Check out code
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+      -
+        id: get-base-image
+        name: Get Base Image
+        env:
+          framework_version: ${{ matrix.version }}
+        run: |
+          echo "base_image=$(python3 -c 'import os; from utils.past_ci_versions import past_versions_testing; base_image = past_versions_testing["tensorflow"][os.environ["framework_version"]]["base_image"]; print(base_image)')" >> $GITHUB_OUTPUT
+      -
+        name: Print Base Image
+        run: |
+          echo ${{ steps.get-base-image.outputs.base_image }}
+      -
+        name: Login to DockerHub
+        uses: docker/login-action@465a07811f14bebb1938fbed4728c6a1ff8901fc # v2.2.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_PASSWORD }}
+      -
+        name: Build and push
+        uses: docker/build-push-action@1104d471370f9806843c095c1db02b5a90c5f8b6 # v3.3.1
+        with:
+          context: ./docker/transformers-past-gpu
+          build-args: |
+            REF=main
+            BASE_DOCKER_IMAGE=${{ steps.get-base-image.outputs.base_image }}
+            FRAMEWORK=tensorflow
+            VERSION=${{ matrix.version }}
+          push: true
+          tags: huggingface/transformers-tensorflow-past-${{ matrix.version }}-gpu
--- a/.github/workflows/build_documentation.yml
+++ b/.github/workflows/build_documentation.yml
@@ -0,0 +1,38 @@
+name: Build documentation
+
+on:
+  workflow_dispatch:
+  push:
+    branches:
+      - main
+      - doc-builder*
+      - v*-release
+      - use_templates
+
+permissions:
+  contents: read
+
+jobs:
+   build:
+    uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@2430c1ec91d04667414e2fa31ecfc36c153ea391  # main
+    with:
+      commit_sha: ${{ github.sha }}
+      package: transformers
+      notebook_folder: transformers_doc
+      languages: en
+      custom_container: huggingface/transformers-doc-builder
+    secrets:
+      token: ${{ secrets.HUGGINGFACE_PUSH }}
+      hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
+
+   build_other_lang:
+    uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@2430c1ec91d04667414e2fa31ecfc36c153ea391  # main
+    with:
+      commit_sha: ${{ github.sha }}
+      package: transformers
+      notebook_folder: transformers_doc
+      languages: ar de es fr hi it ja ko pt ro tr zh
+      custom_container: huggingface/transformers-doc-builder
+    secrets:
+      token: ${{ secrets.HUGGINGFACE_PUSH }}
+      hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
--- a/.github/workflows/build_pr_documentation.yml
+++ b/.github/workflows/build_pr_documentation.yml
@@ -0,0 +1,42 @@
+name: Build PR Documentation
+
+on:
+  pull_request:
+  merge_group:
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
+  cancel-in-progress: true
+
+permissions:
+  contents: read
+
+jobs:
+  build:
+    if: github.event_name == 'pull_request'
+    uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@90b4ee2c10b81b5c1a6367c4e6fc9e2fb510a7e3  # main
+    with:
+      commit_sha: ${{ github.event.pull_request.head.sha }}
+      pr_number: ${{ github.event.number }}
+      package: transformers
+      languages: en
+
+  # Satisfy required check in merge queue without actually building docs
+  skip_merge_queue:
+    if: github.event_name == 'merge_group'
+    runs-on: ubuntu-latest
+    steps:
+      - run: echo "Skipping doc build in merge queue"
+
+  doc_build_status_check:
+    needs: [build, skip_merge_queue]
+    if: always()
+    runs-on: ubuntu-latest
+    steps:
+      - run: |
+          if [[ "${{ needs.build.result }}" == "success" || "${{ needs.build.result }}" == "skipped" ]] && \
+             [[ "${{ needs.skip_merge_queue.result }}" == "success" || "${{ needs.skip_merge_queue.result }}" == "skipped" ]]; then
+            echo "OK"
+          else
+            exit 1
+          fi
--- a/.github/workflows/check-workflow-permissions.yml
+++ b/.github/workflows/check-workflow-permissions.yml
@@ -0,0 +1,26 @@
+---
+name: Check Permissions Advisor
+
+on:
+  workflow_dispatch:
+    inputs:
+      workflow_name:
+        description: 'Workflow file name'
+        type: string
+      run_count:
+        description: 'Number of runs to analyze'
+        type: string
+        default: "10"
+
+permissions:
+  contents: read
+
+jobs:
+  advisor:
+    uses: huggingface/security-workflows/.github/workflows/permissions-advisor-reusable.yml@1b6a139c28db347498b30338da6a602e0a06f56c  # main
+    permissions:
+      actions: read
+      contents: read
+    with:
+      workflow_name: ${{ inputs.workflow_name }}
+      run_count: ${{ fromJSON(inputs.run_count) }}
--- a/.github/workflows/check_failed_tests.yml
+++ b/.github/workflows/check_failed_tests.yml
@@ -0,0 +1,414 @@
+name: Process failed tests
+
+on:
+  workflow_call:
+    inputs:
+      docker:
+        required: true
+        type: string
+      job:
+        required: true
+        type: string
+      slack_report_channel:
+        required: true
+        type: string
+      ci_event:
+        required: true
+        type: string
+      report_repo_id:
+        required: true
+        type: string
+      commit_sha:
+        required: false
+        type: string
+      pr_number:
+        required: false
+        type: string
+      max_num_runners:
+        required: false
+        type: number
+        default: 4
+    outputs:
+      is_check_failures_ok:
+        description: "Whether the failure checking infrastructure succeeded"
+        value: ${{ jobs.check_new_failures.result != 'failure' && jobs.process_new_failures_with_commit_info.result != 'failure' }}
+
+env:
+  HF_HOME: /mnt/cache
+  TRANSFORMERS_IS_CI: yes
+  OMP_NUM_THREADS: 8
+  MKL_NUM_THREADS: 8
+  RUN_SLOW: yes
+  # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
+  # This token is created under the bot `hf-transformers-bot`.
+  HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
+  TF_FORCE_GPU_ALLOW_GROWTH: true
+  CUDA_VISIBLE_DEVICES: 0,1
+
+
+permissions:
+  contents: read
+
+jobs:
+  setup_check_new_failures:
+    name: "Setup matrix for finding commits"
+    runs-on: ubuntu-22.04
+    outputs:
+      matrix: ${{ steps.set-matrix.outputs.matrix }}
+      n_runners: ${{ steps.set-matrix.outputs.n_runners }}
+      process: ${{ steps.set-matrix.outputs.process }}
+    steps:
+      - uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
+        continue-on-error: true
+        with:
+          name: ci_results_${{ inputs.job }}
+          path: ci_results_${{ inputs.job }}
+
+      - name: Set matrix
+        id: set-matrix
+        env:
+          job: ${{ inputs.job }}
+          max_num_runners: ${{ inputs.max_num_runners }}
+        run: |
+          python3 - << 'EOF'
+          import json, os, math
+
+          print("Script started")
+
+          job = os.environ["job"]
+          filepath = f"ci_results_{job}/new_failures.json"
+
+          print(f"Looking for file: {filepath}")
+          print(f"File exists: {os.path.isfile(filepath)}")
+
+          if not os.path.isfile(filepath):
+              print("File not found, setting process=false")
+              with open(os.environ["GITHUB_OUTPUT"], "a") as f:
+                  f.write("process=false\n")
+              exit(0)
+
+          with open(filepath) as f:
+              reports = json.load(f)
+
+          print(f"Loaded reports with {len(reports)} models")
+
+          n_tests = sum(
+              len(model_data.get("failures", model_data).get("single-gpu", []))
+              for model_data in reports.values()
+          )
+
+          print(f"n_tests: {n_tests}")
+
+          max_num_runners = int(os.environ["max_num_runners"])
+
+          TESTS_PER_RUNNER = 10
+          n_runners = max(1, min(max_num_runners, math.ceil(n_tests / TESTS_PER_RUNNER)))
+
+          print(f"n_runners: {n_runners}")
+
+          with open(os.environ["GITHUB_OUTPUT"], "a") as f:
+              f.write(f"matrix={json.dumps(list(range(n_runners)))}\n")
+              f.write(f"n_runners={n_runners}\n")
+              f.write("process=true\n")
+
+          print("Done")
+          EOF
+
+
+  check_new_failures:
+    name: "Find commits for new failing tests"
+    needs: setup_check_new_failures
+    if: needs.setup_check_new_failures.outputs.process == 'true'
+    strategy:
+      matrix:
+        run_idx: ${{ fromJson(needs.setup_check_new_failures.outputs.matrix) }}
+    runs-on:
+      group: aws-g5-4xlarge-cache
+    outputs:
+      process: ${{ needs.setup_check_new_failures.outputs.process }}
+    container:
+      image: ${{ inputs.docker }}
+      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
+    steps:
+      - uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
+        with:
+          name: ci_results_${{ inputs.job }}
+          path: /transformers/ci_results_${{ inputs.job }}
+
+      - uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
+        env:
+          ACTIONS_ARTIFACT_MAX_ARTIFACT_COUNT: 2000
+        with:
+          pattern: setup_values*
+          path: setup_values
+          merge-multiple: true
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Prepare some setup values
+        run: |
+          if [ -f setup_values/prev_workflow_run_id.txt ]; then
+            echo "PREV_WORKFLOW_RUN_ID=$(cat setup_values/prev_workflow_run_id.txt)" >> $GITHUB_ENV
+          else
+            echo "PREV_WORKFLOW_RUN_ID=" >> $GITHUB_ENV
+          fi
+
+      - name: Update clone
+        working-directory: /transformers
+        env:
+          commit_sha: ${{ inputs.commit_sha || github.sha }}
+        run: |
+          git fetch origin "$commit_sha" && git checkout "$commit_sha"
+
+      - name: Get `START_SHA`
+        working-directory: /transformers/utils
+        env:
+          commit_sha: ${{ inputs.commit_sha || github.sha }}
+        run: |
+          echo "START_SHA=$commit_sha" >> $GITHUB_ENV
+
+      # This is used if the CI is triggered from a pull request `self-comment-ci.yml` (after security check is verified)
+      - name: Extract the base commit on `main` (of the merge commit created by Github) if it is a PR
+        id: pr_info
+        if: ${{ inputs.pr_number != '' }}
+        uses: actions/github-script@d7906e4ad0b1822421a7e6a35d5ca353c962f410 # v6.4.1
+        env:
+          PR_NUMBER: ${{ inputs.pr_number }}
+          COMMIT_SHA: ${{ inputs.commit_sha }}
+        with:
+          script: |
+            const pull_number = parseInt(process.env.PR_NUMBER, 10);
+            const commit_sha = process.env.COMMIT_SHA;
+
+            const { data: pr } = await github.rest.pulls.get({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              pull_number,
+            });
+
+            const { data: merge_commit } = await github.rest.repos.getCommit({
+              owner: pr.base.repo.owner.login,
+              repo: pr.base.repo.name,
+              ref: commit_sha,
+            });
+
+            core.setOutput('merge_commit_base_sha', merge_commit.parents[0].sha);
+
+      # Usually, `END_SHA` should be the commit of the last previous workflow run of the **SAME** (scheduled) workflow.
+      # (This is why we don't need to specify `workflow_id` which would be fetched automatically in the python script.)
+      - name: Get `END_SHA` from previous CI runs of the same workflow
+        working-directory: /transformers/utils
+        if: ${{ inputs.pr_number == '' }}
+        env:
+          ACCESS_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
+        run: |
+          echo "END_SHA=$(TOKEN="$ACCESS_TOKEN" python3 -c 'import os; from get_previous_daily_ci import get_last_daily_ci_run_commit; commit=get_last_daily_ci_run_commit(token=os.environ["TOKEN"], workflow_run_id=os.environ["PREV_WORKFLOW_RUN_ID"]); print(commit)')" >> $GITHUB_ENV
+
+      # However, for workflow runs triggered by `issue_comment` (for pull requests), we want to check against the
+      # parent commit (on `main`) of the `merge_commit` (dynamically created by GitHub). In this case, the goal is to
+      # see if a reported failing test is actually ONLY failing on the `merge_commit`.
+      - name: Set `END_SHA`
+        if: ${{ inputs.pr_number != '' }}
+        env:
+          merge_commit_base_sha: ${{ steps.pr_info.outputs.merge_commit_base_sha }}
+        run: |
+          echo "END_SHA=$merge_commit_base_sha" >> $GITHUB_ENV
+
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
+      - name: NVIDIA-SMI
+        run: |
+          nvidia-smi
+
+      - name: Environment
+        working-directory: /transformers
+        run: |
+          python3 utils/print_env.py
+
+      - name: Install pytest-flakefinder
+        run: python3 -m pip install pytest-flakefinder
+
+      - name: Show installed libraries and their versions
+        working-directory: /transformers
+        run: pip freeze
+
+      - name: Check failed tests
+        working-directory: /transformers
+        env:
+          job: ${{ inputs.job }}
+          n_runners: ${{ needs.setup_check_new_failures.outputs.n_runners }}
+          run_idx: ${{ matrix.run_idx }}
+          pr_number: ${{ inputs.pr_number }}
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: python3 utils/check_bad_commit.py --start_commit "$START_SHA" --end_commit "$END_SHA" --file "ci_results_${job}/new_failures.json" --output_file "new_failures_with_bad_commit_${job}_${run_idx}.json"
+
+      - name: Show results
+        working-directory: /transformers
+        env:
+          job: ${{ inputs.job }}
+          run_idx: ${{ matrix.run_idx }}
+        run: |
+          ls -l "new_failures_with_bad_commit_${job}_${run_idx}.json"
+          cat "new_failures_with_bad_commit_${job}_${run_idx}.json"
+
+      - name: Upload artifacts
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: new_failures_with_bad_commit_${{ inputs.job }}_${{ matrix.run_idx }}
+          path: /transformers/new_failures_with_bad_commit_${{ inputs.job }}_${{ matrix.run_idx }}.json
+
+  process_new_failures_with_commit_info:
+    name: "process bad commit reports"
+    needs: check_new_failures
+    if: needs.check_new_failures.outputs.process == 'true'
+    runs-on:
+      group: aws-g5-4xlarge-cache
+    container:
+      image: ${{ inputs.docker }}
+      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
+    steps:
+      - uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
+        with:
+          name: ci_results_${{ inputs.job }}
+          path: /transformers/ci_results_${{ inputs.job }}
+
+      - uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
+        env:
+          ACTIONS_ARTIFACT_MAX_ARTIFACT_COUNT: 2000
+        with:
+          pattern: new_failures_with_bad_commit_${{ inputs.job }}*
+          path: /transformers/new_failures_with_bad_commit_${{ inputs.job }}
+          merge-multiple: true
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Check files
+        working-directory: /transformers
+        env:
+          job: ${{ inputs.job }}
+        run: |
+          ls -la /transformers
+          ls -la "/transformers/new_failures_with_bad_commit_${job}"
+
+      # Currently, we only run with a single runner by using `run_idx: [1]`. We might try to run with multiple runners
+      # to further reduce the false positive caused by flaky tests, which requires further processing to merge reports.
+      - name: Merge files
+        shell: bash
+        working-directory: /transformers
+        env:
+          job: ${{ inputs.job }}
+        run: |
+          python3 - << 'EOF'
+          import json
+          import glob
+          import os
+
+          job = os.environ["job"]
+          pattern = f"/transformers/new_failures_with_bad_commit_{job}/new_failures_with_bad_commit_{job}_*.json"
+          files = sorted(glob.glob(pattern))
+
+          if not files:
+              print(f"No files found matching: {pattern}")
+              exit(1)
+
+          print(f"Found {len(files)} file(s) to merge: {files}")
+
+          merged = {}
+          for filepath in files:
+              with open(filepath) as f:
+                  data = json.load(f)
+
+              for model, model_results in data.items():
+                  if model not in merged:
+                      merged[model] = {}
+                  for gpu_type, failures in model_results.items():
+                      if gpu_type not in merged[model]:
+                          merged[model][gpu_type] = []
+                      merged[model][gpu_type].extend(failures)
+
+              print(f"filepath: {filepath}")
+              print(len(data))
+
+          output_path = "/transformers/new_failures_with_bad_commit.json"
+          with open(output_path, "w") as f:
+              json.dump(merged, f, indent=4)
+
+          print(f"Merged {len(files)} file(s) into {output_path}")
+          print(f"n_items: {len(merged)}")
+          print(merged)
+          EOF
+
+      - name: Update clone
+        working-directory: /transformers
+        env:
+          commit_sha: ${{ inputs.commit_sha || github.sha }}
+        run: |
+          git fetch origin "$commit_sha" && git checkout "$commit_sha"
+
+      - name: Process report
+        shell: bash
+        working-directory: /transformers
+        env:
+          ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
+          TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
+          JOB_NAME: ${{ inputs.job }}
+          REPORT_REPO_ID: ${{ inputs.report_repo_id }}
+        run: |
+          {
+            echo 'REPORT_TEXT<<EOF'
+            python3 utils/process_bad_commit_report.py
+            echo EOF
+          } >> "$GITHUB_ENV"
+
+      - name: Show results
+        working-directory: /transformers
+        run: |
+          ls -l new_failures_with_bad_commit.json
+          cat new_failures_with_bad_commit.json
+
+      - name: Upload artifacts
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: new_failures_with_bad_commit_${{ inputs.job }}
+          path: |
+            /transformers/new_failures_with_bad_commit.json
+            /transformers/new_failures_with_bad_commit_url.txt
+
+      - name: Prepare Slack report title
+        working-directory: /transformers
+        env:
+          ci_event: ${{ inputs.ci_event }}
+          job: ${{ inputs.job }}
+        run: |
+          pip install slack_sdk
+          echo "title=$(python3 -c 'import sys; import os; sys.path.append("utils"); from utils.notification_service import job_to_test_map; ci_event = os.environ["ci_event"]; job = os.environ["job"]; test_name = job_to_test_map[job]; title = f"New failed tests of {ci_event}" + ":" + f" {test_name}"; print(title)')" >> $GITHUB_ENV
+
+      - name: Send processed report
+        if: ${{ !endsWith(env.REPORT_TEXT, '{}') }}
+        uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
+        with:
+          # Slack channel id, channel name, or user id to post message.
+          # See also: https://api.slack.com/methods/chat.postMessage#channels
+          channel-id: '#${{ inputs.slack_report_channel }}'
+          # For posting a rich message using Block Kit
+          payload: |
+            {
+              "blocks": [
+                {
+                  "type": "header",
+                  "text": {
+                    "type": "plain_text",
+                    "text": "${{ env.title }}"
+                  }
+                },
+                {
+                  "type": "section",
+                  "text": {
+                    "type": "mrkdwn",
+                    "text": "${{ env.REPORT_TEXT }}"
+                  }
+                }
+              ]
+            }
+        env:
+          SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
--- a/.github/workflows/check_tiny_models.yml
+++ b/.github/workflows/check_tiny_models.yml
@@ -0,0 +1,63 @@
+name: Check Tiny Models
+
+on:
+  push:
+    branches:
+      - check_tiny_models*
+  repository_dispatch:
+  schedule:
+    - cron: "0 2 * * *"
+
+env:
+  TOKEN: ${{ secrets.TRANSFORMERS_HUB_BOT_HF_TOKEN }}
+  HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
+
+permissions:
+  contents: read
+
+jobs:
+  check_tiny_models:
+    name: Check tiny models
+    runs-on: ubuntu-22.04
+    steps:
+      - name: Checkout transformers
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          fetch-depth: 2
+          persist-credentials: false
+
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+      - name: Set up Python 3.10
+        uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
+        with:
+          # Semantic version range syntax or exact version of a Python version
+          python-version: '3.10'
+          # Optional - x64 or x86 architecture, defaults to x64
+          architecture: 'x64'
+
+      - name: Install
+        run: |
+          sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng cmake
+          pip install --upgrade pip
+          python -m pip install -U .[dev]
+          # For `FastSpeech2ConformerConfig` and `FastSpeech2ConformerWithHifiGanConfig`
+          python -m pip install g2p-en
+          # For `Pop2PianoConfig`
+          python -m pip install essentia==2.1b6.dev1034
+          # For `mamba` type models to avoid the error `self._specs = frozenset(specifiers) #TypeError: 'int' object is not iterable`
+          python -m pip install -U "kernels>=0.12.2" "einops"
+          # For `Pop2PianoProcessor` (otherwise multiprocess error with `Placeholder` unpickable issue)
+          python -m pip install -U "pretty_midi"
+
+      - name: Create all tiny models (locally)
+        run: |
+          python utils/create_dummy_models.py tiny_local_models --all --num_workers 4
+
+      - name: Local tiny model reports artifacts
+        if: ${{ always() }}
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: tiny_local_model_creation_reports
+          path: tiny_local_models/reports
--- a/.github/workflows/circleci-failure-summary-comment.yml
+++ b/.github/workflows/circleci-failure-summary-comment.yml
@@ -0,0 +1,263 @@
+name: CircleCI Failure Summary Comment
+
+on:
+  pull_request_target:
+    types: [opened, synchronize, reopened]
+
+permissions:
+  contents: read
+
+jobs:
+  comment:
+    runs-on: ubuntu-22.04
+    permissions:
+      pull-requests: write
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          persist-credentials: false
+
+      - name: Setup Python
+        uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065  # v5.6.0
+        with:
+          python-version: "3.13"
+
+      - name: Install dependencies
+        run: python -m pip install huggingface_hub
+
+      - name: Wait for CircleCI check suite completion
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          COMMIT_SHA: ${{ github.event.pull_request.head.sha }}
+          GITHUB_REPOSITORY: ${{ github.repository }}
+        run: |
+          # Exit on error, undefined variables, or pipe failures
+          set -euo pipefail
+          
+          echo "Waiting for CircleCI check suite to complete..."
+          # Timeout after 30 minutes (1800 seconds)
+          end=$((SECONDS + 1800))
+          
+          while [ $SECONDS -lt $end ]; do
+            # Query GitHub API for check suites associated with this commit
+            # || echo "" allows retry on transient API failures instead of exiting
+            suite_json=$(gh api "repos/${GITHUB_REPOSITORY}/commits/${COMMIT_SHA}/check-suites" \
+              --jq '.check_suites[] | select(.app.slug == "circleci-checks")' || echo "")
+            
+            if [ -z "$suite_json" ]; then
+              echo "CircleCI check suite not found yet, retrying..."
+            else
+              status=$(echo "$suite_json" | jq -r '.status')
+              conclusion=$(echo "$suite_json" | jq -r '.conclusion // empty')
+              echo "CircleCI status: $status, conclusion: $conclusion"
+              
+              # Check suite is done when status is "completed" AND conclusion is set
+              if [ "$status" = "completed" ] && [ -n "$conclusion" ]; then
+                echo "Check suite completed successfully"
+                exit 0
+              fi
+            fi
+            
+            # Poll every 20 seconds
+            sleep 20
+          done
+    
+          echo "ERROR: Timed out waiting for CircleCI check suite"
+          exit 1
+
+      - name: Get CircleCI run's artifacts and upload them to Hub
+        id: circleci
+        env:
+          COMMIT_SHA: ${{ github.event.pull_request.head.sha }}
+          REPO: ${{ github.repository }}
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          # Step 1: Get CircleCI check suite ID
+          echo "Getting check suites for commit ${COMMIT_SHA}..."
+          check_suites=$(curl -s -H "Authorization: token ${GITHUB_TOKEN}" \
+            "https://api.github.com/repos/${REPO}/commits/${COMMIT_SHA}/check-suites")
+          
+          circleci_suite_id=$(echo "$check_suites" | jq -r '.check_suites[] | select(.app.slug == "circleci-checks") | .id' | head -n 1)
+          echo "CircleCI check suite ID: ${circleci_suite_id}"
+          
+          # Step 2: Get check runs from the CircleCI suite
+          echo "Getting check runs for suite ${circleci_suite_id}..."
+          check_runs=$(curl -s -H "Authorization: token ${GITHUB_TOKEN}" \
+            "https://api.github.com/repos/${REPO}/check-suites/${circleci_suite_id}/check-runs")
+          
+          # Step 3: Extract workflow ID from the "run_tests" check run
+          workflow_id=$(echo "$check_runs" | jq -r '.check_runs[] | select(.name == "run_tests") | .details_url' | grep -oP 'workflows/\K[a-f0-9-]+')
+          echo "CircleCI Workflow ID: ${workflow_id}"
+          
+          # Step 4: Get all jobs in the workflow
+          echo "Getting jobs for workflow ${workflow_id}..."
+          jobs=$(curl -s \
+            "https://circleci.com/api/v2/workflow/${workflow_id}/job")
+          
+          # Step 5: Extract collection_job details
+          # "first // empty": if collection_job is absent or has job_number=null, jq outputs nothing
+          # (empty string). Without this, jq -r would output the literal string "null", making
+          # [ -z "$collection_job_number" ] false and bypassing the early-exit check below.
+          collection_job_number=$(echo "$jobs" | jq -r '[.items[] | select(.name == "collection_job") | .job_number] | first // empty')
+          collection_job_id=$(echo "$jobs" | jq -r '[.items[] | select(.name == "collection_job") | .id] | first // empty')
+          echo "CircleCI Collection job number: ${collection_job_number}"
+          echo "CircleCI Collection job ID: ${collection_job_id}"
+
+          # When only the "empty" job ran (no tests selected), there is no collection_job.
+          # Exit gracefully to avoid curl hitting a broken URL.
+          if [ -z "$collection_job_number" ]; then
+            echo "No collection_job found (only empty job ran - no tests were selected). Skipping."
+            echo "artifact_found=false" >> $GITHUB_OUTPUT
+            exit 0
+          fi
+          
+          # Step 6: Get artifacts list
+          echo "Getting artifacts for job ${collection_job_number}..."
+          artifacts=$(curl -s \
+            "https://circleci.com/api/v2/project/gh/${REPO}/${collection_job_number}/artifacts")
+          
+          # Print for debugging; "|| true" prevents failure if the response is not valid JSON.
+          echo "$artifacts" | jq '.' || true
+          
+          # Step 7: Download failure_summary.json specifically
+          # .items // []  : use empty array if .items is null (avoids "Cannot iterate over null")
+          # .[]           : iterate over each artifact object in the array
+          # select(...)   : keep only the artifact whose .path matches
+          # | .url        : extract the download URL from the matched artifact
+          # first // empty: take the first match; outputs nothing (not "null") if no match found
+          failure_summary_url=$(echo "$artifacts" | jq -r '[.items // [] | .[] | select(.path == "outputs/failure_summary.json") | .url] | first // empty')
+          
+          if [ -z "$failure_summary_url" ]; then
+            echo "failure_summary.json not found in artifacts - PR may not have latest main merged. Skipping."
+            echo "artifact_found=false" >> $GITHUB_OUTPUT
+            exit 0
+          fi
+          
+          echo "Downloading failure_summary.json from: ${failure_summary_url}"
+          mkdir -p outputs
+          curl -s -L "${failure_summary_url}" -o outputs/failure_summary.json
+          ls -la outputs
+          
+          echo "Downloaded failure_summary.json successfully"
+          
+          # Verify the file was downloaded
+          if [ ! -f outputs/failure_summary.json ]; then
+            echo "Failed to download failure_summary.json - skipping."
+            echo "artifact_found=false" >> $GITHUB_OUTPUT
+            exit 0
+          fi
+          
+          echo "File size: $(wc -c < outputs/failure_summary.json) bytes"
+          
+          # Export variables for next steps
+          echo "artifact_found=true" >> $GITHUB_OUTPUT
+          echo "workflow_id=${workflow_id}" >> $GITHUB_OUTPUT
+          echo "collection_job_number=${collection_job_number}" >> $GITHUB_OUTPUT
+
+      - name: Upload summaries to Hub
+        if: steps.circleci.outputs.artifact_found == 'true'
+        env:
+          HF_TOKEN: ${{ secrets.HF_CI_WRITE_TOKEN }}
+          CIRCLECI_RESULTS_DATASET_ID: "transformers-community/circleci-test-results"
+          PR_NUMBER: ${{ github.event.pull_request.number }}
+          COMMIT_SHA: ${{ github.event.pull_request.head.sha }}
+        run: |
+          python << 'EOF'
+          import os
+          from pathlib import Path
+          from huggingface_hub import HfApi
+          
+          # Setup paths
+          pr_number = os.environ["PR_NUMBER"]
+          commit_short = os.environ["COMMIT_SHA"][:12]
+          folder_path = f"pr-{pr_number}/sha-{commit_short}"
+          
+          # Create folder and move file
+          Path(folder_path).mkdir(parents=True, exist_ok=True)
+          Path("outputs/failure_summary.json").rename(f"{folder_path}/failure_summary.json")
+          
+          # Upload to Hub
+          dataset_id = os.environ["CIRCLECI_RESULTS_DATASET_ID"]
+          api = HfApi(token=os.environ["HF_TOKEN"])
+          api.upload_folder(
+              commit_message=f"Update CircleCI artifacts for PR {pr_number} ({commit_short})",
+              folder_path=folder_path,
+              path_in_repo=folder_path,
+              repo_id=dataset_id,
+              repo_type="dataset",
+          )
+          
+          print(f"Uploaded {folder_path} to {dataset_id}")
+          EOF
+
+      - name: Delete existing CircleCI summary comments
+        if: steps.circleci.outputs.artifact_found == 'true'
+        env:
+          PR_NUMBER: ${{ github.event.pull_request.number }}
+        uses: actions/github-script@f28e40c7f34bde8b3046d885e986cb6290c5673b  # v7.1.0
+        with:
+          script: |
+            const PR_NUMBER = parseInt(process.env.PR_NUMBER, 10);
+            
+            // Get all comments on the PR
+            const { data: comments } = await github.rest.issues.listComments({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              issue_number: PR_NUMBER
+            });
+            
+            // Find existing bot comments that start with "View the CircleCI Test Summary for this PR:"
+            const existingComments = comments.filter(comment => 
+              comment.user.login === 'github-actions[bot]' && 
+              comment.body.startsWith('View the CircleCI Test Summary for this PR:')
+            );
+
+            // Delete all matching comments
+            for (const comment of existingComments) {
+              console.log(`Deleting comment #${comment.id}`);
+              await github.rest.issues.deleteComment({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                comment_id: comment.id
+              });
+            }
+            
+            console.log(`Deleted ${existingComments.length} old CircleCI summary comment(s)`);
+
+      - name: Post comment with helper link
+        if: steps.circleci.outputs.artifact_found == 'true'
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GITHUB_REPOSITORY: ${{ github.repository }}
+          PR_NUMBER: ${{ github.event.pull_request.number }}
+          PR_SHA: ${{ github.event.pull_request.head.sha }}
+        run: |
+          COMMIT_SHORT="${PR_SHA:0:12}"
+          SUMMARY_FILE="pr-${PR_NUMBER}/sha-${COMMIT_SHORT}/failure_summary.json"
+          
+          if [ ! -f "$SUMMARY_FILE" ]; then
+            echo "failure_summary.json missing, skipping comment."
+            exit 0
+          fi
+          
+          failures=$(jq '.failures | length' "$SUMMARY_FILE")
+          if [ "$failures" -eq 0 ]; then
+            echo "No failures detected, skipping PR comment."
+            exit 0
+          fi
+          
+          # Build Space URL with encoded parameters
+          repo_enc=$(jq -rn --arg v "$GITHUB_REPOSITORY" '$v|@uri')
+          pr_enc=$(jq -rn --arg v "$PR_NUMBER" '$v|@uri')
+          sha_short="${PR_SHA:0:6}"
+          sha_enc=$(jq -rn --arg v "$sha_short" '$v|@uri')
+          SPACE_URL="https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=${pr_enc}&sha=${sha_enc}"
+
+          # Post comment (using printf for proper newlines)
+          gh api \
+            --method POST \
+            -H "Accept: application/vnd.github+json" \
+            -H "X-GitHub-Api-Version: 2022-11-28" \
+            "repos/${GITHUB_REPOSITORY}/issues/${PR_NUMBER}/comments" \
+            -f body="$(printf "View the CircleCI Test Summary for this PR:\n\n%s" "$SPACE_URL")"
--- a/.github/workflows/codeql.yml
+++ b/.github/workflows/codeql.yml
@@ -0,0 +1,26 @@
+---
+name: CodeQL Security Analysis
+
+on:
+  push:
+    branches: ["main", "fix_security_issue_*"]
+  # pull_request:
+  #   branches: ["main"]
+  workflow_dispatch:
+
+permissions:
+  contents: read
+
+jobs:
+  codeql:
+    name: CodeQL Analysis
+    uses: huggingface/security-workflows/.github/workflows/codeql-reusable.yml@1b6a139c28db347498b30338da6a602e0a06f56c  # main
+    permissions:
+      security-events: write
+      packages: read
+      actions: read
+      contents: read
+    with:
+      languages: '["actions"]'
+      queries: 'security-extended,security-and-quality'
+      runner: 'ubuntu-latest'
--- a/.github/workflows/collated-reports.yml
+++ b/.github/workflows/collated-reports.yml
@@ -0,0 +1,52 @@
+name: CI collated reports
+
+on:
+  workflow_call:
+    inputs:
+      job:
+        required: true
+        type: string
+      report_repo_id:
+        required: true
+        type: string
+      machine_type:
+        required: true
+        type: string
+      gpu_name:
+        description: Name of the GPU used for the job. Its enough that the value contains the name of the GPU, e.g. "noise-h100-more-noise". Case insensitive.
+        required: true
+        type: string
+
+permissions:
+  contents: read
+
+jobs:
+  collated_reports:
+    name: Collated reports
+    runs-on: ubuntu-22.04
+    if: always()
+    steps:
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+      - uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
+
+      - name: Collated reports
+        shell: bash
+        env:
+          ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
+          CI_SHA: ${{ github.sha }}
+          TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
+          MACHINE_TYPE: ${{ inputs.machine_type }}
+          JOB: ${{ inputs.job }}
+          REPORT_REPO_ID: ${{ inputs.report_repo_id }}
+          GPU_NAME: ${{ inputs.gpu_name }}
+        run: |
+          pip install huggingface_hub
+          python3 utils/collated_reports.py \
+            --path . \
+            --machine-type "$MACHINE_TYPE" \
+            --commit-hash "$CI_SHA" \
+            --job "$JOB" \
+            --report-repo-id "$REPORT_REPO_ID" \
+            --gpu-name "$GPU_NAME"
--- a/.github/workflows/doctest_job.yml
+++ b/.github/workflows/doctest_job.yml
@@ -0,0 +1,87 @@
+name: Doctest job
+
+on:
+  workflow_call:
+    inputs:
+      job_splits:
+        required: true
+        type: string
+      split_keys:
+        required: true
+        type: string
+
+env:
+  HF_HOME: /mnt/cache
+  TRANSFORMERS_IS_CI: yes
+  RUN_SLOW: yes
+  OMP_NUM_THREADS: 16
+  MKL_NUM_THREADS: 16
+  TF_FORCE_GPU_ALLOW_GROWTH: true
+
+permissions:
+  contents: read
+
+jobs:
+  run_doctests:
+    name: " "
+    strategy:
+      max-parallel: 8  # 8 jobs at a time
+      fail-fast: false
+      matrix:
+        split_keys: ${{ fromJson(inputs.split_keys) }}
+    runs-on: 
+      group: aws-g5-4xlarge-cache
+    container:
+      image: huggingface/transformers-all-latest-gpu
+      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
+    steps:
+      - name: Update clone
+        working-directory: /transformers
+        run: git fetch && git checkout ${{ github.sha }}
+
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .[flax]
+
+      - name: GPU visibility
+        working-directory: /transformers
+        run: |
+          python3 utils/print_env.py
+
+      - name: Show installed libraries and their versions
+        run: pip freeze
+
+      - name: Get doctest files
+        working-directory: /transformers
+        run: |
+          echo "${{ toJson(fromJson(inputs.job_splits)[matrix.split_keys]) }}" > doc_tests.txt
+          cat doc_tests.txt
+
+      - name: Set `split_keys`
+        shell: bash
+        run: |
+          echo "${MATRIX_SPLIT_KEYS}"
+          split_keys=${MATRIX_SPLIT_KEYS}
+          split_keys=${split_keys//'/'/'_'}
+          echo "split_keys"
+          echo "split_keys=$split_keys" >> $GITHUB_ENV
+        env:
+          MATRIX_SPLIT_KEYS: ${{ matrix.split_keys }}
+
+      - name: Run doctests
+        working-directory: /transformers
+        run: |
+          cat doc_tests.txt
+          python3 -m pytest -v --make-reports "doc_tests_gpu_${split_keys}" --doctest-modules $(cat doc_tests.txt) -sv --doctest-continue-on-failure --doctest-glob="*.md"
+
+      - name: Failure short reports
+        if: ${{ failure() }}
+        continue-on-error: true
+        run: cat "/transformers/reports/doc_tests_gpu_${split_keys}/failures_short.txt"
+
+      - name: "Test suite reports artifacts: doc_tests_gpu_test_reports_${{ env.split_keys }}"
+        if: ${{ always() }}
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: doc_tests_gpu_test_reports_${{ env.split_keys }}
+          path: /transformers/reports/doc_tests_gpu_${{ env.split_keys }}
--- a/.github/workflows/doctests.yml
+++ b/.github/workflows/doctests.yml
@@ -0,0 +1,94 @@
+name: Doctests
+
+on:
+  push:
+    branches:
+      - run_doctest*
+  repository_dispatch:
+  schedule:
+    - cron: "17 2 * * *"
+
+env:
+  NUM_SLICES: 3
+
+permissions:
+  contents: read
+
+jobs:
+  setup:
+    name: Setup
+    runs-on: 
+      group: aws-g5-4xlarge-cache
+    container:
+      image: huggingface/transformers-all-latest-gpu
+      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
+    outputs:
+      job_splits: ${{ steps.set-matrix.outputs.job_splits }}
+      split_keys: ${{ steps.set-matrix.outputs.split_keys }}
+    steps:
+      - name: Update clone
+        working-directory: /transformers
+        run: |
+          git fetch && git checkout ${{ github.sha }}
+
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
+      - name: Show installed libraries and their versions
+        working-directory: /transformers
+        run: pip freeze
+
+      - name: Check values for matrix
+        working-directory: /transformers
+        run: |
+          python3 utils/split_doctest_jobs.py
+          python3 utils/split_doctest_jobs.py --only_return_keys --num_splits ${{ env.NUM_SLICES }}
+
+      - id: set-matrix
+        working-directory: /transformers
+        name: Set values for matrix
+        run: |
+          echo "job_splits=$(python3 utils/split_doctest_jobs.py)" >> $GITHUB_OUTPUT
+          echo "split_keys=$(python3 utils/split_doctest_jobs.py --only_return_keys --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT
+
+  call_doctest_job:
+    name: "Call doctest jobs"
+    needs: setup
+    strategy:
+      max-parallel: 1  # 1 split at a time (in `doctest_job.yml`, we set `8` to run 8 jobs at the same time)
+      fail-fast: false
+      matrix:
+        split_keys: ${{ fromJson(needs.setup.outputs.split_keys) }}
+    uses: ./.github/workflows/doctest_job.yml
+    with:
+      job_splits: ${{ needs.setup.outputs.job_splits }}
+      split_keys: ${{ toJson(matrix.split_keys) }}
+    secrets: inherit
+
+  send_results:
+    name: Send results to webhook
+    runs-on: ubuntu-22.04
+    if: always()
+    needs: [call_doctest_job]
+    steps:
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+      - uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
+      - name: Send message to Slack
+        env:
+          CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
+          ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
+          # Use `CI_SLACK_CHANNEL_DUMMY_TESTS` when doing experimentation
+          SLACK_REPORT_CHANNEL: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY_DOCS }}
+        run: |
+          pip install slack_sdk
+          python utils/notification_service_doc_tests.py
+
+      - name: "Upload results"
+        if: ${{ always() }}
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: doc_test_results
+          path: doc_test_results
--- a/.github/workflows/extras-smoke-test.yml
+++ b/.github/workflows/extras-smoke-test.yml
@@ -0,0 +1,211 @@
+name: Extras Smoke Test
+
+on:
+  schedule:
+    # Run every night at 3 AM UTC
+    - cron: "0 3 * * *"
+env:
+  SLACK_CHANNEL_ID: '#transformers-gh-ci-central'
+
+permissions:
+  contents: read
+
+jobs:
+  get-python-versions:
+    name: Get supported Python versions
+    runs-on: ubuntu-latest
+    outputs:
+      versions: ${{ steps.extract-versions.outputs.versions }}
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          persist-credentials: false
+
+      - name: Install setuptools
+        run: |
+          python -m pip install --upgrade pip
+          pip install setuptools
+
+      - name: Extract Python versions from setup.py
+        id: extract-versions
+        run: |
+          VERSIONS=$(python utils/extract_metadata.py python-versions)
+          echo "Supported Python versions: $VERSIONS"
+          echo "versions=$VERSIONS" >> $GITHUB_OUTPUT
+
+  test-extras:
+    name: Test extras on Python ${{ matrix.python-version }}
+    needs: get-python-versions
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ${{ fromJson(needs.get-python-versions.outputs.versions) }}
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          persist-credentials: false
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065  # v5.6.0
+        with:
+          python-version: ${{ matrix.python-version }}
+          allow-prereleases: true
+
+      - name: Install base dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install setuptools
+
+      - name: Extract extras for this Python version
+        id: get-extras
+        run: |
+          python utils/extract_metadata.py extras > extras_list.txt
+          echo "Found $(wc -l < extras_list.txt) extras for Python ${MATRIX_PYTHON_VERSION}"
+          cat extras_list.txt
+        env:
+          MATRIX_PYTHON_VERSION: ${{ matrix.python-version }}
+
+      - name: Install base package
+        run: |
+          echo "Installing base package..."
+          pip install -e .
+
+      - name: Test all extras
+        id: test-extras
+        run: |
+          mkdir -p failure_reports
+          failed=0
+
+          while IFS= read -r extra; do
+              echo "=== Testing extra: $extra on Python ${MATRIX_PYTHON_VERSION} ==="
+              if ! pip install -e .[$extra]; then
+                  echo "❌ Failed to install extra: $extra"
+                  cat > failure_reports/failure-${MATRIX_PYTHON_VERSION}-${extra}.json << EOF
+          {
+            "python_version": "${MATRIX_PYTHON_VERSION}",
+            "extra": "${extra}",
+            "job_url": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
+          }
+          EOF
+                  failed=$((failed + 1))
+              else
+                  echo "✓ Successfully installed extra: $extra"
+              fi
+          done < extras_list.txt
+
+          if [ $failed -gt 0 ]; then
+              echo "❌ $failed extra(s) failed to install"
+              exit 1
+          fi
+        env:
+          MATRIX_PYTHON_VERSION: ${{ matrix.python-version }}
+
+      - name: Verify installation
+        run: |
+          python -c "import transformers; print(f'Transformers version: {transformers.__version__}')"
+          python -c "from transformers import pipeline; print('Successfully imported pipeline')"
+
+      - name: Upload failure report
+        if: always()
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02  # v4.6.2
+        with:
+          name: failure-report-${{ matrix.python-version }}
+          path: failure_reports/
+          retention-days: 1
+          if-no-files-found: ignore
+
+  precheck-slack:
+    name: Check Slack token availability
+    runs-on: ubuntu-latest
+    outputs:
+      has_slack_token: ${{ steps.chk.outputs.has_token }}
+    steps:
+      - id: chk
+        env:
+          SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
+        run: |
+          if [ -n "$SLACK_BOT_TOKEN" ]; then
+            echo "has_token=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "has_token=false" >> "$GITHUB_OUTPUT"
+          fi
+
+  notify-failures:
+    name: Notify failures to Slack
+    needs: [test-extras, precheck-slack]
+    runs-on: ubuntu-latest
+    if: always() && needs.precheck-slack.outputs.has_slack_token == 'true' && needs.test-extras.result != 'success'
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          persist-credentials: false
+
+      - name: Set up Python
+        uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065  # v5.6.0
+        with:
+          python-version: '3.11'
+
+      - name: Download all failure reports
+        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093  # v4.3.0
+        with:
+          pattern: failure-report-*
+          path: failure_reports/
+          merge-multiple: true
+        continue-on-error: true
+
+      - name: Aggregate failures
+        run: |
+          python utils/aggregate_failure_reports.py \
+            --input-dir failure_reports \
+            --output all_failures.json
+
+      - name: Format Slack message
+        env:
+          FAILURES_FILE: all_failures.json
+          WORKFLOW_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
+        run: |
+          python utils/format_extras_slack_message.py \
+            --failures "$FAILURES_FILE" \
+            --workflow-url "$WORKFLOW_URL"
+
+      - name: Send Slack notification
+        if: env.SLACK_MESSAGE != ''
+        uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
+        with:
+          channel-id: ${{ env.SLACK_CHANNEL_ID }}
+          payload: |
+            {
+              "blocks": [
+                {
+                  "type": "header",
+                  "text": {
+                    "type": "plain_text",
+                    "text": "${{ env.SLACK_TITLE }}"
+                  }
+                },
+                {
+                  "type": "section",
+                  "text": {
+                    "type": "mrkdwn",
+                    "text": "${{ env.SLACK_MESSAGE }}"
+                  }
+                },
+                {
+                  "type": "divider"
+                },
+                {
+                  "type": "section",
+                  "text": {
+                    "type": "mrkdwn",
+                    "text": "<${{ env.SLACK_WORKFLOW_URL }}|View workflow run>"
+                  }
+                }
+              ]
+            }
+        env:
+          SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
--- a/.github/workflows/get-pr-info.yml
+++ b/.github/workflows/get-pr-info.yml
@@ -0,0 +1,174 @@
+name: Get PR commit SHA
+on:
+  workflow_call:
+    inputs:
+      pr_number:
+        required: true
+        type: string
+    outputs:
+      PR_HEAD_REPO_FULL_NAME:
+        description: "The full name of the repository from which the pull request is created"
+        value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REPO_FULL_NAME }}
+      PR_BASE_REPO_FULL_NAME:
+        description: "The full name of the repository to which the pull request is created"
+        value: ${{ jobs.get-pr-info.outputs.PR_BASE_REPO_FULL_NAME }}
+      PR_HEAD_REPO_OWNER:
+        description: "The owner of the repository from which the pull request is created"
+        value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}
+      PR_BASE_REPO_OWNER:
+        description: "The owner of the repository to which the pull request is created"
+        value: ${{ jobs.get-pr-info.outputs.PR_BASE_REPO_OWNER }}
+      PR_HEAD_REPO_NAME:
+        description: "The name of the repository from which the pull request is created"
+        value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}
+      PR_BASE_REPO_NAME:
+        description: "The name of the repository to which the pull request is created"
+        value: ${{ jobs.get-pr-info.outputs.PR_BASE_REPO_NAME }}
+      PR_HEAD_REF:
+        description: "The branch name of the pull request in the head repository"
+        value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REF }}
+      PR_BASE_REF:
+        description: "The branch name in the base repository (to merge into)"
+        value: ${{ jobs.get-pr-info.outputs.PR_BASE_REF }}
+      PR_HEAD_SHA:
+        description: "The head sha of the pull request branch in the head repository"
+        value: ${{ jobs.get-pr-info.outputs.PR_HEAD_SHA }}
+      PR_BASE_SHA:
+        description: "The head sha of the target branch in the base repository"
+        value: ${{ jobs.get-pr-info.outputs.PR_BASE_SHA }}
+      PR_MERGE_COMMIT_SHA:
+        description: "The sha of the merge commit for the pull request (created by GitHub) in the base repository"
+        value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_SHA }}
+      PR_MERGE_COMMIT_BASE_SHA:
+        description: "The sha of the parent commit of the merge commit on the target branch in the base repository"
+        value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_BASE_SHA }}
+      PR_HEAD_COMMIT_DATE:
+        description: "The date of the head sha of the pull request branch in the head repository"
+        value: ${{ jobs.get-pr-info.outputs.PR_HEAD_COMMIT_DATE }}
+      PR_MERGE_COMMIT_DATE:
+        description: "The date of the merge commit for the pull request (created by GitHub) in the base repository"
+        value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_DATE }}
+      PR_HEAD_COMMIT_TIMESTAMP:
+        description: "The timestamp of the head sha of the pull request branch in the head repository"
+        value: ${{ jobs.get-pr-info.outputs.PR_HEAD_COMMIT_TIMESTAMP }}
+      PR_MERGE_COMMIT_TIMESTAMP:
+        description: "The timestamp of the merge commit for the pull request (created by GitHub) in the base repository"
+        value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}
+      PR:
+        description: "The PR"
+        value: ${{ jobs.get-pr-info.outputs.PR }}
+      PR_FILES:
+        description: "The files touched in the PR"
+        value: ${{ jobs.get-pr-info.outputs.PR_FILES }}
+
+
+permissions:
+  contents: read
+
+jobs:
+  get-pr-info:
+    runs-on: ubuntu-22.04
+    name: Get PR commit SHA better
+    outputs:
+      PR_HEAD_REPO_FULL_NAME: ${{ steps.pr_info.outputs.head_repo_full_name }}
+      PR_BASE_REPO_FULL_NAME: ${{ steps.pr_info.outputs.base_repo_full_name }}
+      PR_HEAD_REPO_OWNER: ${{ steps.pr_info.outputs.head_repo_owner }}
+      PR_BASE_REPO_OWNER: ${{ steps.pr_info.outputs.base_repo_owner }}
+      PR_HEAD_REPO_NAME: ${{ steps.pr_info.outputs.head_repo_name }}
+      PR_BASE_REPO_NAME: ${{ steps.pr_info.outputs.base_repo_name }}
+      PR_HEAD_REF: ${{ steps.pr_info.outputs.head_ref }}
+      PR_BASE_REF: ${{ steps.pr_info.outputs.base_ref }}
+      PR_HEAD_SHA: ${{ steps.pr_info.outputs.head_sha }}
+      PR_BASE_SHA: ${{ steps.pr_info.outputs.base_sha }}
+      PR_MERGE_COMMIT_BASE_SHA: ${{ steps.pr_info.outputs.merge_commit_base_sha }}
+      PR_MERGE_COMMIT_SHA: ${{ steps.pr_info.outputs.merge_commit_sha }}
+      PR_HEAD_COMMIT_DATE: ${{ steps.pr_info.outputs.head_commit_date }}
+      PR_MERGE_COMMIT_DATE: ${{ steps.pr_info.outputs.merge_commit_date }}
+      PR_HEAD_COMMIT_TIMESTAMP: ${{ steps.get_timestamps.outputs.head_commit_timestamp }}
+      PR_MERGE_COMMIT_TIMESTAMP: ${{ steps.get_timestamps.outputs.merge_commit_timestamp }}
+      PR: ${{ steps.pr_info.outputs.pr }}
+      PR_FILES: ${{ steps.pr_info.outputs.files }}
+    if: ${{ inputs.pr_number != '' }}
+    steps:
+      - name: Extract PR details
+        id: pr_info
+        uses: actions/github-script@d7906e4ad0b1822421a7e6a35d5ca353c962f410 # v6.4.1
+        env:
+          PR_NUMBER: ${{ inputs.pr_number }}
+        with:
+          script: |
+            const pull_number = parseInt(process.env.PR_NUMBER, 10);
+
+            const { data: pr } = await github.rest.pulls.get({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              pull_number,
+            });
+
+            const { data: head_commit } = await github.rest.repos.getCommit({
+              owner: pr.head.repo.owner.login,
+              repo: pr.head.repo.name,
+              ref: pr.head.ref
+            });
+
+            const { data: merge_commit } = await github.rest.repos.getCommit({
+              owner: pr.base.repo.owner.login,
+              repo: pr.base.repo.name,
+              ref: pr.merge_commit_sha,
+            });
+
+            const { data: files } = await github.rest.pulls.listFiles({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              pull_number,
+            });
+
+            core.setOutput('head_repo_full_name', pr.head.repo.full_name);
+            core.setOutput('base_repo_full_name', pr.base.repo.full_name);
+            core.setOutput('head_repo_owner', pr.head.repo.owner.login);
+            core.setOutput('base_repo_owner', pr.base.repo.owner.login);
+            core.setOutput('head_repo_name', pr.head.repo.name);
+            core.setOutput('base_repo_name', pr.base.repo.name);
+            core.setOutput('head_ref', pr.head.ref);
+            core.setOutput('base_ref', pr.base.ref);
+            core.setOutput('head_sha', pr.head.sha);
+            core.setOutput('base_sha', pr.base.sha);
+            core.setOutput('merge_commit_base_sha', merge_commit.parents[0].sha);
+            core.setOutput('merge_commit_sha', pr.merge_commit_sha);
+            core.setOutput('pr', pr);
+
+            core.setOutput('head_commit_date', head_commit.commit.committer.date);
+            core.setOutput('merge_commit_date', merge_commit.commit.committer.date);
+            
+            core.setOutput('files', files);            
+            
+            console.log('PR head commit:', {
+              head_commit: head_commit,
+              commit: head_commit.commit,
+              date: head_commit.commit.committer.date
+            });
+
+            console.log('PR merge commit:', {
+              merge_commit: merge_commit,
+              commit: merge_commit.commit,
+              date: merge_commit.commit.committer.date
+            });
+
+            console.log('PR Info:', {
+              pr_info: pr
+            });
+
+      - name: Convert dates to timestamps
+        id: get_timestamps
+        env:
+          head_commit_date: ${{ steps.pr_info.outputs.head_commit_date }}
+          merge_commit_date: ${{ steps.pr_info.outputs.merge_commit_date }}
+        run: |
+          echo "$head_commit_date"
+          echo "$merge_commit_date"
+          head_commit_timestamp=$(date -d "$head_commit_date" +%s)
+          merge_commit_timestamp=$(date -d "$merge_commit_date" +%s)
+          echo "$head_commit_timestamp"
+          echo "$merge_commit_timestamp"
+          echo "head_commit_timestamp=$head_commit_timestamp" >> $GITHUB_OUTPUT
+          echo "merge_commit_timestamp=$merge_commit_timestamp" >> $GITHUB_OUTPUT
--- a/.github/workflows/get-pr-number.yml
+++ b/.github/workflows/get-pr-number.yml
@@ -0,0 +1,45 @@
+name: Get PR number
+on:
+  workflow_call:
+    outputs:
+      PR_NUMBER:
+        description: "The extracted PR number"
+        value: ${{ jobs.get-pr-number.outputs.PR_NUMBER }}
+
+permissions:
+  contents: read
+
+jobs:
+  get-pr-number:
+    runs-on: ubuntu-22.04
+    name: Get PR number
+    outputs:
+      PR_NUMBER: ${{ steps.set_pr_number.outputs.PR_NUMBER }}
+    steps:
+      - name: Get PR number
+        shell: bash
+        env:
+          issue_number: ${{ github.event.issue.number }}
+          is_pull_request_issue: ${{ github.event.issue.pull_request != null }}
+          pr_number: ${{ github.event.pull_request.number }}
+          is_pull_request: ${{ github.event.pull_request != null }}
+          event_number: ${{ github.event.number }}
+        run: |
+          if [[ "$issue_number" != "" && "$is_pull_request_issue" == "true" ]]; then
+            echo "PR_NUMBER=$issue_number" >> $GITHUB_ENV
+          elif [[ "$pr_number" != "" ]]; then
+            echo "PR_NUMBER=$pr_number" >> $GITHUB_ENV
+          elif [[ "$is_pull_request" == "true" ]]; then
+            echo "PR_NUMBER=$event_number" >> $GITHUB_ENV
+          else
+            echo "PR_NUMBER=" >> $GITHUB_ENV
+          fi
+
+      - name: Check PR number
+        shell: bash
+        run: |
+          echo "$PR_NUMBER"
+
+      - name: Set PR number
+        id: set_pr_number
+        run: echo "PR_NUMBER=$PR_NUMBER" >> "$GITHUB_OUTPUT"
--- a/.github/workflows/model_jobs.yml
+++ b/.github/workflows/model_jobs.yml
@@ -0,0 +1,219 @@
+name: model jobs
+
+on:
+  workflow_call:
+    inputs:
+      folder_slices:
+        required: true
+        type: string
+      machine_type:
+        required: true
+        type: string
+      slice_id:
+        required: true
+        type: number
+      docker:
+        required: true
+        type: string
+      commit_sha:
+        required: false
+        type: string
+      report_name_prefix:
+        required: false
+        default: run_models_gpu
+        type: string
+      runner_type:
+        required: false
+        type: string
+      report_repo_id:
+        required: false
+        type: string
+      pytest_marker:
+        required: false
+        type: string
+
+env:
+  HF_HOME: /mnt/cache
+  TRANSFORMERS_IS_CI: yes
+  OMP_NUM_THREADS: 8
+  MKL_NUM_THREADS: 8
+  RUN_SLOW: yes
+  # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
+  # This token is created under the bot `hf-transformers-bot`.
+  HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
+  TF_FORCE_GPU_ALLOW_GROWTH: true
+  CUDA_VISIBLE_DEVICES: 0,1
+
+permissions:
+  contents: read
+
+jobs:
+  run_models_gpu:
+    name: " "
+    strategy:
+      max-parallel: 8
+      fail-fast: false
+      matrix:
+        folders: ${{ fromJson(inputs.folder_slices)[inputs.slice_id] }}
+    runs-on:
+      group: '${{ inputs.machine_type }}'
+    container:
+      image: ${{ inputs.docker }}
+      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
+    outputs:
+      machine_type: ${{ steps.set_machine_type.outputs.machine_type }}
+    steps:
+      - name: Echo input and matrix info
+        shell: bash
+        env:
+          folder_slices: ${{ inputs.folder_slices }}
+          matrix_folders: ${{ matrix.folders }}
+          slice_data: ${{ toJson(fromJson(inputs.folder_slices)[inputs.slice_id]) }}
+        run: |
+          echo "$folder_slices"
+          echo "$matrix_folders"
+          echo "$slice_data"
+
+      - name: Echo folder ${{ matrix.folders }}
+        shell: bash
+        # For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
+        # set the artifact folder names (because the character `/` is not allowed).
+        env:
+          matrix_folders_raw: ${{ matrix.folders }}
+        run: |
+          echo "$matrix_folders_raw"
+          matrix_folders="${matrix_folders_raw/'models/'/'models_'}"
+          echo "$matrix_folders"
+          echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
+
+      - name: Update clone
+        working-directory: /transformers
+        env:
+          commit_sha: ${{ inputs.commit_sha || github.sha }}
+        run: |
+          git fetch origin "$commit_sha" && git checkout "$commit_sha"
+
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
+      - name: Update / Install some packages (for Past CI)
+        if: ${{ contains(inputs.docker, '-past-') }}
+        working-directory: /transformers
+        run: |
+          python3 -m pip install -U datasets
+
+      - name: Update / Install some packages (for Past CI)
+        if: ${{ contains(inputs.docker, '-past-') && contains(inputs.docker, '-pytorch-') }}
+        working-directory: /transformers
+        run: |
+          python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
+
+      - name: NVIDIA-SMI
+        run: |
+          nvidia-smi
+
+      - name: Environment
+        working-directory: /transformers
+        run: |
+          python3 utils/print_env.py
+
+      - name: Show installed libraries and their versions
+        working-directory: /transformers
+        run: pip freeze
+
+      - name: Set `machine_type` for report and artifact names
+        id: set_machine_type
+        working-directory: /transformers
+        shell: bash
+        env:
+          input_machine_type: ${{ inputs.machine_type }}
+        run: |
+          echo "$input_machine_type"
+
+          if [ "$input_machine_type" = "aws-g5-4xlarge-cache" ]; then
+            machine_type=single-gpu
+          elif [ "$input_machine_type" = "aws-g5-12xlarge-cache" ]; then
+            machine_type=multi-gpu
+          else
+            machine_type="$input_machine_type"
+          fi
+
+          echo "$machine_type"
+          echo "machine_type=$machine_type" >> $GITHUB_ENV
+          echo "machine_type=$machine_type" >> $GITHUB_OUTPUT
+
+      - name: Create report directory if it doesn't exist
+        shell: bash
+        env:
+          report_name_prefix: ${{ inputs.report_name_prefix }}
+        run: |
+          mkdir -p "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports"
+          echo "dummy" > "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports/dummy.txt"
+          ls -la "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports"
+
+      - name: Run all tests on GPU
+        working-directory: /transformers
+        env:
+          report_name_prefix: ${{ inputs.report_name_prefix }}
+          pytest_marker: ${{ inputs.pytest_marker }}
+          model: ${{ matrix.folders }}
+        run: |
+          # Map short names to actual test paths for trainer/distributed tests
+          test_path="tests/${model}"
+          if [ "$model" = "fsdp" ]; then
+            test_path="tests/trainer/distributed/test_trainer_distributed_fsdp.py"
+          elif [ "$model" = "ddp" ]; then
+            test_path="tests/trainer/distributed/test_trainer_distributed_ddp.py"
+          elif [ "$model" = "trainer" ] && [ "$report_name_prefix" = "run_trainer_and_fsdp_gpu" ]; then
+            test_path="tests/trainer --ignore=tests/trainer/distributed"
+          fi
+          script -q -c "PATCH_TESTING_METHODS_TO_COLLECT_OUTPUTS=yes _PATCHED_TESTING_METHODS_OUTPUT_DIR=/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports python3 -m pytest -rsfE -v -m '${pytest_marker}' --make-reports=${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports ${test_path}" test_outputs.txt
+          ls -la
+          # Extract the exit code from the output file
+          EXIT_CODE=$(tail -1 test_outputs.txt | grep -o 'COMMAND_EXIT_CODE="[0-9]*"' | cut -d'"' -f2)
+          exit ${EXIT_CODE:-1}
+
+      - name: Failure short reports
+        if: ${{ failure() }}
+        # This step is only to show information on Github Actions log.
+        # Always mark this step as successful, even if the report directory or the file `failures_short.txt` in it doesn't exist
+        continue-on-error: true
+        env:
+          report_name_prefix: ${{ inputs.report_name_prefix }}
+        run: cat "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports/failures_short.txt"
+
+      - name: Captured information
+        if: ${{ failure() }}
+        continue-on-error: true
+        env:
+          report_name_prefix: ${{ inputs.report_name_prefix }}
+        run: |
+          cat "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports/captured_info.txt"
+
+      - name: Copy test_outputs.txt
+        if: ${{ always() }}
+        continue-on-error: true
+        env:
+          report_name_prefix: ${{ inputs.report_name_prefix }}
+        run: |
+          cp /transformers/test_outputs.txt "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports"
+
+      - name: "Test suite reports artifacts: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports"
+        if: ${{ always() }}
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02  # v4.6.2
+        with:
+          name: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
+          path: /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
+
+  collated_reports:
+    name: Collated Reports
+    if: ${{ always() && inputs.runner_type != '' }}
+    needs: run_models_gpu
+    uses: huggingface/transformers/.github/workflows/collated-reports.yml@6abd9725ee7d809dc974991f8ff6c958afb63a3a  # main
+    with:
+      job: run_models_gpu
+      report_repo_id: ${{ inputs.report_repo_id }}
+      gpu_name: ${{ inputs.runner_type }}
+      machine_type: ${{ needs.run_models_gpu.outputs.machine_type }}
+    secrets: inherit
--- a/.github/workflows/model_jobs_intel_gaudi.yml
+++ b/.github/workflows/model_jobs_intel_gaudi.yml
@@ -0,0 +1,143 @@
+name: model jobs
+
+on:
+  workflow_call:
+    inputs:
+      folder_slices:
+        required: true
+        type: string
+      slice_id:
+        required: true
+        type: number
+      runner:
+        required: true
+        type: string
+      machine_type:
+        required: true
+        type: string
+      report_name_prefix:
+        required: false
+        default: run_models_gpu
+        type: string
+
+env:
+  RUN_SLOW: yes
+  PT_HPU_LAZY_MODE: 0
+  TRANSFORMERS_IS_CI: yes
+  PT_ENABLE_INT64_SUPPORT: 1
+  HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
+  HF_HOME: /mnt/cache/.cache/huggingface
+
+permissions:
+  contents: read
+
+jobs:
+  run_models_gpu:
+    name: " "
+    strategy:
+      max-parallel: 8
+      fail-fast: false
+      matrix:
+        folders: ${{ fromJson(inputs.folder_slices)[inputs.slice_id] }}
+    runs-on:
+      group: ${{ inputs.runner }}
+    container:
+      image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
+      options: --runtime=habana
+        -v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
+        --env OMPI_MCA_btl_vader_single_copy_mechanism=none
+        --env HABANA_VISIBLE_DEVICES
+        --env HABANA_VISIBLE_MODULES
+        --cap-add=sys_nice
+        --shm-size=64G
+    steps:
+      - name: Echo input and matrix info
+        shell: bash
+        env:
+          FOLDER_SLICES: ${{ inputs.folder_slices }}
+          MATRIX_FOLDERS: ${{ matrix.folders }}
+          SLICE: ${{ toJson(fromJson(inputs.folder_slices)[inputs.slice_id]) }}
+        run: |
+          echo "$FOLDER_SLICES"
+          echo "$MATRIX_FOLDERS"
+          echo "$SLICE"
+
+      - name: Echo folder ${{ matrix.folders }}
+        shell: bash
+        env:
+          MATRIX_FOLDERS: ${{ matrix.folders }}
+        run: |
+          echo "$MATRIX_FOLDERS"
+          matrix_folders="${MATRIX_FOLDERS/'models/'/'models_'}"
+          echo "$matrix_folders"
+          echo "matrix_folders=$matrix_folders" >> "$GITHUB_ENV"
+
+      - name: Checkout
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          fetch-depth: 0
+          persist-credentials: false
+
+      - name: Install dependencies
+        run: |
+          pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn
+
+      - name: HL-SMI
+        run: |
+          hl-smi
+          echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
+          echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
+
+      - name: Environment
+        run: python3 utils/print_env.py
+
+      - name: Show installed libraries and their versions
+        run: pip freeze
+
+      - name: Set `machine_type` for report and artifact names
+        shell: bash
+        env:
+          MACHINE_TYPE: ${{ inputs.machine_type }}
+        run: |
+          if [ "$MACHINE_TYPE" = "1gaudi" ]; then
+            machine_type=single-gpu
+          elif [ "$MACHINE_TYPE" = "2gaudi" ]; then
+            machine_type=multi-gpu
+          else
+            machine_type="$MACHINE_TYPE"
+          fi
+          echo "machine_type=$machine_type" >> "$GITHUB_ENV"
+
+      - name: Run all tests on Gaudi
+        env:
+          REPORT_NAME_PREFIX: ${{ inputs.report_name_prefix }}
+          MATRIX_FOLDERS: ${{ matrix.folders }}
+        run: |
+          REPORTS="${machine_type}_${REPORT_NAME_PREFIX}_${MATRIX_FOLDERS}_test_reports"
+          python3 -m pytest -v --make-reports="$REPORTS" "tests/${MATRIX_FOLDERS}"
+
+      - name: Failure short reports
+        if: ${{ failure() }}
+        continue-on-error: true
+        env:
+          REPORT_NAME_PREFIX: ${{ inputs.report_name_prefix }}
+          MATRIX_FOLDERS: ${{ matrix.folders }}
+        run: cat "reports/${machine_type}_${REPORT_NAME_PREFIX}_${MATRIX_FOLDERS}_test_reports/failures_short.txt"
+
+      - name: Run test
+        shell: bash
+        env:
+          REPORT_NAME_PREFIX: ${{ inputs.report_name_prefix }}
+          MATRIX_FOLDERS: ${{ matrix.folders }}
+        run: |
+          REPORTS="${machine_type}_${REPORT_NAME_PREFIX}_${MATRIX_FOLDERS}_test_reports"
+          mkdir -p "reports/$REPORTS"
+          echo "hello" > "reports/$REPORTS/hello.txt"
+          echo "$REPORTS"
+
+      - name: "Test suite reports artifacts: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports"
+        if: ${{ always() }}
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
+          path: reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports
--- a/.github/workflows/new_model_pr_merged_notification.yml
+++ b/.github/workflows/new_model_pr_merged_notification.yml
@@ -0,0 +1,72 @@
+# Used to notify core maintainers about new model PR being merged
+name: New model PR merged notification
+
+on:
+  push:
+    branches:
+      - main
+    paths:
+      - 'src/transformers/models/*/modeling_*'
+
+permissions:
+  contents: read
+
+jobs:
+  notify_new_model:
+    name: Notify new model
+    runs-on: ubuntu-22.04
+    steps:
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          fetch-depth: 0
+          persist-credentials: false
+      - name: Check new model
+        shell: bash
+        run: |
+          python -m pip install gitpython
+          python -c 'from utils.pr_slow_ci_models import get_new_model; new_model = get_new_model(diff_with_last_commit=True); print(new_model)' | tee output.txt
+          echo "NEW_MODEL=$(tail -n 1 output.txt)" >> $GITHUB_ENV
+          echo "COMMIT_SHA=$(git log -1 --format=%H)" >> $GITHUB_ENV
+
+      - name: print commit sha
+        if: ${{ env.NEW_MODEL != ''}}
+        shell: bash
+        run: |
+          echo "$COMMIT_SHA"
+
+      - name: print new model
+        if: ${{ env.NEW_MODEL != ''}}
+        shell: bash
+        run: |
+          echo "$NEW_MODEL"
+
+      - name: Notify
+        if: ${{ env.NEW_MODEL != ''}}
+        uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
+        with:
+          # Slack channel id, channel name, or user id to post message.
+          # See also: https://api.slack.com/methods/chat.postMessage#channels
+          channel-id: transformers-new-model-notification
+          # For posting a rich message using Block Kit
+          payload: |
+            {
+              "blocks": [
+                {
+                  "type": "header",
+                  "text": {
+                    "type": "plain_text",
+                    "text": "New model!",
+                    "emoji": true
+                  }
+                },
+                {
+                  "type": "section",
+                  "text": {
+                    "type": "mrkdwn",
+                    "text": "<https://github.com/huggingface/transformers/commit/${{ env.COMMIT_SHA }}|New model: ${{ env.NEW_MODEL }}> GH_ArthurZucker, GH_lysandrejik, GH_ydshieh\ncommit SHA: ${{ env.COMMIT_SHA }}"
+                  }
+                }
+              ]
+            }
+        env:
+          SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
--- a/.github/workflows/pr-ci-caller.yml
+++ b/.github/workflows/pr-ci-caller.yml
@@ -0,0 +1,23 @@
+name: PR CI
+
+on:
+  pull_request:
+  push:
+    branches:
+      - main
+
+concurrency:
+  # Group by PR number, commit SHA (main) to avoid cross-branch cancellation, or branch ref for cancellation within each branch
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || (github.ref == 'refs/heads/main' && github.sha) || github.ref }}
+  cancel-in-progress: true
+
+permissions:
+  contents: read
+
+jobs:
+  pr-ci:
+    # Allow `ydshieh2` for testing during development
+    if: github.event_name == 'push' || contains(fromJSON('["MEMBER","OWNER","COLLABORATOR"]'), github.event.pull_request.author_association) || github.event.pull_request.user.login == 'ydshieh2'
+    # DON'T use commit sha here before the CI migration is finished
+    uses: huggingface/transformers-test-ci/.github/workflows/pr-ci_dynamic_caller_example.yml@main # main
+
--- a/.github/workflows/pr-ci-post-dashboard-link.yml
+++ b/.github/workflows/pr-ci-post-dashboard-link.yml
@@ -0,0 +1,71 @@
+name: Post CI Dashboard Link
+
+on:
+  workflow_run:
+    workflows: ["PR CI"]
+    types: [completed]
+
+permissions:
+  pull-requests: write
+
+jobs:
+  post-link:
+    if: github.event.workflow_run.event == 'pull_request'
+    runs-on: ubuntu-22.04
+    steps:
+      - uses: actions/github-script@v7
+        with:
+          script: |
+            const headSha = context.payload.workflow_run.head_sha;
+            const runId = context.payload.workflow_run.id;
+
+            const { data: prs } = await github.rest.pulls.list({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              state: 'open',
+            });
+
+            const pr = prs.find(pr => pr.head.sha === headSha);
+            if (!pr) {
+              console.log(`No open PR found for SHA ${headSha}, skipping`);
+              return;
+            }
+
+            // Check if the code quality job failed (causes all test jobs to be skipped)
+            const { data: jobsData } = await github.rest.actions.listJobsForWorkflowRun({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              run_id: runId,
+              per_page: 100,
+            });
+            const qualityJob = jobsData.jobs.find(j => j.name.includes('Check code quality'));
+            const qualityFailed = qualityJob && qualityJob.conclusion === 'failure';
+
+            // Delete any existing dashboard comment before posting a fresh one
+            const { data: comments } = await github.rest.issues.listComments({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              issue_number: pr.number,
+            });
+            for (const comment of comments) {
+              if (comment.body.includes('**CI Observability Dashboard:** [View test results in Grafana]') ||
+                  comment.body.includes('**CI Dashboard:** [View test results in Grafana]')) {
+                await github.rest.issues.deleteComment({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  comment_id: comment.id,
+                });
+              }
+            }
+
+            const url = `https://transformers-ci.lor-e.huggingface.cool/d/pytest-observability-by-pr/pytest-observability-branch?var-pr=${pr.number}`;
+            let body = `**CI Dashboard:** [View test results in Grafana](${url})`;
+            if (qualityFailed) {
+              body += `\n\n> ⚠️ **Code quality check failed** — all test jobs were skipped. Fix the code quality issues and push again to run tests.`;
+            }
+            await github.rest.issues.createComment({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              issue_number: pr.number,
+              body,
+            });
--- a/.github/workflows/pr-repo-consistency-bot.yml
+++ b/.github/workflows/pr-repo-consistency-bot.yml
@@ -0,0 +1,390 @@
+name: PR Repo. Consistency Bot
+
+on:
+  issue_comment:
+    types:
+      - created
+    branches-ignore:
+      - main
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.issue.number }}-${{ startsWith(github.event.comment.body, '@bot /repo') || startsWith(github.event.comment.body, '@bot /style') }}
+  cancel-in-progress: true
+permissions:
+  contents: read
+
+
+jobs:
+  get-pr-number:
+    name: Get PR number
+    if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "eustlb", "MekkCyber", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "remi-or", "itazap", "3outeille", "IlyasMoutawwakil", "tarekziade"]'), github.actor) && (startsWith(github.event.comment.body, '@bot /repo') || startsWith(github.event.comment.body, '@bot /style')) }}
+    uses: ./.github/workflows/get-pr-number.yml
+
+  get-pr-info:
+    name: Get PR commit SHA
+    needs: get-pr-number
+    if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
+    uses: ./.github/workflows/get-pr-info.yml
+    with:
+      pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+
+  check-timestamps:
+    name: Check timestamps (security check)
+    runs-on: ubuntu-22.04
+    needs: get-pr-info
+    outputs:
+      VERIFIED_PR_HEAD_SHA: ${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}
+    steps:
+      - name: Verify `merge_commit` timestamp is older than the issue comment timestamp
+        env:
+          COMMENT_DATE: ${{ github.event.comment.created_at }}
+          PR_MERGE_COMMIT_TIMESTAMP: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}
+        run: |
+            COMMENT_TIMESTAMP=$(date -d "${COMMENT_DATE}" +"%s")
+            echo "COMMENT_DATE: $COMMENT_DATE"
+            echo "COMMENT_TIMESTAMP: $COMMENT_TIMESTAMP"
+            if [ $COMMENT_TIMESTAMP -le $PR_MERGE_COMMIT_TIMESTAMP ]; then
+              echo "Last commit on the pull request is newer than the issue comment triggering this run! Abort!";
+              exit -1;
+            fi
+
+  init_comment_with_url:
+    name: Init Comment on PR
+    runs-on: ubuntu-22.04
+    needs: [get-pr-number, check-timestamps]
+    outputs:
+       comment_id: ${{ steps.init_comment.outputs.comment_id }}
+    permissions:
+      pull-requests: write
+    steps:
+      - name: Delete existing bot comment if it exists
+        env:
+          PR_NUMBER: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+        uses: actions/github-script@d7906e4ad0b1822421a7e6a35d5ca353c962f410 # v6.4.1
+        with:
+          script: |
+            const PR_NUMBER = parseInt(process.env.PR_NUMBER, 10);
+            
+            // Get all comments on the PR
+            const { data: comments } = await github.rest.issues.listComments({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              issue_number: PR_NUMBER
+            });
+            
+            // Find existing bot comments that start with "Repo. Consistency" or "Style fix"
+            const existingComments = comments.filter(comment => 
+              comment.user.login === 'github-actions[bot]' && 
+              (comment.body.startsWith('Repo. Consistency') || comment.body.startsWith('Style fix'))
+            );
+
+            if (existingComments.length > 0) {
+              // Get the most recent comment
+              const mostRecentComment = existingComments
+                .sort((a, b) => new Date(b.created_at) - new Date(a.created_at))[0];
+              
+              console.log(`Deleting most recent comment #${mostRecentComment.id}`);
+              await github.rest.issues.deleteComment({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                comment_id: mostRecentComment.id
+              });
+            }
+
+      - name: Comment on PR with workflow run link
+        id: init_comment
+        env:
+          PR_NUMBER: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+          COMMENT_BODY: ${{ github.event.comment.body }}
+        uses: actions/github-script@d7906e4ad0b1822421a7e6a35d5ca353c962f410 # v6.4.1
+        with:
+          script: |
+            const PR_NUMBER = parseInt(process.env.PR_NUMBER, 10);
+            const COMMENT_BODY = process.env.COMMENT_BODY;
+            const runUrl = `${process.env.GITHUB_SERVER_URL}/${process.env.GITHUB_REPOSITORY}/actions/runs/${process.env.GITHUB_RUN_ID}`
+            
+            // Determine which command was used
+            const isStyleFix = COMMENT_BODY.startsWith('@bot /style');
+            const messagePrefix = isStyleFix ? 'Style fix' : 'Repo. Consistency fix';
+
+            const { data: botComment } = await github.rest.issues.createComment({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              issue_number: PR_NUMBER,
+              body: `${messagePrefix} is beginning .... [View the workflow run here](${runUrl}).`
+            });
+            core.setOutput('comment_id', botComment.id);
+
+  run-repo-consistency-checks:
+    runs-on: ubuntu-22.04
+    needs: [get-pr-info, check-timestamps, init_comment_with_url]
+    permissions:
+      contents: read
+      pull-requests: read
+    outputs:
+      changes_detected: ${{ steps.run_repo_checks.outputs.changes_detected || steps.run_style_checks.outputs.changes_detected }}
+      util_scripts_modified: ${{ steps.check_util_scripts.outputs.util_scripts_modified }}
+    steps:
+      # Checkout the trusted base repository (main branch) - this is safe
+      - name: Checkout base repository
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          ref: main
+          persist-credentials: false
+
+      - name: Set up Python
+        uses: actions/setup-python@7f4fc3e22c37d6ff65e88745f38bd3157c663f7c # v4.9.1
+        with:
+          python-version: "3.10"
+
+      - name: Install dependencies from trusted main branch
+        run: |
+          python -m pip install --upgrade pip
+          pip install -e ".[quality]"
+          pip install --no-cache-dir --upgrade 'torch' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
+
+      - name: Fetch and checkout PR code manually
+        env:
+          PR_HEAD_REPO_FULL_NAME: ${{ needs.get-pr-info.outputs.PR_HEAD_REPO_FULL_NAME }}
+          PR_HEAD_REF: ${{ needs.get-pr-info.outputs.PR_HEAD_REF }}
+          PR_HEAD_SHA: ${{ needs.check-timestamps.outputs.VERIFIED_PR_HEAD_SHA }}
+        run: |
+          # Create separate directory for PR code
+          mkdir -p pr-repo
+          cd pr-repo
+          
+          # Initialize git and fetch with full history
+          git init
+          git remote add pr-origin "https://github.com/${PR_HEAD_REPO_FULL_NAME}.git"
+          git fetch pr-origin "${PR_HEAD_REF}"
+          git checkout "${PR_HEAD_SHA}"
+
+          # Also fetch main branch from upstream for comparison (required by `check_modular_conversion`)
+          git remote add upstream https://github.com/${{ github.repository }}.git
+          git fetch upstream main:main
+
+      - name: Check if util scripts are modified in PR
+        id: check_util_scripts
+        uses: actions/github-script@d7906e4ad0b1822421a7e6a35d5ca353c962f410 # v6.4.1
+        with:
+          script: |
+            // Re-fetch the PR file list from the API rather than using get-pr-info's PR_FILES
+            // output, to avoid template injection and E2BIG issues (see pr_slow_ci_suggestion.yml).
+            const UTIL_SCRIPTS = new Set([
+              'setup.py',
+              'utils/custom_init_isort.py',
+              'utils/sort_auto_mappings.py',
+              'utils/check_doc_toc.py',
+              'utils/check_copies.py',
+              'utils/check_modular_conversion.py',
+              'utils/check_dummies.py',
+              'utils/check_pipeline_typing.py',
+              'utils/check_doctest_list.py',
+              'utils/check_docstrings.py',
+              'utils/add_dates.py',
+            ]);
+
+            const files = await github.paginate(github.rest.pulls.listFiles, {
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              pull_number: context.payload.issue.number,
+            });
+
+            const modified = files.some(f => UTIL_SCRIPTS.has(f.filename));
+            core.setOutput('util_scripts_modified', modified ? 'true' : 'false');
+
+      - name: Install editable transformers from PR branch
+        if: steps.check_util_scripts.outputs.util_scripts_modified != 'true'
+        run: |
+          cd pr-repo
+          pip install -e .
+
+      - name: Run repo consistency checks with trusted script
+        id: run_repo_checks
+        if: steps.check_util_scripts.outputs.util_scripts_modified != 'true' && startsWith(github.event.comment.body, '@bot /repo')
+        run: |
+          # Continue on errors (like Makefile's - prefix)
+          set +e
+          
+          # Run commands in PR directory (with the copied trusted scripts)
+          cd pr-repo
+          
+          # Run style commands
+          ruff check examples tests src utils scripts benchmark benchmark_v2 setup.py conftest.py --fix
+          ruff format examples tests src utils scripts benchmark benchmark_v2 setup.py conftest.py
+          python utils/custom_init_isort.py
+          python utils/sort_auto_mappings.py
+          
+          # Run fix-repo commands
+          python setup.py deps_table_update
+          python utils/check_doc_toc.py --fix_and_overwrite
+          python utils/check_copies.py --fix_and_overwrite
+          python utils/check_modular_conversion.py --fix_and_overwrite
+          python utils/check_dummies.py --fix_and_overwrite
+          python utils/check_pipeline_typing.py --fix_and_overwrite
+          python utils/check_doctest_list.py --fix_and_overwrite
+          python utils/check_docstrings.py --fix_and_overwrite
+          python utils/add_dates.py
+          
+          # Check if there are changes
+          if [ -n "$(git status --porcelain)" ]; then
+            echo "changes_detected=true" >> $GITHUB_OUTPUT
+          else
+            echo "changes_detected=false" >> $GITHUB_OUTPUT
+          fi
+
+      - name: Run style checks with trusted script
+        id: run_style_checks
+        if: steps.check_util_scripts.outputs.util_scripts_modified != 'true' && startsWith(github.event.comment.body, '@bot /style')
+        run: |
+          # Continue on errors (like Makefile's - prefix)
+          set +e
+          
+          # Run commands in PR directory (with the copied trusted scripts)
+          cd pr-repo
+          
+          # Run style commands
+          ruff check examples tests src utils scripts benchmark benchmark_v2 setup.py conftest.py --fix
+          ruff format examples tests src utils scripts benchmark benchmark_v2 setup.py conftest.py
+          python utils/sort_auto_mappings.py
+          
+          # Run fix-repo commands
+          python setup.py deps_table_update
+          python utils/check_doc_toc.py --fix_and_overwrite
+          python utils/check_docstrings.py --fix_and_overwrite
+          
+          # Check if there are changes
+          if [ -n "$(git status --porcelain)" ]; then
+            echo "changes_detected=true" >> $GITHUB_OUTPUT
+          else
+            echo "changes_detected=false" >> $GITHUB_OUTPUT
+          fi
+
+      - name: Save modified files
+        if: steps.check_util_scripts.outputs.util_scripts_modified != 'true' && (steps.run_repo_checks.outputs.changes_detected == 'true' || steps.run_style_checks.outputs.changes_detected == 'true')
+        run: |
+          cd pr-repo
+          mkdir -p ../artifact-staging
+          git diff --name-only > ../artifact-staging/modified-files.txt
+          # Copy each modified file
+          while IFS= read -r file; do
+            mkdir -p "../artifact-staging/pr-repo/$(dirname "$file")"
+            cp "$file" "../artifact-staging/pr-repo/$file"
+          done < ../artifact-staging/modified-files.txt
+
+      - name: Upload modified files
+        if: steps.check_util_scripts.outputs.util_scripts_modified != 'true' && (steps.run_repo_checks.outputs.changes_detected == 'true' || steps.run_style_checks.outputs.changes_detected == 'true')
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: modified-files
+          path: artifact-staging/
+
+  commit-and-comment:
+    runs-on: ubuntu-22.04
+    needs: [get-pr-number, get-pr-info, check-timestamps, init_comment_with_url, run-repo-consistency-checks]
+    if: always()
+    permissions:
+      pull-requests: write
+      contents: write
+    steps:
+      - name: Download modified files
+        if: needs.run-repo-consistency-checks.outputs.changes_detected == 'true'
+        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
+        with:
+          name: modified-files
+
+      - name: Push changes to fork using git
+        if: needs.run-repo-consistency-checks.outputs.changes_detected == 'true'
+        env:
+          PR_HEAD_REF: ${{ needs.get-pr-info.outputs.PR_HEAD_REF }}
+          PR_HEAD_SHA: ${{ needs.check-timestamps.outputs.VERIFIED_PR_HEAD_SHA }}
+          PR_HEAD_REPO_FULL_NAME: ${{ needs.get-pr-info.outputs.PR_HEAD_REPO_FULL_NAME }}
+          GITHUB_TOKEN: ${{ secrets.HF_STYLE_BOT_ACTION }}
+        run: |
+          # Initialize a fresh git repository for pushing
+          mkdir push-repo
+          cd push-repo
+          git init
+          
+          # Configure git
+          git config user.name "github-actions[bot]"
+          git config user.email "github-actions[bot]@users.noreply.github.com"
+          
+          # Add fork as remote with token
+          git remote add origin "https://x-access-token:${GITHUB_TOKEN}@github.com/${PR_HEAD_REPO_FULL_NAME}.git"
+
+          # Fetch only the specific branch
+          git fetch origin "${PR_HEAD_REF}"
+
+          # Checkout the branch
+          git checkout -b "${PR_HEAD_REF}" "origin/${PR_HEAD_REF}"
+          
+          # Verify we're on the correct SHA
+          current_sha=$(git rev-parse HEAD)
+          if [ "$current_sha" != "$PR_HEAD_SHA" ]; then
+            echo "❌ Error: Branch has been updated since workflow started"
+            echo "Expected SHA: $PR_HEAD_SHA"
+            echo "Current SHA: $current_sha"
+            exit 1
+          fi
+          
+          # Copy modified files from artifact
+          echo "Copying modified files..."
+          while IFS= read -r file; do
+            if [ -n "$file" ]; then
+              echo "  - $file"
+              mkdir -p "$(dirname "$file")"
+              cp "../pr-repo/$file" "$file"
+            fi
+          done < ../modified-files.txt
+          
+          # Check if there are changes
+          if [ -n "$(git status --porcelain)" ]; then
+            git add .
+            git commit -m "Apply repo consistency fixes"
+            git push origin "HEAD:${PR_HEAD_REF}"
+            echo "✅ Changes pushed successfully"
+          else
+            echo "No changes to commit"
+          fi
+
+      - name: Prepare final comment message
+        id: prepare_final_comment
+        if: needs.init_comment_with_url.result == 'success'
+        env:
+          CHANGES_DETECTED: ${{ needs.run-repo-consistency-checks.outputs.changes_detected }}
+          UTIL_SCRIPTS_MODIFIED: ${{ needs.run-repo-consistency-checks.outputs.util_scripts_modified }}
+          COMMENT_BODY: ${{ github.event.comment.body }}
+        run: |
+          # Determine which command was used
+          if [[ "$COMMENT_BODY" == "@bot /style"* ]]; then
+            MESSAGE_PREFIX="Style fix"
+          else
+            MESSAGE_PREFIX="Repo. Consistency"
+          fi
+
+          if [ "$UTIL_SCRIPTS_MODIFIED" = 'true' ]; then
+            echo "final_comment=${MESSAGE_PREFIX}: \`setup.py\` or some script files under the \`utils/\` directory are modified in this PR. Please run style/repo. checks/fixes locally." >> $GITHUB_OUTPUT
+          elif [ "$CHANGES_DETECTED" = 'true' ]; then
+            echo "final_comment=${MESSAGE_PREFIX} bot fixed some files and pushed the changes." >> $GITHUB_OUTPUT
+          else
+            echo "final_comment=${MESSAGE_PREFIX} fix runs successfully without any file modified." >> $GITHUB_OUTPUT
+          fi
+
+      - name: Comment on PR
+        if: needs.init_comment_with_url.result == 'success'
+        uses: actions/github-script@d7906e4ad0b1822421a7e6a35d5ca353c962f410 # v6.4.1
+        env:
+          PR_NUMBER: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+          COMMENT_ID: ${{ needs.init_comment_with_url.outputs.comment_id }}
+          FINAL_COMMENT: ${{ steps.prepare_final_comment.outputs.final_comment }}
+        with:
+          script: |
+            const pr_number = parseInt(process.env.PR_NUMBER, 10);
+            const comment_id = parseInt(process.env.COMMENT_ID, 10);
+            const body = process.env.FINAL_COMMENT;
+            await github.rest.issues.updateComment({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              comment_id,
+              body,
+            });
--- a/.github/workflows/pr_build_doc_with_comment.yml
+++ b/.github/workflows/pr_build_doc_with_comment.yml
@@ -0,0 +1,138 @@
+name: PR - build doc via comment
+on:
+  issue_comment:
+    types:
+      - created
+    branches-ignore:
+      - main
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.issue.number }}-${{ startsWith(github.event.comment.body, 'build-doc') }}
+  cancel-in-progress: true
+permissions: {}
+
+
+jobs:
+  get-pr-number:
+    name: Get PR number
+    if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "eustlb", "MekkCyber", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "itazap", "tarekziade"]'), github.actor) && (startsWith(github.event.comment.body, 'build-doc')) }}
+    uses: ./.github/workflows/get-pr-number.yml
+
+  get-pr-info:
+    name: Get PR commit SHA
+    needs: get-pr-number
+    if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
+    uses: ./.github/workflows/get-pr-info.yml
+    with:
+      pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+
+  verity_pr_commit:
+    name: Verity PR commit corresponds to a specific event by comparing timestamps
+    if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
+    runs-on: ubuntu-22.04
+    needs: get-pr-info
+    env:
+      COMMENT_DATE: ${{ github.event.comment.created_at }}
+      PR_MERGE_COMMIT_DATE: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_DATE }}
+      PR_MERGE_COMMIT_TIMESTAMP: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}
+    steps:
+      - run: |
+          COMMENT_TIMESTAMP=$(date -d "${COMMENT_DATE}" +"%s")
+          echo "COMMENT_DATE: $COMMENT_DATE"
+          echo "PR_MERGE_COMMIT_DATE: $PR_MERGE_COMMIT_DATE"
+          echo "COMMENT_TIMESTAMP: $COMMENT_TIMESTAMP"
+          echo "PR_MERGE_COMMIT_TIMESTAMP: $PR_MERGE_COMMIT_TIMESTAMP"
+          if [ $COMMENT_TIMESTAMP -le $PR_MERGE_COMMIT_TIMESTAMP ]; then
+            echo "Last commit on the pull request is newer than the issue comment triggering this run! Abort!";
+            exit -1;
+          fi
+
+  create_run:
+    name: Create run
+    needs: [get-pr-number, get-pr-info]
+    if: ${{ needs.get-pr-number.outputs.PR_NUMBER != '' }}
+    permissions:
+      statuses: write
+    runs-on: ubuntu-22.04
+    steps:
+      - name: Create Run
+        id: create_run
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          # Create a commit status (pending) for a run of this workflow. The status has to be updated later in `update_run_status`.
+          # See https://docs.github.com/en/rest/commits/statuses?apiVersion=2022-11-28#create-a-commit-status
+          GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
+          NEEDS_GET_PR_INFO_OUTPUTS_PR_HEAD_SHA: ${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}
+        run: |
+          gh api \
+            --method POST \
+            -H "Accept: application/vnd.github+json" \
+            -H "X-GitHub-Api-Version: 2022-11-28" \
+            repos/${{ github.repository }}/statuses/${NEEDS_GET_PR_INFO_OUTPUTS_PR_HEAD_SHA} \
+            -f "target_url=$GITHUB_RUN_URL" -f "state=pending" -f "description=Custom doc building job" -f "context=custom-doc-build"
+
+  reply_to_comment:
+    name: Reply to the comment
+    if: ${{ needs.create_run.result == 'success' }}
+    needs: [get-pr-number, create_run]
+    permissions:
+      pull-requests: write
+    runs-on: ubuntu-22.04
+    steps:
+      - name: Reply to the comment
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
+          NEEDS_GET_PR_NUMBER_OUTPUTS_PR_NUMBER: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+        run: |
+          gh api \
+            --method POST \
+            -H "Accept: application/vnd.github+json" \
+            -H "X-GitHub-Api-Version: 2022-11-28" \
+            repos/${{ github.repository }}/issues/${NEEDS_GET_PR_NUMBER_OUTPUTS_PR_NUMBER}/comments \
+            -f "body=[Building docs for all languages...](${GITHUB_RUN_URL})"
+
+  build-doc:
+    name: Build doc
+    needs: [get-pr-number, get-pr-info]
+    if: ${{ needs.get-pr-number.outputs.PR_NUMBER != '' }}
+    uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@093eb65f2e8745457987df060dc392e6bcf1347a # main
+    with:
+      commit_sha: ${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}
+      pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+      package: transformers
+      languages: ar de en es fr hi it ja ko pt zh
+
+  update_run_status:
+    name: Update Check Run Status
+    needs: [ get-pr-info, create_run, build-doc ]
+    permissions:
+      statuses: write
+    if: ${{ always() && needs.create_run.result == 'success' }}
+    runs-on: ubuntu-22.04
+    env:
+      GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+      GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
+      STATUS_OK: ${{ contains(fromJSON('["skipped", "success"]'), needs.build-doc.result) }}
+    steps:
+      - name: Get `build-doc` job status
+        run: |
+          echo "${{ needs.build-doc.result }}"
+          echo $STATUS_OK
+          if [ "$STATUS_OK" = "true" ]; then
+            echo "STATUS=success" >> $GITHUB_ENV
+          else
+            echo "STATUS=failure" >> $GITHUB_ENV
+          fi
+
+      - name: Update PR commit statuses
+        run: |
+          echo "${{ needs.build-doc.result }}"
+          echo "${STATUS}"
+          gh api \
+            --method POST \
+            -H "Accept: application/vnd.github+json" \
+            -H "X-GitHub-Api-Version: 2022-11-28" \
+            repos/${{ github.repository }}/statuses/${NEEDS_GET_PR_INFO_OUTPUTS_PR_HEAD_SHA} \
+            -f "target_url=$GITHUB_RUN_URL" -f "state=${STATUS}" -f "description=Custom doc building job" -f "context=custom-doc-build"
+        env:
+          NEEDS_GET_PR_INFO_OUTPUTS_PR_HEAD_SHA: ${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}
--- a/.github/workflows/pr_slow_ci_suggestion.yml
+++ b/.github/workflows/pr_slow_ci_suggestion.yml
@@ -0,0 +1,188 @@
+name: PR slow CI - Suggestion
+on:
+  pull_request_target:
+    types: [opened, synchronize, reopened]
+
+permissions:
+  contents: read
+
+jobs:
+  get-pr-number:
+    name: Get PR number
+    uses: ./.github/workflows/get-pr-number.yml
+
+  get-pr-info:
+    name: Get PR commit SHA
+    needs: get-pr-number
+    if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
+    uses: ./.github/workflows/get-pr-info.yml
+    with:
+      pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+
+  get-jobs:
+    name: Get test files to run
+    runs-on: ubuntu-22.04
+    needs: [get-pr-number, get-pr-info]
+    outputs:
+      jobs: ${{ steps.get_jobs.outputs.jobs_to_run }}
+    steps:
+      # This checkout to the main branch
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          fetch-depth: "0"
+          persist-credentials: false
+
+      # Refetch the PR file list from the API instead of receiving it as an
+      # input from `get-pr-info.yml`. The previous approaches either expanded
+      # attacker-controlled content into a shell/JS source via `${{ ... }}`
+      # (template injection, fixed in #45956) or passed the full JSON through
+      # an env var (`E2BIG` / "Argument list too long" once the patch payload
+      # exceeds the per-env-var kernel limit, ~128KB). Fetching inside the
+      # action keeps the JSON in Node heap memory and writes it straight to
+      # disk, so it never crosses an execve argv/envp boundary.
+      - name: Write pr_files file
+        uses: actions/github-script@d7906e4ad0b1822421a7e6a35d5ca353c962f410 # v6.4.1
+        env:
+          PR_NUMBER: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+        with:
+          script: |
+            const fs = require('node:fs');
+            const files = await github.paginate(github.rest.pulls.listFiles, {
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              pull_number: parseInt(process.env.PR_NUMBER, 10),
+            });
+            fs.writeFileSync('pr_files.txt', JSON.stringify(files));
+
+      - name: Get repository content
+        id: repo_content
+        uses: actions/github-script@d7906e4ad0b1822421a7e6a35d5ca353c962f410 # v6.4.1
+        env:
+          PR_HEAD_REPO_OWNER: ${{ needs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}
+          PR_HEAD_REPO_NAME: ${{ needs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}
+          PR_HEAD_SHA: ${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}
+        with:
+          script: |
+            const fs = require('node:fs');
+            const { PR_HEAD_REPO_OWNER, PR_HEAD_REPO_NAME, PR_HEAD_SHA } = process.env;
+
+            const { data: tests_dir } = await github.rest.repos.getContent({
+              owner: PR_HEAD_REPO_OWNER,
+              repo: PR_HEAD_REPO_NAME,
+              path: 'tests',
+              ref: PR_HEAD_SHA,
+            });
+
+            const { data: tests_models_dir } = await github.rest.repos.getContent({
+              owner: PR_HEAD_REPO_OWNER,
+              repo: PR_HEAD_REPO_NAME,
+              path: 'tests/models',
+              ref: PR_HEAD_SHA,
+            });
+
+            const { data: tests_quantization_dir } = await github.rest.repos.getContent({
+              owner: PR_HEAD_REPO_OWNER,
+              repo: PR_HEAD_REPO_NAME,
+              path: 'tests/quantization',
+              ref: PR_HEAD_SHA,
+            });
+
+            // Write to files instead of outputs
+            fs.writeFileSync('tests_dir.txt', JSON.stringify(tests_dir, null, 2));
+            fs.writeFileSync('tests_models_dir.txt', JSON.stringify(tests_models_dir, null, 2));
+            fs.writeFileSync('tests_quantization_dir.txt', JSON.stringify(tests_quantization_dir, null, 2));
+
+      - name: Run script to get jobs to run
+        id: get_jobs
+        run: |
+          python utils/get_pr_run_slow_jobs.py | tee output.txt
+          echo "jobs_to_run: $(tail -n 1 output.txt)"
+          echo "jobs_to_run=$(tail -n 1 output.txt)" >> $GITHUB_OUTPUT
+
+  send_comment:
+    # Will delete the previous comment and send a new one if:
+    #   - either the content is changed
+    #   - or the previous comment is 30 minutes or more old
+    name: Send a comment to suggest jobs to run
+    if: ${{ needs.get-jobs.outputs.jobs != '' }}
+    needs: [get-pr-number, get-jobs]
+    permissions:
+      pull-requests: write
+    runs-on: ubuntu-22.04
+    steps:
+      - name: Check and update comment if needed
+        uses: actions/github-script@f28e40c7f34bde8b3046d885e986cb6290c5673b # v7.1.0
+        env:
+          BODY: "\n\nrun-slow: ${{ needs.get-jobs.outputs.jobs }}"
+          PR_NUMBER: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+        with:
+          script: |
+            const prNumber = parseInt(process.env.PR_NUMBER, 10);
+            const commentPrefix = "**[For maintainers]** Suggested jobs to run (before merge)";
+            const thirtyMinutesAgo = new Date(Date.now() - 30 * 60 * 1000); // 30 minutes ago
+            const newBody = `${commentPrefix}${process.env.BODY}`;
+
+            // Get all comments on the PR
+            const { data: comments } = await github.rest.issues.listComments({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              issue_number: prNumber
+            });
+
+            // Find existing comments that start with our prefix
+            const existingComments = comments.filter(comment =>
+              comment.user.login === 'github-actions[bot]' &&
+              comment.body.startsWith(commentPrefix)
+            );
+
+            let shouldCreateNewComment = true;
+            let commentsToDelete = [];
+
+            if (existingComments.length > 0) {
+              // Get the most recent comment
+              const mostRecentComment = existingComments
+                .sort((a, b) => new Date(b.created_at) - new Date(a.created_at))[0];
+
+              const commentDate = new Date(mostRecentComment.created_at);
+              const isOld = commentDate < thirtyMinutesAgo;
+              const isDifferentContent = mostRecentComment.body !== newBody;
+
+              console.log(`Most recent comment created: ${mostRecentComment.created_at}`);
+              console.log(`Is older than 30 minutes: ${isOld}`);
+              console.log(`Has different content: ${isDifferentContent}`);
+
+              if (isOld || isDifferentContent) {
+                // Delete all existing comments and create new one
+                commentsToDelete = existingComments;
+                console.log(`Will delete ${commentsToDelete.length} existing comment(s) and create new one`);
+              } else {
+                // Content is same and comment is recent, skip
+                shouldCreateNewComment = false;
+                console.log('Comment is recent and content unchanged, skipping update');
+              }
+            } else {
+              console.log('No existing comments found, will create new one');
+            }
+
+            // Delete old comments if needed
+            for (const comment of commentsToDelete) {
+              console.log(`Deleting comment #${comment.id} (created: ${comment.created_at})`);
+              await github.rest.issues.deleteComment({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                comment_id: comment.id
+              });
+            }
+
+            // Create new comment if needed
+            if (shouldCreateNewComment) {
+              await github.rest.issues.createComment({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                issue_number: prNumber,
+                body: newBody
+              });
+              console.log('✅ New comment created');
+            } else {
+              console.log('ℹ️ No comment update needed');
+            }
--- a/.github/workflows/push-important-models.yml
+++ b/.github/workflows/push-important-models.yml
@@ -0,0 +1,162 @@
+name: Slow tests on important models (on Push - A10)
+
+on:
+  push:
+    branches: [ main ]
+
+permissions:
+  contents: read
+
+jobs:
+  get_modified_models:
+    name: "Get all modified files"
+    runs-on: ubuntu-latest
+    outputs:
+      matrix: ${{ steps.set-matrix.outputs.matrix }}
+    steps:
+      - name: Check out code
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+
+      - name: Get changed files using `actions/github-script`
+        id: get-changed-files
+        uses: actions/github-script@f28e40c7f34bde8b3046d885e986cb6290c5673b # v7.1.0
+        with:
+          script: |
+            let files = [];
+            
+            // Only handle push events
+            if (context.eventName === 'push') {
+              const afterSha = context.payload.after;
+              const branchName = context.payload.ref.replace('refs/heads/', '');
+              
+              let baseSha;
+              
+              if (branchName === 'main') {
+                console.log('Push to main branch, comparing to parent commit');
+                // Get the parent commit of the pushed commit
+                const { data: commit } = await github.rest.repos.getCommit({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  ref: afterSha
+                });
+                baseSha = commit.parents[0]?.sha;
+                if (!baseSha) {
+                  throw new Error('No parent commit found for the pushed commit');
+                }
+              } else {
+                console.log(`Push to branch ${branchName}, comparing to main`);
+                baseSha = 'main';
+              }
+              
+              const { data: comparison } = await github.rest.repos.compareCommits({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                base: baseSha,
+                head: afterSha
+              });
+              
+              // Include added, modified, and renamed files
+              files = comparison.files
+                .filter(file => file.status === 'added' || file.status === 'modified' || file.status === 'renamed')
+                .map(file => file.filename);
+            }
+            
+            // Include all files under src/transformers/ (not just models subdirectory)
+            const filteredFiles = files.filter(file => 
+              file.startsWith('src/transformers/')
+            );
+            
+            core.setOutput('changed_files', filteredFiles.join(' '));
+            core.setOutput('any_changed', filteredFiles.length > 0 ? 'true' : 'false');
+
+      - name: Parse changed files with Python
+        if: steps.get-changed-files.outputs.any_changed == 'true'
+        env:
+          CHANGED_FILES: ${{ steps.get-changed-files.outputs.changed_files }}
+        id: set-matrix
+        run: |
+          python3 - << 'EOF'
+          import os
+          import sys
+          import json
+          
+          # Add the utils directory to Python path
+          sys.path.insert(0, 'utils')
+          
+          # Import the important models list
+          from important_files import IMPORTANT_MODELS
+          
+          print(f"Important models: {IMPORTANT_MODELS}")
+          
+          # Get the changed files from the previous step
+          changed_files_str = os.environ.get('CHANGED_FILES', '')
+          changed_files = changed_files_str.split() if changed_files_str else []
+          
+          # Filter to only Python files
+          python_files = [f for f in changed_files if f.endswith('.py')]
+          print(f"Python files changed: {python_files}")
+          
+          result_models = set()
+          
+          # Specific files that trigger all models
+          transformers_utils_files = [
+              'modeling_utils.py',
+              'modeling_rope_utils.py', 
+              'modeling_flash_attention_utils.py',
+              'modeling_attn_mask_utils.py',
+              'cache_utils.py',
+              'masking_utils.py',
+              'pytorch_utils.py'
+          ]
+          
+          # Single loop through all Python files
+          for file in python_files:
+              # Check for files under src/transformers/models/
+              if file.startswith('src/transformers/models/'):
+                  remaining_path = file[len('src/transformers/models/'):]
+                  if '/' in remaining_path:
+                      model_dir = remaining_path.split('/')[0]
+                      if model_dir in IMPORTANT_MODELS:
+                          result_models.add(model_dir)
+                          print(f"Added model directory: {model_dir}")
+              
+              # Check for specific files under src/transformers/ or src/transformers/generation/ files
+              elif file.startswith('src/transformers/generation/') or \
+                   (file.startswith('src/transformers/') and os.path.basename(file) in transformers_utils_files):
+                  print(f"Found core file: {file} - including all important models")
+                  result_models.update(IMPORTANT_MODELS)
+                  break  # No need to continue once we include all models
+          
+          # Convert to sorted list and create matrix
+          result_list = sorted(list(result_models))
+          print(f"Final model list: {result_list}")
+          
+          if result_list:
+              matrix_json = json.dumps(result_list)
+              print(f"matrix={matrix_json}")
+              
+              # Write to GITHUB_OUTPUT
+              with open(os.environ['GITHUB_OUTPUT'], 'a') as f:
+                  f.write(f"matrix={matrix_json}\n")
+          else:
+              print("matrix=[]")
+              with open(os.environ['GITHUB_OUTPUT'], 'a') as f:
+                  f.write("matrix=[]\n")
+          EOF
+
+  model-ci:
+    name: Model CI
+    uses: ./.github/workflows/self-scheduled.yml
+    needs: get_modified_models
+    if: needs.get_modified_models.outputs.matrix != '' && needs.get_modified_models.outputs.matrix != '[]'
+    with:
+      job: run_models_gpu
+      slack_report_channel: "#transformers-ci-push"
+      docker: huggingface/transformers-all-latest-gpu:flash-attn
+      ci_event: push
+      report_repo_id: hf-internal-testing/transformers_ci_push
+      commit_sha: ${{ github.sha }}
+      subdirs: ${{ needs.get_modified_models.outputs.matrix }}
+    secrets: inherit
--- a/.github/workflows/release-conda.yml
+++ b/.github/workflows/release-conda.yml
@@ -0,0 +1,52 @@
+name: Release - Conda
+
+on:
+  push:
+    tags:
+      - v*
+    branches:
+      - conda_*
+
+env:
+  ANACONDA_API_TOKEN: ${{ secrets.ANACONDA_API_TOKEN }}
+
+permissions:
+  contents: read
+
+jobs:
+  build_and_package:
+    runs-on: ubuntu-22.04
+    defaults:
+      run:
+        shell: bash -l {0}
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          persist-credentials: false
+
+      - name: Install miniconda
+        uses: conda-incubator/setup-miniconda@9f54435e0e72c53962ee863144e47a4b094bfd35  # v2.3.0
+        with:
+          auto-update-conda: true
+          auto-activate-base: false
+          python-version: 3.8
+          activate-environment: "build-transformers"
+          channels: huggingface
+
+      - name: Setup conda env
+        run: |
+          conda install -c defaults anaconda-client conda-build
+
+      - name: Extract version
+        run: echo "TRANSFORMERS_VERSION=`python setup.py --version`" >> $GITHUB_ENV
+
+      - name: Build conda packages
+        run: |
+          conda info
+          conda list
+          conda-build .github/conda
+
+      - name: Upload to Anaconda
+        run: anaconda upload `conda-build .github/conda --output` --force
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@@ -0,0 +1,72 @@
+name: Release
+on:
+  push:
+    tags:
+      - v*
+    branches:
+      - 'v*-release'
+
+permissions:
+  contents: read
+
+jobs:
+  build_and_test:
+    name: build release
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          persist-credentials: false
+
+      - name: set up python
+        uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065  # v5.6.0
+        with:
+          python-version: "3.13"
+
+      - run: pip install setuptools
+      - run: pip install -e .
+      - run: make build-release
+
+      - run: pip uninstall -y transformers
+      - run: pip install dist/*.whl
+
+      - run: python -c "from transformers import *"
+
+      - run: pip install -e .[torch]
+      - run: python -c "from transformers import pipeline; classifier = pipeline('text-classification'); assert classifier('What a nice release')[0]['score'] > 0"
+
+      - run: pip install twine
+      - run: twine check --strict dist/*
+
+      - name: Upload build artifacts
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02  # v4.6.2
+        with:
+          name: python-dist
+          path: |
+            dist/**
+            build/**
+
+  upload_package:
+    needs: build_and_test
+    if: startsWith(github.ref, 'refs/tags/')
+    runs-on: ubuntu-latest
+    environment: pypi-release
+    permissions:
+      id-token: write
+
+    steps:
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          persist-credentials: false
+
+      - name: Download build artifacts
+        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093  # v4.3.0
+        with:
+          name: python-dist
+          path: .
+
+      - name: Publish package distributions to TestPyPI
+        uses: pypa/gh-action-pypi-publish@ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e  # release/v1
+        with:
+          verbose: true
+
--- a/.github/workflows/self-comment-ci.yml
+++ b/.github/workflows/self-comment-ci.yml
@@ -0,0 +1,472 @@
+name: PR comment GitHub CI
+
+on:
+  issue_comment:
+    types:
+      - created
+    branches-ignore:
+      - main
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.issue.number }}-${{ startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow') }}
+  cancel-in-progress: true
+permissions:
+  contents: read
+
+env:
+  HF_HOME: /mnt/cache
+  TRANSFORMERS_IS_CI: yes
+  OMP_NUM_THREADS: 8
+  MKL_NUM_THREADS: 8
+  RUN_SLOW: yes
+  # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
+  # This token is created under the bot `hf-transformers-bot`.
+  HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
+  TF_FORCE_GPU_ALLOW_GROWTH: true
+  CUDA_VISIBLE_DEVICES: 0,1
+
+
+jobs:
+  get-pr-number:
+    name: Get PR number
+    if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "eustlb", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "remi-or", "itazap", "3outeille", "IlyasMoutawwakil", "tarekziade", "yonigozlan", "guarin"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}
+    uses: ./.github/workflows/get-pr-number.yml
+
+  get-pr-info:
+    name: Get PR commit SHA
+    needs: get-pr-number
+    if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
+    uses: ./.github/workflows/get-pr-info.yml
+    with:
+      pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+
+  check-timestamps:
+    name: Check timestamps (security check)
+    runs-on: ubuntu-22.04
+    needs: get-pr-info
+    outputs:
+      PR_HEAD_SHA: ${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}
+      PR_MERGE_SHA: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_SHA }}
+    steps:
+      - name: Verify `merge_commit` timestamp is older than the issue comment timestamp
+        env:
+          COMMENT_DATE: ${{ github.event.comment.created_at }}
+          PR_MERGE_COMMIT_TIMESTAMP: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}
+        run: |
+            COMMENT_TIMESTAMP=$(date -d "${COMMENT_DATE}" +"%s")
+            echo "COMMENT_DATE: $COMMENT_DATE"
+            echo "COMMENT_TIMESTAMP: $COMMENT_TIMESTAMP"
+            if [ $COMMENT_TIMESTAMP -le $PR_MERGE_COMMIT_TIMESTAMP ]; then
+              echo "Last commit on the pull request is newer than the issue comment triggering this run! Abort!";
+              exit -1;
+            fi
+
+  # use a python script to handle this complex logic.
+  get-tests:
+    runs-on: ubuntu-22.04
+    needs: [get-pr-number, check-timestamps]
+    outputs:
+      models: ${{ steps.models_to_run.outputs.models }}
+      quantizations: ${{ steps.models_to_run.outputs.quantizations }}
+    steps:
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          fetch-depth: "0"
+          ref: "refs/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge"
+          persist-credentials: false
+
+      - name: Verify merge commit SHA
+        env:
+          VERIFIED_PR_MERGE_SHA: ${{ needs.check-timestamps.outputs.PR_MERGE_SHA }}
+        run: |
+            PR_MERGE_SHA=$(git log -1 --format=%H)
+            if [ $PR_MERGE_SHA != $VERIFIED_PR_MERGE_SHA ]; then
+              echo "The merged commit SHA is not the same as the verified one! Security issue detected, abort the workflow!";
+              exit -1;
+            fi
+
+      - name: Get models to test
+        env:
+          PR_COMMENT: ${{ github.event.comment.body }}
+        run: |
+          python -m pip install GitPython
+          python utils/pr_slow_ci_models.py --message "$PR_COMMENT" | tee output.txt
+          echo "models=$(tail -n 1 output.txt)" >> $GITHUB_ENV
+          python utils/pr_slow_ci_models.py --message "$PR_COMMENT" --quantization | tee output2.txt
+          echo "quantizations=$(tail -n 1 output2.txt)" >> $GITHUB_ENV
+
+      - name: Show models to test
+        id: models_to_run
+        run: |
+          echo "$models"
+          echo "models=$models" >> $GITHUB_OUTPUT
+          echo "$quantizations"
+          echo "quantizations=$quantizations" >> $GITHUB_OUTPUT
+
+  # Report back if we are not able to get the tests (for example, security check is failing)
+  report_error_earlier:
+    name: Report error earlier
+    if: ${{ always() && needs.get-pr-info.result == 'success' && needs.get-tests.result != 'success' }}
+    needs: [get-pr-number, get-pr-info, get-tests]
+    permissions:
+      pull-requests: write
+    runs-on: ubuntu-22.04
+    steps:
+      - name: Reply to the comment
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
+          PREFIX: '[Workflow Run ⚙️](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }})\n\n'
+          github_repository: ${{ github.repository }}
+          pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+        run: |
+          gh api \
+            --method POST \
+            -H "Accept: application/vnd.github+json" \
+            -H "X-GitHub-Api-Version: 2022-11-28" \
+            "repos/${github_repository}/issues/${pr_number}/comments" \
+            -f body="$(echo -e "$PREFIX")💔 This comment contains \`run-slow\`, but unknown error occurred and [the workflow run]($GITHUB_RUN_URL) aborted!"
+
+  reply_to_comment:
+    name: Reply to the comment
+    if: ${{ needs.get-tests.outputs.models != '[]'  || needs.get-tests.outputs.quantizations != '[]' }}
+    needs: [get-pr-number, get-tests]
+    permissions:
+      pull-requests: write
+    runs-on: ubuntu-22.04
+    steps:
+      - name: Reply to the comment
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          PREFIX: '[Workflow Run ⚙️](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }})'
+          INFO: '\n\nThis comment contains `run-slow`, running the specified jobs'
+          BODY: '\n\nmodels: ${{ needs.get-tests.outputs.models }}\nquantizations: ${{ needs.get-tests.outputs.quantizations }}'
+          github_repository: ${{ github.repository }}
+          pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+        run: |
+          gh api \
+            --method POST \
+            -H "Accept: application/vnd.github+json" \
+            -H "X-GitHub-Api-Version: 2022-11-28" \
+            "repos/${github_repository}/issues/${pr_number}/comments" \
+            -f body="$(echo -e "$PREFIX")$(echo -e "$INFO"): $(echo -e "$BODY")"
+
+  create_run:
+    name: Create run
+    needs: [check-timestamps, reply_to_comment]
+    permissions:
+      statuses: write
+    runs-on: ubuntu-22.04
+    steps:
+      - name: Create Run
+        id: create_run
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          # Create a commit status (pending) for a run of this workflow. The status has to be updated later in `update_run_status`.
+          # See https://docs.github.com/en/rest/commits/statuses?apiVersion=2022-11-28#create-a-commit-status
+          GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
+          github_repository: ${{ github.repository }}
+          pr_head_sha: ${{ needs.check-timestamps.outputs.PR_HEAD_SHA }}
+        run: |
+          gh api \
+            --method POST \
+            -H "Accept: application/vnd.github+json" \
+            -H "X-GitHub-Api-Version: 2022-11-28" \
+            "repos/${github_repository}/statuses/${pr_head_sha}" \
+            -f "target_url=$GITHUB_RUN_URL" -f "state=pending" -f "description=Slow CI job" -f "context=pytest/custom-tests"
+
+  model-ci:
+    name: Model CI
+    if: ${{ needs.get-tests.outputs.models != '[]' }}
+    uses: ./.github/workflows/self-scheduled.yml
+    needs: [get-pr-number, check-timestamps, get-tests, create_run]
+    with:
+      job: run_models_gpu
+      slack_report_channel: "#transformers-ci-pr"
+      docker: huggingface/transformers-all-latest-gpu
+      ci_event: PR Comment CI
+      report_repo_id: hf-internal-testing/transformers_pr_ci
+      commit_sha: ${{ needs.check-timestamps.outputs.PR_MERGE_SHA }}
+      subdirs: ${{ needs.get-tests.outputs.models }}
+      pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+    secrets: inherit
+
+  quantization-ci:
+    name: Quantization CI
+    if: ${{ needs.get-tests.outputs.quantizations != '[]' }}
+    uses: ./.github/workflows/self-scheduled.yml
+    needs: [get-pr-number, check-timestamps, get-tests, create_run]
+    with:
+      job: run_quantization_torch_gpu
+      slack_report_channel: "#transformers-ci-pr"
+      docker: huggingface/transformers-quantization-latest-gpu
+      ci_event: PR Comment CI
+      report_repo_id: hf-internal-testing/transformers_pr_ci
+      commit_sha: ${{ needs.check-timestamps.outputs.PR_MERGE_SHA }}
+      subdirs: ${{ needs.get-tests.outputs.quantizations }}
+      pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+    secrets: inherit
+
+  report:
+    name: Check & Report
+    needs: [get-pr-number, get-pr-info, check-timestamps, create_run, model-ci, quantization-ci]
+    permissions:
+      pull-requests: write
+      statuses: write
+    if: ${{ always() && needs.create_run.result == 'success' }}
+    runs-on: ubuntu-22.04
+    steps:
+      - uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
+        with:
+          pattern: new_failures_with_bad_commit_{run_models_gpu,run_quantization_torch_gpu}
+          path: ./new_failures
+          merge-multiple: false
+
+      - name: List downloaded artifacts
+        run: |
+          echo "Downloaded artifact files:"
+          if [ -d "./new_failures/" ]; then
+            find ./new_failures/ -type f
+          else
+            echo "No artifacts downloaded (directory doesn't exist)"
+          fi
+
+      - name: Show reports from jobs
+        run: |
+          echo "=== Model CI Report ==="
+          if [ -f "./new_failures/new_failures_with_bad_commit_run_models_gpu/new_failures_with_bad_commit.json" ]; then
+            cat ./new_failures/new_failures_with_bad_commit_run_models_gpu/new_failures_with_bad_commit.json
+          else
+            echo "No model CI report found"
+          fi
+          
+          echo ""
+          echo "=== Quantization CI Report ==="
+          if [ -f "./new_failures/new_failures_with_bad_commit_run_quantization_torch_gpu/new_failures_with_bad_commit.json" ]; then
+            cat ./new_failures/new_failures_with_bad_commit_run_quantization_torch_gpu/new_failures_with_bad_commit.json
+          else
+            echo "No quantization CI report found"
+          fi
+
+      - name: Process and filter reports
+        run: |
+          # Preprocess with Python
+          python3 << 'PYTHON_SCRIPT'
+          import json
+          import os
+          from pathlib import Path
+          
+          def count_failures(data):
+            """
+            Count total number of failures (excluding None commits)
+            """
+            total = 0
+            for model, model_result in data.items():
+                for device, failures in model_result.items():
+                    # Count failures where commit is not None
+                    total += sum(
+                        1 for failure in failures 
+                        if isinstance(failure, dict) and failure.get('bad_commit') is not None
+                    )
+            return total
+          
+          def filter_and_format_report(data):
+            """
+            Filter out entries where commit is `None` (failing tests who status is not certain) and format as text
+            """
+            lines = []
+            
+            for model, model_result in data.items():
+                model_lines = []
+                for device, failures in model_result.items():
+                    
+                    filtered_failures = [
+                        failure for failure in failures 
+                        if isinstance(failure, dict) and failure.get('bad_commit') is not None
+                    ]
+
+                    # Add tests to model lines
+                    for idx, failure in enumerate(filtered_failures):
+                        if idx == 0:
+                            job_link = failure['job_link']
+                            model_lines.append(f"- [{model}]({job_link}):")
+        
+                        test_name = failure['test']
+                        status = failure.get('status', '')
+                        if "DIFFERENT error message" in status:
+                            indicator = "(❌ ⟹ ❌)"
+                        else:
+                            indicator = "(✅ ⟹ ❌)"
+        
+                        model_lines.append(f"    {test_name} {indicator}")
+        
+                if len(model_lines) > 0:
+                    lines.extend(model_lines)
+                    lines.append("")  # Empty line between models
+            
+            return "\n".join(lines).strip()
+          
+          # Read reports from downloaded artifact files
+          model_report_path = Path('./new_failures/new_failures_with_bad_commit_run_models_gpu/new_failures_with_bad_commit.json')
+          quant_report_path = Path('./new_failures/new_failures_with_bad_commit_run_quantization_torch_gpu/new_failures_with_bad_commit.json')
+
+          # Read URL files if they exist
+          model_url_path = Path('./new_failures/new_failures_with_bad_commit_run_models_gpu/new_failures_with_bad_commit_url.txt')
+          quant_url_path = Path('./new_failures/new_failures_with_bad_commit_run_quantization_torch_gpu/new_failures_with_bad_commit_url.txt')
+
+          model_url = None
+          if model_url_path.exists():
+              with open(model_url_path, 'r') as f:
+                  model_url = f.read().strip()
+          
+          quant_url = None
+          if quant_url_path.exists():
+              with open(quant_url_path, 'r') as f:
+                  quant_url = f.read().strip()
+
+          model_report = {}
+          if model_report_path.exists():
+              with open(model_report_path, 'r') as f:
+                  model_report = json.load(f)
+          
+          quant_report = {}
+          if quant_report_path.exists():
+              with open(quant_report_path, 'r') as f:
+                  quant_report = json.load(f)
+          
+          # Count failures
+          model_count = count_failures(model_report)
+          quant_count = count_failures(quant_report)
+          
+          formatted_model = filter_and_format_report(model_report)
+          formatted_quant = filter_and_format_report(quant_report)
+          
+          # Write to files
+          with open('model_ci.txt', 'w') as f:
+              if formatted_model:
+                  if model_url:
+                      f.write(f"❌ **[{model_count} new failed tests from this PR]({model_url})** 😭\n\n")
+                  else:
+                      f.write(f"❌ **{model_count} new failed tests from this PR** 😭\n\n")
+                  f.write(formatted_model)
+                  f.write('\n')
+          
+          with open('quantization_ci.txt', 'w') as f:
+              if formatted_quant:
+                  if quant_url:
+                      f.write(f"❌ **[{quant_count} new failed tests from this PR]({quant_url})** 😭\n\n")
+                  else:
+                      f.write(f"❌ **{quant_count} new failed tests from this PR** 😭\n\n")
+                  f.write(formatted_quant)
+                  f.write('\n')
+          PYTHON_SCRIPT
+
+      - name: Post results as PR comment
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
+          github_repository: ${{ github.repository }}
+          pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
+          pr_head_repo: ${{ needs.get-pr-info.outputs.PR_HEAD_REPO_FULL_NAME }}
+          run_commit: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_SHA }}
+          pr_commit: ${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}
+          main_commit: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_BASE_SHA }}
+          model_ci_result: ${{ needs.model-ci.result }}
+          quantization_ci_result: ${{ needs.quantization-ci.result }}
+          model_infrastructure_ok: ${{ needs.model-ci.outputs.is_infrastructure_ok }}
+          quant_infrastructure_ok: ${{ needs.quantization-ci.outputs.is_infrastructure_ok }}
+        run: |
+          # Create commit links (8 chars display, full SHA in URL)
+          run_commit_link="[${run_commit:0:8}](https://github.com/${github_repository}/commit/${run_commit})"
+          pr_commit_link="[${pr_commit:0:8}](https://github.com/${pr_head_repo}/commit/${pr_commit})"
+          main_commit_link="[${main_commit:0:8}](https://github.com/${github_repository}/commit/${main_commit})"
+
+          {
+            echo '## CI Results'
+            echo "[Workflow Run ⚙️]($GITHUB_RUN_URL)"
+            echo ''
+
+            echo '### Commit Info'
+            echo "| Context | Commit | Description |"
+            echo "|---------|--------|-------------|"
+            echo "| RUN | ${run_commit_link} | workflow commit (merge commit) |"
+            echo "| PR | ${pr_commit_link} | branch commit (from PR) |"
+            echo "| main | ${main_commit_link} | base commit (on \`main\`) |"
+            echo ''
+
+            # Check infrastructure health - simple and clean!
+            infrastructure_error=false
+            
+            # Model CI
+            if [[ "$model_ci_result" != "skipped" && "$model_ci_result" != "cancelled" ]]; then
+              if [[ "$model_infrastructure_ok" == "false" ]]; then
+                echo "⚠️ **Model CI failed to report results**"
+                echo ''
+                infrastructure_error=true
+              fi
+            fi
+            
+            # Quantization CI
+            if [[ "$quantization_ci_result" != "skipped" && "$quantization_ci_result" != "cancelled" ]]; then
+              if [[ "$quant_infrastructure_ok" == "false" ]]; then
+                echo "⚠️ **Quantization CI failed to report results**"
+                echo ''
+                infrastructure_error=true
+              fi
+            fi
+            
+            if [[ "$infrastructure_error" == "true" ]]; then
+              echo 'The test failure analysis could not be completed. Please check the [workflow run]('$GITHUB_RUN_URL') for details.'
+              echo "STATUS=error" >> $GITHUB_ENV
+      
+            # Check if both jobs were skipped or cancelled
+            elif [[ "$model_ci_result" == "skipped" || "$model_ci_result" == "cancelled" ]] && \
+               [[ "$quantization_ci_result" == "skipped" || "$quantization_ci_result" == "cancelled" ]]; then
+              echo '⚠️ No test being reported (jobs are skipped or cancelled)!'
+              echo "STATUS=error" >> $GITHUB_ENV
+
+            # Check if either file has content
+            elif [ -s model_ci.txt ] || [ -s quantization_ci.txt ]; then
+              echo "STATUS=failure" >> $GITHUB_ENV
+
+              # Check if model_ci.txt has content
+              if [ -s model_ci.txt ]; then
+                echo '### Model CI Report'
+                echo ''
+                cat model_ci.txt
+                echo ''
+              fi
+              
+              # Check if quantization_ci.txt has content
+              if [ -s quantization_ci.txt ]; then
+                echo '### Quantization CI Report'
+                echo ''
+                cat quantization_ci.txt
+                echo ''
+              fi
+            else
+              echo "STATUS=success" >> $GITHUB_ENV
+              echo '✅ No failing test specific to this PR 🎉 👏 !'
+            fi
+          } > comment_body.txt
+
+          gh api \
+            --method POST \
+            -H "Accept: application/vnd.github+json" \
+            -H "X-GitHub-Api-Version: 2022-11-28" \
+            "repos/${github_repository}/issues/${pr_number}/comments" \
+            -F body=@comment_body.txt
+
+      - name: Update PR commit statuses
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
+          github_repository: ${{ github.repository }}
+          pr_head_sha: ${{ needs.check-timestamps.outputs.PR_HEAD_SHA }}
+        # The env. variable `STATUS` used here is set in the previous step
+        run: |
+          gh api \
+            --method POST \
+            -H "Accept: application/vnd.github+json" \
+            -H "X-GitHub-Api-Version: 2022-11-28" \
+            "repos/${github_repository}/statuses/${pr_head_sha}" \
+            -f "target_url=$GITHUB_RUN_URL" -f "state=$STATUS" -f "description=Slow CI job" -f "context=pytest/custom-tests"
--- a/.github/workflows/self-nightly-caller.yml
+++ b/.github/workflows/self-nightly-caller.yml
@@ -0,0 +1,63 @@
+name: Nvidia CI with nightly torch
+
+on:
+  repository_dispatch:
+  # triggered when the daily scheduled Nvidia CI is completed.
+  # This way, we can compare the results more easily.
+  workflow_run:
+    workflows: ["Nvidia CI"]
+    branches: ["main"]
+    types: [completed]
+  push:
+    branches:
+      - run_ci_with_nightly_torch*
+
+# Used for `push` to easily modify the target workflow runs to compare against
+env:
+    prev_workflow_run_id: ""
+    other_workflow_run_id: ""
+
+
+permissions:
+  contents: read
+
+jobs:
+  build_nightly_torch_ci_images:
+    name: Build CI Docker Images with nightly torch
+    uses: ./.github/workflows/build-nightly-ci-docker-images.yml
+    with:
+      job: latest-with-torch-nightly-docker
+    secrets: inherit
+
+  setup:
+    name: Setup
+    runs-on: ubuntu-22.04
+    steps:
+      - name: Setup
+        env:
+          PREV_WORKFLOW_RUN_ID: ${{ inputs.prev_workflow_run_id || env.prev_workflow_run_id }}
+          OTHER_WORKFLOW_RUN_ID: ${{ inputs.other_workflow_run_id || env.other_workflow_run_id }}
+        run: |
+          mkdir "setup_values"
+          echo "$PREV_WORKFLOW_RUN_ID" > "setup_values/prev_workflow_run_id.txt"
+          echo "$OTHER_WORKFLOW_RUN_ID" > "setup_values/other_workflow_run_id.txt"
+
+      - name: Upload artifacts
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: setup_values
+          path: setup_values
+
+  model-ci:
+    name: Model CI
+    needs: build_nightly_torch_ci_images
+    uses: ./.github/workflows/self-scheduled.yml
+    with:
+      job: run_models_gpu
+      slack_report_channel: "#transformers-ci-past-future"
+      docker: huggingface/transformers-all-latest-torch-nightly-gpu
+      ci_event: Nightly CI
+      runner_type: "a10"
+      report_repo_id: hf-internal-testing/transformers_daily_ci_with_torch_nightly
+      commit_sha: ${{ github.event.workflow_run.head_sha || github.sha }}
+    secrets: inherit
--- a/.github/workflows/self-nightly-past-ci-caller.yml
+++ b/.github/workflows/self-nightly-past-ci-caller.yml
@@ -0,0 +1,102 @@
+name: Self-hosted runner (nightly-past-ci-caller)
+
+on:
+  schedule:
+    - cron: "17 2,14 * * *"
+  push:
+    branches:
+      - run_past_ci*
+
+permissions:
+  contents: read
+
+jobs:
+  get_number:
+    name: Get number
+    runs-on: ubuntu-22.04
+    outputs:
+      run_number: ${{ steps.get_number.outputs.run_number }}
+    steps:
+      - name: Get number
+        id: get_number
+        run: |
+          echo "${{ github.run_number }}"
+          echo "$(python3 -c 'print(int(${{ github.run_number }}) % 10)')"
+          echo "run_number=$(python3 -c 'print(int(${{ github.run_number }}) % 10)')" >> $GITHUB_OUTPUT
+
+  run_past_ci_tensorflow_2-11:
+    name: TensorFlow 2.11
+    needs: get_number
+    if: needs.get_number.outputs.run_number == 3 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))
+    uses: ./.github/workflows/self-past-caller.yml
+    with:
+      framework: tensorflow
+      version: "2.11"
+      sha: ${{ github.sha }}
+    secrets: inherit
+
+  run_past_ci_tensorflow_2-10:
+    name: TensorFlow 2.10
+    needs: get_number
+    if: needs.get_number.outputs.run_number == 4 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))
+    uses: ./.github/workflows/self-past-caller.yml
+    with:
+      framework: tensorflow
+      version: "2.10"
+      sha: ${{ github.sha }}
+    secrets: inherit
+
+  run_past_ci_tensorflow_2-9:
+    name: TensorFlow 2.9
+    needs: get_number
+    if: needs.get_number.outputs.run_number == 5 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))
+    uses: ./.github/workflows/self-past-caller.yml
+    with:
+      framework: tensorflow
+      version: "2.9"
+      sha: ${{ github.sha }}
+    secrets: inherit
+
+  run_past_ci_tensorflow_2-8:
+    name: TensorFlow 2.8
+    needs: get_number
+    if: needs.get_number.outputs.run_number == 6 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))
+    uses: ./.github/workflows/self-past-caller.yml
+    with:
+      framework: tensorflow
+      version: "2.8"
+      sha: ${{ github.sha }}
+    secrets: inherit
+
+  run_past_ci_tensorflow_2-7:
+    name: TensorFlow 2.7
+    needs: get_number
+    if: needs.get_number.outputs.run_number == 7 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))
+    uses: ./.github/workflows/self-past-caller.yml
+    with:
+      framework: tensorflow
+      version: "2.7"
+      sha: ${{ github.sha }}
+    secrets: inherit
+
+  run_past_ci_tensorflow_2-6:
+    name: TensorFlow 2.6
+    needs: get_number
+    if: needs.get_number.outputs.run_number == 8 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))
+    uses: ./.github/workflows/self-past-caller.yml
+    with:
+      framework: tensorflow
+      version: "2.6"
+      sha: ${{ github.sha }}
+    secrets: inherit
+
+  run_past_ci_tensorflow_2-5:
+    name: TensorFlow 2.5
+    needs: get_number
+    if: needs.get_number.outputs.run_number == 9 &&  (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))
+    uses: ./.github/workflows/self-past-caller.yml
+    with:
+      framework: tensorflow
+      version: "2.5"
+      sha: ${{ github.sha }}
+    secrets: inherit
--- a/.github/workflows/self-past-caller.yml
+++ b/.github/workflows/self-past-caller.yml
@@ -0,0 +1,43 @@
+name: Self-hosted runner (past-ci)
+
+
+on:
+  workflow_call:
+    inputs:
+      framework:
+        required: true
+        type: string
+      version:
+        required: true
+        type: string
+      # Use this to control the commit to test against
+      sha:
+        default: 'main'
+        required: false
+        type: string
+
+permissions:
+  contents: read
+
+jobs:
+  model-ci:
+    name: Model CI
+    uses: ./.github/workflows/self-scheduled.yml
+    with:
+      job: run_models_gpu
+      slack_report_channel: "#transformers-ci-past-future"
+      runner: past-ci
+      docker: huggingface/transformers-${{ inputs.framework }}-past-${{ inputs.version }}-gpu
+      ci_event: Past CI - ${{ inputs.framework }}-${{ inputs.version }}
+    secrets: inherit
+
+  deepspeed-ci:
+    name: DeepSpeed CI
+    uses: ./.github/workflows/self-scheduled.yml
+    with:
+      job: run_torch_cuda_extensions_gpu
+      slack_report_channel: "#transformers-ci-past-future"
+      runner: past-ci
+      docker: huggingface/transformers-${{ inputs.framework }}-past-${{ inputs.version }}-gpu
+      ci_event: Past CI - ${{ inputs.framework }}-${{ inputs.version }}
+    secrets: inherit
--- a/.github/workflows/self-scheduled-amd-caller.yml
+++ b/.github/workflows/self-scheduled-amd-caller.yml
@@ -0,0 +1,17 @@
+name: Self-hosted runner (AMD scheduled CI caller)
+
+on:
+  schedule:
+    - cron: "17 5 * * *"
+
+permissions:
+  contents: read
+
+jobs:
+  run_scheduled_amd_ci:
+    name: Trigger Scheduled AMD CI
+    runs-on: ubuntu-22.04
+    if: ${{ always() }}
+    steps:
+      - name: Trigger scheduled AMD CI via workflow_run
+        run: echo "Trigger scheduled AMD CI via workflow_run"
--- a/.github/workflows/self-scheduled-amd-mi250-caller.yml
+++ b/.github/workflows/self-scheduled-amd-mi250-caller.yml
@@ -0,0 +1,62 @@
+name: Self-hosted runner (AMD mi250 scheduled CI caller)
+
+on:
+  workflow_run:
+    workflows: ["Self-hosted runner (AMD scheduled CI caller)"]
+    branches: ["main"]
+    types: [completed]
+  push:
+    branches:
+      - run_amd_scheduled_ci_caller*
+
+permissions:
+  contents: read
+
+jobs:
+  model-ci:
+    name: Model CI
+    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@63657f571a92cc9759159442936061c51d6d9ae4 # main
+    with:
+      job: run_models_gpu
+      slack_report_channel: "#transformers-ci-daily-amd"
+      runner: mi250
+      docker: huggingface/transformers-pytorch-amd-gpu
+      ci_event: Scheduled CI (AMD) - mi250
+      report_repo_id: optimum-amd/transformers_daily_ci
+    secrets: inherit
+
+  torch-pipeline:
+    name: Torch pipeline CI
+    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@63657f571a92cc9759159442936061c51d6d9ae4 # main
+    with:
+      job: run_pipelines_torch_gpu
+      slack_report_channel: "#transformers-ci-daily-amd"
+      runner: mi250
+      docker: huggingface/transformers-pytorch-amd-gpu
+      ci_event: Scheduled CI (AMD) - mi250
+      report_repo_id: optimum-amd/transformers_daily_ci
+    secrets: inherit
+
+  example-ci:
+    name: Example CI
+    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@63657f571a92cc9759159442936061c51d6d9ae4 # main
+    with:
+      job: run_examples_gpu
+      slack_report_channel: "#transformers-ci-daily-amd"
+      runner: mi250
+      docker: huggingface/transformers-pytorch-amd-gpu
+      ci_event: Scheduled CI (AMD) - mi250
+      report_repo_id: optimum-amd/transformers_daily_ci
+    secrets: inherit
+
+  deepspeed-ci:
+    name: DeepSpeed CI
+    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@63657f571a92cc9759159442936061c51d6d9ae4 # main
+    with:
+      job: run_torch_cuda_extensions_gpu
+      slack_report_channel: "#transformers-ci-daily-amd"
+      runner: mi250
+      docker: huggingface/transformers-pytorch-deepspeed-amd-gpu
+      ci_event: Scheduled CI (AMD) - mi250
+      report_repo_id: optimum-amd/transformers_daily_ci
+    secrets: inherit
--- a/.github/workflows/self-scheduled-amd-mi325-caller.yml
+++ b/.github/workflows/self-scheduled-amd-mi325-caller.yml
@@ -0,0 +1,72 @@
+name: Self-hosted runner scale set (AMD mi325 scheduled CI caller)
+
+# Note: For every job in this workflow, the name of the runner scale set is finalized in the runner yaml i.e. huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml
+# For example, 1gpu scale set: amd-mi325-ci-1gpu
+#              2gpu scale set: amd-mi325-ci-2gpu
+# Important: Do not pin the reusable workflow ref to a SHA. AMD runner groups only route jobs for
+# workflows referenced at @main; a pinned SHA causes jobs to wait for a runner until the 24h timeout.
+
+on:
+  workflow_run:
+    workflows: ["Self-hosted runner (AMD scheduled CI caller)"]
+    branches: ["main"]
+    types: [completed]
+  push:
+    branches:
+      - run_amd_scheduled_ci_caller*
+
+permissions:
+  contents: read
+
+jobs:
+  model-ci:
+    name: Model CI
+    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
+    with:
+      job: run_models_gpu
+      slack_report_channel: "#amd-hf-ci"
+      runner_group: amd-mi325
+      docker: huggingface/transformers-pytorch-amd-gpu
+      ci_event: Scheduled CI (AMD) - mi325
+      report_repo_id: optimum-amd/transformers_daily_ci
+      env_file: /etc/podinfo/gha-gpu-isolation-settings
+    secrets: inherit
+
+  torch-pipeline:
+    name: Torch pipeline CI
+    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
+    with:
+      job: run_pipelines_torch_gpu
+      slack_report_channel: "#amd-hf-ci"
+      runner_group: amd-mi325
+      docker: huggingface/transformers-pytorch-amd-gpu
+      ci_event: Scheduled CI (AMD) - mi325
+      report_repo_id: optimum-amd/transformers_daily_ci
+      env_file: /etc/podinfo/gha-gpu-isolation-settings
+    secrets: inherit
+
+  example-ci:
+    name: Example CI
+    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
+    with:
+      job: run_examples_gpu
+      slack_report_channel: "#amd-hf-ci"
+      runner_group: amd-mi325
+      docker: huggingface/transformers-pytorch-amd-gpu
+      ci_event: Scheduled CI (AMD) - mi325
+      report_repo_id: optimum-amd/transformers_daily_ci
+      env_file: /etc/podinfo/gha-gpu-isolation-settings
+    secrets: inherit
+
+  deepspeed-ci:
+    name: DeepSpeed CI
+    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
+    with:
+      job: run_torch_cuda_extensions_gpu
+      slack_report_channel: "#amd-hf-ci"
+      runner_group: amd-mi325
+      docker: huggingface/transformers-pytorch-deepspeed-amd-gpu
+      ci_event: Scheduled CI (AMD) - mi325
+      report_repo_id: optimum-amd/transformers_daily_ci
+      env_file: /etc/podinfo/gha-gpu-isolation-settings
+    secrets: inherit
--- a/.github/workflows/self-scheduled-amd-mi355-caller.yml
+++ b/.github/workflows/self-scheduled-amd-mi355-caller.yml
@@ -0,0 +1,66 @@
+name: Self-hosted runner scale set (AMD mi355 scheduled CI caller)
+
+# Note: For every job in this workflow, the name of the runner scale set is finalized in the runner yaml i.e. huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml
+# For example, 1gpu : amd-mi355-ci-1gpu
+#              2gpu : amd-mi355-ci-2gpu
+ 
+on:
+  workflow_run:
+    workflows: ["Self-hosted runner (AMD scheduled CI caller)"]
+    branches: ["main"]
+    types: [completed]
+  push:
+    branches:
+      - run_amd_scheduled_ci_caller*
+
+permissions:
+  contents: read
+
+jobs:
+  model-ci:
+    name: Model CI
+    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@63657f571a92cc9759159442936061c51d6d9ae4 # main
+    with:
+      job: run_models_gpu
+      slack_report_channel: "#amd-hf-ci"
+      runner_group: hfc-amd-mi355
+      docker: huggingface/transformers-pytorch-amd-gpu
+      ci_event: Scheduled CI (AMD) - mi355
+      report_repo_id: hf-transformers-bot/transformers-ci-dummy
+    secrets: inherit
+
+  torch-pipeline:
+    name: Torch pipeline CI
+    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@63657f571a92cc9759159442936061c51d6d9ae4 # main
+    with:
+      job: run_pipelines_torch_gpu
+      slack_report_channel: "#amd-hf-ci"
+      runner_group: hfc-amd-mi355
+      docker: huggingface/transformers-pytorch-amd-gpu
+      ci_event: Scheduled CI (AMD) - mi355
+      report_repo_id: hf-transformers-bot/transformers-ci-dummy
+    secrets: inherit
+
+  example-ci:
+    name: Example CI
+    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@63657f571a92cc9759159442936061c51d6d9ae4 # main
+    with:
+      job: run_examples_gpu
+      slack_report_channel: "#amd-hf-ci"
+      runner_group: hfc-amd-mi355
+      docker: huggingface/transformers-pytorch-amd-gpu
+      ci_event: Scheduled CI (AMD) - mi355
+      report_repo_id: hf-transformers-bot/transformers-ci-dummy
+    secrets: inherit
+
+  deepspeed-ci:
+    name: DeepSpeed CI
+    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@63657f571a92cc9759159442936061c51d6d9ae4 # main
+    with:  
+      job: run_torch_cuda_extensions_gpu
+      slack_report_channel: "#amd-hf-ci"
+      runner_group: hfc-amd-mi355
+      docker: huggingface/testing-rocm7.0-preview
+      ci_event: Scheduled CI (AMD) - mi355
+      report_repo_id: hf-transformers-bot/transformers-ci-dummy
+    secrets: inherit
--- a/.github/workflows/self-scheduled-caller.yml
+++ b/.github/workflows/self-scheduled-caller.yml
@@ -0,0 +1,138 @@
+name: Nvidia CI
+
+on:
+  repository_dispatch:
+  schedule:
+    - cron: "17 2 * * *"
+  push:
+    branches:
+      - run_nvidia_ci*
+  workflow_dispatch:
+    inputs:
+      prev_workflow_run_id:
+        description: 'previous workflow run id to compare'
+        type: string
+        required: false
+        default: ""
+      other_workflow_run_id:
+        description: 'other workflow run id to compare'
+        type: string
+        required: false
+        default: ""
+
+
+# Used for `push` to easily modify the target workflow runs to compare against
+env:
+    prev_workflow_run_id: ""
+    other_workflow_run_id: ""
+
+
+permissions:
+  contents: read
+
+jobs:
+  setup:
+    name: Setup
+    runs-on: ubuntu-22.04
+    steps:
+      - name: Setup
+        env:
+          prev_workflow_run_id: ${{ inputs.prev_workflow_run_id || env.prev_workflow_run_id }}
+          other_workflow_run_id: ${{ inputs.other_workflow_run_id || env.other_workflow_run_id }}
+        run: |
+          mkdir "setup_values"
+          echo "$prev_workflow_run_id" > "setup_values/prev_workflow_run_id.txt"
+          echo "$other_workflow_run_id" > "setup_values/other_workflow_run_id.txt"
+
+      - name: Upload artifacts
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: setup_values
+          path: setup_values
+
+  model-ci:
+    name: Model CI
+    uses: ./.github/workflows/self-scheduled.yml
+    with:
+      job: run_models_gpu
+      slack_report_channel: "#transformers-ci-daily-models"
+      docker: huggingface/transformers-all-latest-gpu
+      ci_event: Daily CI
+      runner_type: "a10"
+      report_repo_id: hf-internal-testing/transformers_daily_ci
+      commit_sha: ${{ github.sha }}
+    secrets: inherit
+
+  torch-pipeline:
+    name: Torch pipeline CI
+    uses: ./.github/workflows/self-scheduled.yml
+    with:
+      job: run_pipelines_torch_gpu
+      slack_report_channel: "#transformers-ci-daily-pipeline-torch"
+      docker: huggingface/transformers-all-latest-gpu
+      ci_event: Daily CI
+      report_repo_id: hf-internal-testing/transformers_daily_ci
+      commit_sha: ${{ github.sha }}
+    secrets: inherit
+
+  example-ci:
+    name: Example CI
+    uses: ./.github/workflows/self-scheduled.yml
+    with:
+      job: run_examples_gpu
+      slack_report_channel: "#transformers-ci-daily-examples"
+      docker: huggingface/transformers-all-latest-gpu
+      ci_event: Daily CI
+      report_repo_id: hf-internal-testing/transformers_daily_ci
+      commit_sha: ${{ github.sha }}
+    secrets: inherit
+
+  trainer-fsdp-ci:
+    name: Trainer/FSDP CI
+    uses: ./.github/workflows/self-scheduled.yml
+    with:
+      job: run_trainer_and_fsdp_gpu
+      slack_report_channel: "#transformers-ci-daily-training"
+      docker: huggingface/transformers-all-latest-gpu
+      runner_type: "a10"
+      ci_event: Daily CI
+      report_repo_id: hf-internal-testing/transformers_daily_ci
+      commit_sha: ${{ github.sha }}
+    secrets: inherit
+
+  deepspeed-ci:
+    name: DeepSpeed CI
+    uses: ./.github/workflows/self-scheduled.yml
+    with:
+      job: run_torch_cuda_extensions_gpu
+      slack_report_channel: "#transformers-ci-daily-training"
+      docker: huggingface/transformers-pytorch-deepspeed-latest-gpu
+      ci_event: Daily CI
+      working-directory-prefix: /workspace
+      report_repo_id: hf-internal-testing/transformers_daily_ci
+      commit_sha: ${{ github.sha }}
+    secrets: inherit
+
+  quantization-ci:
+    name: Quantization CI
+    uses: ./.github/workflows/self-scheduled.yml
+    with:
+      job: run_quantization_torch_gpu
+      slack_report_channel: "#transformers-ci-daily-quantization"
+      docker: huggingface/transformers-quantization-latest-gpu
+      ci_event: Daily CI
+      report_repo_id: hf-internal-testing/transformers_daily_ci
+      commit_sha: ${{ github.sha }}
+    secrets: inherit
+
+  kernels-ci:
+    name: Kernels CI
+    uses: ./.github/workflows/self-scheduled.yml
+    with:
+      job: run_kernels_gpu
+      slack_report_channel: "#transformers-ci-daily-kernels"
+      docker: huggingface/transformers-all-latest-gpu
+      ci_event: Daily CI
+      report_repo_id: hf-internal-testing/transformers_daily_ci
+      commit_sha: ${{ github.sha }}
+    secrets: inherit
--- a/.github/workflows/self-scheduled-flash-attn-caller.yml
+++ b/.github/workflows/self-scheduled-flash-attn-caller.yml
@@ -0,0 +1,66 @@
+name: Nvidia CI - Flash Attn
+
+on:
+  repository_dispatch:
+  schedule:
+    - cron: "17 2 * * *"
+  push:
+    branches:
+      - run_nvidia_ci_flash_attn*
+  workflow_dispatch:
+    inputs:
+      prev_workflow_run_id:
+        description: 'previous workflow run id to compare'
+        type: string
+        required: false
+        default: ""
+      other_workflow_run_id:
+        description: 'other workflow run id to compare'
+        type: string
+        required: false
+        default: ""
+
+
+# Used for `push` to easily modify the target workflow runs to compare against
+env:
+    prev_workflow_run_id: ""
+    other_workflow_run_id: ""
+
+
+permissions:
+  contents: read
+
+jobs:
+  setup:
+    name: Setup
+    runs-on: ubuntu-22.04
+    steps:
+      - name: Setup
+        env:
+          PREV_WORKFLOW_RUN_ID: ${{ inputs.prev_workflow_run_id || env.prev_workflow_run_id }}
+          OTHER_WORKFLOW_RUN_ID: ${{ inputs.other_workflow_run_id || env.other_workflow_run_id }}
+        run: |
+          mkdir "setup_values"
+          echo "$PREV_WORKFLOW_RUN_ID" > "setup_values/prev_workflow_run_id.txt"
+          echo "$OTHER_WORKFLOW_RUN_ID" > "setup_values/other_workflow_run_id.txt"
+
+      - name: Upload artifacts
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: setup_values
+          path: setup_values
+
+
+  model-ci:
+    name: Model CI
+    uses: ./.github/workflows/self-scheduled.yml
+    with:
+      job: run_models_gpu
+      slack_report_channel: "#transformers-ci-flash-attn"
+      docker: huggingface/transformers-all-latest-gpu:flash-attn
+      ci_event: Daily CI
+      runner_type: "a10"
+      report_repo_id: hf-internal-testing/transformers_flash_attn_ci
+      commit_sha: ${{ github.sha }}
+      pytest_marker: "flash_attn_test or flash_attn_3_test or flash_attn_4_test or all_flash_attn_test"
+    secrets: inherit
--- a/.github/workflows/self-scheduled-intel-gaudi.yml
+++ b/.github/workflows/self-scheduled-intel-gaudi.yml
@@ -0,0 +1,350 @@
+name: Self-hosted runner (scheduled-intel-gaudi)
+
+on:
+  workflow_call:
+    inputs:
+      job:
+        required: true
+        type: string
+      slack_report_channel:
+        required: true
+        type: string
+      runner_scale_set:
+        required: true
+        type: string
+      ci_event:
+        required: true
+        type: string
+      report_repo_id:
+        required: true
+        type: string
+
+env:
+  NUM_SLICES: 2
+  RUN_SLOW: yes
+  PT_HPU_LAZY_MODE: 0
+  TRANSFORMERS_IS_CI: yes
+  PT_ENABLE_INT64_SUPPORT: 1
+  HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
+  HF_HOME: /mnt/cache/.cache/huggingface
+
+permissions:
+  contents: read
+
+jobs:
+  setup:
+    if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu"]'), inputs.job)
+    name: Setup
+    runs-on: ubuntu-latest
+    outputs:
+      slice_ids: ${{ steps.set-matrix.outputs.slice_ids }}
+      folder_slices: ${{ steps.set-matrix.outputs.folder_slices }}
+      quantization_matrix: ${{ steps.set-matrix.outputs.quantization_matrix }}
+    steps:
+      - name: Checkout
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          fetch-depth: 0
+          persist-credentials: false
+
+      - name: Set up Python
+        uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
+        with:
+          python-version: "3.10"
+
+      - id: set-matrix
+        if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu"]'), inputs.job)
+        name: Identify models to test
+        working-directory: tests
+        env:
+          JOB: ${{ inputs.job }}
+        run: |
+          if [ "$JOB" = "run_models_gpu" ]; then
+            echo "folder_slices=$(python3 ../utils/split_model_tests.py --num_splits "$NUM_SLICES")" >> "$GITHUB_OUTPUT"
+            echo "slice_ids=$(python3 -c 'import os, sys; print(list(range(int(os.environ[\"NUM_SLICES\"]))))')" >> "$GITHUB_OUTPUT"
+          elif [ "$JOB" = "run_trainer_and_fsdp_gpu" ]; then
+            echo "folder_slices=[['trainer'], ['fsdp']]" >> "$GITHUB_OUTPUT"
+            echo "slice_ids=[0, 1]" >> "$GITHUB_OUTPUT"
+          fi
+
+      - id: set-matrix-quantization
+        if: ${{ inputs.job == 'run_quantization_torch_gpu' }}
+        name: Identify quantization method to test
+        working-directory: tests
+        run: |
+          echo "quantization_matrix=$(python3 -c 'import os; tests = os.getcwd(); quantization_tests = os.listdir(os.path.join(tests, "quantization")); d = sorted(list(filter(os.path.isdir, [f"quantization/{x}" for x in quantization_tests]))) ;  print(d)')" >> $GITHUB_OUTPUT
+
+  run_models_gpu:
+    if: ${{ inputs.job == 'run_models_gpu' }}
+    name: " "
+    needs: setup
+    strategy:
+      fail-fast: false
+      matrix:
+        machine_type: [1gaudi, 2gaudi]
+        slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
+    uses: ./.github/workflows/model_jobs_intel_gaudi.yml
+    with:
+      slice_id: ${{ matrix.slice_id }}
+      machine_type: ${{ matrix.machine_type }}
+      folder_slices: ${{ needs.setup.outputs.folder_slices }}
+      runner: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
+    secrets: inherit
+
+  run_trainer_and_fsdp_gpu:
+    if: ${{ inputs.job == 'run_trainer_and_fsdp_gpu' }}
+    name: " "
+    needs: setup
+    strategy:
+      fail-fast: false
+      matrix:
+        machine_type: [1gaudi, 2gaudi]
+        slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
+    uses: ./.github/workflows/model_jobs_intel_gaudi.yml
+    with:
+      slice_id: ${{ matrix.slice_id }}
+      machine_type: ${{ matrix.machine_type }}
+      folder_slices: ${{ needs.setup.outputs.folder_slices }}
+      runner: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
+      report_name_prefix: run_trainer_and_fsdp_gpu
+    secrets: inherit
+
+  run_pipelines_torch_gpu:
+    if: ${{ inputs.job == 'run_pipelines_torch_gpu' }}
+    name: Pipelines
+    strategy:
+      fail-fast: false
+      matrix:
+        machine_type: [1gaudi, 2gaudi]
+    runs-on:
+      group: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
+    container:
+      image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
+      options: --runtime=habana
+        -v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
+        --env OMPI_MCA_btl_vader_single_copy_mechanism=none
+        --env HABANA_VISIBLE_DEVICES
+        --env HABANA_VISIBLE_MODULES
+        --cap-add=sys_nice
+        --shm-size=64G
+    steps:
+      - name: Checkout
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          fetch-depth: 0
+          persist-credentials: false
+
+      - name: Install dependencies
+        run: |
+          pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn librosa soundfile
+
+      - name: HL-SMI
+        run: |
+          hl-smi
+          echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
+          echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
+
+      - name: Environment
+        run: python3 utils/print_env.py
+
+      - name: Show installed libraries and their versions
+        run: pip freeze
+
+      - name: Set `machine_type` for report and artifact names
+        shell: bash
+        run: |
+          if [ "${{ matrix.machine_type }}" = "1gaudi" ]; then
+            machine_type=single-gpu
+          elif [ "${{ matrix.machine_type }}" = "2gaudi" ]; then
+            machine_type=multi-gpu
+          else
+            machine_type=${{ matrix.machine_type }}
+          fi
+          echo "machine_type=$machine_type" >> $GITHUB_ENV
+
+      - name: Run all pipeline tests on Intel Gaudi
+        run: |
+          python3 -m pytest -v --make-reports="${machine_type}_run_pipelines_torch_gpu_test_reports" tests/pipelines -m "not not_device_test"
+
+      - name: Failure short reports
+        if: ${{ failure() }}
+        continue-on-error: true
+        run: |
+          cat "reports/${machine_type}_run_pipelines_torch_gpu_test_reports/failures_short.txt"
+
+      - name: "Test suite reports artifacts: ${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports"
+        if: ${{ always() }}
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: ${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports
+          path: reports/${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports
+
+  run_examples_gpu:
+    if: ${{ inputs.job == 'run_examples_gpu' }}
+    name: Examples directory
+    strategy:
+      fail-fast: false
+      matrix:
+        machine_type: [1gaudi]
+    runs-on:
+      group: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
+    container:
+      image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
+      options: --runtime=habana
+        -v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
+        --env OMPI_MCA_btl_vader_single_copy_mechanism=none
+        --env HABANA_VISIBLE_DEVICES
+        --env HABANA_VISIBLE_MODULES
+        --cap-add=sys_nice
+        --shm-size=64G
+    steps:
+      - name: Checkout
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          fetch-depth: 0
+          persist-credentials: false
+
+      - name: Install dependencies
+        run: |
+          pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn librosa soundfile
+
+      - name: HL-SMI
+        run: |
+          hl-smi
+          echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
+          echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
+
+      - name: Environment
+        run: |
+          python3 utils/print_env.py
+
+      - name: Show installed libraries and their versions
+        run: |
+          pip freeze
+
+      - name: Set `machine_type` for report and artifact names
+        shell: bash
+        run: |
+          if [ "${{ matrix.machine_type }}" = "1gaudi" ]; then
+            machine_type=single-gpu
+          elif [ "${{ matrix.machine_type }}" = "2gaudi" ]; then
+            machine_type=multi-gpu
+          else
+            machine_type=${{ matrix.machine_type }}
+          fi
+          echo "machine_type=$machine_type" >> $GITHUB_ENV
+
+      - name: Run examples tests on Intel Gaudi
+        run: |
+          pip install -r examples/pytorch/_tests_requirements.txt
+          python3 -m pytest -v --make-reports="${machine_type}_run_examples_gpu_test_reports" examples/pytorch -m "not not_device_test"
+
+      - name: Failure short reports
+        if: ${{ failure() }}
+        continue-on-error: true
+        run: |
+          cat "reports/${machine_type}_run_examples_gpu_test_reports/failures_short.txt"
+
+      - name: "Test suite reports artifacts: ${{ env.machine_type }}_run_examples_gpu_test_reports"
+        if: ${{ always() }}
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: ${{ env.machine_type }}_run_examples_gpu_test_reports
+          path: reports/${{ env.machine_type }}_run_examples_gpu_test_reports
+
+  run_torch_cuda_extensions_gpu:
+    if: ${{ inputs.job == 'run_torch_cuda_extensions_gpu' }}
+    name: Intel Gaudi deepspeed tests
+    strategy:
+      fail-fast: false
+      matrix:
+        machine_type: [1gaudi, 2gaudi]
+    runs-on:
+      group: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
+    container:
+      image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
+      options: --runtime=habana
+        -v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
+        --env OMPI_MCA_btl_vader_single_copy_mechanism=none
+        --env HABANA_VISIBLE_DEVICES
+        --env HABANA_VISIBLE_MODULES
+        --cap-add=sys_nice
+        --shm-size=64G
+    steps:
+      - name: Checkout
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          fetch-depth: 0
+          persist-credentials: false
+
+      - name: Install dependencies
+        run: |
+          pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn librosa soundfile
+          pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.20.0
+
+      - name: HL-SMI
+        run: |
+          hl-smi
+          echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
+          echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
+
+      - name: Environment
+        run: |
+          python3 utils/print_env.py
+
+      - name: Show installed libraries and their versions
+        run: |
+          pip freeze
+
+      - name: Set `machine_type` for report and artifact names
+        shell: bash
+        run: |
+          if [ "${{ matrix.machine_type }}" = "1gaudi" ]; then
+            machine_type=single-gpu
+          elif [ "${{ matrix.machine_type }}" = "2gaudi" ]; then
+            machine_type=multi-gpu
+          else
+            machine_type=${{ matrix.machine_type }}
+          fi
+          echo "machine_type=$machine_type" >> $GITHUB_ENV
+
+      - name: Run all deepspeed tests on intel Gaudi
+        run: |
+          python3 -m pytest -v --make-reports="${machine_type}_run_torch_cuda_extensions_gpu_test_reports" tests/deepspeed -m "not not_device_test"
+
+      - name: Failure short reports
+        if: ${{ failure() }}
+        continue-on-error: true
+        run: |
+          cat "reports/${machine_type}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt"
+
+      - name: "Test suite reports artifacts: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
+        if: ${{ always() }}
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
+          path: reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
+
+  send_results:
+    name: Slack Report
+    needs:
+      [
+        setup,
+        run_models_gpu,
+        run_examples_gpu,
+        run_torch_cuda_extensions_gpu,
+        run_pipelines_torch_gpu,
+        run_trainer_and_fsdp_gpu,
+      ]
+    if: ${{ always() }}
+    uses: ./.github/workflows/slack-report.yml
+    with:
+      job: ${{ inputs.job }}
+      setup_status: ${{ needs.setup.result }}
+      slack_report_channel: ${{ inputs.slack_report_channel }}
+      quantization_matrix: ${{ needs.setup.outputs.quantization_matrix }}
+      folder_slices: ${{ needs.setup.outputs.folder_slices }}
+      report_repo_id: ${{ inputs.report_repo_id }}
+      ci_event: ${{ inputs.ci_event }}
+
+    secrets: inherit
--- a/.github/workflows/self-scheduled-intel-gaudi3-caller.yml
+++ b/.github/workflows/self-scheduled-intel-gaudi3-caller.yml
@@ -0,0 +1,70 @@
+name: Self-hosted runner (Intel Gaudi3 scheduled CI caller)
+
+on:
+  repository_dispatch:
+  workflow_dispatch:
+  schedule:
+    - cron: "17 2 * * *"
+
+permissions:
+  contents: read
+
+jobs:
+  model-ci:
+    name: Model CI
+    uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
+    with:
+      job: run_models_gpu
+      ci_event: Scheduled CI (Intel) - Gaudi3
+      runner_scale_set: itac-bm-emr-gaudi3-dell
+      slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
+      report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
+
+    secrets: inherit
+
+  pipeline-ci:
+    name: Pipeline CI
+    uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
+    with:
+      job: run_pipelines_torch_gpu
+      ci_event: Scheduled CI (Intel) - Gaudi3
+      runner_scale_set: itac-bm-emr-gaudi3-dell
+      slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
+      report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
+
+    secrets: inherit
+
+  example-ci:
+    name: Example CI
+    uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
+    with:
+      job: run_examples_gpu
+      ci_event: Scheduled CI (Intel) - Gaudi3
+      runner_scale_set: itac-bm-emr-gaudi3-dell
+      slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
+      report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
+
+    secrets: inherit
+
+  deepspeed-ci:
+    name: DeepSpeed CI
+    uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
+    with:
+      job: run_torch_cuda_extensions_gpu
+      ci_event: Scheduled CI (Intel) - Gaudi3
+      runner_scale_set: itac-bm-emr-gaudi3-dell
+      slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
+      report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
+
+    secrets: inherit
+
+  trainer-fsdp-ci:
+    name: Trainer/FSDP CI
+    uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
+    with:
+      job: run_trainer_and_fsdp_gpu
+      ci_event: Scheduled CI (Intel) - Gaudi3
+      runner_scale_set: itac-bm-emr-gaudi3-dell
+      slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
+      report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
+    secrets: inherit
--- a/.github/workflows/self-scheduled.yml
+++ b/.github/workflows/self-scheduled.yml
@@ -0,0 +1,678 @@
+name: Nvidia CI (job definitions)
+
+# Note that each job's dependencies go into a corresponding docker file.
+#
+# For example for `run_torch_cuda_extensions_gpu` the docker image is
+# `huggingface/transformers-pytorch-deepspeed-latest-gpu`, which can be found at
+# `docker/transformers-pytorch-deepspeed-latest-gpu/Dockerfile`
+
+on:
+  workflow_call:
+    inputs:
+      job:
+        required: true
+        type: string
+      slack_report_channel:
+        required: true
+        type: string
+      docker:
+        required: true
+        type: string
+      ci_event:
+        required: true
+        type: string
+      working-directory-prefix:
+        default: ''
+        required: false
+        type: string
+      report_repo_id:
+        required: true
+        type: string
+      commit_sha:
+        required: false
+        type: string
+      runner_type:
+        required: false
+        type: string
+      subdirs:
+        default: ""
+        required: false
+        type: string
+      pytest_marker:
+        required: false
+        type: string
+      pr_number:
+        required: false
+        type: string
+    outputs:
+      is_infrastructure_ok:
+        description: "Whether the CI infrastructure (slack reporting and failure checking) succeeded"
+        value: ${{ jobs.send_results.outputs.is_slack_reporting_job_ok == 'true' && jobs.check_new_failures.outputs.is_check_failures_ok == 'true' }}
+
+
+env:
+  HF_HOME: /mnt/cache
+  TRANSFORMERS_IS_CI: yes
+  OMP_NUM_THREADS: 8
+  MKL_NUM_THREADS: 8
+  RUN_SLOW: yes
+  # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
+  # This token is created under the bot `hf-transformers-bot`.
+  HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
+  TF_FORCE_GPU_ALLOW_GROWTH: true
+  CUDA_VISIBLE_DEVICES: 0,1
+
+permissions:
+  contents: read
+
+jobs:
+  setup:
+    name: Setup
+    if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu", "run_quantization_torch_gpu"]'), inputs.job)
+    strategy:
+      matrix:
+        machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
+    runs-on:
+      group: '${{ matrix.machine_type }}'
+    container:
+      image: huggingface/transformers-all-latest-gpu
+      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
+    outputs:
+      folder_slices: ${{ steps.set-matrix.outputs.folder_slices }}
+      slice_ids: ${{ steps.set-matrix.outputs.slice_ids }}
+      quantization_matrix: ${{ steps.set-matrix-quantization.outputs.quantization_matrix }}
+    steps:
+      - name: Update clone
+        working-directory: /transformers
+        env:
+          commit_sha: ${{ inputs.commit_sha || github.sha }}
+        run: |
+          git fetch origin $commit_sha
+          git fetch && git checkout $commit_sha
+
+      - name: Cleanup
+        working-directory: /transformers
+        run: |
+          rm -rf tests/__pycache__
+          rm -rf tests/models/__pycache__
+          rm -rf reports
+
+      - name: Show installed libraries and their versions
+        working-directory: /transformers
+        run: pip freeze
+
+      - id: set-matrix
+        if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu"]'), inputs.job)
+        name: Identify models to test
+        working-directory: /transformers/tests
+        env:
+          job: ${{ inputs.job }}
+          subdirs: ${{ inputs.subdirs }}
+          NUM_SLICES: 2
+        run: |
+          if [ "$job" = "run_models_gpu" ]; then
+            python3 ../utils/split_model_tests.py --subdirs "$subdirs" --num_splits "$NUM_SLICES" > folder_slices.txt
+            echo "folder_slices=$(cat folder_slices.txt)" >> $GITHUB_OUTPUT
+            python3 -c "import ast; folder_slices = ast.literal_eval(open('folder_slices.txt').read()); open('slice_ids.txt', 'w').write(str(list(range(len(folder_slices)))))"
+            echo "slice_ids=$(cat slice_ids.txt)" >> $GITHUB_OUTPUT
+          elif [ "$job" = "run_trainer_and_fsdp_gpu" ]; then
+            echo "folder_slices=[['trainer'], ['fsdp'], ['ddp']]" >> $GITHUB_OUTPUT
+            echo "slice_ids=[0, 1, 2]" >> $GITHUB_OUTPUT
+          fi
+
+      - id: set-matrix-quantization
+        if: ${{ inputs.job == 'run_quantization_torch_gpu' }}
+        name: Identify quantization method to test
+        working-directory: /transformers/tests
+        env:
+          subdirs: ${{ inputs.subdirs || 'None' }}
+        run: |
+          echo "quantization_matrix=$(python3 -c 'import ast; import os; tests = os.getcwd(); quantization_tests = os.listdir(os.path.join(tests, "quantization")); subdirs = ast.literal_eval(os.environ["subdirs"]); quantization_tests = [x.removeprefix("quantization/") for x in subdirs] if subdirs is not None else quantization_tests; d = sorted(list(filter(os.path.isdir, [f"quantization/{x}" for x in quantization_tests]))); print(d)')" >> $GITHUB_OUTPUT
+
+      - name: NVIDIA-SMI
+        run: |
+          nvidia-smi
+
+  run_models_gpu:
+    if: ${{ inputs.job == 'run_models_gpu' }}
+    name: " "
+    needs: setup
+    strategy:
+      fail-fast: false
+      matrix:
+        machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
+        slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
+    uses: ./.github/workflows/model_jobs.yml
+    with:
+      folder_slices: ${{ needs.setup.outputs.folder_slices }}
+      machine_type: ${{ matrix.machine_type }}
+      slice_id: ${{ matrix.slice_id }}
+      docker: ${{ inputs.docker }}
+      commit_sha: ${{ inputs.commit_sha || github.sha }}
+      runner_type: ${{ inputs.runner_type }}
+      report_repo_id: ${{ inputs.report_repo_id }}
+      pytest_marker: ${{ inputs.pytest_marker }}
+    secrets: inherit
+
+  run_trainer_and_fsdp_gpu:
+    if: ${{ inputs.job == 'run_trainer_and_fsdp_gpu' }}
+    name: " "
+    needs: setup
+    strategy:
+      fail-fast: false
+      matrix:
+        machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
+        slice_id: [0, 1, 2]
+    uses: ./.github/workflows/model_jobs.yml
+    with:
+      folder_slices: ${{ needs.setup.outputs.folder_slices }}
+      machine_type: ${{ matrix.machine_type }}
+      slice_id: ${{ matrix.slice_id }}
+      docker: ${{ inputs.docker }}
+      commit_sha: ${{ inputs.commit_sha || github.sha }}
+      runner_type: ${{ inputs.runner_type }}
+      report_repo_id: ${{ inputs.report_repo_id }}
+      report_name_prefix: run_trainer_and_fsdp_gpu
+    secrets: inherit
+
+  run_pipelines_torch_gpu:
+    if: ${{ inputs.job == 'run_pipelines_torch_gpu' }}
+    name: PyTorch pipelines
+    strategy:
+      fail-fast: false
+      matrix:
+        machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
+    runs-on:
+      group: '${{ matrix.machine_type }}'
+    container:
+      image: huggingface/transformers-all-latest-gpu
+      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
+    steps:
+      - name: Update clone
+        working-directory: /transformers
+        env:
+          commit_sha: ${{ inputs.commit_sha || github.sha }}
+        run: git fetch && git checkout "$commit_sha"
+
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
+      - name: NVIDIA-SMI
+        run: |
+          nvidia-smi
+
+      - name: Environment
+        working-directory: /transformers
+        run: |
+          python3 utils/print_env.py
+
+      - name: Show installed libraries and their versions
+        working-directory: /transformers
+        run: pip freeze
+
+      - name: Set `machine_type` for report and artifact names
+        working-directory: /transformers
+        shell: bash
+        env:
+          matrix_machine_type: ${{ matrix.machine_type }}
+        run: |
+          echo "$matrix_machine_type"
+
+          if [ "$matrix_machine_type" = "aws-g5-4xlarge-cache" ]; then
+            machine_type=single-gpu
+          elif [ "$matrix_machine_type" = "aws-g5-12xlarge-cache" ]; then
+            machine_type=multi-gpu
+          else
+            machine_type="$matrix_machine_type"
+          fi
+
+          echo "$machine_type"
+          echo "machine_type=$machine_type" >> $GITHUB_ENV
+
+      - name: Run all pipeline tests on GPU
+        working-directory: /transformers
+        run: |
+          python3 -m pytest -n 1 -v --dist=loadfile --make-reports="${machine_type}_run_pipelines_torch_gpu_test_reports" tests/pipelines
+
+      - name: Failure short reports
+        if: ${{ failure() }}
+        continue-on-error: true
+        run: cat "/transformers/reports/${machine_type}_run_pipelines_torch_gpu_test_reports/failures_short.txt"
+
+      - name: "Test suite reports artifacts: ${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports"
+        if: ${{ always() }}
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: ${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports
+          path: /transformers/reports/${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports
+
+  run_examples_gpu:
+    if: ${{ inputs.job == 'run_examples_gpu' }}
+    name: Examples directory
+    strategy:
+      fail-fast: false
+      matrix:
+        machine_type: [aws-g5-4xlarge-cache]
+    runs-on:
+      group: '${{ matrix.machine_type }}'
+    container:
+      image: huggingface/transformers-all-latest-gpu
+      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
+    steps:
+      - name: Update clone
+        working-directory: /transformers
+        env:
+          commit_sha: ${{ inputs.commit_sha || github.sha }}
+        run: git fetch && git checkout "$commit_sha"
+
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
+      - name: NVIDIA-SMI
+        run: |
+          nvidia-smi
+
+      - name: Environment
+        working-directory: /transformers
+        run: |
+          python3 utils/print_env.py
+
+      - name: Show installed libraries and their versions
+        working-directory: /transformers
+        run: pip freeze
+
+      - name: Set `machine_type` for report and artifact names
+        working-directory: /transformers
+        shell: bash
+        env:
+          matrix_machine_type: ${{ matrix.machine_type }}
+        run: |
+          echo "$matrix_machine_type"
+
+          if [ "$matrix_machine_type" = "aws-g5-4xlarge-cache" ]; then
+            machine_type=single-gpu
+          elif [ "$matrix_machine_type" = "aws-g5-12xlarge-cache" ]; then
+            machine_type=multi-gpu
+          else
+            machine_type="$matrix_machine_type"
+          fi
+
+          echo "$machine_type"
+          echo "machine_type=$machine_type" >> $GITHUB_ENV
+
+      - name: Run examples tests on GPU
+        working-directory: /transformers
+        run: |
+          pip install -r examples/pytorch/_tests_requirements.txt
+          python3 -m pytest -v --make-reports="${machine_type}_run_examples_gpu_test_reports" examples/pytorch
+
+      - name: Failure short reports
+        if: ${{ failure() }}
+        continue-on-error: true
+        run: cat "/transformers/reports/${machine_type}_run_examples_gpu_test_reports/failures_short.txt"
+
+      - name: "Test suite reports artifacts: ${{ env.machine_type }}_run_examples_gpu_test_reports"
+        if: ${{ always() }}
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: ${{ env.machine_type }}_run_examples_gpu_test_reports
+          path: /transformers/reports/${{ env.machine_type }}_run_examples_gpu_test_reports
+
+  run_torch_cuda_extensions_gpu:
+    if: ${{ inputs.job == 'run_torch_cuda_extensions_gpu' }}
+    name: Torch CUDA extension tests
+    strategy:
+      fail-fast: false
+      matrix:
+        machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
+    runs-on:
+      group: '${{ matrix.machine_type }}'
+    container:
+      image: ${{ inputs.docker }}
+      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
+    steps:
+      - name: Update clone
+        working-directory: ${{ inputs.working-directory-prefix }}/transformers
+        env:
+          commit_sha: ${{ inputs.commit_sha || github.sha }}
+        run: git fetch && git checkout "$commit_sha"
+
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: ${{ inputs.working-directory-prefix }}/transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
+      - name: Update / Install some packages (for Past CI)
+        if: ${{ contains(inputs.docker, '-past-') && contains(inputs.docker, '-pytorch-') }}
+        working-directory: ${{ inputs.working-directory-prefix }}/transformers
+        run: |
+          python3 -m pip install -U datasets
+          python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
+
+      - name: Remove cached torch extensions
+        run: rm -rf /github/home/.cache/torch_extensions/
+
+      # To avoid unknown test failures
+      - name: Pre build DeepSpeed *again* (for daily CI)
+        if: ${{ contains(inputs.ci_event, 'Daily CI') }}
+        working-directory: ${{ inputs.working-directory-prefix }}/
+        run: |
+          python3 -m pip uninstall -y deepspeed
+          DS_DISABLE_NINJA=1 DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --no-build-isolation --config-settings="--build-option=build_ext" --config-settings="--build-option=-j8" --no-cache -v --disable-pip-version-check
+
+      # To avoid unknown test failures
+      - name: Pre build DeepSpeed *again* (for nightly & Past CI)
+        if: ${{ contains(inputs.ci_event, 'Nightly CI') || contains(inputs.ci_event, 'Past CI') }}
+        working-directory: ${{ inputs.working-directory-prefix }}/
+        run: |
+          python3 -m pip uninstall -y deepspeed
+          rm -rf DeepSpeed
+          git clone https://github.com/deepspeedai/DeepSpeed && cd DeepSpeed && rm -rf build
+          DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install . --no-build-isolation --config-settings="--build-option=build_ext" --config-settings="--build-option=-j8" --no-cache -v --disable-pip-version-check
+
+      - name: NVIDIA-SMI
+        run: |
+          nvidia-smi
+
+      - name: Environment
+        working-directory: ${{ inputs.working-directory-prefix }}/transformers
+        run: |
+          python3 utils/print_env.py
+
+      - name: Show installed libraries and their versions
+        working-directory: ${{ inputs.working-directory-prefix }}/transformers
+        run: pip freeze
+
+      - name: Set `machine_type` for report and artifact names
+        working-directory: ${{ inputs.working-directory-prefix }}/transformers
+        shell: bash
+        env:
+          matrix_machine_type: ${{ matrix.machine_type }}
+        run: |
+          echo "$matrix_machine_type"
+
+          if [ "$matrix_machine_type" = "aws-g5-4xlarge-cache" ]; then
+            machine_type=single-gpu
+          elif [ "$matrix_machine_type" = "aws-g5-12xlarge-cache" ]; then
+            machine_type=multi-gpu
+          else
+            machine_type="$matrix_machine_type"
+          fi
+
+          echo "$machine_type"
+          echo "machine_type=$machine_type" >> $GITHUB_ENV
+
+      - name: Run all tests on GPU
+        working-directory: ${{ inputs.working-directory-prefix }}/transformers
+        run: |
+          script -q -c "python3 -m pytest -v -rsfE --make-reports=${machine_type}_run_torch_cuda_extensions_gpu_test_reports tests/trainer/distributed/test_trainer_distributed_deepspeed.py" test_outputs.txt
+
+      - name: Failure short reports
+        if: ${{ failure() }}
+        continue-on-error: true
+        env:
+          working_directory_prefix: ${{ inputs.working-directory-prefix }}
+        run: cat "${working_directory_prefix}/transformers/reports/${machine_type}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt"
+
+      - name: "Test suite reports artifacts: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
+        if: ${{ always() }}
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
+          path: ${{ inputs.working-directory-prefix }}/transformers/reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
+
+  run_quantization_torch_gpu:
+    if: ${{ inputs.job == 'run_quantization_torch_gpu' }}
+    name: " "
+    needs: setup
+    strategy:
+      max-parallel: 4
+      fail-fast: false
+      matrix:
+        folders: ${{ fromJson(needs.setup.outputs.quantization_matrix) }}
+        machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
+    runs-on:
+      group: '${{ matrix.machine_type }}'
+    container:
+      image: huggingface/transformers-quantization-latest-gpu
+      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
+    steps:
+      - name: Echo folder ${{ matrix.folders }}
+        shell: bash
+        env:
+          matrix_folders_raw: ${{ matrix.folders }}
+        run: |
+          echo "$matrix_folders_raw"
+          matrix_folders="${matrix_folders_raw/'quantization/'/'quantization_'}"
+          echo "$matrix_folders"
+          echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
+
+      - name: Update clone
+        working-directory: /transformers
+        env:
+          commit_sha: ${{ inputs.commit_sha || github.sha }}
+        run: git fetch origin "$commit_sha" && git checkout "$commit_sha"
+
+      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)
+        working-directory: /transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
+
+      - name: NVIDIA-SMI
+        run: |
+          nvidia-smi
+
+      - name: Environment
+        working-directory: /transformers
+        run: |
+          python3 utils/print_env.py
+
+      - name: Show installed libraries and their versions
+        working-directory: /transformers
+        run: pip freeze
+
+      - name: Set `machine_type` for report and artifact names
+        working-directory: /transformers
+        shell: bash
+        env:
+          matrix_machine_type: ${{ matrix.machine_type }}
+        run: |
+          echo "$matrix_machine_type"
+
+          if [ "$matrix_machine_type" = "aws-g5-4xlarge-cache" ]; then
+            machine_type=single-gpu
+          elif [ "$matrix_machine_type" = "aws-g5-12xlarge-cache" ]; then
+            machine_type=multi-gpu
+          else
+            machine_type="$matrix_machine_type"
+          fi
+
+          echo "$machine_type"
+          echo "machine_type=$machine_type" >> $GITHUB_ENV
+
+      - name: Run quantization tests on GPU
+        working-directory: /transformers
+        env:
+          folders: ${{ matrix.folders }}
+        run: |
+          python3 -m pytest -v --make-reports="${machine_type}_run_quantization_torch_gpu_${matrix_folders}_test_reports" tests/${folders}
+
+      - name: Failure short reports
+        if: ${{ failure() }}
+        continue-on-error: true
+        run: cat "/transformers/reports/${machine_type}_run_quantization_torch_gpu_${matrix_folders}_test_reports/failures_short.txt"
+
+      - name: "Test suite reports artifacts: ${{ env.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports"
+        if: ${{ always() }}
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: ${{ env.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports
+          path: /transformers/reports/${{ env.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports
+
+  run_kernels_gpu:
+    if: ${{ inputs.job == 'run_kernels_gpu' }}
+    name: Kernel tests
+    strategy:
+      fail-fast: false
+      matrix:
+        machine_type: [aws-g5-4xlarge-cache]
+    runs-on:
+      group: '${{ matrix.machine_type }}'
+    container:
+      image: ${{ inputs.docker }}
+      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
+    steps:
+      - name: Update clone
+        working-directory: /transformers
+        env:
+          commit_sha: ${{ inputs.commit_sha || github.sha }}
+        run: git fetch && git checkout "$commit_sha"
+
+      - name: Reinstall transformers in edit mode
+        working-directory: /transformers
+        run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .[testing]
+  
+      - name: Install kernels
+        working-directory: /transformers
+        run: python3 -m pip install -U kernels
+  
+      - name: NVIDIA-SMI
+        run: nvidia-smi
+
+      - name: Environment
+        working-directory: /transformers
+        run: python3 utils/print_env.py
+
+      - name: Show installed libraries and their versions
+        working-directory: /transformers
+        run: pip freeze
+
+      - name: Set `machine_type` for report and artifact names
+        working-directory: /transformers
+        shell: bash
+        env:
+          matrix_machine_type: ${{ matrix.machine_type }}
+        run: |
+          echo "$matrix_machine_type"
+
+          if [ "$matrix_machine_type" = "aws-g5-4xlarge-cache" ]; then
+            machine_type=single-gpu
+          elif [ "$matrix_machine_type" = "aws-g5-12xlarge-cache" ]; then
+            machine_type=multi-gpu
+          else
+            machine_type="$matrix_machine_type"
+          fi
+
+          echo "$machine_type"
+          echo "machine_type=$machine_type" >> $GITHUB_ENV
+    
+      - name: Run kernel tests on GPU
+        working-directory: /transformers
+        run: |
+          python3 -m pytest -v --make-reports="${machine_type}_run_kernels_gpu_test_reports" tests/kernels/test_kernels.py
+
+      - name: Failure short reports
+        if: ${{ failure() }}
+        continue-on-error: true
+        run: cat "/transformers/reports/${machine_type}_run_kernels_gpu_test_reports/failures_short.txt"
+
+      - name: "Test suite reports artifacts: ${{ env.machine_type }}_run_kernels_gpu_test_reports"
+        if: ${{ always() }}
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: ${{ env.machine_type }}_run_kernels_gpu_test_reports
+          path: /transformers/reports/${{ env.machine_type }}_run_kernels_gpu_test_reports
+
+  run_extract_warnings:
+    # Let's only do this for the job `run_models_gpu` to simplify the (already complex) logic.
+    if: ${{ always() && inputs.job == 'run_models_gpu' }}
+    name: Extract warnings in CI artifacts
+    runs-on: ubuntu-22.04
+    needs: [setup, run_models_gpu]
+    steps:
+      # Checkout in order to run `utils/extract_warnings.py`. Avoid **explicit** checkout (i.e. don't specify `ref`) for
+      # security reason.
+      - name: Checkout transformers
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+
+      - name: Install transformers
+        run: pip install transformers
+
+      - name: Show installed libraries and their versions
+        run: pip freeze
+
+      - name: Create output directory
+        run: mkdir warnings_in_ci
+
+      - uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
+        env:
+          ACTIONS_ARTIFACT_MAX_ARTIFACT_COUNT: 2000
+        with:
+          path: warnings_in_ci
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Show artifacts
+        run: echo "$(python3 -c 'import os; d = os.listdir(); print(d)')"
+        working-directory: warnings_in_ci
+
+      - name: Extract warnings in CI artifacts
+        env:
+          github_run_id: ${{ github.run_id }}
+          access_token: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
+        run: |
+          python3 utils/extract_warnings.py --workflow_run_id "$github_run_id" --output_dir warnings_in_ci --token "$access_token" --from_gh
+          echo "$(python3 -c 'import os; import json; fp = open("warnings_in_ci/selected_warnings.json"); d = json.load(fp); d = "\n".join(d); print(d)')"
+
+      - name: Upload artifact
+        if: ${{ always() }}
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: warnings_in_ci
+          path: warnings_in_ci/selected_warnings.json
+
+  send_results:
+    name: Slack Report
+    needs: [
+      setup,
+      run_models_gpu,
+      run_trainer_and_fsdp_gpu,
+      run_pipelines_torch_gpu,
+      run_examples_gpu,
+      run_torch_cuda_extensions_gpu,
+      run_quantization_torch_gpu,
+      run_kernels_gpu,
+      run_extract_warnings
+    ]
+    if: always() && !cancelled()
+    uses: ./.github/workflows/slack-report.yml
+    with:
+      job: ${{ inputs.job }}
+      # This would be `skipped` if `setup` is skipped.
+      setup_status: ${{ needs.setup.result }}
+      slack_report_channel: ${{ inputs.slack_report_channel }}
+      # This would be an empty string if `setup` is skipped.
+      folder_slices: ${{ needs.setup.outputs.folder_slices }}
+      quantization_matrix: ${{ needs.setup.outputs.quantization_matrix }}
+      ci_event: ${{ inputs.ci_event }}
+      report_repo_id: ${{ inputs.report_repo_id }}
+      commit_sha: ${{ inputs.commit_sha || github.sha }}
+
+    secrets: inherit
+
+  check_new_failures:
+    if: ${{ always() && needs.send_results.result == 'success' }}
+    name: Check new failures
+    needs: send_results
+    uses: ./.github/workflows/check_failed_tests.yml
+    with:
+      docker: ${{ inputs.docker }}
+      commit_sha: ${{ inputs.commit_sha || github.sha }}
+      job: ${{ inputs.job }}
+      slack_report_channel: ${{ inputs.slack_report_channel }}
+      ci_event: ${{ inputs.ci_event }}
+      report_repo_id: ${{ inputs.report_repo_id }}
+      pr_number: ${{ inputs.pr_number }}
+
+    secrets: inherit
--- a/.github/workflows/slack-report.yml
+++ b/.github/workflows/slack-report.yml
@@ -0,0 +1,120 @@
+name: CI slack report
+
+on:
+  workflow_call:
+    inputs:
+      job:
+        required: true
+        type: string
+      slack_report_channel:
+        required: true
+        type: string
+      setup_status:
+        required: true
+        type: string
+      folder_slices:
+        required: true
+        type: string
+      quantization_matrix:
+        required: true
+        type: string
+      ci_event:
+        required: true
+        type: string
+      report_repo_id:
+        required: true
+        type: string
+      commit_sha:
+        required: false
+        type: string
+    outputs:
+      is_slack_reporting_job_ok:
+        description: "Whether the send_results job succeeded (not failed)"
+        value: ${{ jobs.send_results.result != 'failure' }}
+
+env:
+  TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
+
+permissions:
+  contents: read
+
+jobs:
+  send_results:
+    name: Send results to webhook
+    runs-on: ubuntu-22.04
+    if: always() && !cancelled()
+    steps:
+      - name: Preliminary job status
+        shell: bash
+        # For the meaning of these environment variables, see the job `Setup`
+        env:
+          setup_status: ${{ inputs.setup_status }}
+        run: |
+          echo "Setup status: $setup_status"
+
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          fetch-depth: 2
+          # Security: checkout to the `main` branch for untrusted triggers (issue_comment, pull_request_target), otherwise use the specified ref
+          ref: ${{ (github.event_name == 'issue_comment' || github.event_name == 'pull_request_target') && 'main' || (inputs.commit_sha || github.sha) }}
+          persist-credentials: false
+
+      - uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
+        env:
+          ACTIONS_ARTIFACT_MAX_ARTIFACT_COUNT: 2000
+        with:
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Prepare some setup values
+        run: |
+          if [ -f setup_values/prev_workflow_run_id.txt ]; then
+            echo "PREV_WORKFLOW_RUN_ID=$(cat setup_values/prev_workflow_run_id.txt)" >> $GITHUB_ENV
+          else
+            echo "PREV_WORKFLOW_RUN_ID=" >> $GITHUB_ENV
+          fi
+
+          if [ -f setup_values/other_workflow_run_id.txt ]; then
+            echo "OTHER_WORKFLOW_RUN_ID=$(cat setup_values/other_workflow_run_id.txt)" >> $GITHUB_ENV
+          else
+            echo "OTHER_WORKFLOW_RUN_ID=" >> $GITHUB_ENV
+          fi
+
+      - name: Send message to Slack
+        shell: bash
+        env:
+          CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
+          CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
+          CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
+          CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }}
+          SLACK_REPORT_CHANNEL: ${{ inputs.slack_report_channel }}
+          ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
+          CI_EVENT: ${{ inputs.ci_event }}
+          # This `CI_TITLE` would be empty for `schedule` or `workflow_run` events.
+          CI_TITLE: ${{ github.event.head_commit.message }}
+          CI_SHA: ${{ inputs.commit_sha || github.sha }}
+          CI_TEST_JOB: ${{ inputs.job }}
+          SETUP_STATUS: ${{ inputs.setup_status }}
+          REPORT_REPO_ID: ${{ inputs.report_repo_id }}
+          quantization_matrix: ${{ inputs.quantization_matrix }}
+          folder_slices: ${{ inputs.folder_slices }}
+        # We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change
+        # `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`.
+        # For a job that doesn't depend on (i.e. `needs`) `setup`, the value for `inputs.folder_slices` would be an
+        # empty string, and the called script still get one argument (which is the emtpy string).
+        run: |
+          pip install huggingface_hub
+          pip install slack_sdk
+          pip show slack_sdk
+          pip install .
+          if [ "$quantization_matrix" != "" ]; then
+            python utils/notification_service.py "$quantization_matrix"
+          else
+            python utils/notification_service.py "$folder_slices"
+          fi
+
+      # Upload the directory containing CI reports prepared in `notification_service.py`
+      - name: Failure table artifacts
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: ci_results_${{ inputs.job }}
+          path: ci_results_${{ inputs.job }}
--- a/.github/workflows/ssh-runner.yml
+++ b/.github/workflows/ssh-runner.yml
@@ -0,0 +1,155 @@
+name: SSH into our runners
+
+on:
+  workflow_dispatch:
+    inputs:
+      runner_type:
+        description: 'Type of runner to test (a10)'
+        required: true
+      docker_image:
+        description: 'Name of the Docker image'
+        required: true
+      num_gpus:
+        description: 'Type of the number of gpus to use (`single` or `multi`)'
+        required: true
+
+env:
+  HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
+  HF_HOME: /mnt/cache
+  TRANSFORMERS_IS_CI: yes
+  OMP_NUM_THREADS: 8
+  MKL_NUM_THREADS: 8
+  RUN_SLOW: yes # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. # This token is created under the bot `hf-transformers-bot`.
+  TF_FORCE_GPU_ALLOW_GROWTH: true
+  CUDA_VISIBLE_DEVICES: 0,1
+
+permissions:
+  contents: read
+
+jobs:
+  get_runner:
+    name: "Get runner to use"
+    runs-on: ubuntu-22.04
+    outputs:
+      RUNNER: ${{ steps.set_runner.outputs.RUNNER }}
+    steps:
+      - name: Get runner to use
+        shell: bash
+        env:
+          NUM_GPUS: ${{ github.event.inputs.num_gpus }}
+          RUNNER_TYPE: ${{ github.event.inputs.runner_type }}
+        run: |
+          if [[ "$NUM_GPUS" == "single" && "$RUNNER_TYPE" == "a10" ]]; then
+            echo "RUNNER=aws-g5-4xlarge-cache-ssh" >> $GITHUB_ENV
+          elif [[ "$NUM_GPUS" == "multi" && "$RUNNER_TYPE" == "a10" ]]; then
+            echo "RUNNER=aws-g5-12xlarge-cache-ssh" >> $GITHUB_ENV
+          else
+            echo "RUNNER=" >> $GITHUB_ENV
+          fi
+
+      - name: Set runner to use
+        id: set_runner
+        run: |
+          echo "$RUNNER"
+          echo "RUNNER=$RUNNER" >> $GITHUB_OUTPUT
+
+  ssh_runner:
+    name: "SSH"
+    needs: get_runner
+    runs-on:
+      group: ${{ needs.get_runner.outputs.RUNNER }}
+    container:
+      image: ${{ github.event.inputs.docker_image }}
+    steps:
+      - name: Update clone
+        working-directory: /transformers
+        env:
+          commit_sha: ${{ github.sha }}
+        run: |
+          git fetch && git checkout "$commit_sha"
+
+      - name: Cleanup
+        working-directory: /transformers
+        run: |
+          rm -rf tests/__pycache__
+          rm -rf tests/models/__pycache__
+          rm -rf reports
+
+      - name: Show installed libraries and their versions
+        working-directory: /transformers
+        run: pip freeze
+
+      - name: NVIDIA-SMI
+        run: |
+          nvidia-smi
+
+      - name: Create python alias
+        run: |
+          ln -sf $(which python3) /usr/local/bin/python
+          ln -sf $(which pip3) /usr/local/bin/pip
+          echo "✅ python -> python3 symlink created"
+
+      - name: Install psutil for memory monitor
+        run: |
+          pip install psutil --break-system-packages
+
+      - name: Download memory monitor script
+        working-directory: /transformers
+        run: |
+          apt-get update && apt-get install -y curl
+          curl -o memory_monitor.py https://raw.githubusercontent.com/huggingface/transformers/refs/heads/utility_scripts/utils/memory_monitor.py
+
+      - name: Start memory monitor
+        working-directory: /transformers
+        continue-on-error: true  # Don't fail workflow if monitor has issues
+        run: |
+          python3 memory_monitor.py --threshold 90 --interval 1 > memory_monitor.log 2>&1 &
+          echo $! > memory_monitor.pid
+          echo "Memory monitor started with PID $(cat memory_monitor.pid)"
+          # Give it a moment to start
+          sleep 2
+          # Verify it's running
+          ps aux | grep memory_monitor | grep -v grep || echo "Warning: memory monitor may not be running"
+
+      - name: Install utilities
+        run: |
+          apt-get install -y nano
+
+      - name: Setup automatic environment for SSH login
+        run: |
+          # Create shared environment setup
+          cat > /root/.env_setup << 'EOF'
+          # Auto-setup (non-sensitive vars)
+          export HF_HOME=/mnt/cache
+          export TRANSFORMERS_IS_CI=yes
+          export OMP_NUM_THREADS=8
+          export MKL_NUM_THREADS=8
+          export RUN_SLOW=yes
+          export TF_FORCE_GPU_ALLOW_GROWTH=true
+          export CUDA_VISIBLE_DEVICES=0,1
+          
+          cd /transformers 2>/dev/null || true
+          
+          # Remind user to set token if needed
+          if [ -z "$HF_TOKEN" ]; then
+              echo "⚠️  HF_TOKEN not set. Set it with:"
+              echo "    export HF_TOKEN=hf_xxxxx"
+          else
+              echo "✅ HF_TOKEN is set"
+          fi
+          
+          echo "📁 Working directory: $(pwd)"
+          EOF
+          
+          # Source from both .bash_profile and .bashrc
+          echo 'source /root/.env_setup' >> /root/.bash_profile
+          echo 'source /root/.env_setup' >> /root/.bashrc
+
+      - name: Tailscale # In order to be able to SSH when a test fails
+        uses: huggingface/tailscale-action@7d53c9737e53934c30290b5524d1c9b4a7c98c8a  # main
+        with:
+          authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }}
+          slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
+          slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
+          waitForSSH: true
+          sshTimeout: 15m
--- a/.github/workflows/stale.yml
+++ b/.github/workflows/stale.yml
@@ -0,0 +1,34 @@
+name: Stale Bot
+
+on:
+  schedule:
+    - cron: "0 8 * * *"
+
+permissions:
+  contents: read
+
+jobs:
+  close_stale_issues:
+    name: Close Stale Issues
+    if: github.repository == 'huggingface/transformers'
+    runs-on: ubuntu-22.04
+    permissions:
+      issues: write
+    env:
+      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+    steps:
+    - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+      with:
+        persist-credentials: false
+
+    - name: Setup Python
+      uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
+      with:
+        python-version: 3.8
+
+    - name: Install requirements
+      run: |
+        pip install PyGithub
+    - name: Close stale issues
+      run: |
+        python scripts/stale.py
--- a/.github/workflows/trl-ci-bot.yml
+++ b/.github/workflows/trl-ci-bot.yml
@@ -0,0 +1,97 @@
+# This workflow allows trusted contributors to trigger TRL CI runs against
+# specific Transformers commits by commenting `/trl-ci` on a PR in the TRL repo.
+# It is meant to be used during the ongoing Trainer refactor/unbloat in Transformers,
+# to help evaluate the downstream impact on TRL.
+name: TRL CI bot
+
+on:
+  issue_comment:
+    types: [created]
+
+permissions:
+  contents: read
+  pull-requests: read
+  issues: read
+
+jobs:
+  dispatch:
+    if: >
+      github.event.issue.pull_request &&
+      contains(github.event.comment.body, '/trl-ci')
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Gate on trusted commenter
+        id: trust
+        run: |
+          assoc="${{ github.event.comment.author_association }}"
+          case "$assoc" in
+            MEMBER|OWNER|COLLABORATOR) echo "trusted=true" >> $GITHUB_OUTPUT ;;
+            *) echo "trusted=false" >> $GITHUB_OUTPUT ;;
+          esac
+
+      - name: Reject untrusted commenter
+        if: steps.trust.outputs.trusted != 'true'
+        run: |
+          echo "::error::Untrusted commenter (${{ github.event.comment.author_association }}); aborting."
+          exit 1
+
+      - name: Fetch PR head SHA + number
+        if: steps.trust.outputs.trusted == 'true'
+        id: pr
+        env:
+          GH_TOKEN: ${{ github.token }}
+          PR_URL: ${{ github.event.issue.pull_request.url }}
+        run: |
+          sha=$(gh api "$PR_URL" --jq .head.sha)
+          number=$(gh api "$PR_URL" --jq .number)
+          echo "sha=$sha" >> "$GITHUB_OUTPUT"
+          echo "number=$number" >> "$GITHUB_OUTPUT"
+
+      - name: Dispatch TRL workflow
+        if: steps.trust.outputs.trusted == 'true'
+        id: dispatch
+        env:
+          GH_TOKEN: ${{ secrets.TRL_CI_DISPATCH_TOKEN }}
+          STEPS_PR_OUTPUTS_SHA: ${{ steps.pr.outputs.sha }}
+        run: |
+          gh workflow run "Tests against Transformers branch" \
+            -R huggingface/trl \
+            -f transformers_ref=${STEPS_PR_OUTPUTS_SHA}
+
+      - name: Find TRL workflow run URL
+        if: steps.trust.outputs.trusted == 'true'
+        id: find_run
+        env:
+          GH_TOKEN: ${{ secrets.TRL_CI_DISPATCH_TOKEN }}
+        run: |
+          echo "Waiting for workflow to appear..."
+          for i in {1..10}; do
+            run_url=$(gh api \
+              repos/huggingface/trl/actions/workflows \
+              --jq '.workflows[] | select(.name=="Tests against Transformers branch") | .id')
+
+            if [ -n "$run_url" ]; then
+              run=$(gh api \
+                repos/huggingface/trl/actions/workflows/$run_url/runs \
+                -f event=workflow_dispatch \
+                -f per_page=5 \
+                --jq '.workflow_runs[0].html_url')
+              if [ -n "$run" ]; then
+                echo "url=$run" >> $GITHUB_OUTPUT
+                break
+              fi
+            fi
+            sleep 5
+          done
+
+      - name: Comment back on PR with link
+        if: steps.trust.outputs.trusted == 'true'
+        env:
+          GH_TOKEN: ${{ github.token }}
+          STEPS_PR_OUTPUTS_SHA: ${{ steps.pr.outputs.sha }}
+          STEPS_FIND_RUN_OUTPUTS_URL: ${{ steps.find_run.outputs.url }}
+        run: |
+          gh api -X POST \
+            /repos/${{ github.repository }}/issues/${{ github.event.issue.number }}/comments \
+            -f body="🚀 **TRL CI triggered** against transformers commit \`${STEPS_PR_OUTPUTS_SHA}\`\n\n🔗 **Run:** ${STEPS_FIND_RUN_OUTPUTS_URL}"
--- a/.github/workflows/trufflehog.yml
+++ b/.github/workflows/trufflehog.yml
@@ -0,0 +1,21 @@
+on:
+  push:
+
+name: Secret Leaks
+
+permissions:
+  contents: read
+
+jobs:
+  trufflehog:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          fetch-depth: 0
+          persist-credentials: false
+      - name: Secret Scanning
+        uses: trufflesecurity/trufflehog@6bd2d14f7a4bc1e569fa3550efa7ec632a4fa67b  # main
+        with:
+          extra_args: --results=verified,unknown
--- a/.github/workflows/update_metdata.yml
+++ b/.github/workflows/update_metdata.yml
@@ -0,0 +1,34 @@
+name: Update Transformers metadata
+
+on:
+  push:
+    branches:
+      - main
+      - update_transformers_metadata*
+
+permissions:
+  contents: read
+
+jobs:
+  build_and_package:
+    runs-on: ubuntu-22.04
+    defaults:
+      run:
+        shell: bash -l {0}
+
+    steps:
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
+        with:
+          persist-credentials: false
+
+      - name: Setup environment
+        run: |
+          pip install --upgrade pip
+          pip install datasets pandas
+          pip install .[torch]
+
+      - name: Update metadata
+        env:
+          HF_TOKEN: ${{ secrets.LYSANDRE_HF_TOKEN }}
+        run: |
+          python utils/update_metadata.py --token "$HF_TOKEN" --commit_sha ${{ github.sha }}
--- a/.github/workflows/upload_pr_documentation.yml
+++ b/.github/workflows/upload_pr_documentation.yml
@@ -0,0 +1,19 @@
+name: Upload PR Documentation
+
+on:
+  workflow_run:
+    workflows: ["Build PR Documentation"]
+    types:
+      - completed
+
+permissions:
+  contents: read
+
+jobs:
+  build:
+    uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@9ad2de8582b56c017cb530c1165116d40433f1c6  # main
+    with:
+      package_name: transformers
+    secrets:
+      hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
+      comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,185 @@
+# Initially taken from Github's Python gitignore file
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# tests and logs
+tests/fixtures/cached_*_text.txt
+logs/
+lightning_logs/
+lang_code_data/
+reports/
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+.venv*
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# vscode
+.vs
+.vscode
+
+# Pycharm
+.idea
+
+# TF code
+tensorflow_code
+
+# Models
+proc_data
+
+# examples
+runs
+/runs_old
+/wandb
+/examples/runs
+/examples/**/*.args
+/examples/rag/sweep
+
+# data
+/data
+serialization_dir
+
+# emacs
+*.*~
+debug.env
+
+# vim
+.*.swp
+
+#ctags
+tags
+
+# pre-commit
+.pre-commit*
+
+# .lock
+*.lock
+
+# DS_Store (MacOS)
+.DS_Store
+
+# ruff
+.ruff_cache
+
+# checkers cache
+utils/.checkers_cache.json
+
+# modular conversion
+*.modular_backup
+
+# Cursor IDE files
+.cursor/
+test-results/
+
+# AI agent local setup artifacts
+/.agents/skills
+/.claude/skills
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1 @@
+.ai/AGENTS.md
--- a/CITATION.cff
+++ b/CITATION.cff
@@ -0,0 +1,82 @@
+cff-version: "1.2.0"
+date-released: 2020-10
+message: "If you use this software, please cite it using these metadata."
+title: "Transformers: State-of-the-Art Natural Language Processing"
+url: "https://github.com/huggingface/transformers"
+authors: 
+  - family-names: Wolf
+    given-names: Thomas
+  - family-names: Debut
+    given-names: Lysandre
+  - family-names: Sanh
+    given-names: Victor
+  - family-names: Chaumond
+    given-names: Julien
+  - family-names: Delangue
+    given-names: Clement
+  - family-names: Moi
+    given-names: Anthony
+  - family-names: Cistac
+    given-names: Perric
+  - family-names: Ma
+    given-names: Clara
+  - family-names: Jernite
+    given-names: Yacine
+  - family-names: Plu
+    given-names: Julien
+  - family-names: Xu
+    given-names: Canwen
+  - family-names: "Le Scao"
+    given-names: Teven
+  - family-names: Gugger
+    given-names: Sylvain
+  - family-names: Drame
+    given-names: Mariama
+  - family-names: Lhoest
+    given-names: Quentin
+  - family-names: Rush
+    given-names: "Alexander M."
+preferred-citation:
+  type: conference-paper
+  authors:
+  - family-names: Wolf
+    given-names: Thomas
+  - family-names: Debut
+    given-names: Lysandre
+  - family-names: Sanh
+    given-names: Victor
+  - family-names: Chaumond
+    given-names: Julien
+  - family-names: Delangue
+    given-names: Clement
+  - family-names: Moi
+    given-names: Anthony
+  - family-names: Cistac
+    given-names: Perric
+  - family-names: Ma
+    given-names: Clara
+  - family-names: Jernite
+    given-names: Yacine
+  - family-names: Plu
+    given-names: Julien
+  - family-names: Xu
+    given-names: Canwen
+  - family-names: "Le Scao"
+    given-names: Teven
+  - family-names: Gugger
+    given-names: Sylvain
+  - family-names: Drame
+    given-names: Mariama
+  - family-names: Lhoest
+    given-names: Quentin
+  - family-names: Rush
+    given-names: "Alexander M."
+  booktitle: "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations"
+  month: 10
+  start: 38
+  end: 45
+  title: "Transformers: State-of-the-Art Natural Language Processing"
+  year: 2020
+  publisher: "Association for Computational Linguistics"
+  url: "https://aclanthology.org/2020.emnlp-demos.6/"
+  address: "Online"
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1 @@
+.ai/AGENTS.md
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@@ -0,0 +1,133 @@
+
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity and expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, caste, color, religion, or sexual
+identity and orientation.
+
+We pledge to act and interact in ways that contribute to an open, welcoming,
+diverse, inclusive, and healthy community.
+
+## Our Standards
+
+Examples of behavior that contributes to a positive environment for our
+community include:
+
+* Demonstrating empathy and kindness toward other people
+* Being respectful of differing opinions, viewpoints, and experiences
+* Giving and gracefully accepting constructive feedback
+* Accepting responsibility and apologizing to those affected by our mistakes,
+  and learning from the experience
+* Focusing on what is best not just for us as individuals, but for the overall
+  community
+
+Examples of unacceptable behavior include:
+
+* The use of sexualized language or imagery, and sexual attention or advances of
+  any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or email address,
+  without their explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Enforcement Responsibilities
+
+Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
+Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
+
+## Scope
+
+This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official e-mail address,
+posting via an official social media account, or acting as an appointed
+representative at an online or offline event.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement at
+feedback@huggingface.co.
+All complaints will be reviewed and investigated promptly and fairly.
+
+All community leaders are obligated to respect the privacy and security of the
+reporter of any incident.
+
+## Enforcement Guidelines
+
+Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
+
+### 1. Correction
+
+**Community Impact**: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
+**Consequence**: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
+
+### 2. Warning
+
+**Community Impact**: A violation through a single incident or series of
+actions.
+
+**Consequence**: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or permanent
+ban.
+
+### 3. Temporary Ban
+
+**Community Impact**: A serious violation of community standards, including
+sustained inappropriate behavior.
+
+**Consequence**: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
+
+### 4. Permanent Ban
+
+**Community Impact**: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior, harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
+**Consequence**: A permanent ban from any sort of public interaction within the
+community.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 2.1, available at
+[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
+
+Community Impact Guidelines were inspired by
+[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
+
+For answers to common questions about this code of conduct, see the FAQ at
+[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
+[https://www.contributor-covenant.org/translations][translations].
+
+[homepage]: https://www.contributor-covenant.org
+[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
+[Mozilla CoC]: https://github.com/mozilla/diversity
+[FAQ]: https://www.contributor-covenant.org/faq
+[translations]: https://www.contributor-covenant.org/translations
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -0,0 +1,232 @@
+<!---
+Copyright 2020 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Contribute to 🤗 Transformers
+
+> [!WARNING]
+> The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
+> code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
+> **we ask that new users do not submit pure code agent PRs** at this time.
+> You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous agents
+> not to open any PRs or issues for the moment.
+>
+> PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
+> repeatedly or maliciously.
+
+<details>
+
+<summary> Our code agent philosophy in detail </summary>
+
+We understand that code agents are extremely powerful tools, and many people at Hugging Face use them in their work.
+However, it's important to realize that **if you simply run a code agent
+and generate a PR to an open-source project, you are merely a middleman between the reviewers and the agent**. 
+Although doing this creates something that looks very much like a useful contribution, in reality there was no reason 
+for you to be involved; the reviewers could have simply run the code agent themselves.
+
+If you want to contribute usefully to open-source in the agent era, **you need to do things that agents can't do on
+their own**. In particular, we've found the following to be very helpful:
+- Clear diagnosis of bugs. Code agents like to quickly fix problems with a workaround that often causes code bloat
+or incompatibilities with other models. Spending time to track down the exact cause of a problem, and in particular
+locating the first commit where it appeared (for example with [git bisect](https://git-scm.com/docs/git-bisect)) is valuable.
+- Minimize the diff. Check your PR to eliminate any unnecessary changes. Ensure that you did not commit any testing scripts
+or unrelated files. Add comments only if they're really necessary; code agents love adding three new functions and
+multi-line comments to draw attention to all the hard work they did. If your PR can be a 1-line fix,
+make it a 1-line fix. This makes the PR much easier to review and improves the chances that it will be accepted.
+- Take the time to reproduce the problem. Very often when a user reports an issue, the issue is actually caused by
+environment issues on their machine, or they misdiagnose the problem and suggest an invalid solution. Many code agents
+trust the user comments too much, which results in bad solutions, sometimes for problems that 
+do not exist! Writing a simple reproducer script and running it to make sure you see the problem is valuable.
+- Compare against other models. The Transformers repo is very large, and many models are doing similar things. When
+fixing a bug, it's valuable to see if the bug exists in other models. If your PR says
+"fixed by using the same approach as (other model)", with a link to the relevant code, that is very helpful for maintainers,
+because it tells us that the fix is likely to be correct and compatible with the rest of the codebase. Code agents often
+look at the code "narrowly", and make a fix that causes models to diverge from the rest of the codebase.
+- Avoid small or "busywork" PRs. In the past, we used to accept these, but given the current flood, we simply don't
+have time for small style changes or typo fixes in comments. You can provide value beyond a code
+agent simply by having good taste about what's really important.
+- Verify tests locally and in the CI. Before opening a PR, run `make fix-repo` and use `utils/tests_fetcher.py` to 
+see a list of tests that cover the files you have changed in your PR branch. Run those tests locally, and make sure 
+they pass before you open a PR. After you open your PR, please verify that the CI is green and fix any issues before 
+pinging anyone for review! This reduces notification spam a lot, which keeps maintainers sane.
+
+Please bear in mind that this is an exciting, rapidly-changing but challenging era for open-source development, and indeed
+for the software industry as a whole. We will likely be rapidly updating these guidelines as we learn more about
+dealing effectively with code agents. Have patience with us if reviews are slower than normal, or if some
+PRs are closed without review!
+
+</details>
+
+Transformers welcomes all contributions whether it is fixing bugs, submitting feature requests, implementing new models, or improving docs.
+
+If you aren't sure where to start, take a look at the [Good First Issues](https://github.com/huggingface/transformers/labels/Good%20First%20Issue) for more beginner-friendly issues. For a more challenging issue, check out the [Good Second Issues](https://github.com/huggingface/transformers/labels/Good%20Second%20Issue). However you choose to contribute, please be mindful and respect our [code of conduct](https://github.com/huggingface/transformers/blob/main/CODE_OF_CONDUCT.md).
+
+If you enjoyed using Transformers, feel free to reference it and let us know how you're using it, shout us out on Twitter, or give the repository a star!
+
+This guide was heavily inspired by the [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/main/CONTRIBUTING.md).
+
+## Contents
+
+1. [Set up](#set-up)
+   - [Windows](#windows)
+2. [Opening issues](#opening-issues)
+   - [Bug-related issue](#bug-related-issue)
+   - [Feature request](#feature-request)
+3. [Adding a new model](#adding-a-new-model)
+   - [Model addition timeline](#model-addition-timeline)
+4. [Docs](#docs)
+5. [Agentic contributions](#agentic-contributions)
+
+## Set up
+
+1. Fork [Transformers](https://github.com/huggingface/transformers) with the **Fork** button on GitHub to get a copy under your account, then clone it locally.
+
+  ```bash
+  git clone git@github.com:<your Github handle>/transformers.git
+  cd transformers
+  ```
+
+2. Add the Transformers repository as a remote named `upstream`. `origin` points at your fork and `upstream` points at the source of truth. That lets you sync your local `main` before pushing to `origin` and opening a PR.
+
+  ```bash
+  git remote add upstream https://github.com/huggingface/transformers.git
+  git remote -v
+  # origin    git@github.com:<your Github handle>/transformers.git (fetch)
+  # origin    git@github.com:<your Github handle>/transformers.git (push)
+  # upstream  https://github.com/huggingface/transformers.git (fetch)
+  # upstream  https://github.com/huggingface/transformers.git (push)
+  ```
+
+3. Keep your fork up to date and start a branch to work on. You can also sync your fork from the GitHub UI with the **Sync fork** button.
+
+  ```bash
+  git checkout main && git pull upstream main && git push origin main
+  git switch -c a-descriptive-name-for-my-branch
+  ```
+
+4. Install the library in editable mode. Use `[dev]` for most contributions because it includes the development tools needed for style and quality checks. Use `[torch,testing]` for model contributions, or `[quality]` for docs-only changes and small fixes when the full development dependency set is not needed.
+
+  ```bash
+  pip install -e ".[dev]"
+
+  # for model contributions
+  pip install -e ".[torch,testing]"
+
+  # for docs-only changes and small fixes
+  pip install -e ".[quality]"
+  ```
+
+### Windows
+
+On Windows, unless you're using [WSL](https://learn.microsoft.com/en-us/windows/wsl/), configure git to convert `CRLF` line endings to `LF`.
+
+```bash
+git config core.autocrlf input
+```
+
+One way to run the `make` command on Windows is with MSYS2.
+
+1. [Download MSYS2](https://www.msys2.org/) (we assume it's installed in `C:\msys64`).
+2. Open the command line `C:\msys64\msys2.exe` (it should be available from the **Start** menu).
+3. Run in the shell: `pacman -Syu` and install `make` with `pacman -S make`.
+4. Add `C:\msys64\usr\bin` to your PATH environment variable.
+
+You can now use `make` from any terminal (PowerShell, cmd.exe, etc.).
+
+## Opening issues
+
+Use the issue [templates](https://github.com/huggingface/transformers/tree/main/.github/ISSUE_TEMPLATE) to help you get started.
+
+### Bug-related issue
+
+Make sure the bug wasn't already reported before opening an issue (use the search bar on GitHub under Issues). The issue should be a bug in Transformers, not in your own code.
+
+If you're unsure whether the bug is in your code or the library, ask in the [forum](https://discuss.huggingface.co/) or on [Discord](https://discord.com/invite/hugging-face-879548962464493619) first. That helps keep GitHub focused on actionable library issues.
+
+Include the following so we can resolve it quickly.
+
+* Your *OS type and version* and *Python*, and *PyTorch* versions when applicable.
+* A short, self-contained, code snippet that allows us to reproduce the bug.
+* The *full* traceback if an exception is raised.
+* Any other information, like screenshots, that may help.
+
+To get the OS and software versions automatically, run the following command.
+
+```bash
+transformers env
+```
+
+### Feature request
+
+Open an issue and describe:
+
+* The *motivation* such as a frustration with the library, a project need, or work you've done that could benefit the community.
+* The feature in as much detail as possible.
+* A *code snippet* demonstrating the feature's usage.
+* A link to the relevant paper, if applicable.
+
+## Adding a new model
+
+Adding a model to Transformers makes it available for anyone to load and fine-tune it or run inference. Before you start the implementation, isolate the differences between your architecture and recent state-of-the-art models that are already in Transformers. Explain those differences in the issue or PR so reviewers can quickly understand what challenges the model brings, what parts can reuse existing patterns, and what should be standardized across models.
+
+Follow the guides below.
+
+1. [Add a model with modular Transformers](docs/source/en/modular_transformers.md) — implement the model using the modular file and generate standalone modeling files.
+2. [Auto-generate docstrings](docs/source/en/auto_docstring.md) — use the `@auto_docstring` decorator to generate consistent docstrings without boilerplate.
+3. [Testing](docs/source/en/testing.md) — write and run model tests to verify correctness and keep the contribution maintainable. For causal language models, inherit from the causal LM tester where possible. Prioritize integration tests because they verify the full processor, tokenizer, and model path against real checkpoints.
+4. [Model structure rules](docs/source/en/modeling_rules.md) — check your files pass the static model structure rules enforced by `make typing`.
+
+Some model additions need extra integration work.
+
+- [Add vision processing components](docs/source/en/add_vision_processing_components.md) if your model requires image or video inputs.
+- [Dynamic weight loading](docs/source/en/weightconverter.md) if published checkpoint parameter names do not match the Transformers implementation.
+
+### Model addition timeline
+
+There are four timelines for model additions depending on the model contributor and community demand for an architecture.
+
+- Day-0 integration: make the model available in Transformers on release day with a new version release. We provide help to support the most important core features (quantization, FlashAttention, KV-cache, etc.), review early drafts, and can also take care of the model implementation depending on timelines.
+
+  Email transformers@huggingface.co a few weeks ahead, especially for novel architectures. We'll iterate with you on a private fork until your checkpoint and release are ready.
+
+- Same-week integration: high-demand models usually land within a week, even when the author doesn't reach out.
+
+  Open an issue with the [new model template](https://github.com/huggingface/transformers/issues/new?assignees=&labels=New+model&projects=&template=new-model-addition.yml) to request one. Issues with more activity move up the queue faster.
+
+- Post-release integration: models without strong demand, or that we don't have bandwidth to take on, land after the upstream release.
+
+  Open issues tagged ["New model"](https://github.com/huggingface/transformers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+model%22) are the entry point for outside contributors. Start with the most-requested architectures to maximize impact. We'll review and guide you through it.
+
+- Hub-first release: ship your model directly on the Hub via Transformers' [remote-code](./docs/source/en/models#custom-models.md) support, with no upstream PR required.
+
+  Popular Hub-first models often get integrated into Transformers later, which unlocks first-class docs, maintenance, and optimization. Hub-first is the lowest-friction way to add a model. However, it can be more fragile if a model requires weight conversion or other backward compatibility issues.
+
+## Docs
+
+Improvements to the docs, like typos, missing content or unclear explanations, are always welcome. For API reference generated from source files, use the [`@auto_docstring`](./docs/source/en/auto_docstring.md) decorator when it applies. Open a pull request directly for meaningful documentation fixes.
+
+Refer to the docs [README](./docs/README.md) for more details about how to edit the docs and the syntax we use.
+
+## Agentic contributions
+
+AI-assisted contributions are welcome. They must be coordinated, scoped, and verified to keep review load manageable.
+
+- Do not submit "pure agent" PRs. The human submitter is responsible for reviewing all changed lines, validating behavior end-to-end, and running relevant tests.
+- If AI tools were used, disclose this in the PR description and include: coordination link, differentiation from existing PRs (if applicable), and test commands/results.
+- Avoid one-off "busywork" PRs (single typo, isolated style cleanup, one mutable default fix, etc.). Bundle mechanical cleanups into a clear, systematic scope.
+- Coordinate on issues before opening a PR, review similar PRs, and wait for approval.
+
+> [!NOTE]
+> These topics are outlined for agents in `AGENTS.md` with instruction for how to autonomously implement them.
--- a/ISSUES.md
+++ b/ISSUES.md
@@ -0,0 +1,274 @@
+<!---
+Copyright 2020 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# How To Request Support
+
+This is an Open Source Project so please be mindful that like in any other project of this kind there is no obligation to answer all requests for help.
+
+However, we want to encourage you to ask for help whenever you think it's needed! We are happy about every question we get because it allows us to better understand your needs, possible misunderstandings, and most importantly a way for you to help us make this library better. That being said, this document's main purpose is to provide guidelines at how you can formulate your requests to increase your chances to be understood and to get support.
+
+There are two main venues to receive support: [the forums](https://discuss.huggingface.co/) and [the GitHub issues](https://github.com/huggingface/transformers/issues).
+
+## The Forums
+
+[The user forums](https://discuss.huggingface.co/) are supported by the wide community of the library users and backed up by developers when needed.
+
+If you have a difficulty with deploying this library or some questions, or you'd like to discuss a new feature, please first consider discussing those things at the forums. Only when you feel your subject matter has been crystallized and you still need support from the library developers do proceed to file an [issue](https://github.com/huggingface/transformers/issues).
+
+In particular all "Please explain" questions or objectively very user-specific feature requests belong to the forums. Here are some example of such questions:
+
+* "I would like to use a BertModel within a RL-Agent for a customer support service. How can I use a BertForMaskedLM in my ChatBotModel?"
+
+* "Could you please explain why T5 has no positional embedding matrix under T5Model?"
+
+* "How should I set my generation parameters for translation?"
+
+* "How to train T5 on De->En translation?"
+
+## The GitHub Issues
+
+Everything which hints at a bug should be opened as an [issue](https://github.com/huggingface/transformers/issues).
+
+You are not required to read the following guidelines before opening an issue. However, if you notice that your issue doesn't get any replies, chances are that the developers have one or several difficulties with its quality. In this case, reading the following points and adjusting your issue accordingly could help.
+
+1. Before posting an issue, first search for already posted issues, since chances are someone has already asked a similar question before you.
+
+    If you use Google your search query should be:
+
+    ```
+    "huggingface" "transformers" your query
+    ```
+
+    The first two quoted words tell Google to limit the search to the context of the Huggingface Transformers. The remainder is your query - most commonly this would be the error message the software fails with. We will go deeper into details shortly.
+
+    The results of such a query will typically match GitHub issues, Hugging Face forums, StackExchange, and blogs.
+
+    If you find relevant hints, you may choose to continue the discussion there if you have follow up questions.
+
+    If what you found is similar but doesn't quite answer your problem, please, post a new issue and do include links to similar issues or forum discussions you may have found.
+
+    Let's look at some examples:
+
+    The error message, often referred to as an assertion, tells us what went wrong. Here is an example of an assertion:
+
+   ```python
+   Traceback (most recent call last):
+     File "<string>", line 1, in <module>
+     File "/transformers/src/transformers/__init__.py", line 34, in <module>
+       from . import dependency_versions_check
+     File "/transformers/src/transformers/dependency_versions_check.py", line 34, in <module>
+       from .utils import is_tokenizers_available
+     File "/transformers/src/transformers/utils/import_utils.py", line 40, in <module>
+       from tqdm.auto import tqdm
+    ModuleNotFoundError: No module named 'tqdm.auto'
+    ```
+
+   and it typically includes a traceback, so that we can see the full stack of calls the program made before it fails. This gives us the context to know why the program failed.
+
+   Going back to the above example. If you received this error search, look at the very last line of the error which is:
+
+   ```python
+    ModuleNotFoundError: No module named 'tqdm.auto'
+    ```
+
+    And now we can use it to do the searching on your favorite search engine:
+
+    1. first for `"huggingface" "transformers" "ModuleNotFoundError: No module named 'tqdm.auto'"`
+    2. if you don't find relevant results, then search for just `"ModuleNotFoundError: No module named 'tqdm.auto'"`
+    3. and finally if nothing still comes up, then remove the outside quotes: `ModuleNotFoundError: No module named 'tqdm.auto'`
+
+   If the error includes any messages that include bits unique to your filesystem, always remove those in the search query since other users will not have the same filesystem as yours. For example:
+
+   ```bash
+   python -c 'open("/tmp/wrong_path.txt", "r")'
+   Traceback (most recent call last):
+     File "<string>", line 1, in <module>
+   FileNotFoundError: [Errno 2] No such file or directory: '/tmp/wrong_path.txt'
+   ```
+   Here you'd search for just: `"FileNotFoundError: [Errno 2] No such file or directory"`
+
+   If the local information that you removed were inside the error message and you removed them you may need to remove double quotes since your query is no longer exact. So if the error message was something like:
+
+   ```bash
+      ValueError: '/tmp/wrong_path.txt' cannot be found
+   ```
+
+   then you'd search for `"ValueError" "cannot be found"`
+
+   As you search you will notice that when you don't use quotes often the search engines will return a variety of unrelated hits, which may or may not be what you want.
+
+   Experiment with different ways and find which approach gives the most satisfactory results.
+
+2. Keep the issue short, providing the information that you think will aid the developers to understand your situation. Put yourself in the shoes of the person who has never seen your code or knows anything about your custom setup. This mental exercise will help to develop an intuition to what/what not to share"
+
+3. If there is a software failure, always provide the full traceback, for example:
+
+   ```python
+   $ python -c 'import transformers'
+   Traceback (most recent call last):
+     File "<string>", line 1, in <module>
+     File "/transformers/src/transformers/__init__.py", line 34, in <module>
+       from . import dependency_versions_check
+     File "/transformers/src/transformers/dependency_versions_check.py", line 34, in <module>
+       from .utils import is_tokenizers_available
+     File "/transformers/src/transformers/utils/import_utils.py", line 40, in <module>
+       from tqdm.auto import tqdm
+   ModuleNotFoundError: No module named 'tqdm.auto'
+   ```
+
+   As compared to providing just the last line of the error message, e.g.:
+   ```python
+   ModuleNotFoundError: No module named 'tqdm.auto'
+   ```
+   which is not sufficient.
+
+   If your application is running on more than one GPU (e.g. under `DistributedDataParallel`) and typically getting every log and traceback printed multiple times, please make sure that you paste only one copy of it. At times the traceback from parallel processes may get interleaved - so either disentangle these or change the loggers to log only for `local_rank==0` so that only one process logs things.
+
+4. When quoting a traceback, command line instructions and any type of code always enclose it in triple backticks inside the editor window, that is:
+
+   ````
+   ```
+   git clone https://github.com/huggingface/transformers
+   cd transformers
+   pip install .
+   ```
+   ````
+
+   If it's a command line with a long argument list, please consider breaking it down using backslashes and new lines. Here is an example of a good command line quote:
+
+   ```bash
+    cd examples/seq2seq
+    torchrun --nproc_per_node=2 ./finetune_trainer.py \
+    --model_name_or_path sshleifer/distill-mbart-en-ro-12-4 --data_dir wmt_en_ro \
+    --output_dir output_dir \
+    --do_train --n_train 500 --num_train_epochs 1 \
+    --per_device_train_batch_size 1  --freeze_embeds \
+    --src_lang en_XX --tgt_lang ro_RO --task translation \
+    --fp16
+   ```
+
+   If you don't break it up, one has to scroll horizontally which often makes it quite difficult to quickly see what's happening.
+
+   The backslashes allow us to copy the command directly into the console to run it, without needing to edit it.
+
+5. Include only the important information that you think will help the developer to quickly identify the problem.
+
+   For example applications often create huge amounts of logs. Ask yourself whether providing all or parts of the log is useful.
+
+   Pasting a 100-1000 lines of log into the issue is an immediate turn off, since it will take a lot of time to figure out where the pertinent parts of the log are.
+
+   Attaching a full log can be helpful if it's done as an attachment, if it's enclosed in the following html code in the comment editor window:
+
+   ```
+   <details>
+   <summary>Full log</summary>
+   <pre>
+
+   many
+   lines
+   go
+   here
+
+   </pre>
+   </details>
+   ```
+
+   which would result in the following entry, which can be opened if desired, but otherwise takes little space.
+
+   <details>
+   <summary>Full log</summary>
+   <pre>
+   many
+   lines
+   go
+   here
+   </pre>
+   </details>
+
+    You could also provide a link to a pastebin service, but this is less beneficial since those links tend to expire quickly and future readers of your issue might not be able to access that log file anymore and may lack some context.
+
+6. If this is an issue in your code, do try to reduce that code to a minimal example that still demonstrates the problem. Please ask at the forums if you have a hard time figuring how to do that. Please realize that we don't have the luxury of having time to try and understand all of your custom code.
+
+   If you really tried to make a short reproducible code but couldn't figure it out, it might be that having a traceback will give the developer enough information to know what's going on. But if it is not enough and we can't reproduce the problem, we can't really solve it.
+
+   Do not despair if you can't figure it out from the beginning, just share what you can and perhaps someone else will be able to help you at the forums.
+
+   If your setup involves any custom datasets, the best way to help us reproduce the problem is to create a [Google Colab notebook](https://colab.research.google.com/) that demonstrates the issue and once you verify that the issue still exists, include a link to that notebook in the Issue. Just make sure that you don't copy and paste the location bar url of the open notebook - as this is private and we won't be able to open it. Instead, you need to click on `Share` in the right upper corner of the notebook, select `Get Link` and then copy and paste the public link it will give to you.
+
+7. If you forked off some of this project's code or example applications, please, do not ask us to go into your code repository and figure out what you may have done. The code is already very complex and unless there is an easy way to do a diff and it's a small diff, it won't be possible to find someone with time on their hands to make a lengthy investigation. Albeit, you might find someone at the forums who will be generous to do this for you.
+
+8. Before reporting an issue, first, always try to update your environment to the latest official version of this library. We have no resources to go and debug older revisions, which could easily have bugs that have been fixed in the latest released version.
+
+   We understand that this is not always possible, especially when APIs change, in which case file an issue against the highest library version your environment can support.
+
+   Of course, if you upgrade the library, always retest that the problem is still there.
+
+9. Please do not ask us to reproduce an issue with your custom data, since we don't have it. So, either you should use some existing dataset supported by HF datasets or you need to supply a code that generates a small sample on the fly, or some another quick and simple way to get it.
+
+   Please do not send us any non-public domain data that may require a license or a permission to be used.
+
+10. Do not tag multiple developers on the issue unless you know this is expected, either because you asked them and they gave you an explicit permission to tag them or the issue template instructs you to do so.
+
+   The "who to tag for what domain" part of the issue template is there to help users direct their questions to the right developers who are designated maintainers of project's specific domains. They can then decide at their own discretion to tag other developers if they feel it'd help move the issue forward.
+
+   We currently don't have a triage service and we trust your capacity to identify the right domain and thus the persons to tag in your issue. If you are not sure, please use the forums to ask for guidance.
+
+   When in doubt, err on the side of not tagging a given person. If you tag multiple people out of context or permission don't be surprised if you get no response at all. Please remember that every time you tag someone, they get a notification and you're taking their time without their permission. Please be sensitive to that.
+
+   If you got helped by one of the developers in the past please don't tag them in future issues, unless they are listed in the issue template for the domain you are asking about or that developer gave you an explicit permission to tag them in future issues.
+
+   If you see a certain developer doing multiple and/or recent commits into a specific area of the project that you feel is relevant to your issue, it is not a good reason to tag them. Various developers may be fixing things that prevent them from moving forward, but often their work is focused on a totally different domain. And while they may or may not know how to help you with the problem at hand, it would benefit the whole community much more if they focus on the domain of their unique expertise.
+
+11. Use the Edit button. Take your time, and re-read and improve the wording and formatting to make your posts and comments as easy to understand as possible.
+
+    Avoid posting multiple comments in a row, as each comment generates a notification for the developers tagged in that issue. If you happened to post multiple comments in a row, and nobody followed up yet - consider merging those into one or a few comments while editing the combined content to be coherent.
+
+    If you choose to edit your older comments after others posted follow up comments you need to be aware that your modifications might not be noticed, so if it's not a typo fixing, try to write a new comment flagging that something has been changed in the previous comments.
+
+    For example, the very first comment is the most important one. If while the thread unfolds you realize that things aren't as they seemed to you originally you may want to edit the first post to reflect the up-to-date understanding of the issue at hand so that it helps those who read your issue in the future quickly understand what's going on and not need to sift through dozens of comments. It also helps to indicate that the post was edited. So, those reading the thread later can understand why there might be certain discontinuity in the information flow.
+
+    Use bullets and items if you have lists of items and the outcome improves overall readability.
+
+    Use backticks to refer to class and function names, e.g. `BartModel` and `generate` as these stand out and improve the speed of a reader's comprehension.
+
+    Try not use italics and bold text too much as these often make the text more difficult to read.
+
+12. If you are cross-referencing a specific comment in a given thread or another issue, always link to that specific comment, rather than using the issue link. If you do the latter it could be quite impossible to find which specific comment you're referring to.
+
+    To get the link to the specific comment do not copy the url from the location bar of your browser, but instead, click the `...` icon in the upper right corner of the comment and then select "Copy Link".
+
+    For example the first link is a link to an issue, and the second to a specific comment in the same issue:
+
+    1. https://github.com/huggingface/transformers/issues/9257
+    2. https://github.com/huggingface/transformers/issues/9257#issuecomment-749945162
+
+13. If you are replying to a last comment, it's totally fine to make your reply with just your comment in it. The readers can follow the information flow here.
+
+    But if you're replying to a comment that happened some comments back it's always a good practice to quote just the relevant lines you're replying it. The `>` is used for quoting, or you can always use the menu to do so. For example your editor box will look like:
+
+    ```
+    > How big is your GPU cluster?
+
+    Our cluster is made of 256 GPUs.
+    ```
+
+    If you are addressing multiple comments, quote the relevant parts of each before your answer. Some people use the same comment to do multiple replies, others separate them into separate comments. Either way works. The latter approach helps for linking to a specific comment.
+
+In general the best way to figure out what works the best is learn from issues posted by other people - see which issues get great responses and which get little to no response - observe what the posters who received great responses did differently from those who did not.
+
+Thank you for reading this somewhat lengthy document. We would like to conclude that these are not absolute rules, but a friendly advice that will help maximize the chances for us to understand what you are trying to communicate, reproduce the problem then resolve it to your satisfaction and the benefit of the whole community.
+
+If after reading this document there are remaining questions on how and why or there is a need for further elucidation, please, don't hesitate to ask your question in [this thread](https://discuss.huggingface.co/t/how-to-request-support/3128).
--- a/203
+++ b/203
@@ -0,0 +1,203 @@
+Copyright 2018- The Hugging Face team. All rights reserved.
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/MIGRATION_GUIDE_V5.md
+++ b/MIGRATION_GUIDE_V5.md
@@ -0,0 +1,791 @@
+<!---
+Copyright 2025 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Version 5 Migration guide
+
+## Library-wide changes with widespread impact
+
+### Removal of TensorFlow and Jax
+
+We're removing the TensorFlow and Jax parts of the library. This will help us focus fully on `torch`
+going forward and will greatly reduce the maintenance cost of models. We are working with tools from
+the Jax ecosystem still (such as MaxText) in order to see how we can remain compatible with their
+tool while keeping `torch` as the only backend for now.
+
+Linked PR: https://github.com/huggingface/transformers/pull/40760
+
+### Dynamic weight loading
+
+We introduce a new weight loading API in `transformers`, which significantly improves on the previous API. This
+weight loading API is designed to apply operations to the checkpoints loaded by transformers.
+
+Instead of loading the checkpoint exactly as it is serialized within the model, these operations can reshape, merge,
+and split the layers according to how they're defined in this new API. These operations are often a necessity when
+working with quantization or parallelism algorithms.
+
+This new API is centered around the new `WeightConverter` class:
+
+```python
+class WeightConverter(WeightTransform):
+    operations: list[ConversionOps]
+    source_keys: Union[str, list[str]]
+    target_keys: Union[str, list[str]]
+```
+
+The weight converter is designed to apply a list of operations on the source keys, resulting in target keys. A common
+operation done on the attention layers is to fuse the query, key, values layers. Doing so with this API would amount
+to defining the following conversion:
+
+```python
+conversion = WeightConverter(
+    ["self_attn.q_proj", "self_attn.k_proj", "self_attn.v_proj"],  # The input layers
+    "self_attn.qkv_proj",  # The single layer as output
+    operations=[Concatenate(dim=0)],
+)
+```
+
+In this situation, we apply the `Concatenate` operation, which accepts a list of layers as input and returns a single
+layer.
+
+This allows us to define a mapping from architecture to a list of weight conversions. Applying those weight conversions
+can apply arbitrary transformations to the layers themselves. This significantly simplified the `from_pretrained` method
+and helped us remove a lot of technical debt that we accumulated over the past few years.
+
+This results in several improvements:
+- Much cleaner definition of transformations applied to the checkpoint
+- Reversible transformations, so loading and saving a checkpoint should result in the same checkpoint
+- Faster model loading thanks to scheduling of tensor materialization
+- Enables complex mix of transformations that wouldn't otherwise be possible (such as quantization + MoEs, or TP + MoEs)
+
+While this is being implemented, expect varying levels of support across different release candidates.
+
+Linked PR: https://github.com/huggingface/transformers/pull/41580
+
+## Tokenization
+
+Just as we moved towards a single backend library for model definition, we want our tokenizers, and the `Tokenizer` object to be a lot more intuitive. With v5, tokenizer definition is much simpler; one can now initialize an empty `LlamaTokenizer` and train it directly on your corpus.
+
+Defining a new tokenizer object should be as simple as this:
+
+```python
+from transformers import TokenizersBackend, generate_merges
+from tokenizers import pre_tokenizers, Tokenizer
+from tokenizers.model import BPE
+
+class Llama5Tokenizer(TokenizersBackend):
+    def __init__(self, unk_token="<unk>",bos_token="<s>", eos_token="</s>", vocab=None, merges=None ):
+        if vocab is None:
+            self._vocab = {
+                str(unk_token): 0,
+                str(bos_token): 1,
+                str(eos_token): 2,
+            }
+
+        else:
+            self._vocab = vocab
+
+        self._merges = merges or []
+
+        self._tokenizer = Tokenizer(
+            BPE(vocab=self._vocab, merges=self._merges, fuse_unk=True)
+        )
+        self._tokenizer.pre_tokenizer = pre_tokenizers.Metaspace(
+            replacement="▁", prepend_scheme=_get_prepend_scheme(self.add_prefix_space, self), split=False
+        )
+        super().__init__(
+            tokenizer_object=self._tokenizer,
+            unk_token=unk_token,
+            bos_token=bos_token,
+            eos_token=eos_token,
+        )
+```
+
+Once the tokenizer is defined as above, you can load it with the following: `Llama5Tokenizer()`. Doing this returns you an empty, trainable tokenizer that follows the definition of the authors of `Llama5` (it does not exist yet :wink:).
+
+The above is the main motivation towards refactoring tokenization: we want tokenizers to behave similarly to models: trained or empty, and with exactly what is defined in their class definition.
+
+### Backend Architecture Changes: moving away from the slow/fast tokenizer separation
+
+Up to now, transformers maintained two parallel implementations for many tokenizers:
+- "Slow" tokenizers (`tokenization_<model>.py`) - Python-based implementations, often using [SentencePiece](https://github.com/google/sentencepiece) as the backend.
+- "Fast" tokenizers (`tokenization_<model>_fast.py`) - Rust-based implementations using the 🤗 [tokenizers](https://github.com/huggingface/tokenizers) library.
+
+In v5, we consolidate to a single tokenizer file per model: `tokenization_<model>.py`. This file will use the most appropriate backend available:
+
+1. **TokenizersBackend** (preferred): Rust-based tokenizers from the 🤗 [tokenizers](https://github.com/huggingface/tokenizers) library. In general it provides optimal performance, but it also offers a lot more features that are commonly adopted across the ecosystem:
+  - handling additional tokens
+  - a full python API for setting and updating
+  - automatic parallelization,
+  - automatic offsets
+  - customization
+  - training
+2. **SentencePieceBackend**: for tokenizers requiring the `sentencepiece` library. It inherits from `PythonBackend`.
+3. **PythonBackend**: a Python implementations of the features provided by `tokenizers`. Basically allows adding tokens.
+4. **MistralCommonBackend**: relies on `MistralCommon`'s tokenization library. (Previously known as the `MistralCommonTokenizer`)
+
+The `AutoTokenizer` automatically selects the appropriate backend based on available files and dependencies. This is transparent, you continue to use `AutoTokenizer.from_pretrained()` as before. This allows transformers to be future-proof and modular to easily support future backends.
+
+### Defining a tokenizers outside of the existing backends
+
+We enable users and tokenizer builders to define their own tokenizers from top to bottom. Tokenizers are usually defined using a backend such as `tokenizers`, `sentencepiece` or `mistral-common`, but we offer the possibility to design the tokenizer at a higher-level, without relying on those backends.
+
+To do so, you can import the `PythonBackend` (which was previously known as `PreTrainedTokenizer`). This class encapsulates all the logic related to added tokens, encoding, and decoding.
+
+If you want something even higher up the stack, then `PreTrainedTokenizerBase` is what `PythonBackend` inherits from. It contains the very basic tokenizer API features:
+- `encode`
+- `decode`
+- `vocab_size`
+- `get_vocab`
+- `convert_tokens_to_ids`
+- `convert_ids_to_tokens`
+- `from_pretrained`
+- `save_pretrained`
+- among a few others
+
+**Note for implementing new tokenizers:** When creating a tokenizer class that loads from SentencePiece files, you can override the `convert_from_spm` class method in your converter to customize vocabulary structure, normalizers, regexes and anything that you would want to be passed to the tokenizers your are converting.
+This is useful if the model requires specific token ordering or special split regex patterns. See existing converter classes in `convert_slow_tokenizer.py` for examples.
+
+### API Changes
+
+#### 1. Direct tokenizer initialization with vocab and merges
+
+Starting with v5, we now enable initializing blank, untrained `tokenizers`-backed tokenizers:
+
+```py
+from transformers import LlamaTokenizer
+
+tokenizer = LlamaTokenizer()
+```
+
+This tokenizer will therefore follow the definition of the `LlamaTokenizer` as defined in its class definition. It can then be trained on a corpus as can be seen in [the `tokenizers` documentation](https://huggingface.co/docs/tokenizers/training_from_memory).
+
+These tokenizers can also be initialized from vocab and merges (if necessary), like the previous "slow" tokenizers:
+
+```py
+from transformers import LlamaTokenizer
+
+vocab = {"<unk>": 0, "<s>": 1, "</s>": 2, "hello": 3, "world": 4}
+merges = [("h", "e"), ("l", "l"), ("o", " ")]
+
+tokenizer = LlamaTokenizer(vocab=vocab, merges=merges)
+```
+
+This tokenizer will behave as a Llama-like tokenizer, with an updated vocabulary. This allows comparing different tokenizer classes with the same vocab; therefore enabling the comparison of different pre-tokenizers, normalizers, etc.
+
+**Simplified file loading:** Support is added for passing`vocab` and `merges` as file paths directly to tokenizer initialization. The tokenizer will automatically detect the format (SentencePiece `.model`, Tekken `tekken.json`, or plain vocab/merges files) for loading. For BPE tokenizers, if a vocab is provided but no merges, merges will be automatically generated (excluding special tokens).
+
+Note: Loading from file paths with `vocab="<path_to_a_file>"`'s primary goal is to allow you to do some quick testing, but for `BPE` models for example we don't check whether you properly passed the merges or not.
+
+#### 2. Simplified decoding API
+
+The `batch_decode` and `decode` methods have been unified to reflect behavior of the `encode` method. Both single and batch decoding now use the same `decode` method. See an example of the new behavior below:
+
+```python
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("t5-small")
+inputs = ["hey how are you?", "fine"]
+tokenizer.decode(tokenizer.encode(inputs))
+```
+
+Gives:
+```diff
+- 'hey how are you?</s> fine</s>'
+ ['hey how are you?</s>', 'fine</s>']
+```
+
+We expect `encode` and `decode` to behave, as two sides of the same coin: `encode`, `process`, `decode`,  should work.
+
+> [!NOTE]
+> A common use-case would be: `encode`, `model.generate`, `decode`.  However, using `generate` would return `list[list[int]]`, which would then be incompatible with `decode`.
+
+#### 3. Unified encoding API
+
+The `encode_plus` method is deprecated in favor of the single `__call__` method.
+
+#### 4. `apply_chat_template` returns `BatchEncoding`
+
+Previously, `apply_chat_template` returned `input_ids` for backward compatibility. Starting with v5, it now consistently returns a `BatchEncoding` dict like other tokenizer methods.
+
+```python
+# v5
+messages = [
+    {"role": "user", "content": "Hello!"},
+    {"role": "assistant", "content": "Hi there!"}
+]
+
+# Now returns BatchEncoding with input_ids, attention_mask, etc.
+outputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
+print(outputs.keys())  # dict_keys(['input_ids', 'attention_mask'])
+```
+
+#### 5. Removed legacy configuration file saving:
+
+We simplify the serialization of tokenization attributes:
+
+- `special_tokens_map.json` - special tokens are now stored in `tokenizer_config.json`.
+- `added_tokens.json` - added tokens are now stored in `tokenizer.json`.
+- `added_tokens_decoder` is only stored when there is no `tokenizer.json`.
+- `add_bos_token` and `add_eos_token` - these are no longer saved in `tokenizer_config.json`. When a `tokenizer.json` file exists, these settings are defined in the tokenizer class or `tokenizer.json` itself.
+
+**Backend synchronization removed:** The automatic synchronization logic that updated backend tokenizer settings (like `add_prefix_space`, `do_lower_case`, `strip_accents`, `tokenize_chinese_chars`) after initialization has been removed. Tokenizer behavior is now fully determined by the `tokenizer.json` file or class definition at initialization time.
+
+When loading older tokenizers, these files are still read for backward compatibility, but new saves use the consolidated format. We're gradually moving towards consolidating attributes to fewer files so that other libraries and implementations may depend on them more reliably.
+
+#### 6. Model-Specific Changes
+
+Several models that had identical tokenizers now import from their base implementation:
+
+- **LayoutLM** → uses BertTokenizer
+- **LED** → uses BartTokenizer
+- **Longformer** → uses RobertaTokenizer
+- **LXMert** → uses BertTokenizer
+- **MT5** → uses T5Tokenizer
+- **MVP** → uses BartTokenizer
+These modules will eventually be removed altogether.
+
+**Removed T5-specific workarounds**
+
+The internal `_eventually_correct_t5_max_length` method has been removed. T5 tokenizers now handle max length consistently with other models.
+
+### Testing Changes
+
+A few testing changes specific to tokenizers have been applied:
+- Model-specific tokenization test files now focus on integration tests.
+- Common tokenization API tests (e.g., `add_tokens`, `encode`, `decode`) are now centralized and automatically applied across all tokenizers. This reduces test duplication and ensures consistent behavior
+
+For legacy implementations, the original BERT Python tokenizer code (including `WhitespaceTokenizer`, `BasicTokenizer`, etc.) is preserved in `bert_legacy.py` for reference purposes.
+
+#### 7. Deprecated / Modified Features
+
+**Special Tokens Structure:**
+- `SpecialTokensMixin`: Merged into `PreTrainedTokenizerBase` to simplify the tokenizer architecture.
+- `special_tokens_map`: Now only stores named special token attributes (e.g., `bos_token`, `eos_token`). Use `extra_special_tokens` for additional special tokens (formerly `additional_special_tokens`). `all_special_tokens` includes both named and extra tokens.
+
+```python
+# v4
+tokenizer.special_tokens_map  # Included 'additional_special_tokens'
+
+# v5
+tokenizer.special_tokens_map  # Only named tokens
+tokenizer.extra_special_tokens  # Additional tokens
+```
+
+- `special_tokens_map_extended` and `all_special_tokens_extended`: Removed. Access `AddedToken` objects directly from `_special_tokens_map` or `_extra_special_tokens` if needed.
+- `additional_special_tokens`: Automatically converted to `extra_special_tokens` during initialization.
+- `additional_special_tokens_ids`: Removed. Use `extra_special_tokens_ids` instead.
+- `extra_special_tokens`: Only accepts list/tuple format and is intended for use during tokenizer initialization. For model-specific named tokens (e.g., `image_token`), pass directly as keyword arguments instead.
+
+**Deprecated Methods:**
+- `sanitize_special_tokens()`: Already deprecated in v4, removed in v5.
+- `prepare_seq2seq_batch()`: Deprecated; use `__call__()` with `text_target` parameter instead.
+
+```python
+# v4
+model_inputs = tokenizer.prepare_seq2seq_batch(src_texts, tgt_texts, max_length=128)
+
+# v5
+model_inputs = tokenizer(src_texts, text_target=tgt_texts, max_length=128, return_tensors="pt")
+model_inputs["labels"] = model_inputs.pop("input_ids_target")
+```
+
+- `BatchEncoding.words()`: Deprecated; use `word_ids()` instead.
+
+**Removed Methods:**
+- `create_token_type_ids_from_sequences()`: Removed from base class. Subclasses that need custom token type ID creation should implement this method directly.
+- `prepare_for_model()`, `build_inputs_with_special_tokens()`, `truncate_sequences()`: Moved from `tokenization_utils_base.py` to `tokenization_python.py` for `PythonBackend` tokenizers. `TokenizersBackend` provides model-ready input via `tokenize()` and `encode()`, so these methods are no longer needed in the base class.
+- `_switch_to_input_mode()`, `_switch_to_target_mode()`, `as_target_tokenizer()`: Removed from base class. Use `__call__()` with `text_target` parameter instead.
+
+```python
+# v4
+with tokenizer.as_target_tokenizer():
+    labels = tokenizer(tgt_texts, ...)
+
+# v5
+labels = tokenizer(text_target=tgt_texts, ...)
+```
+
+- `parse_response()`: Removed from base class.
+
+## Disclaimers for the RC0
+
+### PEFT + MoE:
+
+Because we are switching from the naive MOE (`nn.ModuleList` for experts) we currently have an issue with MoEs that have adapters. For more details see https://github.com/huggingface/transformers/issues/42491#issuecomment-3591485649.
+
+_We aim for this to be fixed and released in a following release candidate in the week that follows RC0._
+
+### Tensor parallel and Expert parallel + MoE
+
+We are streamlining the MoE support with vLLM; while this is being implemented, tensor parallelism and expert parallelism aren't working as expected.
+This is known and actively being worked on.
+
+_We aim for this to be fixed and released in a following release candidate in the week that follows RC0._
+
+### Remote code incompatibility
+
+A lot of paths were removed and reworked; paths like `transformers.tokenization_utils` and `transformers.tokenization_utils_fast`, which no longer exist.
+These now redirect to `transformers.tokenization_utils_sentencepiece` and `transformers.tokenization_utils_tokenizers` respectively; please update imports accordingly.
+
+_We aim for this to be fixed and released in a following release candidate in the week that follows RC0._
+
+### Custom pretrained models:
+For anyone inheriting from a `transformers` `PreTrainedModel`, the weights are automatically initialized with the common scheme:
+```python
+
+    @torch.no_grad()
+    def _init_weights(self, module):
+        """
+        Initialize the weights. This is quite general on purpose, in the spirit of what we usually do. For more complex
+        initialization scheme, it should be overridden by the derived `PreTrainedModel` class. In case a model adds an explicit
+        `nn.Parameter`, this method should also be overridden in order to initialize it correctly.
+        """
+        if hasattr(self.config, "initializer_range"):
+            std = self.config.initializer_range or 0.02
+        elif hasattr(self.config, "init_std"):
+            std = self.config.init_std
+        elif hasattr(self.config, "initializer_factor"):
+            std = self.config.initializer_factor
+        else:
+            # 0.02 is the standard default value across the library
+            std = getattr(self.config.get_text_config(), "initializer_range", 0.02)
+
+        if isinstance(module, (nn.Linear, nn.Conv1d, nn.Conv2d, nn.Conv3d, nn.ConvTranspose1d, nn.ConvTranspose2d)):
+            if getattr(module, "weight", None) is not None:
+                init.normal_(module.weight, mean=0.0, std=std)
+            if getattr(module, "bias", None) is not None:
+                init.zeros_(module.bias)
+        elif isinstance(module, nn.Embedding):
+            if getattr(module, "weight", None) is not None:
+                init.normal_(module.weight, mean=0.0, std=std)
+                # Here we need the check explicitly, as we slice the weight in the `zeros_` call, so it looses the flag
+                if module.padding_idx is not None and not getattr(module.weight, "_is_hf_initialized", False):
+                    init.zeros_(module.weight[module.padding_idx])
+        elif isinstance(module, nn.MultiheadAttention):
+            # This uses torch's original init
+            module._reset_parameters()
+        # We cannot use `isinstance` on the RMSNorms or LayerNorms, as they usually are custom modules which change names
+        # between modelings (because they are prefixed with the model name)
+        elif (
+            isinstance(module, (nn.GroupNorm, nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d))
+            or "LayerNorm" in module.__class__.__name__
+            or "RMSNorm" in module.__class__.__name__
+        ):
+            # Norms can exist without weights (in which case they are None from torch primitives)
+            if hasattr(module, "weight") and module.weight is not None:
+                init.ones_(module.weight)
+            if hasattr(module, "bias") and module.bias is not None:
+                init.zeros_(module.bias)
+```
+
+If you want to avoid that, for now you should just do:
+
+```
+class CustomModel(Qwen3VLForConditionalGeneration):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.action_head = nn.Linear(1024, 7)
+        self.positional_embedding = nn.Parameter(torch.randn(16, 1152))
+        self.post_init()
+
+    def _init_weights(self, module):
+        pass
+
+```
+There is a tracker for that here: https://github.com/huggingface/transformers/issues/42418.
+
+## Library-wide changes with lesser impact
+
+### Drop support for `safe_serialization=False`
+
+Safetensors is a simple format for storing tensors safely (as opposed to pickle) and that is still fast (zero-copy). It is the preferred file format to store transformers's weights. Prior to transformers `v5`, it was still possible to pass `safe_serialization=False` to fall back to torch's default (and unsafe) file format. This is no longer possible in `v5`. The `safe_serialization` parameter has been removed from all `save_pretrained` and `push_to_hub` methods.
+
+If you really want to export weights to another file format, you must save the `model.state_dict()` by yourself.
+
+Linked PR: https://github.com/huggingface/transformers/issues/42556
+
+### 50GB default shard size
+
+The default shard size went up from `5GB` to `50GB`. Main benefit will be to avoid having tens or hundreds of weight files for large models. This change was made possible thanks to the Xet backend allowing us to efficiently serve very large files. Increasing default shard size was a decision that was only taken after *very careful considerations* around optimizations and load speed. Check out the linked PR for benchmark details.
+
+Linked PR: https://github.com/huggingface/transformers/issues/42556
+
+### `use_auth_token`
+
+The `use_auth_token` argument/parameter is deprecated in favor of `token` everywhere.
+You should be able to search and replace `use_auth_token` with `token` and get the same logic.
+
+Linked PR: https://github.com/huggingface/transformers/pull/41666
+
+### Attention-related features
+
+We decided to remove some features for the upcoming v5 as they are currently only supported in a few old models and no longer integrated in current model additions. It's recommended to stick to v4.x in case you need them. Following features are affected:
+- No more head masking, see [#41076](https://github.com/huggingface/transformers/pull/41076). This feature allowed to turn off certain heads during the attention calculation and only worked for eager.
+- No more relative positional biases in Bert-like models, see [#41170](https://github.com/huggingface/transformers/pull/41170). This feature was introduced to allow relative position scores within attention calculations (similar to T5). However, this feature is barely used in official models and a lot of complexity instead. It also only worked with eager.
+- No more head pruning, see [#41417](https://github.com/huggingface/transformers/pull/41417) by @gante. As the name suggests, it allowed to prune heads within your attention layers.
+
+### Updates to supported torch APIs
+
+We dropped support for two torch APIs:
+- `torchscript` in https://github.com/huggingface/transformers/pull/41688
+- `torch.fx` in https://github.com/huggingface/transformers/pull/41683
+
+Those APIs were deprecated by the PyTorch team, and we're instead focusing on the supported APIs `dynamo` and `export`.
+
+### Feature extraction helpers: `get_*_features`
+
+Many multi-modal models expose convenience methods such as `get_text_features`, `get_image_features`, `get_audio_features`, and `get_video_features` to run inference on a single modality without calling `model(**inputs)` directly.
+
+Starting with v5, these 4 helper methods now return a `BaseModelOutputWithPooling` (or a subclass) instead of only a pooled embedding tensor:
+
+- `last_hidden_state`: unpooled token/patch/frame embeddings for the requested modality.
+- `pooler_output`: pooled representation (what most models previously returned from `get_*_features`).
+- `hidden_states`: full hidden states for all layers when `output_hidden_states=True` is passed.
+- `attentions`: attention maps when `output_attentions=True` is passed.
+
+> [!IMPORTANT]
+> There is **no single universal shape** for `last_hidden_state` or `pooler_output`. It's recommended to inspect a small forward pass before making assumptions about shapes or semantics.
+
+If your code previously did something like this:
+
+```python
+text_embeddings = model.get_text_features(**inputs)
+```
+
+and you used `text_embeddings` as a tensor, you should now explicitly use `return_dict=True` take the `pooler_output` field from the returned `BaseModelOutputWithPooling`:
+
+```python
+outputs = model.get_text_features(**inputs, return_dict=True)
+text_embeddings = outputs.pooler_output
+```
+
+This will match the previous behavior in the large majority of cases. If your model-specific implementation returned a tuple of results before, those values should now be accessible as fields on the corresponding `BaseModelOutputWithPooling` subclass.
+
+Linked PR: https://github.com/huggingface/transformers/pull/42564
+
+## Quantization changes
+
+We clean up the quantization API in transformers, and significantly refactor the weight loading as highlighted
+above.
+
+We drop support for two quantization arguments that have been deprecated for some time:
+- `load_in_4bit`
+- `load_in_8bit`
+
+We remove them in favor of the `quantization_config` argument which is much more complete. As an example, here is how
+you would load a 4-bit bitsandbytes model using this argument:
+
+```python
+from transformers import AutoModelForCausalLM, BitsAndBytesConfig
+
+quantization_config = BitsAndBytesConfig(load_in_4bit=True)
+
+model_4bit = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Llama-3.2-3B",
+    device_map="auto",
+    quantization_config=quantization_config
+)
+```
+
+
+### Auto-classes
+
+- `AutoModelWithLMHead` is removed in favor of `AutoModelForCausalLM` for causal language models, `AutoModelForMaskedLM` for masked language models and `AutoModelForSeq2SeqLM` for encoder-decoder models
+- `AutoModelForVision2Seq` is removed in favor of `AutoModelForImageTextToText`
+
+
+## Configuration
+
+- Methods to init a nested config such as `from_xxx_config` are deleted. Configs can be init from the `__init__` method in the same way. See [#41314](https://github.com/huggingface/transformers/pull/41314).
+- It is no longer possible to load a config class from a URL file. Configs must be loaded from either a local path or a repo on the Hub. See [#42383](https://github.com/huggingface/transformers/pull/42383).
+- All parameters for configuring model's rotary embedding are now stored under `mode.rope_parameters`, including the `rope_theta` and `rope_type`. Model's `config.rope_parameters` is a simple dictionaty in most cases, and can also be a nested dict in special cases (i.e. Gemma3 and ModernBert) with different rope parameterization for each layer type. Trying to get `config.rope_theta` will throw an attribute error from now on. See [#39847](https://github.com/huggingface/transformers/pull/39847) and [#42255](https://github.com/huggingface/transformers/pull/42255)
+- Qwen-VL family configuration is in a nested format and trying to access keys directly will throw an error (e.g. `config.vocab_size`). Users are expected to access keys from their respective sub-configs (`config.text_config.vocab_size`).
+- Configurations of non-generative models (any model that doesn't call `model.generate()`) will no longer have a `generation_config` and `model.config.generation_config` will throw an attribute error.
+
+## Processing
+
+### Tokenization
+
+- Slow tokenizer files (aka: `tokenization_<model>.py` ) will be removed in favor of using fast tokenizer files `tokenization_<model>_fast.py` --> will be renamed to `tokenization_<model>.py`.  As fast tokenizers are :hugs:`tokenizers` - backend, they include a wider range of features that are maintainable and reliable.
+- Other backends (sentence piece, tokenizers, etc.) will be supported with a light layer if loading a fast tokenizer fails
+- Remove legacy files like special_tokens_map.json and added_tokens.json
+- Remove _eventually_correct_t5_max_length
+- `encode_plus` --> `__call__`
+- `batch_decode` --> `decode`
+
+`apply_chat_template` by default returns naked `input_ids` rather than a `BatchEncoding` dict.
+This was inconvenient - it should return a `BatchEncoding` dict like `tokenizer.__call__()`, but we were stuck with
+it for backward compatibility. The method now returns a `BatchEncoding`.
+
+Linked PRs:
+- https://github.com/huggingface/transformers/issues/40938
+- https://github.com/huggingface/transformers/pull/40936
+- https://github.com/huggingface/transformers/pull/41626
+
+### Processing classes
+
+- In processing classes each attribute will be serialized under `processor_config.json` as a nested dict, instead of serializing attributes in their own config files. Loading will be supported for all old format processors (https://github.com/huggingface/transformers/pull/41474)
+- `XXXFeatureExtractors` classes are completely removed in favor of `XXXImageProcessor` class for all vision models (https://github.com/huggingface/transformers/pull/41174)
+- Minor change: `XXXFastImageProcessorKwargs` is removed in favor of `XXXImageProcessorKwargs` which will be shared between fast and slow processors (https://github.com/huggingface/transformers/pull/40931)
+
+
+### Image processors
+
+The old slow/fast dual-file design has been replaced with a named-backend architecture. Each model previously had a PIL-based `image_processing_<model>.py` and a torchvision-based `image_processing_<model>_fast.py`. The new layout is:
+
+- `image_processing_<model>.py` → **torchvision** backend (default; was previously `FooImageProcessorFast`)
+- `image_processing_pil_<model>.py` → **PIL** backend (was previously `FooImageProcessor`)
+
+Processor classes now inherit from `TorchvisionBackend` or `PilBackend` (defined in `image_processing_backends.py`), which provide ready-made implementations of all standard operations (`resize`, `rescale`, `normalize`, `center_crop`, `pad`) and a default `_preprocess` pipeline. `BaseImageProcessor` (in `image_processing_utils`) handles shared preprocessing boilerplate: kwargs validation, default-filling from class attributes, and input preparation. Model-specific processors contain only what is unique to the model. Most processors inherit from a backend and declare class-attribute defaults. Only those with custom logic (e.g. patch tiling) need to override `_preprocess`.
+
+The `image_processing_utils_fast` module has been removed; all shared logic now lives in `image_processing_utils`.
+
+#### `use_fast` is replaced by `backend`
+
+The `use_fast` parameter is deprecated. Use `backend` instead:
+
+```python
+# v4
+processor = AutoImageProcessor.from_pretrained("...", use_fast=True)   # torchvision
+processor = AutoImageProcessor.from_pretrained("...", use_fast=False)  # PIL
+
+# v5
+processor = AutoImageProcessor.from_pretrained("...", backend="torchvision")
+processor = AutoImageProcessor.from_pretrained("...", backend="pil")
+```
+
+When `backend` is not specified, the default is `"torchvision"` if torchvision is installed, otherwise `"pil"`. If the requested backend is unavailable, loading falls back to another available backend with a warning.
+
+#### `FooImageProcessorFast` class names are deprecated
+
+`FooImageProcessor` now refers to the torchvision-backed class (what was previously `FooImageProcessorFast`), and `FooImageProcessorPil` is the PIL-backed class (what was previously `FooImageProcessor`). Importing a `*Fast` class name still resolves correctly but emits a deprecation warning.
+
+#### `is_fast` property is deprecated
+
+Use `processor.backend == "torchvision"` instead of `processor.is_fast`.
+
+#### `AutoImageProcessor.register()` API change
+
+`slow_image_processor_class` and `fast_image_processor_class` are deprecated in favor of an `image_processor_classes` dict:
+
+```python
+# v4
+AutoImageProcessor.register(MyConfig, slow_image_processor_class=MyPilProcessor, fast_image_processor_class=MyTorchvisionProcessor)
+
+# v5
+AutoImageProcessor.register(MyConfig, image_processor_classes={"pil": MyPilProcessor, "torchvision": MyTorchvisionProcessor})
+```
+
+#### Custom backends
+
+The backend key space is open-ended. Any string (e.g. `"mlx"`, `"onnx"`) can be registered by subclassing `BaseImageProcessor`, implementing `process_image` and `_preprocess`, and calling `register_backend` on the processor class:
+
+```python
+LlavaNextImageProcessor.register_backend(name="mlx", backend_class=LlavaNextMlxProcessor, availability_check=lambda: is_mlx_available())
+processor = LlavaNextImageProcessor.from_pretrained("...", backend="mlx")
+```
+
+Linked PR: https://github.com/huggingface/transformers/pull/43514
+
+## Modeling
+
+- Some `RotaryEmbeddings` layers will start returning a dict of tuples, in case the model uses several RoPE configurations (Gemma2, ModernBert). Each value will be a tuple of "cos, sin" per RoPE type.
+- Config attribute for `RotaryEmbeddings` layer will be unified and accessed via `config.rope_parameters`. Config attr for `rope_theta` might not be accessible anymore for some models, and instead will be in `config.rope_parameters['rope_theta']`. BC will be supported for a while as much as possible, and in the near future we'll gradually move to the new RoPE format  (https://github.com/huggingface/transformers/pull/39847)
+- Vision Language models will not have a shortcut access to its language and vision component from the generative model via `model.language_model`. It is recommended to either access the module with `model.model.language_model` or `model.get_decoder()`. See [#42156](https://github.com/huggingface/transformers/pull/42156/)
+
+### Generate
+
+- Old, deprecated output type aliases were removed (e.g. `GreedySearchEncoderDecoderOutput`). We now only have 4 output classes built from the following matrix: decoder-only vs encoder-decoder, uses beams vs doesn't use beams (https://github.com/huggingface/transformers/pull/40998)
+- Removed deprecated classes regarding decoding methods that were moved to the Hub due to low usage (constraints and beam scores) (https://github.com/huggingface/transformers/pull/41223)
+- If `generate` doesn't receive any KV Cache argument, the default cache class used is now defined by the model (as opposed to always being `DynamicCache`) (https://github.com/huggingface/transformers/pull/41505)
+- Generation parameters are no longer accessible via model's config. If generation parameters are serialized in `config.json` for any old model, it will be loaded back into model's generation config. Users are expected to access or modify generation parameters only with `model.generation_config.do_sample = True`.
+
+## Trainer
+
+### Removing arguments without deprecation cycle in `TrainingArguments` due to low usage
+
+- `mp_parameters` -> legacy param that was later on added to sagemaker trainer
+- `_n_gpu` -> not intended for users to set, we will initialize it correctly instead of putting it in the `TrainingArguments`
+- `overwrite_output_dir` - > replaced by `resume_from_checkpoint` and it was only used in examples script, no impact on Trainer.
+- `logging_dir` -> only used for tensorboard, set `TENSORBOARD_LOGGING_DIR` env var instead
+- `jit_mode_eval` -> use `use_torch_compile` instead as torchscript is not recommended anymore
+- `tpu_num_cores`-> It is actually better to remove it as it is not recommended to set the number of cores. By default, all tpu cores are used . Set `TPU_NUM_CORES` env var instead
+- `past_index` -> it was only used for a very small number of models that have special architecture like transformersxl + it was not documented at all how to train those model
+- `ray_scope` -> only for a minor arg for ray integration. Set `RAY_SCOPE` var env instead
+- `warmup_ratio` -> use `warmup_step` instead. We combined both args together by allowing passing float values in `warmup_step`.
+
+### Removing deprecated arguments in `TrainingArguments`
+
+- `fsdp_min_num_params` and `fsdp_transformer_layer_cls_to_wrap` -> use `fsdp_config`
+- `tpu_metrics_debug` -> `debug`
+- `push_to_hub_token` -> `hub_token`
+- `push_to_hub_model_id` and `push_to_hub_organization` -> `hub_model_id`
+- `include_inputs_for_metrics` -> `include_for_metrics`
+- `per_gpu_train_batch_size` -> `per_device_train_batch_size`
+- `per_gpu_eval_batch_size` -> `per_device_eval_batch_size`
+- `use_mps_device` -> mps will be used by default if detected
+- `fp16_backend` and `half_precision_backend` -> we will only rely on torch.amp as everything has been upstream to torch
+- `no_cuda` -> `use_cpu`
+- `include_tokens_per_second` -> `include_num_input_tokens_seen`
+- `use_legacy_prediction_loop` -> we only use `evaluation_loop` function from now on
+
+### Removing deprecated arguments in `Trainer`
+
+- `tokenizer` in initialization -> `processing_class`
+- `model_path` in train() -> `resume_from_checkpoint`
+
+### Removed features for `Trainer`
+
+- sigpot integration for hp search was removed as the library was archived + the api stopped working
+- drop support for sagemaker API <1.10
+- bump accelerate minimum version to 1.1.0
+
+###  New defaults for `Trainer`
+
+- `use_cache` in the model config will be set to `False`. You can still change the cache value through `TrainingArguments` `use_cache` argument if needed.
+
+## Pipelines
+
+### Text pipelines that should just be LLMs
+
+`question-answering` and `Text2TextGenerationPipeline`, including its related `SummarizationPipeline` and `TranslationPipeline`, were deprecated and will now be removed. `pipeline` classes are intended as a high-level beginner-friendly API,
+but for almost all text-to-text or question-answering tasks a modern chat model and `TextGenerationPipeline` will provide much higher quality output.
+As a result, we felt it was misleading for beginners to offer the older pipelines.
+
+If you were using these pipelines before, try using `TextGenerationPipeline` with a chat model instead. For example, for summarization:
+
+```python
+import torch
+from transformers import pipeline
+
+# Any other chat model will also work - if you're low on memory you can use a smaller one
+summarizer = pipeline("text-generation", model="Qwen/Qwen3-4B-Instruct-2507")
+message_history = [
+    {
+        "role": "user",
+        "content": "Summarize the following text:\n\n[TEXT_TO_SUMMARIZE]"
+    }
+]
+print(summarizer(message_history)[0]["generated_text"][-1]["content"])
+```
+
+The above example can be adapted for other tasks, e.g. translation or question answering, simply by changing the prompt.
+
+### Vision pipelines that should just be VLMs
+
+Similarly, the `image-to-text` and `visual-question-answering` pipelines have been removed. For image captioning or question answering
+tasks we recommend using a modern vision-language chat model via the `image-text-to-text` pipeline. For example:
+
+```python
+import torch
+from transformers import pipeline
+
+# Any other VLM will also work - if you're low on memory you can use a smaller one
+captioner = pipeline("image-text-to-text", model="Qwen/Qwen3-VL-4B-Instruct")
+message_history = [
+    {
+        "role": "user",
+        "content": [
+            {
+                "type": "image",
+                "image": "[IMAGE_URL_HERE]",
+            },
+            {"type": "text", "text": "Describe this image."},
+        ],
+    }
+]
+print(captioner(message_history)[0]["generated_text"][-1]["content"])
+```
+
+The above example can be adapted for visual question answering simply by asking the question in the prompt.
+
+### Other removed pipelines
+
+The `image-to-image` pipeline has been removed, as it was rarely updated or used. For most image generation tasks, you
+probably want [🤗 Diffusers](https://huggingface.co/docs/diffusers/index) instead!
+
+### Other changes
+
+- Image text to text pipelines will no longer accept images as a separate argument along with conversation chats. Image data has to be embedded in the chat's "content" field. See [#42359](https://github.com/huggingface/transformers/pull/42359)
+
+## PushToHubMixin
+
+- removed deprecated `organization` and `repo_url` from `PushToHubMixin`. You must pass a `repo_id` instead.
+- removed `ignore_metadata_errors` from `PushToMixin`. In practice if we ignore errors while loading the model card, we won't be able to push the card back to the Hub so it's better to fail early and not provide the option to fail later.
+- `push_to_hub` do not accept `**kwargs` anymore. All accepted parameters are explicitly documented.
+- arguments of `push_to_hub` are now keyword-only to avoid confusion. Only `repo_id` can be positional since it's the main arg.
+- removed `use_temp_dir` argument from `push_to_hub`. We now use a tmp dir in all cases.
+
+Linked PR: https://github.com/huggingface/transformers/pull/42391.
+
+## CLI
+
+The deprecated `transformers-cli ...` command was deprecated, `transformers ...` is now the only CLI entry point.
+
+`transformers` CLI has been migrated to `Typer`, making it easier to maintain + adding some nice features out of
+the box (improved `--help` section, autocompletion).
+
+Biggest breaking change is in `transformers chat`. This command starts a terminal UI to interact with a chat model.
+It used to also be able to start a Chat Completion server powered by `transformers` and chat with it. In this revamped
+version, this feature has been removed in favor of `transformers serve`. The goal of splitting `transformers chat`
+and `transformers serve` is to define clear boundaries between client and server code. It helps with maintenance
+but also makes the commands less bloated. The new signature of `transformers chat` is:
+
+```
+Usage: transformers chat [OPTIONS] BASE_URL MODEL_ID [GENERATE_FLAGS]...
+
+Chat with a model from the command line.
+```
+
+It works hand in hand with `transformers serve`, which means that if `transformers serve` is running on its default endpoint, `transformers chat` can be launched as follows:
+
+```sh
+transformers chat HuggingFaceTB/SmolLM3-3B
+```
+
+It can however use any OpenAI API compatible HTTP endpoint:
+
+```sh
+transformers chat HuggingFaceTB/SmolLM3-3B https://router.huggingface.co/v1
+```
+
+Linked PRs:
+- https://github.com/huggingface/transformers/pull/40997
+- https://github.com/huggingface/transformers/pull/41487
+
+
+### Removal of the `run` method
+
+The `transformers run` (previously `transformers-cli run`) is an artefact of the past, was not documented nor tested,
+and isn't part of any public documentation. We're removing it for now and ask you to please let us know in case
+this is a method you are using; in which case we should bring it back with better support.
+
+Linked PR: https://github.com/huggingface/transformers/pull/42447
+
+## Environment variables
+
+- Legacy environment variables like `TRANSFORMERS_CACHE`, `PYTORCH_TRANSFORMERS_CACHE`, and `PYTORCH_PRETRAINED_BERT_CACHE` have been removed. Please use `HF_HOME` instead.
+- Constants `HUGGINGFACE_CO_EXAMPLES_TELEMETRY`, `HUGGINGFACE_CO_EXAMPLES_TELEMETRY`, `HUGGINGFACE_CO_PREFIX`, and `HUGGINGFACE_CO_RESOLVE_ENDPOINT` have been removed. Please use `huggingface_hub.constants.ENDPOINT` instead.
+
+Linked PR: https://github.com/huggingface/transformers/pull/42391.
+
+## Requirements update
+
+`transformers` v5 pins the `huggingface_hub` version to `>=1.0.0`. See this [migration guide](https://huggingface.co/docs/huggingface_hub/concepts/migration) to learn more about this major release. Here are to main aspects to know about:
+- switched the HTTP backend from `requests` to `httpx`. This change was made to improve performance and to support both synchronous and asynchronous requests the same way. If you are currently catching `requests.HTTPError` errors in your codebase, you'll need to switch to `httpx.HTTPError`.
+- related to 1., it is not possible to set proxies from your script. To handle proxies, you must set the `HTTP_PROXY` / `HTTPS_PROXY` environment variables
+- `hf_transfer` and therefore `HF_HUB_ENABLE_HF_TRANSFER` have been completed dropped in favor of `hf_xet`. This should be transparent for most users. Please let us know if you notice any downside!
+
+`typer-slim` has been added as required dependency, used to implement both `hf` and `transformers` CLIs.
--- a/107
+++ b/107
@@ -0,0 +1,107 @@
+# make sure to test the local checkout in scripts and not the pre-installed one (don't use quotes!)
+export PYTHONPATH = src
+
+.PHONY: style typing check-code-quality check-repository-consistency check-repo fix-repo test test-examples benchmark codex claude clean-ai
+
+
+# Checker lists. The two CI jobs (CircleCI runs `make check-code-quality` and
+# `make check-repository-consistency` in parallel) own the canonical sets below.
+# Local convenience targets `check-repo` and `fix-repo` are *derived* from them,
+# so they can never drift out of sync (e.g. silently dropping `auto_mappings`
+# from CI, as happened in #45018 → fixed in #45774).
+
+STYLE_CHECKERS := ruff_check, ruff_format, init_isort, sort_auto_mappings
+TYPING_CHECKERS := types, modeling_structure
+CODE_QUALITY_CHECKERS := $(TYPING_CHECKERS), $(STYLE_CHECKERS)
+
+REPO_CONSISTENCY_CHECKERS := \
+	auto_mappings, \
+	imports, \
+	import_complexity, \
+	copies, \
+	modular_conversion, \
+	doc_toc, \
+	modeling_rules_doc, \
+	docstrings, \
+	dummies, \
+	repo, \
+	inits, \
+	pipeline_typing, \
+	config_docstrings, \
+	config_attributes, \
+	doctest_list, \
+	update_metadata, \
+	add_dates, \
+	deps_table
+
+ALL_CHECKERS := $(CODE_QUALITY_CHECKERS), $(REPO_CONSISTENCY_CHECKERS)
+
+
+# Runs all linting/formatting scripts, most notably ruff
+style:
+	@python utils/checkers.py $(STYLE_CHECKERS) --fix
+
+# Runs ty type checker and model structure rules
+typing:
+	@python utils/checkers.py $(TYPING_CHECKERS)
+
+# Runs typing, ruff linting/formatting, import-order checks and auto-mappings
+check-code-quality:
+	@python utils/checkers.py $(CODE_QUALITY_CHECKERS)
+
+# Runs a full repository consistency check.
+check-repository-consistency:
+	@python utils/checkers.py $(REPO_CONSISTENCY_CHECKERS)
+
+# Runs typing and formatting checks + repository consistency check (ignores errors)
+check-repo:
+	@python utils/checkers.py $(ALL_CHECKERS) --keep-going
+
+# Run all repo checks for which there is an automatic fix, most notably modular conversions
+fix-repo:
+	@python utils/checkers.py $(ALL_CHECKERS) --fix --keep-going
+
+# Run tests for the library, requires pytest-random-order
+test:
+	python -m pytest -p random_order -n auto --dist=loadfile -s -v --random-order-bucket=module ./tests/
+
+# Run tests for examples, requires pytest-random-order
+test-examples:
+	python -m pytest -p random_order -n auto --dist=loadfile -s -v --random-order-bucket=module ./examples/pytorch/
+
+# Run benchmark
+benchmark:
+	python3 benchmark/benchmark.py --config-dir benchmark/config --config-name generation --commit=diff backend.model=google/gemma-2b backend.cache_implementation=null,static backend.torch_compile=false,true --multirun
+
+codex:
+	mkdir -p .agents
+	rm -rf .agents/skills
+	ln -snf ../.ai/skills .agents/skills
+
+claude:
+	mkdir -p .claude
+	rm -rf .claude/skills
+	ln -snf ../.ai/skills .claude/skills
+
+clean-ai:
+	rm -rf .agents/skills .claude/skills
+
+
+# Release stuff
+pre-release:
+	python utils/release.py
+
+pre-patch:
+	python utils/release.py --patch
+
+post-release:
+	python utils/release.py --post_release
+
+post-patch:
+	python utils/release.py --post_release --patch
+
+build-release:
+	rm -rf dist
+	rm -rf build
+	python setup.py bdist_wheel
+	python setup.py sdist
--- a/README.md
+++ b/README.md
@@ -0,0 +1,337 @@
+<!---
+Copyright 2020 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+<p align="center">
+  <picture>
+    <source media="(prefers-color-scheme: dark)" srcset="https://huggingface.co/datasets/huggingface/documentation-images/raw/main/transformers-logo-dark.svg">
+    <source media="(prefers-color-scheme: light)" srcset="https://huggingface.co/datasets/huggingface/documentation-images/raw/main/transformers-logo-light.svg">
+    <img alt="Hugging Face Transformers Library" src="https://huggingface.co/datasets/huggingface/documentation-images/raw/main/transformers-logo-light.svg" width="352" height="59" style="max-width: 100%;">
+  </picture>
+  <br/>
+  <br/>
+</p>
+
+<p align="center">
+    <a href="https://huggingface.com/models"><img alt="Checkpoints on Hub" src="https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen"></a>
+    <a href="https://circleci.com/gh/huggingface/transformers"><img alt="Build" src="https://img.shields.io/circleci/build/github/huggingface/transformers/main"></a>
+    <a href="https://github.com/huggingface/transformers/blob/main/LICENSE"><img alt="GitHub" src="https://img.shields.io/github/license/huggingface/transformers.svg?color=blue"></a>
+    <a href="https://huggingface.co/docs/transformers/index"><img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers/index.svg?down_color=red&down_message=offline&up_message=online"></a>
+    <a href="https://github.com/huggingface/transformers/releases"><img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/transformers.svg"></a>
+    <a href="https://github.com/huggingface/transformers/blob/main/CODE_OF_CONDUCT.md"><img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg"></a>
+    <a href="https://zenodo.org/badge/latestdoi/155220641"><img src="https://zenodo.org/badge/155220641.svg" alt="DOI"></a>
+</p>
+
+<h4 align="center">
+    <p>
+        <b>English</b> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_zh-hans.md">简体中文</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_zh-hant.md">繁體中文</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ko.md">한국어</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_es.md">Español</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ja.md">日本語</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_hd.md">हिन्दी</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ru.md">Русский</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_pt-br.md">Português</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_te.md">తెలుగు</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_fr.md">Français</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_de.md">Deutsch</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_it.md">Italiano</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_vi.md">Tiếng Việt</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ar.md">العربية</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ur.md">اردو</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_bn.md">বাংলা</a> |
+        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_fa.md">فارسی</a> |
+    </p>
+</h4>
+
+<h3 align="center">
+    <p>State-of-the-art pretrained models for inference and training</p>
+</h3>
+
+<h3 align="center">
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_as_a_model_definition.png"/>
+</h3>
+
+Transformers acts as the model-definition framework for state-of-the-art machine learning with text, computer
+vision, audio, video, and multimodal models, for both inference and training.
+
+It centralizes the model definition so that this definition is agreed upon across the ecosystem. `transformers` is the
+pivot across frameworks: if a model definition is supported, it will be compatible with the majority of training
+frameworks (Axolotl, Unsloth, DeepSpeed, FSDP, PyTorch-Lightning, ...), inference engines (vLLM, SGLang, TGI, ...),
+and adjacent modeling libraries (llama.cpp, mlx, ...) which leverage the model definition from `transformers`.
+
+We pledge to help support new state-of-the-art models and democratize their usage by having their model definition be
+simple, customizable, and efficient.
+
+There are over 1M+ Transformers [model checkpoints](https://huggingface.co/models?library=transformers&sort=trending) on the [Hugging Face Hub](https://huggingface.co/models) you can use.
+
+Explore the [Hub](https://huggingface.co/) today to find a model and use Transformers to help you get started right away.
+
+## Installation
+
+Transformers works with Python 3.10+, and [PyTorch](https://pytorch.org/get-started/locally/) 2.4+.
+
+Create and activate a virtual environment with [venv](https://docs.python.org/3/library/venv.html) or [uv](https://docs.astral.sh/uv/), a fast Rust-based Python package and project manager.
+
+```py
+# venv
+python -m venv .my-env
+source .my-env/bin/activate
+# uv
+uv venv .my-env
+source .my-env/bin/activate
+```
+
+Install Transformers in your virtual environment.
+
+```py
+# pip
+pip install "transformers[torch]"
+
+# uv
+uv pip install "transformers[torch]"
+```
+
+Install Transformers from source if you want the latest changes in the library or are interested in contributing. However, the *latest* version may not be stable. Feel free to open an [issue](https://github.com/huggingface/transformers/issues) if you encounter an error.
+
+```shell
+git clone https://github.com/huggingface/transformers.git
+cd transformers
+
+# pip
+pip install '.[torch]'
+
+# uv
+uv pip install '.[torch]'
+```
+
+## Quickstart
+
+Get started with Transformers right away with the [Pipeline](https://huggingface.co/docs/transformers/pipeline_tutorial) API. The `Pipeline` is a high-level inference class that supports text, audio, vision, and multimodal tasks. It handles preprocessing the input and returns the appropriate output.
+
+Instantiate a pipeline and specify model to use for text generation. The model is downloaded and cached so you can easily reuse it again. Finally, pass some text to prompt the model.
+
+```py
+from transformers import pipeline
+
+pipeline = pipeline(task="text-generation", model="Qwen/Qwen2.5-1.5B")
+pipeline("the secret to baking a really good cake is ")
+[{'generated_text': 'the secret to baking a really good cake is 1) to use the right ingredients and 2) to follow the recipe exactly. the recipe for the cake is as follows: 1 cup of sugar, 1 cup of flour, 1 cup of milk, 1 cup of butter, 1 cup of eggs, 1 cup of chocolate chips. if you want to make 2 cakes, how much sugar do you need? To make 2 cakes, you will need 2 cups of sugar.'}]
+```
+
+To chat with a model, the usage pattern is the same. The only difference is you need to construct a chat history (the input to `Pipeline`) between you and the system.
+
+> [!TIP]
+> You can also chat with a model directly from the command line, as long as [`transformers serve` is running](https://huggingface.co/docs/transformers/main/en/serving).
+> ```shell
+> transformers chat Qwen/Qwen2.5-0.5B-Instruct
+> ```
+
+```py
+import torch
+from transformers import pipeline
+
+chat = [
+    {"role": "system", "content": "You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986."},
+    {"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"}
+]
+
+pipeline = pipeline(task="text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct", dtype=torch.bfloat16, device_map="auto")
+response = pipeline(chat, max_new_tokens=512)
+print(response[0]["generated_text"][-1]["content"])
+```
+
+Expand the examples below to see how `Pipeline` works for different modalities and tasks.
+
+<details>
+<summary>Automatic speech recognition</summary>
+
+```py
+from transformers import pipeline
+
+pipeline = pipeline(task="automatic-speech-recognition", model="openai/whisper-large-v3")
+pipeline("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
+{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}
+```
+
+</details>
+
+<details>
+<summary>Image classification</summary>
+
+<h3 align="center">
+    <a><img src="https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png"></a>
+</h3>
+
+```py
+from transformers import pipeline
+
+pipeline = pipeline(task="image-classification", model="facebook/dinov2-small-imagenet1k-1-layer")
+pipeline("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png")
+[{'label': 'macaw', 'score': 0.997848391532898},
+ {'label': 'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita',
+  'score': 0.0016551691805943847},
+ {'label': 'lorikeet', 'score': 0.00018523589824326336},
+ {'label': 'African grey, African gray, Psittacus erithacus',
+  'score': 7.85409429227002e-05},
+ {'label': 'quail', 'score': 5.502637941390276e-05}]
+```
+
+</details>
+
+<details>
+<summary>Visual question answering</summary>
+
+<h3 align="center">
+    <a><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/idefics-few-shot.jpg"></a>
+</h3>
+
+```py
+from transformers import pipeline
+
+pipeline = pipeline(task="visual-question-answering", model="Salesforce/blip-vqa-base")
+pipeline(
+    image="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/idefics-few-shot.jpg",
+    question="What is in the image?",
+)
+[{'answer': 'statue of liberty'}]
+```
+
+</details>
+
+## Why should I use Transformers?
+
+1. Easy-to-use state-of-the-art models:
+    - High performance on natural language understanding & generation, computer vision, audio, video, and multimodal tasks.
+    - Low barrier to entry for researchers, engineers, and developers.
+    - Few user-facing abstractions with just three classes to learn.
+    - A unified API for using all our pretrained models.
+
+1. Lower compute costs, smaller carbon footprint:
+    - Share trained models instead of training from scratch.
+    - Reduce compute time and production costs.
+    - Hundreds of model architectures with 1M+ pretrained checkpoints across all modalities.
+
+1. Choose the right framework for every part of a model's lifetime:
+    - Train state-of-the-art models in 3 lines of code.
+    - Move a single model between PyTorch/JAX/TF2.0 frameworks at will.
+    - Pick the right framework for training, evaluation, and production.
+
+1. Easily customize a model or an example to your needs:
+    - We provide examples for each architecture to reproduce the results published by its original authors.
+    - Model internals are exposed as consistently as possible.
+    - Model files can be used independently of the library for quick experiments.
+
+<a target="_blank" href="https://huggingface.co/enterprise">
+    <img alt="Hugging Face Enterprise Hub" src="https://github.com/user-attachments/assets/247fb16d-d251-4583-96c4-d3d76dda4925">
+</a><br>
+
+## When shouldn't I use Transformers?
+
+- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving into additional abstractions/files.
+- The training API is optimized to work with PyTorch models provided by Transformers. For generic machine learning loops, you should use another library like [Accelerate](https://huggingface.co/docs/accelerate).
+- The [example scripts](https://github.com/huggingface/transformers/tree/main/examples) are only *examples*. They may not necessarily work out-of-the-box on your specific use case and you'll need to adapt the code for it to work.
+
+## 100 projects using Transformers
+
+Transformers is more than a toolkit to use pretrained models, it's a community of projects built around it and the
+Hugging Face Hub. We want Transformers to enable developers, researchers, students, professors, engineers, and anyone
+else to build their dream projects.
+
+In order to celebrate Transformers 100,000 stars, we wanted to put the spotlight on the
+community with the [awesome-transformers](./awesome-transformers.md) page which lists 100
+incredible projects built with Transformers.
+
+If you own or use a project that you believe should be part of the list, please open a PR to add it!
+
+## Example models
+
+You can test most of our models directly on their [Hub model pages](https://huggingface.co/models).
+
+Expand each modality below to see a few example models for various use cases.
+
+<details>
+<summary>Audio</summary>
+
+- Audio classification with [CLAP](https://huggingface.co/laion/clap-htsat-fused)
+- Automatic speech recognition with [Parakeet](https://huggingface.co/nvidia/parakeet-ctc-1.1b#transcribing-using-transformers-%F0%9F%A4%97), [Whisper](https://huggingface.co/openai/whisper-large-v3-turbo), [GLM-ASR](https://huggingface.co/zai-org/GLM-ASR-Nano-2512) and [Moonshine-Streaming](https://huggingface.co/UsefulSensors/moonshine-streaming-medium)
+- Keyword spotting with [Wav2Vec2](https://huggingface.co/superb/wav2vec2-base-superb-ks)
+- Speech to speech generation with [Moshi](https://huggingface.co/kyutai/moshiko-pytorch-bf16)
+- Text to audio with [MusicGen](https://huggingface.co/facebook/musicgen-large)
+- Text to speech with [CSM](https://huggingface.co/sesame/csm-1b)
+
+</details>
+
+<details>
+<summary>Computer vision</summary>
+
+- Automatic mask generation with [SAM](https://huggingface.co/facebook/sam-vit-base)
+- Depth estimation with [DepthPro](https://huggingface.co/apple/DepthPro-hf)
+- Image classification with [DINO v2](https://huggingface.co/facebook/dinov2-base)
+- Keypoint detection with [SuperPoint](https://huggingface.co/magic-leap-community/superpoint)
+- Keypoint matching with [SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor)
+- Object detection with [RT-DETRv2](https://huggingface.co/PekingU/rtdetr_v2_r50vd)
+- Pose Estimation with [VitPose](https://huggingface.co/usyd-community/vitpose-base-simple)
+- Universal segmentation with [OneFormer](https://huggingface.co/shi-labs/oneformer_ade20k_swin_large)
+- Video classification with [VideoMAE](https://huggingface.co/MCG-NJU/videomae-large)
+
+</details>
+
+<details>
+<summary>Multimodal</summary>
+
+- Audio or text to text with [Voxtral](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507), [Audio Flamingo](https://huggingface.co/nvidia/audio-flamingo-3-hf)
+- Document question answering with [LayoutLMv3](https://huggingface.co/microsoft/layoutlmv3-base)
+- Image or text to text with [Qwen-VL](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)
+- Image captioning [BLIP-2](https://huggingface.co/Salesforce/blip2-opt-2.7b)
+- OCR-based document understanding with [GOT-OCR2](https://huggingface.co/stepfun-ai/GOT-OCR-2.0-hf)
+- Table question answering with [TAPAS](https://huggingface.co/google/tapas-base)
+- Unified multimodal understanding and generation with [Emu3](https://huggingface.co/BAAI/Emu3-Gen)
+- Vision to text with [Llava-OneVision](https://huggingface.co/llava-hf/llava-onevision-qwen2-0.5b-ov-hf)
+- Visual question answering with [Llava](https://huggingface.co/llava-hf/llava-1.5-7b-hf)
+- Visual referring expression segmentation with [Kosmos-2](https://huggingface.co/microsoft/kosmos-2-patch14-224)
+
+</details>
+
+<details>
+<summary>NLP</summary>
+
+- Masked word completion with [ModernBERT](https://huggingface.co/answerdotai/ModernBERT-base)
+- Named entity recognition with [Gemma](https://huggingface.co/google/gemma-2-2b)
+- Question answering with [Mixtral](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)
+- Summarization with [BART](https://huggingface.co/facebook/bart-large-cnn)
+- Translation with [T5](https://huggingface.co/google-t5/t5-base)
+- Text generation with [Llama](https://huggingface.co/meta-llama/Llama-3.2-1B)
+- Text classification with [Qwen](https://huggingface.co/Qwen/Qwen2.5-0.5B)
+
+</details>
+
+## Citation
+
+We now have a [paper](https://aclanthology.org/2020.emnlp-demos.6/) you can cite for the 🤗 Transformers library:
+```bibtex
+@inproceedings{wolf-etal-2020-transformers,
+    title = "Transformers: State-of-the-Art Natural Language Processing",
+    author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
+    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
+    month = oct,
+    year = "2020",
+    address = "Online",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2020.emnlp-demos.6/",
+    pages = "38--45"
+}
+```
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -0,0 +1,32 @@
+# Security Policy
+
+## Hugging Face Hub, remote artefacts, and remote code
+
+Transformers is open-source software that is tightly coupled to the Hugging Face Hub. While you have the ability to use it
+offline with pre-downloaded model weights, it provides a very simple way to download, use, and manage models locally.
+
+When downloading artefacts that have been uploaded by others on any platform, you expose yourself to risks. Please
+read below for the security recommendations in order to keep your runtime and local environment safe.
+
+### Remote artefacts
+
+Models uploaded on the Hugging Face Hub come in different formats. We heavily recommend uploading and downloading
+models in the [`safetensors`](https://github.com/huggingface/safetensors) format (which is the default prioritized
+by the transformers library), as developed specifically to prevent arbitrary code execution on your system.
+
+To avoid loading models from unsafe formats (e.g. [pickle](https://docs.python.org/3/library/pickle.html), you should use the `use_safetensors` parameter. If doing so, in the event that no .safetensors file is present, transformers will error when loading the model.
+
+### Remote code
+
+#### Modeling
+
+Transformers supports many model architectures, but is also the bridge between your Python runtime and models that
+are stored in model repositories on the Hugging Face Hub.
+
+These models require the `trust_remote_code=True` parameter to be set when using them; please **always** verify
+the content of the modeling files when using this argument. We recommend setting a revision in order to ensure you
+protect yourself from updates on the repository.
+
+## Reporting a Vulnerability
+
+Feel free to submit vulnerability reports to [security@huggingface.co](mailto:security@huggingface.co), where someone from the HF security team will review and recommend next steps. If reporting a vulnerability specific to open source, please note [Huntr](https://huntr.com) is a vulnerability disclosure program for open source software.
--- a/awesome-transformers.md
+++ b/awesome-transformers.md
@@ -0,0 +1,614 @@
+# Awesome projects built with Transformers
+
+This page lists awesome projects built on top of Transformers. Transformers is more than a toolkit to use pretrained
+models: it's a community of projects built around it and the Hugging Face Hub. We want Transformers to enable
+developers, researchers, students, professors, engineers, and anyone else to build their dream projects.
+
+In this list, we showcase incredibly impactful and novel projects that have pushed the field forward. We celebrate
+100 of these projects as we reach the milestone of 100k stars as a community; but we're very open to pull requests
+adding other projects to the list. If you believe a project should be here and it's not, then please, open a PR
+to add it.
+
+## [◉ Universal Intelligence](https://github.com/blueraai/universal-intelligence)
+
+[Universal Intelligence](https://github.com/blueraai/universal-intelligence) aims to standardize models, tools, and agents —transforming them into simple, composable, portable, interoperable, framework-agnostic, hardware-agnostic interfaces (through auto-negotiation and resource sharing); for fast and accessible development of AI applications.
+
+Keywords: Protocol, Open-source, LLMs, Large Language Models, Agents, Low-code
+
+## [gpt4all](https://github.com/nomic-ai/gpt4all)
+
+[gpt4all](https://github.com/nomic-ai/gpt4all) is an ecosystem of open-source chatbots trained on massive collections of clean assistant data including code, stories and dialogue. It offers open-source, large language models such as LLaMA and GPT-J trained in an assistant-style.
+
+Keywords: Open-source, LLaMa, GPT-J, instruction, assistant
+
+## [recommenders](https://github.com/recommenders-team/recommenders)
+
+This repository contains examples and best practices for building recommendation systems, provided as Jupyter notebooks. It goes over several aspects required to build efficient recommendation systems: data preparation, modeling, evaluation, model selection & optimization, as well as operationalization
+
+Keywords: Recommender systems, AzureML
+
+## [IOPaint](https://github.com/Sanster/IOPaint)
+
+Image inpainting tool powered by Stable Diffusion. Remove any unwanted object, defect, people from your pictures or erase and replace anything on your pictures.
+
+Keywords: inpainting, SD, Stable Diffusion
+
+## [flair](https://github.com/flairNLP/flair)
+
+FLAIR is a powerful PyTorch NLP framework, covering several important tasks: NER, sentiment-analysis, part-of-speech tagging, text and document embeddings, among other things.
+
+Keywords: NLP, text embedding, document embedding, biomedical, NER, PoS, sentiment-analysis
+
+## [mindsdb](https://github.com/mindsdb/mindsdb)
+
+MindsDB is a low-code ML platform, which automates and integrates several ML frameworks into the data stack as "AI Tables" to streamline the integration of AI into applications, making it accessible to developers of all skill levels.
+
+Keywords: Database, low-code, AI table
+
+## [langchain](https://github.com/langchain-ai/langchain)
+
+[langchain](https://github.com/langchain-ai/langchain) is aimed at assisting in the development of apps merging both LLMs and other sources of knowledge. The library allows chaining calls to applications, creating a sequence across many tools.
+
+Keywords: LLMs, Large Language Models, Agents, Chains
+
+## [LlamaIndex](https://github.com/run-llama/llama_index)
+
+[LlamaIndex](https://github.com/run-llama/llama_index) is a project that provides a central interface to connect your LLM's with external data. It provides various kinds of indices and retrieval mechanisms to perform different LLM tasks and obtain knowledge-augmented results.
+
+Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation
+
+## [ParlAI](https://github.com/facebookresearch/ParlAI)
+
+[ParlAI](https://github.com/facebookresearch/ParlAI) is a python framework for sharing, training and testing dialogue models, from open-domain chitchat, to task-oriented dialogue, to visual question answering. It provides more than 100 datasets under the same API, a large zoo of pretrained models, a set of agents, and has several integrations.
+
+Keywords: Dialogue, Chatbots, VQA, Datasets, Agents
+
+## [sentence-transformers](https://github.com/UKPLab/sentence-transformers)
+
+This framework provides an easy method to compute dense vector representations for sentences, paragraphs, and images. The models are based on transformer networks like BERT / RoBERTa / XLM-RoBERTa etc. and achieve state-of-the-art performance in various task. Text is embedding in vector space such that similar text is close and can efficiently be found using cosine similarity.
+
+Keywords: Dense vector representations, Text embeddings, Sentence embeddings
+
+## [ludwig](https://github.com/ludwig-ai/ludwig)
+
+Ludwig is a declarative machine learning framework that makes it easy to define machine learning pipelines using a simple and flexible data-driven configuration system. Ludwig is targeted at a wide variety of AI tasks. It provides a data-driven configuration system, training, prediction, and evaluation scripts, as well as a programmatic API.
+
+Keywords: Declarative, Data-driven, ML Framework
+
+## [InvokeAI](https://github.com/invoke-ai/InvokeAI)
+
+[InvokeAI](https://github.com/invoke-ai/InvokeAI) is an engine for Stable Diffusion models, aimed at professionals, artists, and enthusiasts. It leverages the latest AI-driven technologies through CLI as well as a WebUI.
+
+Keywords: Stable-Diffusion, WebUI, CLI
+
+## [PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP)
+
+[PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP) is an easy-to-use and powerful NLP library particularly targeted at the Chinese languages. It has support for multiple pre-trained model zoos, and supports a wide-range of NLP tasks from research to industrial applications.
+
+Keywords: NLP, Chinese, Research, Industry
+
+## [stanza](https://github.com/stanfordnlp/stanza)
+
+The Stanford NLP Group's official Python NLP library. It contains support for running various accurate natural language processing tools on 60+ languages and for accessing the Java Stanford CoreNLP software from Python.
+
+Keywords: NLP, Multilingual, CoreNLP
+
+## [DeepPavlov](https://github.com/deeppavlov/DeepPavlov)
+
+[DeepPavlov](https://github.com/deeppavlov/DeepPavlov) is an open-source conversational AI library. It is designed for the development of production ready chat-bots and complex conversational systems, as well as research in the area of NLP and, particularly, of dialog systems.
+
+Keywords: Conversational, Chatbot, Dialog
+
+## [alpaca-lora](https://github.com/tloen/alpaca-lora)
+
+Alpaca-lora contains code for reproducing the Stanford Alpaca results using low-rank adaptation (LoRA). The repository provides training (fine-tuning) as well as generation scripts.
+
+Keywords: LoRA, Parameter-efficient fine-tuning
+
+## [imagen-pytorch](https://github.com/lucidrains/imagen-pytorch)
+
+An open-source Implementation of Imagen, Google's closed-source Text-to-Image Neural Network that beats DALL-E2. As of release, it is the new SOTA for text-to-image synthesis.
+
+Keywords: Imagen, Text-to-image
+
+## [adapters](https://github.com/adapter-hub/adapters)
+
+[adapters](https://github.com/adapter-hub/adapters) is an extension of HuggingFace's Transformers library, integrating adapters into state-of-the-art language models by incorporating AdapterHub, a central repository for pre-trained adapter modules. It is a drop-in replacement for transformers, which is regularly updated to stay up-to-date with the developments of transformers.
+
+Keywords: Adapters, LoRA, Parameter-efficient fine-tuning, Hub
+
+## [NeMo](https://github.com/NVIDIA/NeMo)
+
+NVIDIA [NeMo](https://github.com/NVIDIA/NeMo) is a conversational AI toolkit built for researchers working on automatic speech recognition (ASR), text-to-speech synthesis (TTS), large language models (LLMs), and natural language processing (NLP). The primary objective of [NeMo](https://github.com/NVIDIA/NeMo) is to help researchers from industry and academia to reuse prior work (code and pretrained models) and make it easier to create new https://developer.nvidia.com/conversational-ai#started.
+
+Keywords: Conversational, ASR, TTS, LLMs, NLP
+
+## [Runhouse](https://github.com/run-house/runhouse)
+
+[Runhouse](https://github.com/run-house/runhouse) allows to send code and data to any of your compute or data infra, all in Python, and continue to interact with them normally from your existing code and environment. Runhouse developers mention:
+
+> Think of it as an expansion pack to your Python interpreter that lets it take detours to remote machines or manipulate remote data.
+
+Keywords: MLOps, Infrastructure, Data storage, Modeling
+
+## [MONAI](https://github.com/Project-MONAI/MONAI)
+
+[MONAI](https://github.com/Project-MONAI/MONAI) is a PyTorch-based, open-source framework for deep learning in healthcare imaging, part of PyTorch Ecosystem. Its ambitions are:
+- developing a community of academic, industrial and clinical researchers collaborating on a common foundation;
+- creating state-of-the-art, end-to-end training workflows for healthcare imaging;
+- providing researchers with the optimized and standardized way to create and evaluate deep learning models.
+
+Keywords: Healthcare imaging, Training, Evaluation
+
+## [simpletransformers](https://github.com/ThilinaRajapakse/simpletransformers)
+
+Simple Transformers lets you quickly train and evaluate Transformer models. Only 3 lines of code are needed to initialize, train, and evaluate a model. It supports a wide variety of NLP tasks.
+
+Keywords: Framework, simplicity, NLP
+
+## [JARVIS](https://github.com/microsoft/JARVIS)
+
+[JARVIS](https://github.com/microsoft/JARVIS) is a system attempting to merge LLMs such as GPT-4 with the rest of the open-source ML community: leveraging up to 60 downstream models in order to perform tasks identified by the LLM.
+
+Keywords: LLM, Agents, HF Hub
+
+## [transformers.js](https://github.com/huggingface/transformers.js/)
+
+[transformers.js](https://github.com/huggingface/transformers.js/) is a JavaScript library targeted at running models from transformers directly within the browser.
+
+Keywords: Transformers, JavaScript, browser
+
+## [bumblebee](https://github.com/elixir-nx/bumblebee)
+
+Bumblebee provides pre-trained Neural Network models on top of Axon, a neural networks library for the Elixir language. It includes integration with 🤗 Models, allowing anyone to download and perform Machine Learning tasks with few lines of code.
+
+Keywords: Elixir, Axon
+
+## [argilla](https://github.com/argilla-io/argilla)
+
+Argilla is an open-source platform providing advanced NLP labeling, monitoring, and workspaces. It is compatible with many open source ecosystems such as Hugging Face, Stanza, FLAIR, and others.
+
+Keywords: NLP, Labeling, Monitoring, Workspaces
+
+## [haystack](https://github.com/deepset-ai/haystack)
+
+Haystack is an open source NLP framework to interact with your data using Transformer models and LLMs. It offers production-ready tools to quickly build complex decision making, question answering, semantic search, text generation applications, and more.
+
+Keywords: NLP, Framework, LLM
+
+## [spaCy](https://github.com/explosion/spaCy)
+
+[spaCy](https://github.com/explosion/spaCy) is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. It offers support for transformers models through its third party package, spacy-transformers.
+
+Keywords: NLP, Framework
+
+## [speechbrain](https://github.com/speechbrain/speechbrain)
+
+SpeechBrain is an open-source and all-in-one conversational AI toolkit based on PyTorch.
+The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognition, speech enhancement, speech separation, language identification, multi-microphone signal processing, and many others.
+
+Keywords: Conversational, Speech
+
+## [skorch](https://github.com/skorch-dev/skorch)
+
+Skorch is a scikit-learn compatible neural network library that wraps PyTorch. It has support for models within transformers, and tokenizers from tokenizers.
+
+Keywords: Scikit-Learn, PyTorch
+
+## [bertviz](https://github.com/jessevig/bertviz)
+
+BertViz is an interactive tool for visualizing attention in Transformer language models such as BERT, GPT2, or T5. It can be run inside a Jupyter or Colab notebook through a simple Python API that supports most Huggingface models.
+
+Keywords: Visualization, Transformers
+
+## [mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax)
+
+[mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax) is a haiku library using the xmap/pjit operators in JAX for model parallelism of transformers. This library is designed for scalability up to approximately 40B parameters on TPUv3s. It was the library used to train the GPT-J model.
+
+Keywords: Haiku, Model parallelism, LLM, TPU
+
+## [deepchem](https://github.com/deepchem/deepchem)
+
+DeepChem aims to provide a high quality open-source toolchain that democratizes the use of deep-learning in drug discovery, materials science, quantum chemistry, and biology.
+
+Keywords: Drug discovery, Materials Science, Quantum Chemistry, Biology
+
+## [OpenNRE](https://github.com/thunlp/OpenNRE)
+
+An Open-Source Package for Neural Relation Extraction (NRE). It is targeted at a wide range of users, from newcomers to relation extraction, to developers, researchers, or students.
+
+Keywords: Neural Relation Extraction, Framework
+
+## [pycorrector](https://github.com/shibing624/pycorrector)
+
+PyCorrector is a Chinese Text Error Correction Tool. It uses a language model to detect errors, pinyin feature and shape feature to correct Chinese text errors. it can be used for Chinese Pinyin and stroke input method.
+
+Keywords: Chinese, Error correction tool, Language model, Pinyin
+
+## [nlpaug](https://github.com/makcedward/nlpaug)
+
+This python library helps you with augmenting nlp for machine learning projects. It is a lightweight library featuring synthetic data generation for improving model performance, support for audio and text, and compatibility with several ecosystems (scikit-learn, pytorch, tensorflow).
+
+Keywords: Data augmentation, Synthetic data generation, Audio, NLP
+
+## [dream-textures](https://github.com/carson-katri/dream-textures)
+
+[dream-textures](https://github.com/carson-katri/dream-textures) is a library targeted at bringing stable-diffusion support within Blender. It supports several use-cases, such as image generation, texture projection, inpainting/outpainting, ControlNet, and upscaling.
+
+Keywords: Stable-Diffusion, Blender
+
+## [seldon-core](https://github.com/SeldonIO/seldon-core)
+
+Seldon core converts your ML models (Tensorflow, Pytorch, H2o, etc.) or language wrappers (Python, Java, etc.) into production REST/GRPC microservices.
+Seldon handles scaling to thousands of production machine learning models and provides advanced machine learning capabilities out of the box including Advanced Metrics, Request Logging, Explainers, Outlier Detectors, A/B Tests, Canaries and more.
+
+Keywords: Microservices, Modeling, Language wrappers
+
+## [open_model_zoo](https://github.com/openvinotoolkit/open_model_zoo)
+
+This repository includes optimized deep learning models and a set of demos to expedite development of high-performance deep learning inference applications. Use these free pre-trained models instead of training your own models to speed-up the development and production deployment process.
+
+Keywords: Optimized models, Demos
+
+## [ml-stable-diffusion](https://github.com/apple/ml-stable-diffusion)
+
+ML-Stable-Diffusion is a repository by Apple bringing Stable Diffusion support to Core ML, on Apple Silicon devices. It supports stable diffusion checkpoints hosted on the Hugging Face Hub.
+
+Keywords: Stable Diffusion, Apple Silicon, Core ML
+
+## [stable-dreamfusion](https://github.com/ashawkey/stable-dreamfusion)
+
+Stable-Dreamfusion is a pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model.
+
+Keywords: Text-to-3D, Stable Diffusion
+
+## [txtai](https://github.com/neuml/txtai)
+
+[txtai](https://github.com/neuml/txtai) is an open-source platform for semantic search and workflows powered by language models. txtai builds embeddings databases, which are a union of vector indexes and relational databases enabling similarity search with SQL. Semantic workflows connect language models together into unified applications.
+
+Keywords: Semantic search, LLM
+
+## [djl](https://github.com/deepjavalibrary/djl)
+
+Deep Java Library (DJL) is an open-source, high-level, engine-agnostic Java framework for deep learning. DJL is designed to be easy to get started with and simple to use for developers. DJL provides a native Java development experience and functions like any other regular Java library. DJL offers [a Java binding](https://github.com/deepjavalibrary/djl/tree/master/extensions/tokenizers) for HuggingFace Tokenizers and easy conversion toolkit for HuggingFace model to deploy in Java.
+
+Keywords: Java, Framework
+
+## [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/)
+
+This project provides a unified framework to test generative language models on a large number of different evaluation tasks. It has support for more than 200 tasks, and supports different ecosystems: HF Transformers, GPT-NeoX, DeepSpeed, as well as the OpenAI API.
+
+Keywords: LLM, Evaluation, Few-shot
+
+## [gpt-neox](https://github.com/EleutherAI/gpt-neox)
+
+This repository records EleutherAI's library for training large-scale language models on GPUs. The framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. It is focused on training multi-billion-parameter models.
+
+Keywords: Training, LLM, Megatron, DeepSpeed
+
+## [muzic](https://github.com/microsoft/muzic)
+
+Muzic is a research project on AI music that empowers music understanding and generation with deep learning and artificial intelligence. Muzic was created by researchers from Microsoft Research Asia.
+
+Keywords: Music understanding, Music generation
+
+## [dalle-flow](https://github.com/jina-ai/dalle-flow)
+
+DALL·E Flow is an interactive workflow for generating high-definition images from a text prompt. It leverages DALL·E-Mega, GLID-3 XL, and Stable Diffusion to generate image candidates, and then calls CLIP-as-service to rank the candidates w.r.t. the prompt.
+The preferred candidate is fed to GLID-3 XL for diffusion, which often enriches the texture and background. Finally, the candidate is upscaled to 1024x1024 via SwinIR.
+
+Keywords: High-definition image generation, Stable Diffusion, DALL-E Mega, GLID-3 XL, CLIP, SwinIR
+
+## [lightseq](https://github.com/bytedance/lightseq)
+
+LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA. It enables highly efficient computation of modern NLP and CV models such as BERT, GPT, Transformer, etc. It is therefore best useful for machine translation, text generation, image classification, and other sequence related tasks.
+
+Keywords: Training, Inference, Sequence Processing, Sequence Generation
+
+## [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)
+
+The goal of this project is to create a learning based system that takes an image of a math formula and returns corresponding LaTeX code.
+
+Keywords: OCR, LaTeX, Math formula
+
+## [open_clip](https://github.com/mlfoundations/open_clip)
+
+OpenCLIP is an open source implementation of OpenAI's CLIP.
+
+The goal of this repository is to enable training models with contrastive image-text supervision, and to investigate their properties such as robustness to distribution shift.
+The starting point is an implementation of CLIP that matches the accuracy of the original CLIP models when trained on the same dataset.
+
+Specifically, a ResNet-50 model trained with this codebase on OpenAI's 15 million image subset of YFCC achieves 32.7% top-1 accuracy on ImageNet.
+
+Keywords: CLIP, Open-source, Contrastive, Image-text
+
+## [dalle-playground](https://github.com/saharmor/dalle-playground)
+
+A playground to generate images from any text prompt using Stable Diffusion and Dall-E mini.
+
+Keywords: WebUI, Stable Diffusion, Dall-E mini
+
+## [FedML](https://github.com/FedML-AI/FedML)
+
+[FedML](https://github.com/FedML-AI/FedML) is a federated learning and analytics library enabling secure and collaborative machine learning on decentralized data anywhere at any scale.
+
+It supports large-scale cross-silo federated learning, and cross-device federated learning on smartphones/IoTs, and research simulation.
+
+Keywords: Federated Learning, Analytics, Collaborative ML, Decentralized
+
+## [gpt-code-clippy](https://github.com/CodedotAl/gpt-code-clippy)
+
+GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model -- based on GPT-3, called GPT-Codex -- that is fine-tuned on publicly available code from GitHub.
+
+Keywords: LLM, Code
+
+## [TextAttack](https://github.com/QData/TextAttack)
+
+[TextAttack](https://github.com/QData/TextAttack) 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP.
+
+Keywords: Adversarial attacks, Data augmentation, NLP
+
+## [OpenPrompt](https://github.com/thunlp/OpenPrompt)
+
+Prompt-learning is a paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modify the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. This library provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. [OpenPrompt](https://github.com/thunlp/OpenPrompt) supports loading PLMs directly from https://github.com/huggingface/transformers.
+
+## [text-generation-webui](https://github.com/oobabooga/text-generation-webui/)
+
+[text-generation-webui](https://github.com/oobabooga/text-generation-webui/) is a Gradio Web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA.
+
+Keywords: LLM, WebUI
+
+## [libra](https://github.com/Palashio/libra)
+
+An ergonomic machine learning [libra](https://github.com/Palashio/libra)ry for non-technical users. It focuses on ergonomics and on ensuring that training a model is as simple as it can be.
+
+Keywords: Ergonomic, Non-technical
+
+## [alibi](https://github.com/SeldonIO/alibi)
+
+Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The focus of the library is to provide high-quality implementations of black-box, white-box, local and global explanation methods for classification and regression models.
+
+Keywords: Model inspection, Model interpretation, Black-box, White-box
+
+## [tortoise-tts](https://github.com/neonbjb/tortoise-tts)
+
+Tortoise is a text-to-speech program built with the following priorities: strong multi-voice capabilities, and highly realistic prosody and intonation.
+
+Keywords: Text-to-speech
+
+## [flower](https://github.com/adap/flower)
+
+Flower (flwr) is a framework for building federated learning systems. The design of Flower is based on a few guiding principles: customizability, extendability, framework agnosticity, and ease-of-use.
+
+Keywords: Federated learning systems, Customizable, Extendable, Framework-agnostic, Simplicity
+
+## [fast-bert](https://github.com/utterworks/fast-bert)
+
+Fast-Bert is a deep learning library that allows developers and data scientists to train and deploy BERT and XLNet based models for natural language processing tasks beginning with Text Classification. It is aimed at simplicity.
+
+Keywords: Deployment, BERT, XLNet
+
+## [towhee](https://github.com/towhee-io/towhee)
+
+Towhee makes it easy to build neural data processing pipelines for AI applications. We provide hundreds of models, algorithms, and transformations that can be used as standard pipeline building blocks. Users can use Towhee's Pythonic API to build a prototype of their pipeline and automatically optimize it for production-ready environments.
+
+Keywords: Data processing pipeline, Optimization
+
+## [alibi-detect](https://github.com/SeldonIO/alibi-detect)
+
+Alibi Detect is an open source Python library focused on outlier, adversarial and drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series. Both TensorFlow and PyTorch backends are supported for drift detection.
+
+Keywords: Adversarial, Outlier, Drift detection
+
+## [FARM](https://github.com/deepset-ai/FARM)
+
+[FARM](https://github.com/deepset-ai/FARM) makes Transfer Learning with BERT & Co simple, fast and enterprise-ready. It's built upon transformers and provides additional features to simplify the life of developers: Parallelized preprocessing, highly modular design, multi-task learning, experiment tracking, easy debugging and close integration with AWS SageMaker.
+
+Keywords: Transfer Learning, Modular design, Multi-task learning, Experiment tracking
+
+## [aitextgen](https://github.com/minimaxir/aitextgen)
+
+A robust Python tool for text-based AI training and generation using OpenAI's GPT-2 and EleutherAI's GPT Neo/GPT-3 architecture.
+[aitextgen](https://github.com/minimaxir/aitextgen) is a Python package that leverages PyTorch, Hugging Face Transformers and pytorch-lightning with specific optimizations for text generation using GPT-2, plus many added features.
+
+Keywords: Training, Generation
+
+## [diffgram](https://github.com/diffgram/diffgram)
+
+Diffgram aims to integrate human supervision into platforms. We support your team programmatically changing the UI (Schema, layout, etc.) like in Streamlit. This means that you can collect and annotate timely data from users. In other words, we are the platform behind your platform, an integrated part of your application, to ship new & better AI products faster.
+
+Keywords: Human supervision, Platform
+
+## [ecco](https://github.com/jalammar/ecco)
+
+Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BERT, RoBERTA, T5, and T0).
+
+Keywords: Model explainability
+
+## [s3prl](https://github.com/s3prl/s3prl)
+
+[s3prl](https://github.com/s3prl/s3prl) stands for Self-Supervised Speech Pre-training and Representation Learning. Self-supervised speech pre-trained models are called upstream in this toolkit, and are utilized in various downstream tasks.
+
+Keywords: Speech, Training
+
+## [ru-dalle](https://github.com/ai-forever/ru-dalle)
+
+RuDALL-E aims to be similar to DALL-E, targeted to Russian.
+
+Keywords: DALL-E, Russian
+
+## [DeepKE](https://github.com/zjunlp/DeepKE)
+
+[DeepKE](https://github.com/zjunlp/DeepKE) is a knowledge extraction toolkit for knowledge graph construction supporting cnSchema，low-resource, document-level and multimodal scenarios for entity, relation and attribute extraction.
+
+Keywords: Knowledge Extraction, Knowledge Graphs
+
+## [Nebuly](https://github.com/nebuly-ai/optimate)
+
+Nebuly is the next-generation platform to monitor and optimize your AI costs in one place. The platform connects to all your AI cost sources (compute, API providers, AI software licenses, etc) and centralizes them in one place to give you full visibility on a model basis. The platform also provides optimization recommendations and a co-pilot model that can guide during the optimization process. The platform builds on top of the open-source tools allowing you to optimize the different steps of your AI stack to squeeze out the best possible cost performances.
+
+Keywords: Optimization, Performance, Monitoring
+
+## [imaginAIry](https://github.com/brycedrennan/imaginAIry)
+
+Offers a CLI and a Python API to generate images with Stable Diffusion. It has support for many tools, like image structure control (controlnet), instruction-based image edits (InstructPix2Pix), prompt-based masking (clipseg), among others.
+
+Keywords: Stable Diffusion, CLI, Python API
+
+## [sparseml](https://github.com/neuralmagic/sparseml)
+
+SparseML is an open-source model optimization toolkit that enables you to create inference-optimized sparse models using pruning, quantization, and distillation algorithms. Models optimized with SparseML can then be exported to the ONNX and deployed with DeepSparse for GPU-class performance on CPU hardware.
+
+Keywords: Model optimization, Pruning, Quantization, Distillation
+
+## [opacus](https://github.com/pytorch/opacus)
+
+Opacus is a library that enables training PyTorch models with differential privacy. It supports training with minimal code changes required on the client, has little impact on training performance, and allows the client to online track the privacy budget expended at any given moment.
+
+Keywords: Differential privacy
+
+## [LAVIS](https://github.com/salesforce/LAVIS)
+
+[LAVIS](https://github.com/salesforce/LAVIS) is a Python deep learning library for LAnguage-and-VISion intelligence research and applications. This library aims to provide engineers and researchers with a one-stop solution to rapidly develop models for their specific multimodal scenarios, and benchmark them across standard and customized datasets. It features a unified interface design to access
+
+Keywords: Multimodal, NLP, Vision
+
+## [buzz](https://github.com/chidiwilliams/buzz)
+
+Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
+
+Keywords: Audio transcription, Translation
+
+## [rust-bert](https://github.com/guillaume-be/rust-bert)
+
+Rust-native state-of-the-art Natural Language Processing models and pipelines. Port of Hugging Face's Transformers library, using the tch-rs crate and pre-processing from rust-tokenizers. Supports multi-threaded tokenization and GPU inference. This repository exposes the model base architecture, task-specific heads and ready-to-use pipelines.
+
+Keywords: Rust, BERT, Inference
+
+## [EasyNLP](https://github.com/alibaba/EasyNLP)
+
+[EasyNLP](https://github.com/alibaba/EasyNLP) is an easy-to-use NLP development and application toolkit in PyTorch, first released inside Alibaba in 2021. It is built with scalable distributed training strategies and supports a comprehensive suite of NLP algorithms for various NLP applications. [EasyNLP](https://github.com/alibaba/EasyNLP) integrates knowledge distillation and few-shot learning for landing large pre-trained models, together with various popular multi-modality pre-trained models. It provides a unified framework of model training, inference, and deployment for real-world applications.
+
+Keywords: NLP, Knowledge distillation, Few-shot learning, Multi-modality, Training, Inference, Deployment
+
+## [TurboTransformers](https://github.com/Tencent/TurboTransformers)
+
+A fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
+
+Keywords: Optimization, Performance
+
+## [hivemind](https://github.com/learning-at-home/hivemind)
+
+Hivemind is a PyTorch library for decentralized deep learning across the Internet. Its intended usage is training one large model on hundreds of computers from different universities, companies, and volunteers.
+
+Keywords: Decentralized training
+
+## [docquery](https://github.com/impira/docquery)
+
+DocQuery is a library and command-line tool that makes it easy to analyze semi-structured and unstructured documents (PDFs, scanned images, etc.) using large language models (LLMs). You simply point DocQuery at one or more documents and specify a question you want to ask. DocQuery is created by the team at Impira.
+
+Keywords: Semi-structured documents, Unstructured documents, LLM, Document Question Answering
+
+## [CodeGeeX](https://github.com/THUDM/CodeGeeX)
+
+[CodeGeeX](https://github.com/THUDM/CodeGeeX) is a large-scale multilingual code generation model with 13 billion parameters, pre-trained on a large code corpus of more than 20 programming languages. It has several unique features:
+- Multilingual code generation
+- Crosslingual code translation
+- Is a customizable programming assistant
+
+Keywords: Code Generation Model
+
+## [ktrain](https://github.com/amaiya/ktrain)
+
+[ktrain](https://github.com/amaiya/ktrain) is a lightweight wrapper for the deep learning library TensorFlow Keras (and other libraries) to help build, train, and deploy neural networks and other machine learning models. Inspired by ML framework extensions like fastai and ludwig, [ktrain](https://github.com/amaiya/ktrain) is designed to make deep learning and AI more accessible and easier to apply for both newcomers and experienced practitioners.
+
+Keywords: Keras wrapper, Model building, Training, Deployment
+
+## [FastDeploy](https://github.com/PaddlePaddle/FastDeploy)
+
+[FastDeploy](https://github.com/PaddlePaddle/FastDeploy) is an Easy-to-use and High Performance AI model deployment toolkit for Cloud, Mobile and Edge with packageout-of-the-box and unified experience, endend-to-end optimization for over fire160+ Text, Vision, Speech and Cross-modal AI models. Including image classification, object detection, OCR, face detection, matting, pp-tracking, NLP, stable diffusion, TTS and other tasks to meet developers' industrial deployment needs for multi-scenario, multi-hardware and multi-platform.
+
+Keywords: Model deployment, CLoud, Mobile, Edge
+
+## [underthesea](https://github.com/undertheseanlp/underthesea)
+
+[underthesea](https://github.com/undertheseanlp/underthesea) is a Vietnamese NLP toolkit. Underthesea is a suite of open source Python modules data sets and tutorials supporting research and development in Vietnamese Natural Language Processing. We provide extremely easy API to quickly apply pretrained NLP models to your Vietnamese text, such as word segmentation, part-of-speech tagging (PoS), named entity recognition (NER), text classification and dependency parsing.
+
+Keywords: Vietnamese, NLP
+
+## [hasktorch](https://github.com/hasktorch/hasktorch)
+
+Hasktorch is a library for tensors and neural networks in Haskell. It is an independent open source community project which leverages the core C++ libraries shared by PyTorch.
+
+Keywords: Haskell, Neural Networks
+
+## [donut](https://github.com/clovaai/donut)
+
+Donut, or Document understanding transformer, is a new method of document understanding that utilizes an OCR-free end-to-end Transformer model.
+
+Donut does not require off-the-shelf OCR engines/APIs, yet it shows state-of-the-art performances on various visual document understanding tasks, such as visual document classification or information extraction (a.k.a. document parsing).
+
+Keywords: Document Understanding
+
+## [transformers-interpret](https://github.com/cdpierse/transformers-interpret)
+
+Transformers Interpret is a model explainability tool designed to work exclusively with the transformers package.
+
+In line with the philosophy of the Transformers package Transformers Interpret allows any transformers model to be explained in just two lines. Explainers are available for both text and computer vision models. Visualizations are also available in notebooks and as savable png and html files
+
+Keywords: Model interpretation, Visualization
+
+## [mlrun](https://github.com/mlrun/mlrun)
+
+MLRun is an open MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications, significantly reducing engineering efforts, time to production, and computation resources. With MLRun, you can choose any IDE on your local machine or on the cloud. MLRun breaks the silos between data, ML, software, and DevOps/MLOps teams, enabling collaboration and fast continuous improvements.
+
+Keywords: MLOps
+
+## [FederatedScope](https://github.com/alibaba/FederatedScope)
+
+[FederatedScope](https://github.com/alibaba/FederatedScope) is a comprehensive federated learning platform that provides convenient usage and flexible customization for various federated learning tasks in both academia and industry. Based on an event-driven architecture, [FederatedScope](https://github.com/alibaba/FederatedScope) integrates rich collections of functionalities to satisfy the burgeoning demands from federated learning, and aims to build up an easy-to-use platform for promoting learning safely and effectively.
+
+Keywords: Federated learning, Event-driven
+
+## [pythainlp](https://github.com/PyThaiNLP/pythainlp)
+
+PyThaiNLP is a Python package for text processing and linguistic analysis, similar to NLTK with focus on Thai language.
+
+Keywords: Thai, NLP, NLTK
+
+## [FlagAI](https://github.com/FlagAI-Open/FlagAI)
+
+[FlagAI](https://github.com/FlagAI-Open/FlagAI) (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model. Our goal is to support training, fine-tuning, and deployment of large-scale models on various downstream tasks with multi-modality.
+
+Keywords: Large models, Training, Fine-tuning, Deployment, Multi-modal
+
+## [pyserini](https://github.com/castorini/pyserini)
+
+[pyserini](https://github.com/castorini/pyserini) is a Python toolkit for reproducible information retrieval research with sparse and dense representations. Retrieval using sparse representations is provided via integration with the group's Anserini IR toolkit. Retrieval using dense representations is provided via integration with Facebook's Faiss library.
+
+Keywords: IR, Information Retrieval, Dense, Sparse
+
+## [baal](https://github.com/baal-org/baal)
+
+[baal](https://github.com/baal-org/baal) is an active learning library that supports both industrial applications and research usecases. [baal](https://github.com/baal-org/baal) currently supports Monte-Carlo Dropout, MCDropConnect, deep ensembles, and semi-supervised learning.
+
+Keywords: Active Learning, Research, Labeling
+
+## [cleanlab](https://github.com/cleanlab/cleanlab)
+
+[cleanlab](https://github.com/cleanlab/cleanlab) is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels. For text, image, tabular, audio (among others) datasets, you can use cleanlab to automatically: detect data issues (outliers, label errors, near duplicates, etc), train robust ML models, infer consensus + annotator-quality for multi-annotator data, suggest data to (re)label next (active learning).
+
+Keywords: Data-Centric AI, Data Quality, Noisy Labels, Outlier Detection, Active Learning  
+
+## [BentoML](https://github.com/bentoml/BentoML)
+
+[BentoML](https://github.com/bentoml) is the unified framework for building, shipping, and scaling production-ready AI applications incorporating traditional ML, pre-trained AI models, Generative and Large Language Models.
+All Hugging Face models and pipelines can be seamlessly integrated into BentoML applications, enabling the running of models on the most suitable hardware and independent scaling based on usage.
+
+Keywords: BentoML, Framework, Deployment, AI Applications
+
+## [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory)
+
+[LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) offers a user-friendly fine-tuning framework that incorporates PEFT. The repository includes training(fine-tuning) and inference examples for LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, and other LLMs. A ChatGLM version is also available in [ChatGLM-Efficient-Tuning](https://github.com/hiyouga/ChatGLM-Efficient-Tuning).
+
+Keywords: PEFT, fine-tuning, LLaMA-2, ChatGLM, Qwen
--- a/benchmark/.gitignore
+++ b/benchmark/.gitignore
@@ -0,0 +1 @@
+benchmark_results/
--- a/benchmark/README.md
+++ b/benchmark/README.md
@@ -0,0 +1,49 @@
+# Benchmarks
+
+You might want to add new benchmarks.
+
+You will need to define a python function named `run_benchmark` in your python file and the file must be located in this `benchmark/` directory.
+
+The expected function signature is the following:
+
+```py
+def run_benchmark(logger: Logger, branch: str, commit_id: str, commit_msg: str, num_tokens_to_generate=100):
+```
+
+## Writing metrics to the database
+
+`MetricsRecorder` is thread-safe, in the sense of the python [`Thread`](https://docs.python.org/3/library/threading.html#threading.Thread). This means you can start a background thread to do the readings on the device measurements while not blocking the main thread to execute the model measurements.
+
+cf [`llama.py`](./llama.py) to see an example of this in practice.
+
+```py
+from benchmarks_entrypoint import MetricsRecorder
+import psycopg2
+
+def run_benchmark(logger: Logger, branch: str, commit_id: str, commit_msg: str, num_tokens_to_generate=100):
+  metrics_recorder = MetricsRecorder(psycopg2.connect("dbname=metrics"), logger, branch, commit_id, commit_msg)
+  benchmark_id = metrics_recorder.initialise_benchmark({"gpu_name": gpu_name, "model_id": model_id})
+    # To collect device measurements
+    metrics_recorder.collect_device_measurements(
+        benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes
+    )
+    # To collect your model measurements
+    metrics_recorder.collect_model_measurements(
+        benchmark_id,
+        {
+            "model_load_time": model_load_time,
+            "first_eager_forward_pass_time_secs": first_eager_fwd_pass_time,
+            "second_eager_forward_pass_time_secs": second_eager_fwd_pass_time,
+            "first_eager_generate_time_secs": first_eager_generate_time,
+            "second_eager_generate_time_secs": second_eager_generate_time,
+            "time_to_first_token_secs": time_to_first_token,
+            "time_to_second_token_secs": time_to_second_token,
+            "time_to_third_token_secs": time_to_third_token,
+            "time_to_next_token_mean_secs": mean_time_to_next_token,
+            "first_compile_generate_time_secs": first_compile_generate_time,
+            "second_compile_generate_time_secs": second_compile_generate_time,
+            "third_compile_generate_time_secs": third_compile_generate_time,
+            "fourth_compile_generate_time_secs": fourth_compile_generate_time,
+        },
+    )
+```
--- a/benchmark/init.py
+++ b/benchmark/init.py
--- a/benchmark/benches/llama.py
+++ b/benchmark/benches/llama.py
@@ -0,0 +1,353 @@
+# Copyright 2025 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import sys
+from logging import Logger
+from threading import Event, Thread
+from time import perf_counter, sleep
+
+
+# Add the parent directory to Python path to import benchmarks_entrypoint
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+import gpustat
+import psutil
+import psycopg2
+from benchmarks_entrypoint import MetricsRecorder
+
+
+# Optional heavy ML dependencies - only required when actually running the benchmark
+try:
+    import torch
+
+    from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, StaticCache
+
+    TRANSFORMERS_AVAILABLE = True
+except ImportError:
+    TRANSFORMERS_AVAILABLE = False
+    torch = None
+    AutoModelForCausalLM = None
+    AutoTokenizer = None
+    GenerationConfig = None
+    StaticCache = None
+
+os.environ["HF_XET_HIGH_PERFORMANCE"] = "1"
+os.environ["TOKENIZERS_PARALLELISM"] = "1"
+
+# Only set torch precision if torch is available
+if TRANSFORMERS_AVAILABLE:
+    torch.set_float32_matmul_precision("high")
+
+
+def collect_metrics(benchmark_id, continue_metric_collection, metrics_recorder):
+    p = psutil.Process(os.getpid())
+    while not continue_metric_collection.is_set():
+        with p.oneshot():
+            cpu_util = p.cpu_percent()
+            mem_megabytes = p.memory_info().rss / (1024 * 1024)
+        gpu_stats = gpustat.GPUStatCollection.new_query()
+        gpu_util = gpu_stats[0]["utilization.gpu"]
+        gpu_mem_megabytes = gpu_stats[0]["memory.used"]
+        metrics_recorder.collect_device_measurements(
+            benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes
+        )
+        sleep(0.01)
+
+
+def run_benchmark(
+    logger: Logger,
+    repository: str,
+    branch: str,
+    commit_id: str,
+    commit_msg: str,
+    metrics_recorder=None,
+    num_tokens_to_generate=100,
+):
+    # Check if required ML dependencies are available
+    if not TRANSFORMERS_AVAILABLE:
+        logger.error("Transformers and torch are required to run the LLaMA benchmark. Please install them with:")
+        logger.error("pip install torch transformers")
+        logger.error("Skipping LLaMA benchmark due to missing dependencies.")
+        return
+
+    continue_metric_collection = Event()
+    metrics_thread = None
+    model_id = "meta-llama/Llama-2-7b-hf"
+
+    # If no metrics_recorder is provided, create one for backward compatibility
+    if metrics_recorder is None:
+        try:
+            metrics_recorder = MetricsRecorder(
+                psycopg2.connect("dbname=metrics"), logger, repository, branch, commit_id, commit_msg, True
+            )
+            should_close_recorder = True
+        except Exception as e:
+            logger.error(f"Failed to create metrics recorder: {e}")
+            return
+    else:
+        should_close_recorder = False
+    try:
+        gpu_stats = gpustat.GPUStatCollection.new_query()
+        gpu_name = gpu_stats[0]["name"]
+        benchmark_id = metrics_recorder.initialise_benchmark({"gpu_name": gpu_name, "model_id": model_id})
+        logger.info(f"running benchmark #{benchmark_id} on {gpu_name} for {model_id}")
+        metrics_thread = Thread(
+            target=collect_metrics,
+            args=[benchmark_id, continue_metric_collection, metrics_recorder],
+        )
+        metrics_thread.start()
+        logger.info("started background thread to fetch device metrics")
+
+        os.environ["TOKENIZERS_PARALLELISM"] = "false"  # silence warnings when compiling
+
+        device = "cuda"
+
+        logger.info("downloading weights")
+        # This is to avoid counting download in model load time measurement
+        model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.float16)
+        gen_config = GenerationConfig(do_sample=False, top_p=1, temperature=1)
+        logger.info("loading model")
+        start = perf_counter()
+        model = AutoModelForCausalLM.from_pretrained(
+            model_id, dtype=torch.float16, generation_config=gen_config
+        ).eval()
+        model.to(device)
+        torch.cuda.synchronize()
+        end = perf_counter()
+        model_load_time = end - start
+        logger.info(f"loaded model in: {model_load_time}s")
+
+        tokenizer = AutoTokenizer.from_pretrained(model_id)
+
+        prompt = "Why dogs are so cute?"
+        inputs = tokenizer(prompt, return_tensors="pt").to(device)
+
+        # Specify the max length (including both the prompt and the response)
+        # When calling `generate` with `cache_implementation="static" later, this is also used to create a `StaticCache` object
+        # with sequence length = `max_length`. The longer the more you will re-use it
+        seq_length = inputs["input_ids"].shape[1]
+        model.generation_config.max_length = seq_length + num_tokens_to_generate
+        batch_size = inputs["input_ids"].shape[0]
+
+        # Copied from the gpt-fast repo
+        def multinomial_sample_one_no_sync(probs_sort):  # Does multinomial sampling without a cuda synchronization
+            q = torch.empty_like(probs_sort).exponential_(1)
+            return torch.argmax(probs_sort / q, dim=-1, keepdim=True).to(dtype=torch.int)
+
+        def logits_to_probs(logits, temperature: float = 1.0, top_k: int | None = None):
+            logits = logits / max(temperature, 1e-5)
+
+            if top_k is not None:
+                v, _ = torch.topk(logits, min(top_k, logits.size(-1)))
+                pivot = v.select(-1, -1).unsqueeze(-1)
+                logits = torch.where(logits < pivot, -float("Inf"), logits)
+            probs = torch.nn.functional.softmax(logits, dim=-1)
+            return probs
+
+        def sample(logits, temperature: float = 1.0, top_k: int | None = None):
+            probs = logits_to_probs(logits[0, -1], temperature, top_k)
+            idx_next = multinomial_sample_one_no_sync(probs)
+            return idx_next, probs
+
+        # First eager forward pass
+        logger.info("running first eager forward pass")
+        start = perf_counter()
+        _ = model(**inputs)
+        torch.cuda.synchronize()
+        end = perf_counter()
+        first_eager_fwd_pass_time = end - start
+        logger.info(f"completed first eager forward pass in: {first_eager_fwd_pass_time}s")
+
+        # Second eager forward pass (should be faster)
+        logger.info("running second eager forward pass")
+        start = perf_counter()
+        _ = model(**inputs)
+        torch.cuda.synchronize()
+        end = perf_counter()
+        second_eager_fwd_pass_time = end - start
+        logger.info(f"completed second eager forward pass in: {second_eager_fwd_pass_time}s")
+
+        # First eager generation
+        logger.info("running first eager generation")
+        start = perf_counter()
+        output = model.generate(**inputs)
+        torch.cuda.synchronize()
+        end = perf_counter()
+        first_eager_generate_time = end - start
+        logger.info(f"completed first eager generation in: {first_eager_generate_time}s")
+        logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
+
+        # Second eager generation (should be faster)
+        logger.info("running second eager generation")
+        start = perf_counter()
+        output = model.generate(**inputs)
+        torch.cuda.synchronize()
+        end = perf_counter()
+        second_eager_generate_time = end - start
+        logger.info(f"completed second eager generation in: {second_eager_generate_time}s")
+        logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
+
+        logger.info("running generation timing loop")
+
+        input_pos = torch.arange(0, seq_length, device=device)
+        inputs = inputs["input_ids"]
+
+        start = perf_counter()
+        with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
+            logits = model(inputs, position_ids=input_pos).logits
+        next_token, probs = sample(logits, temperature=0.6, top_k=5)
+        torch.cuda.synchronize()
+        end = perf_counter()
+        time_to_first_token = end - start
+
+        input_pos = torch.tensor([seq_length], device=device, dtype=torch.int)
+        next_token = next_token.clone()
+        start = perf_counter()
+        with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
+            logits = model(next_token, position_ids=input_pos).logits
+        next_token, probs = sample(logits, temperature=0.6, top_k=5)
+        torch.cuda.synchronize()
+        end = perf_counter()
+        time_to_second_token = end - start
+
+        input_pos = torch.tensor([seq_length + 1], device=device, dtype=torch.int)
+        next_token = next_token.clone()
+        start = perf_counter()
+        with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
+            logits = model(next_token, position_ids=input_pos).logits
+        next_token, probs = sample(logits, temperature=0.6, top_k=5)
+        torch.cuda.synchronize()
+        end = perf_counter()
+        time_to_third_token = end - start
+
+        logger.info("running longer generation timing loop")
+
+        total_time = 0
+        for i in range(20):
+            input_pos = torch.tensor([seq_length + 2 + i], device=device, dtype=torch.int)
+            next_token = next_token.clone()
+            start = perf_counter()
+            with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
+                logits = model(next_token, position_ids=input_pos).logits
+            next_token, probs = sample(logits, temperature=0.6, top_k=5)
+            torch.cuda.synchronize()
+            end = perf_counter()
+            total_time += end - start
+
+        mean_time_to_next_token = total_time / 20
+
+        logger.info("running compilation benchmarks")
+
+        # Now compile the model
+        model = torch.compile(model, mode="max-autotune", fullgraph=True)
+
+        # StaticCache for generation
+        with torch.device(device):
+            model.setup_caches(max_batch_size=batch_size, max_seq_len=seq_length + num_tokens_to_generate)
+
+        input_pos = torch.arange(0, seq_length, device=device)
+        inputs = tokenizer(prompt, return_tensors="pt").to(device)["input_ids"]
+
+        logger.info("compiling model")
+
+        model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.float16, generation_config=gen_config)
+        model.to(device)
+        model = torch.compile(model, mode="max-autotune", fullgraph=True)
+
+        past_key_values = StaticCache(
+            model.config,
+            max_batch_size=batch_size,
+            device=device,
+            dtype=torch.float16,
+            max_cache_len=seq_length + 128,
+        )
+        # 1st call
+        start = perf_counter()
+        output = model.generate(**inputs, past_key_values=past_key_values)
+        end = perf_counter()
+        first_compile_generate_time = end - start
+        logger.info(f"completed first compile generation in: {first_compile_generate_time}s")
+        logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
+
+        past_key_values = StaticCache(
+            model.config,
+            max_batch_size=batch_size,
+            device=device,
+            dtype=torch.float16,
+            max_cache_len=seq_length + 128,
+        )
+        # 2nd call
+        start = perf_counter()
+        output = model.generate(**inputs, past_key_values=past_key_values)
+        end = perf_counter()
+        second_compile_generate_time = end - start
+        logger.info(f"completed second compile generation in: {second_compile_generate_time}s")
+        logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
+
+        past_key_values = StaticCache(
+            model.config,
+            max_batch_size=batch_size,
+            device=device,
+            dtype=torch.float16,
+            max_cache_len=seq_length + 128,
+        )
+        # 3rd call
+        start = perf_counter()
+        output = model.generate(**inputs, past_key_values=past_key_values)
+        end = perf_counter()
+        third_compile_generate_time = end - start
+        logger.info(f"completed third compile generation in: {third_compile_generate_time}s")
+        logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
+
+        past_key_values = StaticCache(
+            model.config,
+            max_batch_size=batch_size,
+            device=device,
+            dtype=torch.float16,
+            max_cache_len=seq_length + 128,
+        )
+        # 4th call
+        start = perf_counter()
+        output = model.generate(**inputs, past_key_values=past_key_values)
+        end = perf_counter()
+        fourth_compile_generate_time = end - start
+        logger.info(f"completed fourth compile generation in: {fourth_compile_generate_time}s")
+        logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
+
+        metrics_recorder.collect_model_measurements(
+            benchmark_id,
+            {
+                "model_load_time": model_load_time,
+                "first_eager_forward_pass_time_secs": first_eager_fwd_pass_time,
+                "second_eager_forward_pass_time_secs": second_eager_fwd_pass_time,
+                "first_eager_generate_time_secs": first_eager_generate_time,
+                "second_eager_generate_time_secs": second_eager_generate_time,
+                "time_to_first_token_secs": time_to_first_token,
+                "time_to_second_token_secs": time_to_second_token,
+                "time_to_third_token_secs": time_to_third_token,
+                "time_to_next_token_mean_secs": mean_time_to_next_token,
+                "first_compile_generate_time_secs": first_compile_generate_time,
+                "second_compile_generate_time_secs": second_compile_generate_time,
+                "third_compile_generate_time_secs": third_compile_generate_time,
+                "fourth_compile_generate_time_secs": fourth_compile_generate_time,
+            },
+        )
+    except Exception as e:
+        logger.error(f"Caught exception: {e}")
+    continue_metric_collection.set()
+    if metrics_thread is not None:
+        metrics_thread.join()
+
+    # Only close the recorder if we created it locally
+    if should_close_recorder:
+        metrics_recorder.close()
--- a/benchmark/benchmark.py
+++ b/benchmark/benchmark.py
@@ -0,0 +1,324 @@
+# Copyright 2024 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Run benchmark using the `optimum-benchmark` library with some customization in `transformers`.
+
+Assume we are under `transformers` root directory: (make sure the commits are valid commits)
+```bash
+python benchmark/benchmark.py --config-dir benchmark/config --config-name generation --commit=9b9c7f03da625b13643e99205c691fe046461724 --metrics=decode.latency.mean,per_token.latency.mean,per_token.throughput.value backend.model=google/gemma-2b benchmark.input_shapes.sequence_length=5,7 benchmark.input_shapes.batch_size=1,2 --multirun
+```
+"""
+
+import argparse
+import glob
+import json
+import os.path
+import re
+import tempfile
+from contextlib import contextmanager
+from pathlib import Path
+
+from git import Repo
+from huggingface_hub import HfApi
+from optimum_benchmark import Benchmark
+from optimum_benchmark_wrapper import main
+
+
+PATH_TO_REPO = Path(__file__).parent.parent.resolve()
+
+
+@contextmanager
+def checkout_commit(repo: Repo, commit_id: str):
+    """
+    Context manager that checks out a given commit when entered, but gets back to the reference it was at on exit.
+    Args:
+        repo (`git.Repo`): A git repository (for instance the Transformers repo).
+        commit_id (`str`): The commit reference to checkout inside the context manager.
+    """
+    current_head = repo.head.commit if repo.head.is_detached else repo.head.ref
+
+    try:
+        repo.git.checkout(commit_id)
+        yield
+
+    finally:
+        repo.git.checkout(current_head)
+
+
+def summarize(run_dir, metrics, expand_metrics=False):
+    """Produce a summary for each optimum-benchmark launched job's output directory found in `run_dir`.
+
+    Each summary's format is as follows (for `expand_metrics=False`):
+    ```
+    {
+        "model": "google/gemma-2b",
+        "commit": "3cd6ed22e4d49219f300f5055e71e3929aba20d7",
+        "config": "benchmark.input_shapes.batch_size=1,benchmark.input_shapes.sequence_length=5",
+        "metrics": {
+            "decode.latency.mean": 1.624666809082031,
+            "per_token.latency.mean": 0.012843788806628804,
+            "per_token.throughput.value": 77.85864553330948
+        }
+    }
+    ```
+    """
+    reports = glob.glob(os.path.join(run_dir, "**/benchmark_report.json"), recursive=True)
+    report_dirs = [str(Path(report).parent) for report in reports]
+
+    summaries = []
+    for report_dir in report_dirs:
+        commit = re.search(r"/commit=([^/]+)", report_dir).groups()[0]
+
+        if not os.path.isfile(os.path.join(report_dir, "benchmark.json")):
+            continue
+        benchmark = Benchmark.from_json(os.path.join(report_dir, "benchmark.json"))
+        report = benchmark.report
+
+        model = benchmark.config.backend["model"]
+
+        # This looks like `benchmark.input_shapes.batch_size=1,benchmark.input_shapes.sequence_length=5`.
+        # (we rely on the usage of hydra's `${hydra.job.override_dirname}`.)
+        benchmark_name = re.sub(f"backend.model={model},*", "", report_dir)
+        benchmark_name = str(Path(benchmark_name).parts[-1])
+        if benchmark_name.startswith("commit="):
+            benchmark_name = benchmark.config.name
+
+        metrics_values = {}
+        # post-processing of report: show a few selected/important metric
+        for metric in metrics:
+            keys = metric.split(".")
+            value = report.to_dict()
+            current = metrics_values
+            for key in keys:
+                # Avoid KeyError when a user's specified metric has typo.
+                # TODO: Give warnings.
+                if key not in value:
+                    continue
+                value = value[key]
+
+                if expand_metrics:
+                    if isinstance(value, dict):
+                        if key not in current:
+                            current[key] = {}
+                            current = current[key]
+                    else:
+                        current[key] = value
+
+            if not expand_metrics:
+                metrics_values[metric] = value
+
+        # show some config information
+        print(f"model: {model}")
+        print(f"commit: {commit}")
+        print(f"config: {benchmark_name}")
+        if len(metrics_values) > 0:
+            print("metrics:")
+            if expand_metrics:
+                print(metrics_values)
+            else:
+                for metric, value in metrics_values.items():
+                    print(f"  - {metric}: {value}")
+        print("-" * 80)
+
+        summary = {
+            "model": model,
+            "commit": commit,
+            "config": benchmark_name,
+            "metrics": metrics_values,
+        }
+        summaries.append(summary)
+
+        with open(os.path.join(report_dir, "summary.json"), "w") as fp:
+            json.dump(summary, fp, indent=4)
+
+    return summaries
+
+
+def combine_summaries(summaries):
+    """Combine a list of summary obtained from the function `summarize`.
+
+    The combined summary's format is as follows:
+    ```
+    "google/gemma-2b": {
+        "benchmark.input_shapes.batch_size=1,benchmark.input_shapes.sequence_length=5": {
+            "3cd6ed22e4d49219f300f5055e71e3929aba20d7": {
+                "metrics": {"decode.latency.mean": 1.624666809082031}
+            },
+            "c97ee28b117c0abe8e08891f402065e4df6d72aa": {
+                "metrics": {"decode.latency.mean": 1.6278163452148438}
+            }
+        },
+        "benchmark.input_shapes.batch_size=2,benchmark.input_shapes.sequence_length=5": {
+            "3cd6ed22e4d49219f300f5055e71e3929aba20d7": {
+                "metrics": {"decode.latency.mean": 1.6947791748046876}
+            },
+            "c97ee28b117c0abe8e08891f402065e4df6d72aa": {
+                "metrics": {
+                    "decode.latency.mean": 1.6980519409179688}
+            }
+        }
+    }
+    ```
+    """
+    combined = {}
+    for summary in summaries:
+        model = summary["model"]
+        config = summary["config"]
+        commit = summary["commit"]
+
+        if model not in combined:
+            combined[model] = {}
+
+        if config not in combined[model]:
+            combined[model][config] = {}
+
+        if commit not in combined[model][config]:
+            combined[model][config][commit] = {"metrics": summary["metrics"]}
+
+    with open(os.path.join(exp_run_dir, "summary.json"), "w") as fp:
+        json.dump(combined, fp, indent=4)
+
+    print(json.dumps(combined, indent=4))
+
+    return combined
+
+
+if __name__ == "__main__":
+
+    def list_str(values):
+        return values.split(",")
+
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument("--config-dir", type=str, required=True, help="The path to the config directory.")
+    parser.add_argument("--config-name", type=str, required=True, help="The config name.")
+
+    # arguments specific to this wrapper for our own customization
+    parser.add_argument("--ensure_empty", type=bool, default=True, help="If to create a temporary directory.")
+    parser.add_argument(
+        "--commit",
+        type=list_str,
+        default="",
+        help="Comma-separated list of branch names and/or commit sha values on which the benchmark will run. If `diff` is specified, it will run on both the current head and the `main` branch.",
+    )
+    parser.add_argument("--metrics", type=str, help="The metrics to be included in the summary.")
+
+    parser.add_argument("--repo_id", type=str, default=None, help="The repository to which the file will be uploaded.")
+    parser.add_argument("--path_in_repo", type=str, default=None, help="Relative filepath in the repo.")
+    parser.add_argument("--token", type=str, default=None, help="A valid user access token (string).")
+
+    args, optimum_benchmark_args = parser.parse_known_args()
+
+    repo = Repo(PATH_TO_REPO)
+
+    metrics = [
+        "prefill.latency.mean",
+        "prefill.throughput.value",
+        "decode.latency.mean",
+        "decode.throughput.value",
+        "per_token.latency.mean",
+        "per_token.throughput.value",
+    ]
+    if args.metrics is not None:
+        metrics = args.metrics.split(",")
+
+    # Get `backend.model` in a hacky way: We want to control the experiment flow manually.
+    models = [""]
+    for idx, arg in enumerate(optimum_benchmark_args):
+        if arg.startswith("backend.model="):
+            models = arg[len("backend.model=") :]
+            models = models.split(",")
+            break
+    optimum_benchmark_args = [arg for arg in optimum_benchmark_args if not arg.startswith("backend.model=")]
+
+    # Get the commit(s)
+    current_head = str(repo.head.commit) if repo.head.is_detached else str(repo.head.ref)
+    commits = [x for x in args.commit if x != ""]
+    if len(commits) == 0:
+        commits = [current_head]
+    elif len(commits) == 1 and commits[0] == "diff":
+        # compare to `main`
+        commits = ["main", current_head]
+
+    # Get the specified run directory
+    run_dir_arg_idx, run_dir = -1, None
+    sweep_dir_arg_idx, sweep_dir = -1, None
+    for idx, arg in enumerate(optimum_benchmark_args):
+        if arg.startswith("hydra.run.dir="):
+            run_dir = arg[len("hydra.run.dir=") :]
+            run_dir_arg_idx = idx
+        elif arg.startswith("hydra.sweep.dir="):
+            sweep_dir = arg[len("hydra.sweep.dir=") :]
+            sweep_dir_arg_idx = idx
+    exp_run_dir, arg_dix, arg_name = (
+        (sweep_dir, sweep_dir_arg_idx, "hydra.sweep.dir")
+        if "--multirun" in optimum_benchmark_args
+        else (run_dir, run_dir_arg_idx, "hydra.run.dir")
+    )
+
+    # TODO: not hardcoded
+    if exp_run_dir is None and args.ensure_empty:
+        exp_run_dir = "_benchmark"
+
+    if args.ensure_empty:
+        os.makedirs(exp_run_dir, exist_ok=True)
+        exp_run_dir = tempfile.mkdtemp(dir=exp_run_dir)
+
+    run_summaries = []
+    for commit in commits:
+        with checkout_commit(repo, commit):
+            commit = str(repo.head.commit)
+
+            commit_run_dir = exp_run_dir
+            if exp_run_dir is not None:
+                commit_run_dir = os.path.join(exp_run_dir, rf"commit\={commit}")
+
+            print(f"Run benchmark on commit: {commit}")
+
+            for model in models:
+                model_arg = [f"backend.model={model}"] if model != "" else []
+                dir_args = []
+                if commit_run_dir is not None:
+                    if arg_dix > -1:
+                        optimum_benchmark_args[arg_dix] = f"{arg_name}={commit_run_dir}"
+                    else:
+                        dir_args = [
+                            f"hydra.sweep.dir={commit_run_dir}",
+                            f"hydra.run.dir={commit_run_dir}/" + "${hydra.job.override_dirname}",
+                        ]
+                main(args.config_dir, args.config_name, model_arg + dir_args + optimum_benchmark_args)
+
+            if commit_run_dir is not None:
+                # Need to remove the `\` character
+                summaries = summarize(commit_run_dir.replace("\\", ""), metrics)
+                run_summaries.extend(summaries)
+
+    # aggregate the information across the commits
+    if exp_run_dir is not None:
+        with open(os.path.join(exp_run_dir, "summaries.json"), "w") as fp:
+            json.dump(run_summaries, fp, indent=4)
+
+        combined_summary = combine_summaries(run_summaries)
+
+        if args.repo_id is not None and args.path_in_repo is not None:
+            # Upload to Hub
+            api = HfApi()
+            api.upload_folder(
+                folder_path=exp_run_dir,
+                path_in_repo=args.path_in_repo,
+                repo_id=args.repo_id,
+                repo_type="dataset",
+                token=args.token,
+            )
--- a/benchmark/benchmarks_entrypoint.py
+++ b/benchmark/benchmarks_entrypoint.py
@@ -0,0 +1,502 @@
+# Copyright 2025 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import importlib.util
+import json
+import logging
+import os
+import sys
+import uuid
+from datetime import datetime
+
+import pandas as pd
+
+
+try:
+    from psycopg2.extensions import register_adapter
+    from psycopg2.extras import Json
+
+    register_adapter(dict, Json)
+    PSYCOPG2_AVAILABLE = True
+except ImportError:
+    PSYCOPG2_AVAILABLE = False
+
+
+class ImportModuleException(Exception):
+    pass
+
+
+class MetricsRecorder:
+    def __init__(
+        self,
+        connection,
+        logger: logging.Logger,
+        repository: str,
+        branch: str,
+        commit_id: str,
+        commit_msg: str,
+        collect_csv_data: bool = True,
+    ):
+        self.conn = connection
+        self.use_database = connection is not None
+        if self.use_database:
+            self.conn.autocommit = True
+        self.logger = logger
+        self.repository = repository
+        self.branch = branch
+        self.commit_id = commit_id
+        self.commit_msg = commit_msg
+        self.collect_csv_data = collect_csv_data
+
+        # For CSV export - store all data in pandas DataFrames (only if CSV collection is enabled)
+        if self.collect_csv_data:
+            # Initialize empty DataFrames with proper schemas
+            self.benchmarks_df = pd.DataFrame(
+                columns=[
+                    "benchmark_id",
+                    "repository",
+                    "branch",
+                    "commit_id",
+                    "commit_message",
+                    "metadata",
+                    "created_at",
+                ]
+            )
+            self.device_measurements_df = pd.DataFrame(
+                columns=["benchmark_id", "cpu_util", "mem_megabytes", "gpu_util", "gpu_mem_megabytes", "time"]
+            )
+            self.model_measurements_df = pd.DataFrame(
+                columns=[
+                    "benchmark_id",
+                    "time",
+                    "model_load_time",
+                    "first_eager_forward_pass_time_secs",
+                    "second_eager_forward_pass_time_secs",
+                    "first_eager_generate_time_secs",
+                    "second_eager_generate_time_secs",
+                    "time_to_first_token_secs",
+                    "time_to_second_token_secs",
+                    "time_to_third_token_secs",
+                    "time_to_next_token_mean_secs",
+                    "first_compile_generate_time_secs",
+                    "second_compile_generate_time_secs",
+                    "third_compile_generate_time_secs",
+                    "fourth_compile_generate_time_secs",
+                ]
+            )
+        else:
+            self.benchmarks_df = None
+            self.device_measurements_df = None
+            self.model_measurements_df = None
+
+    def initialise_benchmark(self, metadata: dict[str, str]) -> str:
+        """
+        Creates a new benchmark, returns the benchmark id (UUID)
+        """
+        # Generate a unique UUID for this benchmark
+        benchmark_id = str(uuid.uuid4())
+
+        if self.use_database:
+            with self.conn.cursor() as cur:
+                cur.execute(
+                    "INSERT INTO benchmarks (benchmark_id, repository, branch, commit_id, commit_message, metadata) VALUES (%s, %s, %s, %s, %s, %s)",
+                    (benchmark_id, self.repository, self.branch, self.commit_id, self.commit_msg, metadata),
+                )
+                self.logger.debug(f"initialised benchmark #{benchmark_id}")
+
+        # Store benchmark data for CSV export (if enabled)
+        if self.collect_csv_data:
+            # Add row to pandas DataFrame
+            new_row = pd.DataFrame(
+                [
+                    {
+                        "benchmark_id": benchmark_id,
+                        "repository": self.repository,
+                        "branch": self.branch,
+                        "commit_id": self.commit_id,
+                        "commit_message": self.commit_msg,
+                        "metadata": json.dumps(metadata),
+                        "created_at": datetime.utcnow().isoformat(),
+                    }
+                ]
+            )
+            self.benchmarks_df = pd.concat([self.benchmarks_df, new_row], ignore_index=True)
+
+        mode_info = []
+        if self.use_database:
+            mode_info.append("database")
+        if self.collect_csv_data:
+            mode_info.append("CSV")
+        mode_str = " + ".join(mode_info) if mode_info else "no storage"
+
+        self.logger.debug(f"initialised benchmark #{benchmark_id} ({mode_str} mode)")
+        return benchmark_id
+
+    def collect_device_measurements(self, benchmark_id: str, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes):
+        """
+        Collect device metrics, such as CPU & GPU usage. These are "static", as in you cannot pass arbitrary arguments to the function.
+        """
+        # Store device measurements for CSV export (if enabled)
+        if self.collect_csv_data:
+            # Add row to pandas DataFrame
+            new_row = pd.DataFrame(
+                [
+                    {
+                        "benchmark_id": benchmark_id,
+                        "cpu_util": cpu_util,
+                        "mem_megabytes": mem_megabytes,
+                        "gpu_util": gpu_util,
+                        "gpu_mem_megabytes": gpu_mem_megabytes,
+                        "time": datetime.utcnow().isoformat(),
+                    }
+                ]
+            )
+            self.device_measurements_df = pd.concat([self.device_measurements_df, new_row], ignore_index=True)
+
+        # Store in database if available
+        if self.use_database:
+            with self.conn.cursor() as cur:
+                cur.execute(
+                    "INSERT INTO device_measurements (benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes) VALUES (%s, %s, %s, %s, %s)",
+                    (benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes),
+                )
+
+        self.logger.debug(
+            f"collected device measurements for benchmark #{benchmark_id} [CPU util: {cpu_util}, mem MBs: {mem_megabytes}, GPU util: {gpu_util}, GPU mem MBs: {gpu_mem_megabytes}]"
+        )
+
+    def collect_model_measurements(self, benchmark_id: str, measurements: dict[str, float]):
+        # Store model measurements for CSV export (if enabled)
+        if self.collect_csv_data:
+            # Add row to pandas DataFrame with flattened measurements
+            row_data = {"benchmark_id": benchmark_id, "time": datetime.utcnow().isoformat()}
+            # Flatten the measurements dict into the row
+            row_data.update(measurements)
+
+            new_row = pd.DataFrame([row_data])
+            self.model_measurements_df = pd.concat([self.model_measurements_df, new_row], ignore_index=True)
+
+        # Store in database if available
+        if self.use_database:
+            with self.conn.cursor() as cur:
+                cur.execute(
+                    """
+                    INSERT INTO model_measurements (
+                        benchmark_id,
+                        measurements
+                    ) VALUES (%s, %s)
+                    """,
+                    (
+                        benchmark_id,
+                        measurements,
+                    ),
+                )
+
+        self.logger.debug(f"collected model measurements for benchmark #{benchmark_id}: {measurements}")
+
+    def export_to_csv(self, output_dir: str = "benchmark_results"):
+        """
+        Export all collected data to CSV files using pandas DataFrames
+        """
+        if not self.collect_csv_data:
+            self.logger.warning("CSV data collection is disabled - no CSV files will be generated")
+            return
+
+        if not os.path.exists(output_dir):
+            os.makedirs(output_dir)
+            self.logger.info(f"Created output directory: {output_dir}")
+
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+        files_created = []
+
+        # Export using pandas DataFrames
+        self._export_pandas_data(output_dir, timestamp, files_created)
+
+        self.logger.info(f"CSV export complete! Created {len(files_created)} files in {output_dir}")
+
+    def _export_pandas_data(self, output_dir: str, timestamp: str, files_created: list):
+        """
+        Export CSV files using pandas DataFrames
+        """
+        # Export benchmarks
+        benchmarks_file = os.path.join(output_dir, f"benchmarks_{timestamp}.csv")
+        self.benchmarks_df.to_csv(benchmarks_file, index=False)
+        files_created.append(benchmarks_file)
+        self.logger.info(f"Exported {len(self.benchmarks_df)} benchmark records to {benchmarks_file}")
+
+        # Export device measurements
+        device_file = os.path.join(output_dir, f"device_measurements_{timestamp}.csv")
+        self.device_measurements_df.to_csv(device_file, index=False)
+        files_created.append(device_file)
+        self.logger.info(f"Exported {len(self.device_measurements_df)} device measurement records to {device_file}")
+
+        # Export model measurements (already flattened)
+        model_file = os.path.join(output_dir, f"model_measurements_{timestamp}.csv")
+        self.model_measurements_df.to_csv(model_file, index=False)
+        files_created.append(model_file)
+        self.logger.info(f"Exported {len(self.model_measurements_df)} model measurement records to {model_file}")
+
+        # Create comprehensive summary using pandas operations
+        summary_file = os.path.join(output_dir, f"benchmark_summary_{timestamp}.csv")
+        self._create_summary(summary_file)
+        files_created.append(summary_file)
+
+    def _create_summary(self, summary_file: str):
+        """
+        Create a comprehensive summary CSV using pandas operations
+        """
+        if len(self.benchmarks_df) == 0:
+            # Create empty summary file
+            summary_df = pd.DataFrame()
+            summary_df.to_csv(summary_file, index=False)
+            self.logger.info(f"Created empty benchmark summary at {summary_file}")
+            return
+
+        # Start with benchmarks as the base
+        summary_df = self.benchmarks_df.copy()
+
+        # Add model measurements (join on benchmark_id)
+        if len(self.model_measurements_df) > 0:
+            # Drop 'time' column from model measurements to avoid conflicts
+            model_df = self.model_measurements_df.drop(columns=["time"], errors="ignore")
+            summary_df = summary_df.merge(model_df, on="benchmark_id", how="left")
+
+        # Calculate device measurement aggregates using pandas groupby
+        if len(self.device_measurements_df) > 0:
+            device_agg = (
+                self.device_measurements_df.groupby("benchmark_id")
+                .agg(
+                    {
+                        "cpu_util": ["mean", "max", "std", "count"],
+                        "mem_megabytes": ["mean", "max", "std"],
+                        "gpu_util": ["mean", "max", "std"],
+                        "gpu_mem_megabytes": ["mean", "max", "std"],
+                    }
+                )
+                .round(3)
+            )
+
+            # Flatten column names
+            device_agg.columns = [f"{col[0]}_{col[1]}" for col in device_agg.columns]
+            device_agg = device_agg.reset_index()
+
+            # Rename count column to be more descriptive
+            if "cpu_util_count" in device_agg.columns:
+                device_agg = device_agg.rename(columns={"cpu_util_count": "device_measurement_count"})
+
+            # Merge with summary
+            summary_df = summary_df.merge(device_agg, on="benchmark_id", how="left")
+
+        # Export the comprehensive summary
+        summary_df.to_csv(summary_file, index=False)
+        self.logger.info(f"Created comprehensive benchmark summary with {len(summary_df)} records at {summary_file}")
+
+    def close(self):
+        if self.use_database and self.conn:
+            self.conn.close()
+
+
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.INFO)
+
+handler = logging.StreamHandler(sys.stdout)
+handler.setLevel(logging.INFO)
+formatter = logging.Formatter("[%(levelname)s - %(asctime)s] %(message)s")
+handler.setFormatter(formatter)
+logger.addHandler(handler)
+
+
+def parse_arguments() -> tuple[str, str, str, str, bool, str]:
+    """
+    Parse command line arguments for the benchmarking CLI.
+    """
+    parser = argparse.ArgumentParser(description="CLI for benchmarking the huggingface/transformers.")
+
+    parser.add_argument(
+        "repository",
+        type=str,
+        help="The repository name on which the benchmarking is performed.",
+    )
+
+    parser.add_argument(
+        "branch",
+        type=str,
+        help="The branch name on which the benchmarking is performed.",
+    )
+
+    parser.add_argument(
+        "commit_id",
+        type=str,
+        help="The commit hash on which the benchmarking is performed.",
+    )
+
+    parser.add_argument(
+        "commit_msg",
+        type=str,
+        help="The commit message associated with the commit, truncated to 70 characters.",
+    )
+
+    parser.add_argument("--csv", action="store_true", default=False, help="Enable CSV output files generation.")
+
+    parser.add_argument(
+        "--csv-output-dir",
+        type=str,
+        default="benchmark_results",
+        help="Directory for CSV output files (default: benchmark_results).",
+    )
+
+    args = parser.parse_args()
+
+    # CSV is disabled by default, only enabled when --csv is used
+    generate_csv = args.csv
+
+    return args.repository, args.branch, args.commit_id, args.commit_msg, generate_csv, args.csv_output_dir
+
+
+def import_from_path(module_name, file_path):
+    try:
+        spec = importlib.util.spec_from_file_location(module_name, file_path)
+        module = importlib.util.module_from_spec(spec)
+        sys.modules[module_name] = module
+        spec.loader.exec_module(module)
+        return module
+    except Exception as e:
+        raise ImportModuleException(f"failed to load python module: {e}")
+
+
+def create_database_connection():
+    """
+    Try to create a database connection. Returns None if connection fails.
+    """
+    if not PSYCOPG2_AVAILABLE:
+        logger.warning("psycopg2 not available - running in CSV-only mode")
+        return None
+
+    try:
+        import psycopg2
+
+        conn = psycopg2.connect("dbname=metrics")
+        logger.info("Successfully connected to database")
+        return conn
+    except Exception as e:
+        logger.warning(f"Failed to connect to database: {e}. Running in CSV-only mode")
+        return None
+
+
+def create_global_metrics_recorder(
+    repository: str, branch: str, commit_id: str, commit_msg: str, generate_csv: bool = False
+) -> MetricsRecorder:
+    """
+    Create a global metrics recorder that will be used across all benchmarks.
+    """
+    connection = create_database_connection()
+    recorder = MetricsRecorder(connection, logger, repository, branch, commit_id, commit_msg, generate_csv)
+
+    # Log the storage mode
+    storage_modes = []
+    if connection is not None:
+        storage_modes.append("database")
+    if generate_csv:
+        storage_modes.append("CSV")
+
+    if not storage_modes:
+        logger.warning("Running benchmarks with NO data storage (no database connection, CSV disabled)")
+        logger.warning("Use --csv flag to enable CSV output when database is unavailable")
+    else:
+        logger.info(f"Running benchmarks with: {' + '.join(storage_modes)} storage")
+
+    return recorder
+
+
+if __name__ == "__main__":
+    benchmarks_folder_path = os.path.dirname(os.path.realpath(__file__))
+    benches_folder_path = os.path.join(benchmarks_folder_path, "benches")
+
+    repository, branch, commit_id, commit_msg, generate_csv, csv_output_dir = parse_arguments()
+
+    # Create a global metrics recorder
+    global_metrics_recorder = create_global_metrics_recorder(repository, branch, commit_id, commit_msg, generate_csv)
+
+    successful_benchmarks = 0
+    failed_benchmarks = 0
+
+    # Automatically discover all benchmark modules in benches/ folder
+    benchmark_modules = []
+
+    if os.path.exists(benches_folder_path):
+        logger.debug(f"Scanning for benchmarks in: {benches_folder_path}")
+        for entry in os.scandir(benches_folder_path):
+            if not entry.name.endswith(".py"):
+                continue
+            if entry.name.startswith("__"):  # Skip __init__.py, __pycache__, etc.
+                continue
+
+            # Check if the file has a run_benchmark function
+            try:
+                logger.debug(f"checking if benches/{entry.name} has run_benchmark function")
+                module = import_from_path(entry.name.split(".")[0], entry.path)
+                if hasattr(module, "run_benchmark"):
+                    benchmark_modules.append(entry.name)
+                    logger.debug(f"discovered benchmark: {entry.name}")
+                else:
+                    logger.debug(f"skipping {entry.name} - no run_benchmark function found")
+            except Exception as e:
+                logger.debug(f"failed to check benches/{entry.name}: {e}")
+    else:
+        logger.warning(f"Benches directory not found: {benches_folder_path}")
+
+    if benchmark_modules:
+        logger.info(f"Discovered {len(benchmark_modules)} benchmark(s): {benchmark_modules}")
+    else:
+        logger.warning("No benchmark modules found in benches/ directory")
+
+    for module_name in benchmark_modules:
+        module_path = os.path.join(benches_folder_path, module_name)
+        try:
+            logger.debug(f"loading: {module_name}")
+            module = import_from_path(module_name.split(".")[0], module_path)
+            logger.info(f"running benchmarks in: {module_name}")
+
+            # Check if the module has an updated run_benchmark function that accepts metrics_recorder
+            try:
+                # Try the new signature first
+                module.run_benchmark(logger, repository, branch, commit_id, commit_msg, global_metrics_recorder)
+            except TypeError:
+                # Fall back to the old signature for backward compatibility
+                logger.warning(
+                    f"Module {module_name} using old run_benchmark signature - database connection will be created per module"
+                )
+                module.run_benchmark(logger, repository, branch, commit_id, commit_msg)
+
+            successful_benchmarks += 1
+        except ImportModuleException as e:
+            logger.error(e)
+            failed_benchmarks += 1
+        except Exception as e:
+            logger.error(f"error running benchmarks for {module_name}: {e}")
+            failed_benchmarks += 1
+
+    # Export CSV results at the end (if enabled)
+    try:
+        if generate_csv:
+            global_metrics_recorder.export_to_csv(csv_output_dir)
+            logger.info(f"CSV reports have been generated and saved to the {csv_output_dir} directory")
+        else:
+            logger.info("CSV generation disabled - no CSV files created (use --csv to enable)")
+
+        logger.info(f"Benchmark run completed. Successful: {successful_benchmarks}, Failed: {failed_benchmarks}")
+    except Exception as e:
+        logger.error(f"Failed to export CSV results: {e}")
+    finally:
+        global_metrics_recorder.close()
--- a/benchmark/config/generation.yaml
+++ b/benchmark/config/generation.yaml
@@ -0,0 +1,57 @@
+defaults:
+  - benchmark # inheriting benchmark schema
+  - scenario: inference
+  - launcher: process
+  - backend: pytorch
+  - _self_ # for hydra 1.1 compatibility
+
+name: pytorch_generate
+
+launcher:
+  start_method: spawn
+  device_isolation: true
+  device_isolation_action: warn
+
+backend:
+  device: cuda
+  device_ids: 0
+  no_weights: true
+  model: meta-llama/Llama-2-7b-hf
+  cache_implementation: static
+  torch_compile: true
+  dtype: float16
+  torch_compile_config:
+    backend: inductor
+    mode: reduce-overhead
+    fullgraph: true
+
+scenario:
+  input_shapes:
+    batch_size: 1
+    sequence_length: 7
+  generate_kwargs:
+    max_new_tokens: 128
+    min_new_tokens: 128
+    do_sample: false
+  memory: true
+  latency: true
+  iterations: 2
+  duration: 0
+
+
+# hydra/cli specific settings
+hydra:
+  run:
+    # where to store run results
+    dir: runs/${name}
+  job:
+    # change working directory to the run directory
+    chdir: true
+    env_set:
+      # set environment variable OVERRIDE_BENCHMARKS to 1
+      # to not skip benchmarks that have been run before
+      OVERRIDE_BENCHMARKS: 1
+      LOG_LEVEL: WARN
+  sweep:
+    dir: multirun
+    subdir: ${hydra.job.override_dirname}
--- a/benchmark/default.yml
+++ b/benchmark/default.yml
@@ -0,0 +1,10 @@
+apiVersion: 1
+
+providers:
+  - name: 'Transformers Benchmarks'
+    orgId: 1
+    type: file
+    updateIntervalSeconds: 10
+    allowUiUpdates: true
+    options:
+      path: /etc/grafana/dashboards
--- a/benchmark/grafana_dashboard.json
+++ b/benchmark/grafana_dashboard.json
--- a/benchmark/grafana_datasource.yaml
+++ b/benchmark/grafana_datasource.yaml
@@ -0,0 +1,17 @@
+apiVersion: 1
+datasources:
+  - name: grafana-postgresql-datasource
+    uid: be28nkzirtb0gd
+    type: postgres
+    url: $GRAFANA_POSTGRES_DATASOURCE_URL
+    user: $GRAFANA_POSTGRES_DATASOURCE_USER
+    secureJsonData:
+      password: $GRAFANA_POSTGRES_DATASOURCE_PWD
+    jsonData:
+      database: metrics
+      maxOpenConns: 100
+      maxIdleConns: 100
+      maxIdleConnsAuto: true
+      connMaxLifetime: 14400
+      postgresVersion: 1000
+      timescaledb: false
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				`$PYTHON setup.py install # Python command to install the script.`