Files
transformers/docs/source/en/troubleshooting.md
陈赣 06f1fd69a6
Some checks failed
Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled
Build documentation / build (push) Has been cancelled
Build documentation / build_other_lang (push) Has been cancelled
CodeQL Security Analysis / CodeQL Analysis (push) Has been cancelled
New model PR merged notification / Notify new model (push) Has been cancelled
PR CI / pr-ci (push) Has been cancelled
Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Update Transformers metadata / build_and_package (push) Has been cancelled
Slow tests on important models (on Push - A10) / Model CI (push) Has been cancelled
Check Tiny Models / Check tiny models (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI - Flash Attn / Setup (push) Has been cancelled
Nvidia CI - Flash Attn / Model CI (push) Has been cancelled
Nvidia CI / Setup (push) Has been cancelled
Nvidia CI / Model CI (push) Has been cancelled
Nvidia CI / Torch pipeline CI (push) Has been cancelled
Nvidia CI / Example CI (push) Has been cancelled
Nvidia CI / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI / DeepSpeed CI (push) Has been cancelled
Nvidia CI / Quantization CI (push) Has been cancelled
Nvidia CI / Kernels CI (push) Has been cancelled
Doctests / Setup (push) Has been cancelled
Doctests / Call doctest jobs (push) Has been cancelled
Doctests / Send results to webhook (push) Has been cancelled
Extras Smoke Test / Get supported Python versions (push) Has been cancelled
Extras Smoke Test / Test extras on Python ${{ matrix.python-version }} (push) Has been cancelled
Extras Smoke Test / Check Slack token availability (push) Has been cancelled
Extras Smoke Test / Notify failures to Slack (push) Has been cancelled
Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled
Stale Bot / Close Stale Issues (push) Has been cancelled
first commit
2026-06-05 16:53:03 +08:00

7.7 KiB

Troubleshoot

Sometimes errors occur, but we are here to help! This guide covers some of the most common issues we've seen and how you can resolve them. However, this guide isn't meant to be a comprehensive collection of every 🤗 Transformers issue. For more help with troubleshooting your issue, try:

  1. Asking for help on the forums. There are specific categories you can post your question to, like Beginners or 🤗 Transformers. Make sure you write a good descriptive forum post with some reproducible code to maximize the likelihood that your problem is solved!
  1. Create an Issue on the 🤗 Transformers repository if it is a bug related to the library. Try to include as much information describing the bug as possible to help us better figure out what's wrong and how we can fix it.

  2. Check the Migration guide if you use an older version of 🤗 Transformers since some important changes have been introduced between versions.

For more details about troubleshooting and getting help, take a look at Chapter 8 of the Hugging Face course.

Firewalled environments

Some GPU instances on cloud and intranet setups are firewalled to external connections, resulting in a connection error. When your script attempts to download model weights or datasets, the download will hang and then timeout with the following message:

ValueError: Connection error, and we cannot find the requested files in the cached path.
Please try again or make sure your Internet connection is on.

In this case, you should try to run 🤗 Transformers on offline mode to avoid the connection error.

CUDA out of memory

Training large models with millions of parameters can be challenging without the appropriate hardware. A common error you may encounter when the GPU runs out of memory is:

CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 11.17 GiB total capacity; 9.70 GiB already allocated; 179.81 MiB free; 9.85 GiB reserved in total by PyTorch)

Here are some potential solutions you can try to lessen memory use:

Refer to the Performance guide for more details about memory-saving techniques.

ImportError

Another common error you may encounter, especially if it is a newly released model, is ImportError:

ImportError: cannot import name 'ImageGPTImageProcessor' from 'transformers' (unknown location)

For these error types, check to make sure you have the latest version of 🤗 Transformers installed to access the most recent models:

pip install transformers --upgrade

CUDA error: device-side assert triggered

Sometimes you may run into a generic CUDA error about an error in the device code.

RuntimeError: CUDA error: device-side assert triggered

You should try to run the code on a CPU first to get a more descriptive error message. Add the following environment variable to the beginning of your code to switch to a CPU:

>>> import os

>>> os.environ["CUDA_VISIBLE_DEVICES"] = ""

Another option is to get a better traceback from the GPU. Add the following environment variable to the beginning of your code to get the traceback to point to the source of the error:

>>> import os

>>> os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

Incorrect output when padding tokens aren't masked

In some cases, the output hidden_state may be incorrect if the input_ids include padding tokens. To demonstrate, load a model and tokenizer. You can access a model's pad_token_id to see its value. The pad_token_id may be None for some models, but you can always manually set it.

>>> from transformers import AutoModelForSequenceClassification
>>> import torch

>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-uncased")
>>> model.config.pad_token_id
0

The following example shows the output without masking the padding tokens:

>>> input_ids = torch.tensor([[7592, 2057, 2097, 2393, 9611, 2115], [7592, 0, 0, 0, 0, 0]])
>>> output = model(input_ids)
>>> print(output.logits)
tensor([[ 0.0082, -0.2307],
        [ 0.1317, -0.1683]], grad_fn=<AddmmBackward0>)

Here is the actual output of the second sequence:

>>> input_ids = torch.tensor([[7592]])
>>> output = model(input_ids)
>>> print(output.logits)
tensor([[-0.1008, -0.4061]], grad_fn=<AddmmBackward0>)

Most of the time, you should provide an attention_mask to your model to ignore the padding tokens to avoid this silent error. Now the output of the second sequence matches its actual output:

By default, the tokenizer creates an attention_mask for you based on your specific tokenizer's defaults.

>>> attention_mask = torch.tensor([[1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 0]])
>>> output = model(input_ids, attention_mask=attention_mask)
>>> print(output.logits)
tensor([[ 0.0082, -0.2307],
        [-0.1008, -0.4061]], grad_fn=<AddmmBackward0>)

🤗 Transformers doesn't automatically create an attention_mask to mask a padding token if it is provided because:

  • Some models don't have a padding token.
  • For some use-cases, users want a model to attend to a padding token.

ValueError: Unrecognized configuration class XYZ for this kind of AutoModel

Generally, we recommend using the [AutoModel] class to load pretrained instances of models. This class can automatically infer and load the correct architecture from a given checkpoint based on the configuration. If you see this ValueError when loading a model from a checkpoint, this means the Auto class couldn't find a mapping from the configuration in the given checkpoint to the kind of model you are trying to load. Most commonly, this happens when a checkpoint doesn't support a given task. For instance, you'll see this error in the following example because there is no GPT2 for question answering:

>>> from transformers import AutoProcessor, AutoModelForQuestionAnswering

>>> processor = AutoProcessor.from_pretrained("openai-community/gpt2-medium")
>>> model = AutoModelForQuestionAnswering.from_pretrained("openai-community/gpt2-medium")
ValueError: Unrecognized configuration class <class 'transformers.models.gpt2.configuration_gpt2.GPT2Config'> for this kind of AutoModel: AutoModelForQuestionAnswering.
Model type should be one of AlbertConfig, BartConfig, BertConfig, BigBirdConfig, BigBirdPegasusConfig, BloomConfig, ...