Files
transformers/docs/source/en/model_doc/big_bird.md
陈赣 06f1fd69a6
Some checks failed
Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled
Build documentation / build (push) Has been cancelled
Build documentation / build_other_lang (push) Has been cancelled
CodeQL Security Analysis / CodeQL Analysis (push) Has been cancelled
New model PR merged notification / Notify new model (push) Has been cancelled
PR CI / pr-ci (push) Has been cancelled
Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Update Transformers metadata / build_and_package (push) Has been cancelled
Slow tests on important models (on Push - A10) / Model CI (push) Has been cancelled
Check Tiny Models / Check tiny models (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI - Flash Attn / Setup (push) Has been cancelled
Nvidia CI - Flash Attn / Model CI (push) Has been cancelled
Nvidia CI / Setup (push) Has been cancelled
Nvidia CI / Model CI (push) Has been cancelled
Nvidia CI / Torch pipeline CI (push) Has been cancelled
Nvidia CI / Example CI (push) Has been cancelled
Nvidia CI / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI / DeepSpeed CI (push) Has been cancelled
Nvidia CI / Quantization CI (push) Has been cancelled
Nvidia CI / Kernels CI (push) Has been cancelled
Doctests / Setup (push) Has been cancelled
Doctests / Call doctest jobs (push) Has been cancelled
Doctests / Send results to webhook (push) Has been cancelled
Extras Smoke Test / Get supported Python versions (push) Has been cancelled
Extras Smoke Test / Test extras on Python ${{ matrix.python-version }} (push) Has been cancelled
Extras Smoke Test / Check Slack token availability (push) Has been cancelled
Extras Smoke Test / Notify failures to Slack (push) Has been cancelled
Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled
Stale Bot / Close Stale Issues (push) Has been cancelled
first commit
2026-06-05 16:53:03 +08:00

4.9 KiB
Raw Blame History

This model was published in HF papers on 2020-07-28 and contributed to Hugging Face Transformers on 2021-03-30.

PyTorch

BigBird

BigBird is a transformer model built to handle sequence lengths up to 4096 compared to 512 for BERT. Traditional transformers struggle with long inputs because attention gets really expensive as the sequence length grows. BigBird fixes this by using a sparse attention mechanism, which means it doesnt try to look at everything at once. Instead, it mixes in local attention, random attention, and a few global tokens to process the whole input. This combination gives it the best of both worlds. It keeps the computation efficient while still capturing enough of the sequence to understand it well. Because of this, BigBird is great at tasks involving long documents, like question answering, summarization, and genomic applications.

You can find all the original BigBird checkpoints under the Google organization.

Tip

Click on the BigBird models in the right sidebar for more examples of how to apply BigBird to different language tasks.

The example below demonstrates how to predict the [MASK] token with [Pipeline], [AutoModel], and from the command line.

from transformers import pipeline


pipeline = pipeline(
    task="fill-mask",
    model="google/bigbird-roberta-base",
    device=0
)
pipeline("Plants create [MASK] through a process known as photosynthesis.")
import torch

from transformers import AutoModelForMaskedLM, AutoTokenizer


tokenizer = AutoTokenizer.from_pretrained(
    "google/bigbird-roberta-base",
)
model = AutoModelForMaskedLM.from_pretrained(
    "google/bigbird-roberta-base",
    device_map="auto",
)
inputs = tokenizer("Plants create [MASK] through a process known as photosynthesis.", return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = outputs.logits

masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1]
predicted_token_id = predictions[0, masked_index].argmax(dim=-1)
predicted_token = tokenizer.decode(predicted_token_id)

print(f"The predicted token is: {predicted_token}")

Notes

  • Inputs should be padded on the right because BigBird uses absolute position embeddings.
  • BigBird supports original_full and block_sparse attention. If the input sequence length is less than 1024, it is recommended to use original_full since sparse patterns don't offer much benefit for smaller inputs.
  • The current implementation uses window size of 3 blocks and 2 global blocks, only supports the ITC-implementation, and doesn't support num_random_blocks=0.
  • The sequence length must be divisible by the block size.

Resources

  • Read the BigBird blog post for more details about how its attention works.

BigBirdConfig

autodoc BigBirdConfig

BigBirdTokenizer

autodoc BigBirdTokenizer - get_special_tokens_mask - save_vocabulary

BigBirdTokenizerFast

autodoc BigBirdTokenizerFast

BigBird specific outputs

autodoc models.big_bird.modeling_big_bird.BigBirdForPreTrainingOutput

BigBirdModel

autodoc BigBirdModel - forward

BigBirdForPreTraining

autodoc BigBirdForPreTraining - forward

BigBirdForCausalLM

autodoc BigBirdForCausalLM - forward

BigBirdForMaskedLM

autodoc BigBirdForMaskedLM - forward

BigBirdForSequenceClassification

autodoc BigBirdForSequenceClassification - forward

BigBirdForMultipleChoice

autodoc BigBirdForMultipleChoice - forward

BigBirdForTokenClassification

autodoc BigBirdForTokenClassification - forward

BigBirdForQuestionAnswering

autodoc BigBirdForQuestionAnswering - forward