gavin/transformers

Fork 0

Files

陈赣 06f1fd69a6

Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled

Details

Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled

Details

Build documentation / build (push) Has been cancelled

Details

Build documentation / build_other_lang (push) Has been cancelled

Details

CodeQL Security Analysis / CodeQL Analysis (push) Has been cancelled

Details

New model PR merged notification / Notify new model (push) Has been cancelled

Details

PR CI / pr-ci (push) Has been cancelled

Details

Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled

Details

Secret Leaks / trufflehog (push) Has been cancelled

Details

Update Transformers metadata / build_and_package (push) Has been cancelled

Details

Slow tests on important models (on Push - A10) / Model CI (push) Has been cancelled

Details

Check Tiny Models / Check tiny models (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled

Details

Nvidia CI - Flash Attn / Setup (push) Has been cancelled

Details

Nvidia CI - Flash Attn / Model CI (push) Has been cancelled

Details

Nvidia CI / Setup (push) Has been cancelled

Details

Nvidia CI / Model CI (push) Has been cancelled

Details

Nvidia CI / Torch pipeline CI (push) Has been cancelled

Details

Nvidia CI / Example CI (push) Has been cancelled

Details

Nvidia CI / Trainer/FSDP CI (push) Has been cancelled

Details

Nvidia CI / DeepSpeed CI (push) Has been cancelled

Details

Nvidia CI / Quantization CI (push) Has been cancelled

Details

Nvidia CI / Kernels CI (push) Has been cancelled

Details

Doctests / Setup (push) Has been cancelled

Details

Doctests / Call doctest jobs (push) Has been cancelled

Details

Doctests / Send results to webhook (push) Has been cancelled

Details

Extras Smoke Test / Get supported Python versions (push) Has been cancelled

Details

Extras Smoke Test / Test extras on Python ${{ matrix.python-version }} (push) Has been cancelled

Details

Extras Smoke Test / Check Slack token availability (push) Has been cancelled

Details

Extras Smoke Test / Notify failures to Slack (push) Has been cancelled

Details

Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled

Details

Stale Bot / Close Stale Issues (push) Has been cancelled

Details

first commit

2026-06-05 16:53:03 +08:00

9.3 KiB

Raw Permalink Blame History

This model was published in HF papers on 2020-10-02 and contributed to Hugging Face Transformers on 2021-05-03.

LUKE

Overview

The LUKE model was proposed in LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda and Yuji Matsumoto. It is based on RoBERTa and adds entity embeddings as well as an entity-aware self-attention mechanism, which helps improve performance on various downstream tasks involving reasoning about entities such as named entity recognition, extractive and cloze-style question answering, entity typing, and relation classification.

The abstract from the paper is the following:

Entity representations are useful in natural language tasks involving entities. In this paper, we propose new pretrained contextualized representations of words and entities based on the bidirectional transformer. The proposed model treats words and entities in a given text as independent tokens, and outputs contextualized representations of them. Our model is trained using a new pretraining task based on the masked language model of BERT. The task involves predicting randomly masked words and entities in a large entity-annotated corpus retrieved from Wikipedia. We also propose an entity-aware self-attention mechanism that is an extension of the self-attention mechanism of the transformer, and considers the types of tokens (words or entities) when computing attention scores. The proposed model achieves impressive empirical performance on a wide range of entity-related tasks. In particular, it obtains state-of-the-art results on five well-known datasets: Open Entity (entity typing), TACRED (relation classification), CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), and SQuAD 1.1 (extractive question answering).

This model was contributed by ikuyamada and nielsr. The original code can be found here.

Usage tips

This implementation is the same as [RobertaModel] with the addition of entity embeddings as well as an entity-aware self-attention mechanism, which improves performance on tasks involving reasoning about entities.
LUKE treats entities as input tokens; therefore, it takes entity_ids, entity_attention_mask, entity_token_type_ids and entity_position_ids as extra input. You can obtain those using [LukeTokenizer].
[LukeTokenizer] takes entities and entity_spans (character-based start and end positions of the entities in the input text) as extra input. entities typically consist of [MASK] entities or Wikipedia entities. The brief description when inputting these entities are as follows:
- Inputting [MASK] entities to compute entity representations: The [MASK] entity is used to mask entities to be predicted during pretraining. When LUKE receives the [MASK] entity, it tries to predict the original entity by gathering the information about the entity from the input text. Therefore, the [MASK] entity can be used to address downstream tasks requiring the information of entities in text such as entity typing, relation classification, and named entity recognition.
- Inputting Wikipedia entities to compute knowledge-enhanced token representations: LUKE learns rich information (or knowledge) about Wikipedia entities during pretraining and stores the information in its entity embedding. By using Wikipedia entities as input tokens, LUKE outputs token representations enriched by the information stored in the embeddings of these entities. This is particularly effective for tasks requiring real-world knowledge, such as question answering.
There are three head models for the former use case:
- [LukeForEntityClassification], for tasks to classify a single entity in an input text such as entity typing, e.g. the Open Entity dataset. This model places a linear head on top of the output entity representation.
- [LukeForEntityPairClassification], for tasks to classify the relationship between two entities such as relation classification, e.g. the TACRED dataset. This model places a linear head on top of the concatenated output representation of the pair of given entities.
- [LukeForEntitySpanClassification], for tasks to classify the sequence of entity spans, such as named entity recognition (NER). This model places a linear head on top of the output entity representations. You can address NER using this model by inputting all possible entity spans in the text to the model.
[LukeTokenizer] has a task argument, which enables you to easily create an input to these head models by specifying task="entity_classification", task="entity_pair_classification", or task="entity_span_classification". Please refer to the example code of each head models.

Usage example:

from transformers import LukeForEntityPairClassification, LukeModel, LukeTokenizer


model = LukeModel.from_pretrained("studio-ousia/luke-base", device_map="auto")
tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-base")
# Example 1: Computing the contextualized entity representation corresponding to the entity mention "Beyoncé"

text = "Beyoncé lives in Los Angeles."
entity_spans = [(0, 7)]  # character-based entity span corresponding to "Beyoncé"
inputs = tokenizer(text, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt").to(model.device)
outputs = model(**inputs)
word_last_hidden_state = outputs.last_hidden_state
entity_last_hidden_state = outputs.entity_last_hidden_state
# Example 2: Inputting Wikipedia entities to obtain enriched contextualized representations

entities = [
    "Beyoncé",
    "Los Angeles",
]  # Wikipedia entity titles corresponding to the entity mentions "Beyoncé" and "Los Angeles"
entity_spans = [(0, 7), (17, 28)]  # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
inputs = tokenizer(text, entities=entities, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt").to(model.device)
outputs = model(**inputs)
word_last_hidden_state = outputs.last_hidden_state
entity_last_hidden_state = outputs.entity_last_hidden_state
# Example 3: Classifying the relationship between two entities using LukeForEntityPairClassification head model

model = LukeForEntityPairClassification.from_pretrained("studio-ousia/luke-large-finetuned-tacred", device_map="auto")
tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
entity_spans = [(0, 7), (17, 28)]  # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
inputs = tokenizer(text, entity_spans=entity_spans, return_tensors="pt").to(model.device)
outputs = model(**inputs)
logits = outputs.logits
predicted_class_idx = int(logits[0].argmax())
print("Predicted class:", model.config.id2label[predicted_class_idx])

Resources

LukeConfig

autodoc LukeConfig

LukeTokenizer

autodoc LukeTokenizer - call - save_vocabulary

LukeModel

autodoc LukeModel - forward

LukeForMaskedLM

autodoc LukeForMaskedLM - forward

LukeForEntityClassification

autodoc LukeForEntityClassification - forward

LukeForEntityPairClassification

autodoc LukeForEntityPairClassification - forward

LukeForEntitySpanClassification

autodoc LukeForEntitySpanClassification - forward

LukeForSequenceClassification

autodoc LukeForSequenceClassification - forward

LukeForMultipleChoice

autodoc LukeForMultipleChoice - forward

LukeForTokenClassification

autodoc LukeForTokenClassification - forward

LukeForQuestionAnswering

autodoc LukeForQuestionAnswering - forward

9.3 KiB Raw Permalink Blame History

LUKE

Overview

Usage tips

Resources

LukeConfig

LukeTokenizer

LukeModel

LukeForMaskedLM

LukeForEntityClassification

LukeForEntityPairClassification

LukeForEntitySpanClassification

LukeForSequenceClassification

LukeForMultipleChoice

LukeForTokenClassification

LukeForQuestionAnswering

9.3 KiB

Raw Permalink Blame History