gavin/transformers

Fork 0

Files

陈赣 06f1fd69a6

Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled

Details

Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled

Details

Build documentation / build (push) Has been cancelled

Details

Build documentation / build_other_lang (push) Has been cancelled

Details

CodeQL Security Analysis / CodeQL Analysis (push) Has been cancelled

Details

New model PR merged notification / Notify new model (push) Has been cancelled

Details

PR CI / pr-ci (push) Has been cancelled

Details

Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled

Details

Secret Leaks / trufflehog (push) Has been cancelled

Details

Update Transformers metadata / build_and_package (push) Has been cancelled

Details

Slow tests on important models (on Push - A10) / Model CI (push) Has been cancelled

Details

Check Tiny Models / Check tiny models (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled

Details

Nvidia CI - Flash Attn / Setup (push) Has been cancelled

Details

Nvidia CI - Flash Attn / Model CI (push) Has been cancelled

Details

Nvidia CI / Setup (push) Has been cancelled

Details

Nvidia CI / Model CI (push) Has been cancelled

Details

Nvidia CI / Torch pipeline CI (push) Has been cancelled

Details

Nvidia CI / Example CI (push) Has been cancelled

Details

Nvidia CI / Trainer/FSDP CI (push) Has been cancelled

Details

Nvidia CI / DeepSpeed CI (push) Has been cancelled

Details

Nvidia CI / Quantization CI (push) Has been cancelled

Details

Nvidia CI / Kernels CI (push) Has been cancelled

Details

Doctests / Setup (push) Has been cancelled

Details

Doctests / Call doctest jobs (push) Has been cancelled

Details

Doctests / Send results to webhook (push) Has been cancelled

Details

Extras Smoke Test / Get supported Python versions (push) Has been cancelled

Details

Extras Smoke Test / Test extras on Python ${{ matrix.python-version }} (push) Has been cancelled

Details

Extras Smoke Test / Check Slack token availability (push) Has been cancelled

Details

Extras Smoke Test / Notify failures to Slack (push) Has been cancelled

Details

Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled

Details

Stale Bot / Close Stale Issues (push) Has been cancelled

Details

first commit

2026-06-05 16:53:03 +08:00

7.4 KiB

Raw Blame History

This model was published in HF papers on 2022-04-18 and contributed to Hugging Face Transformers on 2022-05-24.

LayoutLMv3

Overview

The LayoutLMv3 model was proposed in LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei. LayoutLMv3 simplifies LayoutLMv2 by using patch embeddings (as in ViT) instead of leveraging a CNN backbone, and pre-trains the model on 3 objectives: masked language modeling (MLM), masked image modeling (MIM) and word-patch alignment (WPA).

The abstract from the paper is the following:

Self-supervised pre-training techniques have achieved remarkable progress in Document AI. Most multimodal pre-trained models use a masked language modeling objective to learn bidirectional representations on the text modality, but they differ in pre-training objectives for the image modality. This discrepancy adds difficulty to multimodal representation learning. In this paper, we propose LayoutLMv3 to pre-train multimodal Transformers for Document AI with unified text and image masking. Additionally, LayoutLMv3 is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model for both text-centric and image-centric Document AI tasks. Experimental results show that LayoutLMv3 achieves state-of-the-art performance not only in text-centric tasks, including form understanding, receipt understanding, and document visual question answering, but also in image-centric tasks such as document image classification and document layout analysis.

LayoutLMv3 architecture. Taken from the original paper.

This model was contributed by nielsr. The original code can be found here.

Usage tips

In terms of data processing, LayoutLMv3 is identical to its predecessor LayoutLMv2, except that:
- images need to be resized and normalized with channels in regular RGB format. LayoutLMv2 on the other hand normalizes the images internally and expects the channels in BGR format.
- text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece. Due to these differences in data preprocessing, one can use [LayoutLMv3Processor] which internally combines a [LayoutLMv3ImageProcessor] (for the image modality) and a [LayoutLMv3Tokenizer]/[LayoutLMv3TokenizerFast] (for the text modality) to prepare all data for the model.
Regarding usage of [LayoutLMv3Processor], we refer to the usage guide of its predecessor.

Resources

A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with LayoutLMv3. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.

LayoutLMv3 is nearly identical to LayoutLMv2, so we've also included LayoutLMv2 resources you can adapt for LayoutLMv3 tasks. For these notebooks, take care to use [LayoutLMv2Processor] instead when preparing data for the model!

Demo notebooks for LayoutLMv3 can be found here.
Demo scripts can be found here.

[LayoutLMv2ForSequenceClassification] is supported by this notebook.
Text classification task guide

[LayoutLMv3ForTokenClassification] is supported by this example script and notebook.
A notebook for how to perform inference with [LayoutLMv2ForTokenClassification] and a notebook for how to perform inference when no labels are available with [LayoutLMv2ForTokenClassification].
A notebook for how to finetune [LayoutLMv2ForTokenClassification] with the 🤗 Trainer.
Token classification task guide

[LayoutLMv2ForQuestionAnswering] is supported by this notebook.
Question answering task guide

Document question answering

Document question answering task guide

LayoutLMv3Config

autodoc LayoutLMv3Config

LayoutLMv3ImageProcessor

autodoc LayoutLMv3ImageProcessor - preprocess

LayoutLMv3ImageProcessorPil

autodoc LayoutLMv3ImageProcessorPil - preprocess

LayoutLMv3Tokenizer

autodoc LayoutLMv3Tokenizer - call - save_vocabulary

LayoutLMv3TokenizerFast

autodoc LayoutLMv3TokenizerFast - call

LayoutLMv3Processor

autodoc LayoutLMv3Processor - call

LayoutLMv3Model

autodoc LayoutLMv3Model - forward

LayoutLMv3ForSequenceClassification

autodoc LayoutLMv3ForSequenceClassification - forward

LayoutLMv3ForTokenClassification

autodoc LayoutLMv3ForTokenClassification - forward

LayoutLMv3ForQuestionAnswering

autodoc LayoutLMv3ForQuestionAnswering - forward

7.4 KiB Raw Blame History

LayoutLMv3

Overview

Usage tips

Resources

LayoutLMv3Config

LayoutLMv3ImageProcessor

LayoutLMv3ImageProcessorPil

LayoutLMv3Tokenizer

LayoutLMv3TokenizerFast

LayoutLMv3Processor

LayoutLMv3Model

LayoutLMv3ForSequenceClassification

LayoutLMv3ForTokenClassification

LayoutLMv3ForQuestionAnswering

7.4 KiB

Raw Blame History