gavin/transformers

Fork 0

Files

陈赣 06f1fd69a6

Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled

Details

Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled

Details

Build documentation / build (push) Has been cancelled

Details

Build documentation / build_other_lang (push) Has been cancelled

Details

CodeQL Security Analysis / CodeQL Analysis (push) Has been cancelled

Details

New model PR merged notification / Notify new model (push) Has been cancelled

Details

PR CI / pr-ci (push) Has been cancelled

Details

Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled

Details

Secret Leaks / trufflehog (push) Has been cancelled

Details

Update Transformers metadata / build_and_package (push) Has been cancelled

Details

Slow tests on important models (on Push - A10) / Model CI (push) Has been cancelled

Details

Check Tiny Models / Check tiny models (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled

Details

Nvidia CI - Flash Attn / Setup (push) Has been cancelled

Details

Nvidia CI - Flash Attn / Model CI (push) Has been cancelled

Details

Nvidia CI / Setup (push) Has been cancelled

Details

Nvidia CI / Model CI (push) Has been cancelled

Details

Nvidia CI / Torch pipeline CI (push) Has been cancelled

Details

Nvidia CI / Example CI (push) Has been cancelled

Details

Nvidia CI / Trainer/FSDP CI (push) Has been cancelled

Details

Nvidia CI / DeepSpeed CI (push) Has been cancelled

Details

Nvidia CI / Quantization CI (push) Has been cancelled

Details

Nvidia CI / Kernels CI (push) Has been cancelled

Details

Doctests / Setup (push) Has been cancelled

Details

Doctests / Call doctest jobs (push) Has been cancelled

Details

Doctests / Send results to webhook (push) Has been cancelled

Details

Extras Smoke Test / Get supported Python versions (push) Has been cancelled

Details

Extras Smoke Test / Test extras on Python ${{ matrix.python-version }} (push) Has been cancelled

Details

Extras Smoke Test / Check Slack token availability (push) Has been cancelled

Details

Extras Smoke Test / Notify failures to Slack (push) Has been cancelled

Details

Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled

Details

Stale Bot / Close Stale Issues (push) Has been cancelled

Details

first commit

2026-06-05 16:53:03 +08:00

7.6 KiB

Raw Blame History

Processors

Transformers ライブラリでは、プロセッサは 2 つの異なる意味を持ちます。

Wav2Vec2 などのマルチモーダルモデルの入力を前処理するオブジェクト (音声とテキスト) または CLIP (テキストとビジョン)
古いバージョンのライブラリで GLUE または SQUAD のデータを前処理するために使用されていたオブジェクトは非推奨になりました。

マルチモーダルモデルでは、オブジェクトが複数のモダリティ (テキスト、視覚と音声）。これは、2 つ以上の処理オブジェクトをグループ化するプロセッサーと呼ばれるオブジェクトによって処理されます。トークナイザー (テキストモダリティ用)、画像プロセッサー (視覚用)、特徴抽出器 (オーディオ用) など。

これらのプロセッサは、保存およびロード機能を実装する次の基本クラスを継承します。

autodoc ProcessorMixin

Deprecated processors

すべてのプロセッサは、同じアーキテクチャに従っています。 [~data.processors.utils.DataProcessor]。プロセッサは次のリストを返します。 [~data.processors.utils.InputExample]。これら [~data.processors.utils.InputExample] は次のように変換できます。 [~data.processors.utils.Input features] をモデルにフィードします。

autodoc data.processors.utils.DataProcessor

autodoc data.processors.utils.InputExample

autodoc data.processors.utils.InputFeatures

GLUE

一般言語理解評価 (GLUE) は、既存の NLU タスクの多様なセットにわたるモデルのパフォーマンス。紙と同時発売された GLUE: A 自然言語理解のためのマルチタスクベンチマークおよび分析プラットフォーム

このライブラリは、MRPC、MNLI、MNLI (不一致)、CoLA、SST2、STSB、 QQP、QNLI、RTE、WNLI。

それらのプロセッサは次のとおりです。

[~data.processors.utils.MrpcProcessor]
[~data.processors.utils.MnliProcessor]
[~data.processors.utils.MnliMismatchedProcessor]
[~data.processors.utils.Sst2Processor]
[~data.processors.utils.StsbProcessor]
[~data.processors.utils.QqpProcessor]
[~data.processors.utils.QnliProcessor]
[~data.processors.utils.RteProcessor]
[~data.processors.utils.WnliProcessor]

さらに、次のメソッドを使用して、データファイルから値をロードし、それらをリストに変換することができます。 [~data.processors.utils.InputExample]。

autodoc data.processors.glue.glue_convert_examples_to_features

XNLI

クロスリンガル NLI コーパス (XNLI) は、言語を超えたテキスト表現の品質。 XNLI は、MultiNLI に基づくクラウドソースのデータセットです。テキストのペアには、15 個のテキスト含意アノテーションがラベル付けされています。さまざまな言語 (英語などの高リソース言語とスワヒリ語などの低リソース言語の両方を含む)。

論文 XNLI: Evaluating Cross-lingual Sentence Representations と同時にリリースされました。

このライブラリは、XNLI データをロードするプロセッサをホストします。

[~data.processors.utils.XnliProcessor]

テストセットにはゴールドラベルが付いているため、評価はテストセットで行われますのでご了承ください。

これらのプロセッサを使用する例は、run_xnli.py スクリプトに示されています。

SQuAD

The Stanford Question Answering Dataset (SQuAD) は、次のベンチマークです。質問応答に関するモデルのパフォーマンスを評価します。 v1.1 と v2.0 の 2 つのバージョンが利用可能です。最初のバージョン (v1.1) は、論文 SQuAD: 100,000+ question for Machine Comprehension of Text とともにリリースされました。 2 番目のバージョン (v2.0) は、論文 Know What You Don't と同時にリリースされました。知っておくべき: SQuAD の答えられない質問。

このライブラリは、次の 2 つのバージョンのそれぞれのプロセッサをホストします。

Processors

それらのプロセッサは次のとおりです。

[~data.processors.utils.SquadV1Processor]
[~data.processors.utils.SquadV2Processor]

どちらも抽象クラス [~data.processors.utils.SquadProcessor] を継承しています。

autodoc data.processors.squad.SquadProcessor - all

さらに、次のメソッドを使用して、SQuAD の例を次の形式に変換できます。モデルの入力として使用できる [~data.processors.utils.SquadFeatures]。

autodoc data.processors.squad.squad_convert_examples_to_features

これらのプロセッサと前述の方法は、データを含むファイルだけでなく、 tensorflow_datasets パッケージ。以下に例を示します。

Example usage

以下にプロセッサを使用した例と、データファイルを使用した変換方法を示します。

# Loading a V2 processor
processor = SquadV2Processor()
examples = processor.get_dev_examples(squad_v2_data_dir)

# Loading a V1 processor
processor = SquadV1Processor()
examples = processor.get_dev_examples(squad_v1_data_dir)

features = squad_convert_examples_to_features(
    examples=examples,
    tokenizer=tokenizer,
    max_seq_length=max_seq_length,
    doc_stride=args.doc_stride,
    max_query_length=max_query_length,
    is_training=not evaluate,
)

tensorflow_datasets の使用は、データファイルを使用するのと同じくらい簡単です。

# tensorflow_datasets only handle Squad V1.
tfds_examples = tfds.load("squad")
examples = SquadV1Processor().get_examples_from_dataset(tfds_examples, evaluate=evaluate)

features = squad_convert_examples_to_features(
    examples=examples,
    tokenizer=tokenizer,
    max_seq_length=max_seq_length,
    doc_stride=args.doc_stride,
    max_query_length=max_query_length,
    is_training=not evaluate,
)

これらのプロセッサを使用する別の例は、run_squad.py スクリプトに示されています。

7.6 KiB Raw Blame History Unescape Escape

Processors

Multi-modal processors

Deprecated processors

GLUE

XNLI

SQuAD

Processors

Example usage

7.6 KiB

Raw Blame History