first commit
Some checks failed
Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled
Build documentation / build (push) Has been cancelled
Build documentation / build_other_lang (push) Has been cancelled
CodeQL Security Analysis / CodeQL Analysis (push) Has been cancelled
New model PR merged notification / Notify new model (push) Has been cancelled
PR CI / pr-ci (push) Has been cancelled
Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Update Transformers metadata / build_and_package (push) Has been cancelled
Slow tests on important models (on Push - A10) / Model CI (push) Has been cancelled
Check Tiny Models / Check tiny models (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI - Flash Attn / Setup (push) Has been cancelled
Nvidia CI - Flash Attn / Model CI (push) Has been cancelled
Nvidia CI / Setup (push) Has been cancelled
Nvidia CI / Model CI (push) Has been cancelled
Nvidia CI / Torch pipeline CI (push) Has been cancelled
Nvidia CI / Example CI (push) Has been cancelled
Nvidia CI / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI / DeepSpeed CI (push) Has been cancelled
Nvidia CI / Quantization CI (push) Has been cancelled
Nvidia CI / Kernels CI (push) Has been cancelled
Doctests / Setup (push) Has been cancelled
Doctests / Call doctest jobs (push) Has been cancelled
Doctests / Send results to webhook (push) Has been cancelled
Extras Smoke Test / Get supported Python versions (push) Has been cancelled
Extras Smoke Test / Test extras on Python ${{ matrix.python-version }} (push) Has been cancelled
Extras Smoke Test / Check Slack token availability (push) Has been cancelled
Extras Smoke Test / Notify failures to Slack (push) Has been cancelled
Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled
Stale Bot / Close Stale Issues (push) Has been cancelled
Some checks failed
Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled
Build documentation / build (push) Has been cancelled
Build documentation / build_other_lang (push) Has been cancelled
CodeQL Security Analysis / CodeQL Analysis (push) Has been cancelled
New model PR merged notification / Notify new model (push) Has been cancelled
PR CI / pr-ci (push) Has been cancelled
Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Update Transformers metadata / build_and_package (push) Has been cancelled
Slow tests on important models (on Push - A10) / Model CI (push) Has been cancelled
Check Tiny Models / Check tiny models (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI - Flash Attn / Setup (push) Has been cancelled
Nvidia CI - Flash Attn / Model CI (push) Has been cancelled
Nvidia CI / Setup (push) Has been cancelled
Nvidia CI / Model CI (push) Has been cancelled
Nvidia CI / Torch pipeline CI (push) Has been cancelled
Nvidia CI / Example CI (push) Has been cancelled
Nvidia CI / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI / DeepSpeed CI (push) Has been cancelled
Nvidia CI / Quantization CI (push) Has been cancelled
Nvidia CI / Kernels CI (push) Has been cancelled
Doctests / Setup (push) Has been cancelled
Doctests / Call doctest jobs (push) Has been cancelled
Doctests / Send results to webhook (push) Has been cancelled
Extras Smoke Test / Get supported Python versions (push) Has been cancelled
Extras Smoke Test / Test extras on Python ${{ matrix.python-version }} (push) Has been cancelled
Extras Smoke Test / Check Slack token availability (push) Has been cancelled
Extras Smoke Test / Notify failures to Slack (push) Has been cancelled
Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled
Stale Bot / Close Stale Issues (push) Has been cancelled
This commit is contained in:
284
docs/source/ar/tasks/multiple_choice.md
Normal file
284
docs/source/ar/tasks/multiple_choice.md
Normal file
@@ -0,0 +1,284 @@
|
||||
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# الاختيار من متعدد (Multiple choice)
|
||||
|
||||
[[open-in-colab]]
|
||||
|
||||
مهمة الاختيار من متعدد مشابهة لمهمة الإجابة على الأسئلة، ولكن مع توفير عدة إجابات محتملة مع سياق، ويُدرّب النموذج على تحديد الإجابة الصحيحة.
|
||||
|
||||
سيوضح لك هذا الدليل كيفية:
|
||||
|
||||
1. ضبط نموذج [BERT](https://huggingface.co/google-bert/bert-base-uncased) باستخدام الإعداد `regular` لمجموعة بيانات [SWAG](https://huggingface.co/datasets/swag) لاختيار الإجابة الأفضل من بين الخيارات المتعددة المتاحة مع السياق.
|
||||
2. استخدام النموذج المضبوط للاستدلال.
|
||||
|
||||
قبل البدء، تأكد من تثبيت جميع المكتبات الضرورية:
|
||||
|
||||
```bash
|
||||
pip install transformers datasets evaluate
|
||||
```
|
||||
|
||||
نشجعك على تسجيل الدخول إلى حساب Hugging Face الخاص بك حتى تتمكن من تحميل نموذجك ومشاركته مع المجتمع. عند المطالبة، أدخل الرمز المميز الخاص بك لتسجيل الدخول:
|
||||
|
||||
```py
|
||||
>>> from huggingface_hub import notebook_login
|
||||
|
||||
>>> notebook_login()
|
||||
```
|
||||
|
||||
## تحميل مجموعة بيانات SWAG
|
||||
|
||||
ابدأ بتحميل تهيئة `regular` لمجموعة بيانات SWAG من مكتبة 🤗 Datasets:
|
||||
|
||||
```py
|
||||
>>> from datasets import load_dataset
|
||||
|
||||
>>> swag = load_dataset("swag", "regular")
|
||||
```
|
||||
|
||||
ثم ألق نظرة على مثال:
|
||||
|
||||
```py
|
||||
>>> swag["train"][0]
|
||||
{'ending0': 'passes by walking down the street playing their instruments.',
|
||||
'ending1': 'has heard approaching them.',
|
||||
'ending2': "arrives and they're outside dancing and asleep.",
|
||||
'ending3': 'turns the lead singer watches the performance.',
|
||||
'fold-ind': '3416',
|
||||
'gold-source': 'gold',
|
||||
'label': 0,
|
||||
'sent1': 'Members of the procession walk down the street holding small horn brass instruments.',
|
||||
'sent2': 'A drum line',
|
||||
'startphrase': 'Members of the procession walk down the street holding small horn brass instruments. A drum line',
|
||||
'video-id': 'anetv_jkn6uvmqwh4'}
|
||||
```
|
||||
|
||||
على الرغم من أن الحقول تبدو كثيرة، إلا أنها في الواقع بسيطة جداً:
|
||||
|
||||
- `sent1` و `sent2`: يعرض هذان الحقلان بداية الجملة، وبدمجهما معًا، نحصل على حقل `startphrase`.
|
||||
- `ending`: يقترح نهاية محتملة للجملة، واحدة منها فقط هي الصحيحة.
|
||||
- `label`: يحدد نهاية الجملة الصحيحة.
|
||||
|
||||
## المعالجة المسبقة (Preprocess)
|
||||
|
||||
الخطوة التالية هي استدعاء مُجزئ BERT لمعالجة بدايات الجمل والنهايات الأربع المحتملة:
|
||||
|
||||
```py
|
||||
>>> from transformers import AutoTokenizer
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
|
||||
```
|
||||
|
||||
تحتاج دالة المعالجة المسبقة التي تريد إنشاءها إلى:
|
||||
|
||||
1. إنشاء أربع نسخ من حقل `sent1` ودمج كل منها مع `sent2` لإعادة إنشاء كيفية بدء الجملة.
|
||||
2. دمج `sent2` مع كل من نهايات الجمل الأربع المحتملة.
|
||||
3. تتجميع هاتين القائمتين لتتمكن من تجزئتهما، ثم إعادة ترتيبها بعد ذلك بحيث يكون لكل مثال حقول `input_ids` و `attention_mask` و `labels` مقابلة.
|
||||
|
||||
|
||||
```py
|
||||
>>> ending_names = ["ending0", "ending1", "ending2", "ending3"]
|
||||
|
||||
>>> def preprocess_function(examples):
|
||||
... first_sentences = [[context] * 4 for context in examples["sent1"]]
|
||||
... question_headers = examples["sent2"]
|
||||
... second_sentences = [
|
||||
... [f"{header} {examples[end][i]}" for end in ending_names] for i, header in enumerate(question_headers)
|
||||
... ]
|
||||
|
||||
... first_sentences = sum(first_sentences, [])
|
||||
... second_sentences = sum(second_sentences, [])
|
||||
|
||||
... tokenized_examples = tokenizer(first_sentences, second_sentences, truncation=True)
|
||||
... return {k: [v[i : i + 4] for i in range(0, len(v), 4)] for k, v in tokenized_examples.items()}
|
||||
```
|
||||
|
||||
لتطبيق دالة المعالجة المسبقة على مجموعة البيانات بأكملها، استخدم طريقة [`~datasets.Dataset.map`] الخاصة بـ 🤗 Datasets. يمكنك تسريع دالة `map` عن طريق تعيين `batched=True` لمعالجة عناصر متعددة من مجموعة البيانات في وقت واحد:
|
||||
|
||||
```py
|
||||
tokenized_swag = swag.map(preprocess_function, batched=True)
|
||||
```
|
||||
|
||||
لا يحتوي 🤗 Transformers على مجمع بيانات للاختيار من متعدد، لذلك ستحتاج إلى تكييف [`DataCollatorWithPadding`] لإنشاء دفعة من الأمثلة. من الأكفأ إضافة حشو (padding) ديناميكي للجمل إلى أطول طول في دفعة أثناء التجميع، بدلاً من حشو مجموعة البيانات بأكملها إلى الحد الأقصى للطول.
|
||||
|
||||
يقوم `DataCollatorForMultipleChoice` بتجميع جميع مدخلات النموذج، ويطبق الحشو، ثم يعيد تجميع النتائج في شكلها الأصلي:
|
||||
|
||||
|
||||
```py
|
||||
>>> from dataclasses import dataclass
|
||||
>>> from transformers.tokenization_utils_base import PreTrainedTokenizerBase, PaddingStrategy
|
||||
>>> from typing import Optional, Union
|
||||
>>> import torch
|
||||
|
||||
>>> @dataclass
|
||||
... class DataCollatorForMultipleChoice:
|
||||
... """
|
||||
... Data collator that will dynamically pad the inputs for multiple choice received.
|
||||
... """
|
||||
|
||||
... tokenizer: PreTrainedTokenizerBase
|
||||
... padding: Union[bool, str, PaddingStrategy] = True
|
||||
... max_length: Optional[int] = None
|
||||
... pad_to_multiple_of: Optional[int] = None
|
||||
|
||||
... def __call__(self, features):
|
||||
... label_name = "label" if "label" in features[0].keys() else "labels"
|
||||
... labels = [feature.pop(label_name) for feature in features]
|
||||
... batch_size = len(features)
|
||||
... num_choices = len(features[0]["input_ids"])
|
||||
... flattened_features = [
|
||||
... [{k: v[i] for k, v in feature.items()} for i in range(num_choices)] for feature in features
|
||||
... ]
|
||||
... flattened_features = sum(flattened_features, [])
|
||||
|
||||
... batch = self.tokenizer.pad(
|
||||
... flattened_features,
|
||||
... padding=self.padding,
|
||||
... max_length=self.max_length,
|
||||
... pad_to_multiple_of=self.pad_to_multiple_of,
|
||||
... return_tensors="pt",
|
||||
... )
|
||||
|
||||
... batch = {k: v.view(batch_size, num_choices, -1) for k, v in batch.items()}
|
||||
... batch["labels"] = torch.tensor(labels, dtype=torch.int64)
|
||||
... return batch
|
||||
```
|
||||
|
||||
## التقييم (Evaluate)
|
||||
|
||||
يُفضل غالبًا تضمين مقياس أثناء التدريب لتقييم أداء نموذجك. يمكنك تحميل طريقة تقييم بسرعة باستخدام مكتبة 🤗 [Evaluate](https://huggingface.co/docs/evaluate/index). لهذه المهمة، قم بتحميل مقياس [الدقة](https://huggingface.co/spaces/evaluate-metric/accuracy) (انظر إلى [الجولة السريعة](https://huggingface.co/docs/evaluate/a_quick_tour) لـ 🤗 Evaluate لمعرفة المزيد حول كيفية تحميل المقياس وحسابه):
|
||||
|
||||
```py
|
||||
>>> import evaluate
|
||||
|
||||
>>> accuracy = evaluate.load("accuracy")
|
||||
```
|
||||
|
||||
ثم أنشئ دالة لتمرير التنبؤات والتسميات إلى [`~evaluate.EvaluationModule.compute`] لحساب الدقة:
|
||||
|
||||
```py
|
||||
>>> import numpy as np
|
||||
|
||||
>>> def compute_metrics(eval_pred):
|
||||
... predictions, labels = eval_pred
|
||||
... predictions = np.argmax(predictions, axis=1)
|
||||
... return accuracy.compute(predictions=predictions, references=labels)
|
||||
```
|
||||
|
||||
دالتك `compute_metrics` جاهزة الآن، وستعود إليها عند إعداد تدريبك.
|
||||
|
||||
## التدريب (Train)
|
||||
|
||||
|
||||
<Tip>
|
||||
|
||||
إذا لم تكن معتادًا على ضبط نموذج باستخدام [`Trainer`], فراجع الدرس الأساسي [هنا](../training#train-with-pytorch-trainer)!
|
||||
|
||||
</Tip>
|
||||
|
||||
أنت جاهز لبدء تدريب نموذجك الآن! قم بتحميل BERT باستخدام [`AutoModelForMultipleChoice`]:
|
||||
|
||||
```py
|
||||
>>> from transformers import AutoModelForMultipleChoice, TrainingArguments, Trainer
|
||||
|
||||
>>> model = AutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-uncased")
|
||||
```
|
||||
|
||||
في هذه المرحلة، تبقى ثلاث خطوات فقط:
|
||||
|
||||
1. حدد معلمات التدريب الخاصة بك في [`TrainingArguments`]. المعلمة الوحيدة المطلوبة هي `output_dir` التي تحدد مكان حفظ نموذجك. ستدفع هذا النموذج إلى Hub عن طريق تعيين `push_to_hub=True` (يجب عليك تسجيل الدخول إلى Hugging Face لتحميل نموذجك). في نهاية كل حقبة، سيقوم [`Trainer`] بتقييم الدقة وحفظ نقطة فحص التدريب.
|
||||
2. مرر معلمات التدريب إلى [`Trainer`] جنبًا إلى جنب مع النموذج ومُجمِّع البيانات والمعالج ودالة تجميع البيانات ودالة `compute_metrics`.
|
||||
3. استدعي [`~Trainer.train`] لضبط نموذجك.
|
||||
|
||||
```py
|
||||
>>> training_args = TrainingArguments(
|
||||
... output_dir="my_awesome_swag_model",
|
||||
... eval_strategy="epoch",
|
||||
... save_strategy="epoch",
|
||||
... load_best_model_at_end=True,
|
||||
... learning_rate=5e-5,
|
||||
... per_device_train_batch_size=16,
|
||||
... per_device_eval_batch_size=16,
|
||||
... num_train_epochs=3,
|
||||
... weight_decay=0.01,
|
||||
... push_to_hub=True,
|
||||
... )
|
||||
|
||||
>>> trainer = Trainer(
|
||||
... model=model,
|
||||
... args=training_args,
|
||||
... train_dataset=tokenized_swag["train"],
|
||||
... eval_dataset=tokenized_swag["validation"],
|
||||
... processing_class=tokenizer,
|
||||
... data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer),
|
||||
... compute_metrics=compute_metrics,
|
||||
... )
|
||||
|
||||
>>> trainer.train()
|
||||
```
|
||||
|
||||
بمجرد اكتمال التدريب، شارك نموذجك مع Hub باستخدام طريقة [`~transformers.Trainer.push_to_hub`] حتى يتمكن الجميع من استخدام نموذجك:
|
||||
|
||||
```py
|
||||
>>> trainer.push_to_hub()
|
||||
```
|
||||
|
||||
<Tip>
|
||||
|
||||
للحصول على مثال أكثر تعمقًا حول كيفية ضبط نموذج للاختيار من متعدد، ألق نظرة على [دفتر ملاحظات PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb)
|
||||
أو [دفتر ملاحظات TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice-tf.ipynb) المقابل.
|
||||
|
||||
</Tip>
|
||||
|
||||
## الاستدلال (Inference)
|
||||
|
||||
رائع، الآن بعد أن قمت بضبط نموذج، يمكنك استخدامه للاستدلال!
|
||||
|
||||
قم بإنشاء نص واقتراح إجابتين محتملتين:
|
||||
|
||||
```py
|
||||
>>> prompt = "France has a bread law, Le Décret Pain, with strict rules on what is allowed in a traditional baguette."
|
||||
>>> candidate1 = "The law does not apply to croissants and brioche."
|
||||
>>> candidate2 = "The law applies to baguettes."
|
||||
```
|
||||
|
||||
قم بتحليل كل مطالبة وزوج إجابة مرشح وأعد تنسورات PyTorch. يجب عليك أيضًا إنشاء بعض `العلامات`:
|
||||
|
||||
```py
|
||||
>>> from transformers import AutoTokenizer
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("username/my_awesome_swag_model")
|
||||
>>> inputs = tokenizer([[prompt, candidate1], [prompt, candidate2]], return_tensors="pt", padding=True)
|
||||
>>> labels = torch.tensor(0).unsqueeze(0)
|
||||
```
|
||||
|
||||
مرر مدخلاتك والعلامات إلى النموذج وأرجع`logits`:
|
||||
|
||||
```py
|
||||
>>> from transformers import AutoModelForMultipleChoice
|
||||
|
||||
>>> model = AutoModelForMultipleChoice.from_pretrained("username/my_awesome_swag_model")
|
||||
>>> outputs = model(**{k: v.unsqueeze(0) for k, v in inputs.items()}, labels=labels)
|
||||
>>> logits = outputs.logits
|
||||
```
|
||||
|
||||
استخرج الفئة ذات الاحتمالية الأكبر:
|
||||
|
||||
```py
|
||||
>>> predicted_class = logits.argmax().item()
|
||||
>>> predicted_class
|
||||
0
|
||||
```
|
||||
Reference in New Issue
Block a user