gavin/transformers

Fork 0

Files

陈赣 06f1fd69a6

Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled

Details

Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled

Details

Build documentation / build (push) Has been cancelled

Details

Build documentation / build_other_lang (push) Has been cancelled

Details

CodeQL Security Analysis / CodeQL Analysis (push) Has been cancelled

Details

New model PR merged notification / Notify new model (push) Has been cancelled

Details

PR CI / pr-ci (push) Has been cancelled

Details

Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled

Details

Secret Leaks / trufflehog (push) Has been cancelled

Details

Update Transformers metadata / build_and_package (push) Has been cancelled

Details

Slow tests on important models (on Push - A10) / Model CI (push) Has been cancelled

Details

Check Tiny Models / Check tiny models (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled

Details

Nvidia CI - Flash Attn / Setup (push) Has been cancelled

Details

Nvidia CI - Flash Attn / Model CI (push) Has been cancelled

Details

Nvidia CI / Setup (push) Has been cancelled

Details

Nvidia CI / Model CI (push) Has been cancelled

Details

Nvidia CI / Torch pipeline CI (push) Has been cancelled

Details

Nvidia CI / Example CI (push) Has been cancelled

Details

Nvidia CI / Trainer/FSDP CI (push) Has been cancelled

Details

Nvidia CI / DeepSpeed CI (push) Has been cancelled

Details

Nvidia CI / Quantization CI (push) Has been cancelled

Details

Nvidia CI / Kernels CI (push) Has been cancelled

Details

Doctests / Setup (push) Has been cancelled

Details

Doctests / Call doctest jobs (push) Has been cancelled

Details

Doctests / Send results to webhook (push) Has been cancelled

Details

Extras Smoke Test / Get supported Python versions (push) Has been cancelled

Details

Extras Smoke Test / Test extras on Python ${{ matrix.python-version }} (push) Has been cancelled

Details

Extras Smoke Test / Check Slack token availability (push) Has been cancelled

Details

Extras Smoke Test / Notify failures to Slack (push) Has been cancelled

Details

Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled

Details

Stale Bot / Close Stale Issues (push) Has been cancelled

Details

first commit

2026-06-05 16:53:03 +08:00

4.7 KiB

Raw Blame History

This model was contributed to Hugging Face Transformers on 2024-02-14.

StableLM

Overview

StableLM 3B 4E1T (blog post) was proposed in StableLM 3B 4E1T: Technical Report by Stability AI and is the first model in a series of multi-epoch pre-trained language models.

Model Details

StableLM 3B 4E1T is a decoder-only base language model pre-trained on 1 trillion tokens of diverse English and code datasets for four epochs. The model architecture is transformer-based with partial Rotary Position Embeddings, SwiGLU activation, LayerNorm, etc.

We also provide StableLM Zephyr 3B, an instruction fine-tuned version of the model that can be used for chat-based applications.

Usage Tips

The architecture is similar to LLaMA but with RoPE applied to 25% of head embedding dimensions, LayerNorm instead of RMSNorm, and optional QKV bias terms.
StableLM 3B 4E1T-based models uses the same tokenizer as [GPTNeoXTokenizerFast].

StableLM 3B 4E1T and StableLM Zephyr 3B can be found on the Huggingface Hub

The following code snippet demonstrates how to use StableLM 3B 4E1T for inference:

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed


set_seed(0)

tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-3b-4e1t")
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-3b-4e1t", device_map="auto")

model_inputs = tokenizer("The weather is always wonderful in", return_tensors="pt").to(model.device)

generated_ids = model.generate(**model_inputs, max_length=32, do_sample=True)
responses = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
responses
['The weather is always wonderful in Costa Rica, which makes it a prime destination for retirees. That’s where the Pensionado program comes in, offering']

Combining StableLM and Flash Attention 2

First, make sure to install the latest version of Flash Attention v2.

pip install -U flash-attn --no-build-isolation

Also make sure that your hardware is compatible with Flash-Attention 2. Read more about it in the official documentation of the flash-attn repository. Note: you must load your model in half-precision (e.g. torch.bfloat16).

Now, to run the model with Flash Attention 2, refer to the snippet below:

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed


set_seed(0)

tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-3b-4e1t")
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-3b-4e1t", attn_implementation="flash_attention_2", device_map="auto")  # doctest: +SKIP

model_inputs = tokenizer("The weather is always wonderful in", return_tensors="pt").to(model.device)

generated_ids = model.generate(**model_inputs, max_length=32, do_sample=True)  # doctest: +SKIP
responses = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)  # doctest: +SKIP
responses  # doctest: +SKIP
['The weather is always wonderful in Costa Rica, which makes it a prime destination for retirees. That’s where the Pensionado program comes in, offering']

StableLmConfig

autodoc StableLmConfig

StableLmModel

autodoc StableLmModel - forward

StableLmForCausalLM

autodoc StableLmForCausalLM - forward

StableLmForSequenceClassification

autodoc StableLmForSequenceClassification - forward

StableLmForTokenClassification

autodoc StableLmForTokenClassification - forward

4.7 KiB Raw Blame History Unescape Escape

StableLM

Overview

Model Details

Usage Tips

Combining StableLM and Flash Attention 2

StableLmConfig

StableLmModel

StableLmForCausalLM

StableLmForSequenceClassification

StableLmForTokenClassification

4.7 KiB

Raw Blame History