# SGLang [SGLang](https://docs.sglang.ai) is a low-latency, high-throughput inference engine for large language models (LLMs). It also includes a frontend language for building agentic workflows. Set `model_impl="transformers"` to load a Transformers modeling backend. ```py import sglang as sgl llm = sgl.Engine("meta-llama/Llama-3.2-1B-Instruct", model_impl="transformers") print(llm.generate(["The capital of France is"], {"max_new_tokens": 20})[0]) ``` Pass `--model-impl transformers` to the `sglang.launch_server` command for online serving. ```bash python3 -m sglang.launch_server \ --model-path meta-llama/Llama-3.2-1B-Instruct \ --model-impl transformers \ --host 0.0.0.0 \ --port 30000 ``` ## Transformers integration Setting `model_impl="transformers"` tells SGLang to skip its native model matching and use the Transformers model directly. 1. [`PreTrainedConfig.from_pretrained`] loads the model's `config.json` from the Hub or your Hugging Face cache. 2. [`AutoModel.from_config`] resolves the model class based on the config. 3. During loading, `_attn_implementation` is set to `"sglang"`. This routes attention calls through SGLang's RadixAttention kernels. 4. SGLang's parallel linear class replaces linear layers to support tensor parallelism. 5. The [load_weights](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/transformers.py#L277) function populates the model with weights from safetensors files. The model benefits from all SGLang optimizations while using the Transformers model structure. > [!WARNING] > Compatible models require `_supports_attention_backend=True` so SGLang can control attention execution. See the [Building a compatible model backend for inference](./transformers_as_backend#model-implementation) guide for details. ## Resources - [SGLang docs](https://docs.sglang.ai/supported_models/transformers_fallback.html) has more usage examples and tips for using Transformers as a backend. - [Transformers backend integration in SGLang](https://huggingface.co/blog/transformers-backend-sglang) blog post explains what this integration enables.