*This model was contributed to Hugging Face Transformers on 2026-05-28.*

# Mellum Mellum is a code-focused Mixture-of-Experts language model developed by [JetBrains](https://www.jetbrains.com/). It is derived from the Qwen3-MoE architecture with per-layer-type RoPE and interleaved sliding window attention. The model has 12B total parameters with 2.5B active parameters per token, using 64 routed experts with 8 activated per token across 28 layers. The example below demonstrates how to generate text with [`Pipeline`] or the [`AutoModelForCausalLM`] class. ```python from transformers import pipeline pipe = pipeline( task="text-generation", model="JetBrains/Mellum2-12B-A2.5B-Base", ) pipe("def fibonacci(n):") ``` ```python from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("JetBrains/Mellum2-12B-A2.5B-Base") model = AutoModelForCausalLM.from_pretrained( "JetBrains/Mellum2-12B-A2.5B-Base", device_map="auto", ) input_ids = tokenizer("def fibonacci(n):", return_tensors="pt").to(model.device) output = model.generate(**input_ids, max_new_tokens=50) print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` ## MellumConfig [[autodoc]] MellumConfig ## MellumModel [[autodoc]] MellumModel - forward ## MellumForCausalLM [[autodoc]] MellumForCausalLM - forward