*This model was contributed to Hugging Face Transformers on 2026-04-22.* # Hy3-preview ## Overview Hy3-preview is a large-scale Mixture-of-Experts (MoE) language model developed by the Tencent HunYuan team. It features a dense-MoE hybrid architecture with 192 routed experts and 1 always-active shared expert per MoE layer, achieving strong performance with efficient inference via sparse expert activation. Key architectural features: - **Dense-MoE hybrid**: The first layer uses a dense FFN; all subsequent layers use MoE with top-k routing (default k=8). - **Shared experts**: Each MoE layer includes 1 shared expert that processes all tokens alongside the routed experts. - **Sigmoid routing with expert-bias correction**: Tokens are routed via sigmoid scoring (not softmax) with a learned per-expert bias for load balancing. - **QK-Norm**: Per-head RMSNorm applied to query and key projections before attention for improved training stability. ## Usage tips - Load with `AutoModelForCausalLM`. The model requires multiple GPUs due to its size. - Set `output_router_logits=True` in the config or forward call to collect per-layer MoE router logits. Note that this model does not compute an auxiliary load-balancing loss; `aux_loss` is always `None`. - The model supports `gradient_checkpointing` to reduce memory during fine-tuning. ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "tencent/Hy3-preview" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", ) inputs = tokenizer("The future of artificial intelligence is", return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=64) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## HYV3Config [[autodoc]] HYV3Config ## HYV3Model [[autodoc]] HYV3Model - forward ## HYV3ForCausalLM [[autodoc]] HYV3ForCausalLM - forward