*This model was contributed to Hugging Face Transformers on 2025-07-21.*

# Ernie 4.5 Moe ## Overview The Ernie 4.5 Moe model was released in the [Ernie 4.5 Model Family](https://ernie.baidu.com/blog/posts/ernie4.5/) release by baidu. This family of models contains multiple different architectures and model sizes. This model in specific targets the base text model with mixture of experts (moe) - one with 21B total, 3B active parameters and another one with 300B total, 47B active parameters. It uses the standard [Llama](./llama) at its core combined with a specialized MoE based on [Mixtral](./mixtral) with additional shared experts. Other models from the family can be found at [Ernie 4.5](./ernie4_5) and [Ernie 4.5 VL MoE](./ernie4_5_vl_moe.md).

## Usage Tips ### Generate text ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "baidu/ERNIE-4.5-21B-A3B-PT" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", ) # prepare the model input inputs = tokenizer("Hey, are you conscious? Can you talk to me?", return_tensors="pt").to(model.device) prompt = "Hey, are you conscious? Can you talk to me?" messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], add_special_tokens=False, return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=32, ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # decode the generated ids generate_text = tokenizer.decode(output_ids, skip_special_tokens=True) ``` ### Distributed Generation with Tensor Parallelism ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "baidu/ERNIE-4.5-21B-A3B-PT" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", tp_plan="auto", ) # prepare the model input inputs = tokenizer("Hey, are you conscious? Can you talk to me?", return_tensors="pt").to(model.device) prompt = "Hey, are you conscious? Can you talk to me?" messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], add_special_tokens=False, return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=32, ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # decode the generated ids generate_text = tokenizer.decode(output_ids, skip_special_tokens=True) ``` ### Quantization with Bitsandbytes ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig model_name = "baidu/ERNIE-4.5-21B-A3B-PT" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True), ) # prepare the model input inputs = tokenizer("Hey, are you conscious? Can you talk to me?", return_tensors="pt").to(model.device) prompt = "Hey, are you conscious? Can you talk to me?" messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], add_special_tokens=False, return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=32, ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # decode the generated ids generate_text = tokenizer.decode(output_ids, skip_special_tokens=True) ``` This model was contributed by [Anton Vlasjuk](https://huggingface.co/AntonV). The original code can be found [here](https://github.com/PaddlePaddle/ERNIE). ## Ernie4_5_MoeConfig [[autodoc]] Ernie4_5_MoeConfig ## Ernie4_5_MoeModel [[autodoc]] Ernie4_5_MoeModel - forward ## Ernie4_5_MoeForCausalLM [[autodoc]] Ernie4_5_MoeForCausalLM - forward - generate