*This model was contributed to Hugging Face Transformers on 2022-09-14.*
# GPT-NeoX-Japanese
GPT-NeoX-Japanese, a Japanese language model based on [GPT-NeoX](./gpt_neox).
Japanese uses three types of characters (hiragana, katakana, kanji) and has a huge vocabulary. This model uses [BPEEncoder V2](https://github.com/tanreinama/Japanese-BPEEncoder_V2), a sub-word tokenizer to handle the different characters.
The model also removes some bias parameters for better performance.
You can find all the original GPT-NeoX-Japanese checkpoints under the [ABEJA](https://huggingface.co/abeja/models?search=gpt-neo-x) organization.
> [!TIP]
> This model was contributed by [Shinya Otani](https://github.com/SO0529), [Takayoshi Makabe](https://github.com/spider-man-tm), [Anuj Arora](https://github.com/Anuj040), and [Kyo Hattori](https://github.com/go5paopao) from [ABEJA, Inc.](https://www.abejainc.com/).
>
> Click on the GPT-NeoX-Japanese models in the right sidebar for more examples of how to apply GPT-NeoX-Japanese to different language tasks.
The example below demonstrates how to generate text with [`Pipeline`] or the [`AutoModel`], and from the command line.
```python
from transformers import pipeline
pipeline = pipeline(task="text-generation",
model="abeja/gpt-neox-japanese-2.7b", device=0)
pipeline("人とAIが協調するためには、")
```
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("abeja/gpt-neox-japanese-2.7b", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("abeja/gpt-neox-japanese-2.7b")
input_ids = tokenizer("人とAIが協調するためには、", return_tensors="pt").input_ids.to(model.device)
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
The example below uses [bitsandbytes](../quantization/bitsandbytes) to only quantize the weights to 4-bits.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16"
)
model = AutoModelForCausalLM.from_pretrained(
"abeja/gpt-neox-japanese-2.7b",
quantization_config=quantization_config,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("abeja/gpt-neox-japanese-2.7b")
input_ids = tokenizer.encode("人とAIが協調するためには、", return_tensors="pt").to(model.device)
output = model.generate(input_ids)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
Use the [AttentionMaskVisualizer](https://github.com/huggingface/transformers/blob/beb9b5b02246b9b7ee81ddf938f93f44cfeaad19/src/transformers/utils/attention_visualizer.py#L139) to better understand what tokens the model can and cannot attend to.
```python
from transformers.utils.attention_visualizer import AttentionMaskVisualizer
visualizer = AttentionMaskVisualizer("abeja/gpt-neox-japanese-2.7b")
visualizer("
What is shown in this image?")
```
## Resources
Refer to the [Training a better GPT model: Learnings from PaLM](https://medium.com/ml-abeja/training-a-better-gpt-2-93b157662ae4) blog post for more details about how ABEJA trained GPT-NeoX-Japanese.
## GPTNeoXJapaneseConfig
[[autodoc]] GPTNeoXJapaneseConfig
## GPTNeoXJapaneseTokenizer
[[autodoc]] GPTNeoXJapaneseTokenizer
## GPTNeoXJapaneseModel
[[autodoc]] GPTNeoXJapaneseModel
- forward
## GPTNeoXJapaneseForCausalLM
[[autodoc]] GPTNeoXJapaneseForCausalLM
- forward