gavin/transformers

Fork 0

Files

陈赣 06f1fd69a6

Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled

Details

Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled

Details

Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled

Details

Build documentation / build (push) Has been cancelled

Details

Build documentation / build_other_lang (push) Has been cancelled

Details

CodeQL Security Analysis / CodeQL Analysis (push) Has been cancelled

Details

New model PR merged notification / Notify new model (push) Has been cancelled

Details

PR CI / pr-ci (push) Has been cancelled

Details

Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled

Details

Secret Leaks / trufflehog (push) Has been cancelled

Details

Update Transformers metadata / build_and_package (push) Has been cancelled

Details

Slow tests on important models (on Push - A10) / Model CI (push) Has been cancelled

Details

Check Tiny Models / Check tiny models (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled

Details

Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled

Details

Nvidia CI - Flash Attn / Setup (push) Has been cancelled

Details

Nvidia CI - Flash Attn / Model CI (push) Has been cancelled

Details

Nvidia CI / Setup (push) Has been cancelled

Details

Nvidia CI / Model CI (push) Has been cancelled

Details

Nvidia CI / Torch pipeline CI (push) Has been cancelled

Details

Nvidia CI / Example CI (push) Has been cancelled

Details

Nvidia CI / Trainer/FSDP CI (push) Has been cancelled

Details

Nvidia CI / DeepSpeed CI (push) Has been cancelled

Details

Nvidia CI / Quantization CI (push) Has been cancelled

Details

Nvidia CI / Kernels CI (push) Has been cancelled

Details

Doctests / Setup (push) Has been cancelled

Details

Doctests / Call doctest jobs (push) Has been cancelled

Details

Doctests / Send results to webhook (push) Has been cancelled

Details

Extras Smoke Test / Get supported Python versions (push) Has been cancelled

Details

Extras Smoke Test / Test extras on Python ${{ matrix.python-version }} (push) Has been cancelled

Details

Extras Smoke Test / Check Slack token availability (push) Has been cancelled

Details

Extras Smoke Test / Notify failures to Slack (push) Has been cancelled

Details

Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled

Details

Stale Bot / Close Stale Issues (push) Has been cancelled

Details

first commit

2026-06-05 16:53:03 +08:00

4.1 KiB

Raw Permalink Blame History

This model was published in HF papers on 2024-06-18 and contributed to Hugging Face Transformers on 2025-04-09.

GLM-4-0414

Overview

The GLM family welcomes new members GLM-4-0414 series models.

The GLM-4-32B-0414 series models, featuring 32 billion parameters. Its performance is comparable to OpenAI's GPT series and DeepSeek's V3/R1 series. It also supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-trained on 15T of high-quality data, including substantial reasoning-type synthetic data. This lays the foundation for subsequent reinforcement learning extensions. In the post-training stage, we employed human preference alignment for dialogue scenarios. Additionally, using techniques like rejection sampling and reinforcement learning, we enhanced the model's performance in instruction following, engineering code, and function calling, thus strengthening the atomic capabilities required for agent tasks. GLM-4-32B-0414 achieves good results in engineering code, Artifact generation, function calling, search-based Q&A, and report generation. In particular, on several benchmarks, such as code generation or specific Q&A tasks, GLM-4-32B-Base-0414 achieves comparable performance with those larger models like GPT-4o and DeepSeek-V3-0324 (671B).

GLM-Z1-32B-0414 is a reasoning model with deep thinking capabilities. This was developed based on GLM-4-32B-0414 through cold start, extended reinforcement learning, and further training on tasks including mathematics, code, and logic. Compared to the base model, GLM-Z1-32B-0414 significantly improves mathematical abilities and the capability to solve complex tasks. During training, we also introduced general reinforcement learning based on pairwise ranking feedback, which enhances the model's general capabilities.

GLM-Z1-Rumination-32B-0414 is a deep reasoning model with rumination capabilities (against OpenAI's Deep Research). Unlike typical deep thinking models, the rumination model is capable of deeper and longer thinking to solve more open-ended and complex problems (e.g., writing a comparative analysis of AI development in two cities and their future development plans). Z1-Rumination is trained through scaling end-to-end reinforcement learning with responses graded by the ground truth answers or rubrics and can make use of search tools during its deep thinking process to handle complex tasks. The model shows significant improvements in research-style writing and complex tasks.

Finally, GLM-Z1-9B-0414 is a surprise. We employed all the aforementioned techniques to train a small model (9B). GLM-Z1-9B-0414 exhibits excellent capabilities in mathematical reasoning and general tasks. Its overall performance is top-ranked among all open-source models of the same size. Especially in resource-constrained scenarios, this model achieves an excellent balance between efficiency and effectiveness, providing a powerful option for users seeking lightweight deployment.

Glm4Config

autodoc Glm4Config

Glm4Model

autodoc Glm4Model - forward

Glm4ForCausalLM

autodoc Glm4ForCausalLM - forward

Glm4ForSequenceClassification

autodoc Glm4ForSequenceClassification - forward

Glm4ForTokenClassification

autodoc Glm4ForTokenClassification - forward

4.1 KiB Raw Permalink Blame History

GLM-4-0414

Overview

Glm4Config

Glm4Model

Glm4ForCausalLM

Glm4ForSequenceClassification

Glm4ForTokenClassification

4.1 KiB

Raw Permalink Blame History