Zhipu AI Unveils Massive Open-Source LLM Beating Gemini 3 Pro
Zhipu AI launched the largest open-source large language model to date, trained exclusively on Huawei chips and outperforming Google's Gemini 3 Pro in benchmarks. Released under the MIT license, it allows unrestricted use and modification by developers worldwide. This marks a significant advancement in accessible AI technology from China.

For developers and technical buyers navigating the AI landscape, the release of Zhipu AI's GLM-5 represents a game-changer: a 744-billion-parameter open-source LLM that surpasses Google's Gemini 3 Pro on key benchmarks like SWE-bench, all while being trained exclusively on Huawei's Ascend chips. This democratizes access to frontier-level AI without reliance on restricted hardware ecosystems, enabling cost-effective scaling for agentic applications and complex coding tasks in resource-constrained environments.
What Happened
On February 11, 2026, Zhipu AI, a leading Chinese AI firm, unveiled GLM-5, touted as the largest open-source large language model to date with 744 billion parameters in a Mixture-of-Experts (MoE) architecture activating 40 billion per inference. Trained on over 30 trillion tokens using Huawei's Ascend 910B processors, it achieves state-of-the-art open-source performance in coding and agentic engineering, including a record-low hallucination rate and top scores on benchmarks like SWE-bench Verified (77.8%, edging out Gemini 3 Pro's 76.2%). Released under the permissive MIT license via Hugging Face, GLM-5 supports unrestricted commercial use, modification, and deployment, marking a bold push in accessible AI from China amid global chip tensions. The model builds on prior GLM iterations, emphasizing reliability for long-running agents and enhanced reasoning over "vibe coding."[source] [source] [source] Press coverage highlighted its competitive edge against Western models, with Zhipu's stock surging 30% post-announcement.[source] [source]
Why This Matters
Technically, GLM-5 empowers engineers to build sophisticated agentic systems—autonomous workflows handling multi-step tasks with reduced errors—without proprietary lock-in, leveraging its MoE efficiency for faster inference on diverse hardware. Developers gain a drop-in alternative for fine-tuning on domain-specific data, potentially slashing costs by sixfold compared to Claude or GPT equivalents, as noted in benchmarks. For technical buyers, it challenges NVIDIA dominance by validating Huawei chips for high-end training, opening doors for supply-chain diversification in geopolitically sensitive markets. Business-wise, this intensifies global competition, pressuring incumbents to accelerate open-source releases while enabling startups to prototype frontier AI affordably, fostering innovation in coding assistants, automated DevOps, and enterprise agents.[source]
Technical Deep-Dive
Architecture Changes and Improvements
Zhipu AI's GLM-5 represents a significant evolution in its GLM series, scaling to a 744 billion parameter Mixture-of-Experts (MoE) architecture with 40 billion active parameters during inference. This design activates only a subset of experts per token, enabling efficient scaling while maintaining high performance. Key innovations include DeepSeek Sparse Attention (DSA), which optimizes long-context handling by sparsifying attention computations, reducing deployment costs without sacrificing quality. The model supports a 128K token context window (extendable to 200K in some configurations) and incorporates advanced routing mechanisms like loss-free balance routing with sigmoid gates for expert selection.
Trained on 28.5 trillion tokens—up from 23T in GLM-4.5—the dataset emphasizes coding and reasoning data (7T specialized tokens). Architectural tweaks include deeper, narrower layers (more layers, fewer experts per MoE block), Grouped Query Attention (GQA) with partial RoPE embeddings, and 96 attention heads. These changes enhance reasoning depth and bilingual (English-Chinese) capabilities, positioning GLM-5 as an "agentic engineering" model optimized for multi-step tasks. Compared to predecessors, it introduces native "thinking mode" for complex reasoning and speculative decoding support (e.g., MTP, EAGLE) to boost inference speed by up to 2x [source](https://huggingface.co/zai-org/GLM-5).
Benchmark Performance Comparisons
GLM-5 sets new open-source benchmarks, particularly in coding and agentic tasks, often surpassing Google's Gemini 3 Pro. On SWE-bench Verified, it achieves 77.8%—edging out Gemini 3 Pro's 76.2%—demonstrating superior software engineering resolution. In BrowseComp (with context management), GLM-5 scores 75.9 versus Gemini's 59.2, highlighting better web navigation and tool integration. Tool-Decathlon yields 38.0 for GLM-5 (vs. 36.4), and Vending Bench 2 simulates e-commerce optimization at $4,432.12 profit (beating Gemini's $5,478.16 loss? Wait, lower cost is better here).
However, it trails in pure reasoning: GPQA-Diamond at 86.0 (vs. 91.9) and HLE at 30.5 (vs. 37.2), though with tools, HLE jumps to 50.4 (surpassing Gemini's 45.8). Other highs include AIME 2026: 92.7, IMOAnswerBench: 82.5, and τ²-Bench: 89.7. Developers on X praise its agentic focus, noting shifts from function-writing to autonomous project orchestration, with one calling it a "productivity ceiling lift" for long-duration tasks [source](https://huggingface.co/zai-org/GLM-5) [source](https://medium.com/@mlabonne/glm-5-chinas-first-public-ai-company-ships-a-frontier-model-a068cecb74e3) [post](https://x.com/CNBizInsider/status/2021770548708356355).
API Changes and Pricing
GLM-5 is accessible via Z.ai's API platform with OpenAI-compatible endpoints, supporting chat completions, tool-calling, and JSON mode. Key changes from GLM-4.x include enhanced auto-tool-choice and reasoning parsers (e.g., glm47 for tools, glm45 for step-by-step logic). Rate limits scale with subscriptions, up to 10K RPM for enterprise.
Pricing is aggressive: $1 per million input tokens, $3.2 per million output tokens, with cached inputs at $0.2/M (limited-time free storage). This is ~6x cheaper than Anthropic's Claude, per Zhipu claims. Batch API halves costs to ¥2.5/M tokens (~$0.35). Open-source weights (MIT license) on Hugging Face enable free local use, but API tiers start at $0.8/M via providers like Atlas Cloud [source](https://docs.z.ai/guides/overview/pricing) [source](https://www.atlascloud.ai/models/zai-org/glm-5).
Integration Considerations
For developers, GLM-5 integrates seamlessly with frameworks like LangChain and LiteLLM via ChatZhipuAI. Local inference requires high-end hardware: 8x H100/A100 GPUs with tensor parallelism (TP=8) due to size. FP8 quantization (zai-org/GLM-5-FP8) reduces memory to ~400GB.
Example vLLM deployment:
docker pull vllm/vllm-openai:nightly
vllm serve zai-org/GLM-5-FP8 \
--tensor-parallel-size 8 \
--gpu-memory-utilization 0.85 \
--tool-call-parser glm47 \
--enable-auto-tool-choice
SGLang offers similar setup with EAGLE speculative decoding for 1.5-2x throughput. Challenges include high VRAM needs and Chinese-English tokenization quirks; recommend testing bilingual prompts. Enterprise options include fine-tuning APIs and SOC2 compliance. X reactions highlight excitement for agent tools but note inference scaling hurdles for non-datacenter users [source](https://huggingface.co/zai-org/GLM-5) [post](https://x.com/RuiDiaoX/status/2021627263796797645).
Developer & Community Reactions ▼
Developer & Community Reactions
What Developers Are Saying
Developers and AI engineers are buzzing about Zhipu AI's GLM-5, praising its open-source nature and competitive edge over models like Gemini 3 Pro. Rui Diao, an ex-Google engineer, highlighted its agentic focus: "GLM-5 is built for agents. ... In the τ²-Bench interactive tool evaluation, it scored 84.7, beating Claude Sonnet 4.5. ... GLM-5 isn't trying to win at chat. It's trying to win at work." [source](https://x.com/RuiDiaoX/status/2021627263796797645). Similarly, 0xSero, an OSS AI enthusiast, shared enthusiasm for the model's capabilities: "Man, what a model. I have not seen any mode below 200B act like this. It's really doing a good job in a pretty novel environment. The ZAI team is doing a great job I'm very happy they still open source all this. It's very impressive." [source](https://x.com/0xSero/status/2013911569428586656). Ignis Rex noted the pricing disruption: "Zhipu AI's introduction of the $3/month GLM-4.7 model severely impacts OpenAI ... offering a viable, open-source alternative, which is a significant win for startups, small teams, and solo developers seeking cost control." [source](https://x.com/Ignis_Rex/status/2004389826905440325). These reactions underscore excitement for accessible, high-performance tools in coding and agent workflows.
Early Adopter Experiences
Technical users report strong real-world performance, particularly in coding and long-context tasks. QuestGlitch detailed early benchmarks: "Coding: verified SOTA for open models; approaching Claude Opus levels in real-world system engineering. Agents: built for 'agentic engineering' capable of 200–300 sequential tool calls." They also noted easy integration: "Compatible with Cursor, Claude Code, and Cline via MCP tools." [source](https://x.com/AIRevSpot/status/2021851721107185994). Koder, an AI/DevOps engineer, tested the model and affirmed: "Zhipu AI's GLM-5 just dropped: 744B params (40B active), 200K context window, MoE architecture, 77.8% on SWE-bench ... The #1 open-source model for coding and agentic tasks." [source](https://x.com/FreelanceHelper/status/2022200214271173082). gm8xx8 shared usage insights on GLM-4.6, a precursor: "stronger results on benchmarks + real-world apps (Claude Code, Cline, Roo, Kilo), incl. polished front-end generation ... 74 real-world coding evals show it leading domestic peers with ~30% lower token use." [source](https://x.com/gm8xx8/status/1972932103462400133). Adopters appreciate the efficiency and open-weight MIT license for rapid prototyping.
Concerns & Criticisms
While praise dominates, some technical users raise valid concerns about adoption barriers and long-term sustainability. MALATJI pointed to hype vs. reality: "Corporate adoption is trailing the hype. ... Legacy corporations are paralysed. They are stuck in endless committee meetings, fighting over compliance and HR policies." This suggests integration challenges for enterprises despite developer enthusiasm. [source](https://x.com/m_a_l_a_t_j_i/status/2021608405832462492). Broader community discussions, like Bindu Reddy's on open-source safety, echo worries about misuse: "bad actors will somehow modify these LLMs and create AGI! ... companies like OpenAI and Google ... have yet to create AGI from LLMs." Though not GLM-specific, it highlights scrutiny on open models' security. [source](https://x.com/bindureddy/status/1727103332794191947). Critics also note dependency on Huawei Ascend chips for training, potentially limiting global accessibility amid geopolitical tensions.
Strengths ▼
Strengths
- Superior coding performance, scoring 77.8 on SWE-bench Verified, outperforming Gemini 3 Pro (76.2) and enabling autonomous engineering tasks like long sequential tool calls. [VentureBeat](https://venturebeat.com/technology/z-ais-open-source-glm-5-achieves-record-low-hallucination-rate-and-leverages)
- Record-low hallucination rate on AA-Omniscience benchmark, enhancing reliability for technical applications in reasoning and agentic workflows. [Latent Space](https://www.latent.space/p/ainews-zai-glm-5-new-sota-open-weights)
- Open-source under MIT license with 745B MoE parameters (44B active), allowing free customization and deployment on non-NVIDIA hardware like Huawei Ascend chips, reducing vendor lock-in. [Medium](https://medium.com/@mlabonne/glm-5-chinas-first-public-ai-company-ships-a-frontier-model-a068cecb74e3)
Weaknesses & Limitations ▼
Weaknesses & Limitations
- Imposed guardrails lead to excessive lecturing and policy enforcement, making the model borderline unusable for unrestricted creative or exploratory tasks. [Reddit](https://www.reddit.com/r/LocalLLaMA/comments/1r2gddi/zai_implemented_new_guardrails_in_the_glm5)
- Current serving infrastructure is slow and unreliable with constrained capacity, causing delays in API access and higher costs for production use. [X Post](https://x.com/bridgemindai/status/2021959118953611464)
- Geopolitical restrictions as a U.S. Entity List company limit hardware access and raise data privacy concerns, potentially complicating integration for Western buyers amid export controls. [Medium](https://medium.com/@mlabonne/glm-5-chinas-first-public-ai-company-ships-a-frontier-model-a068cecb74e3)
Opportunities for Technical Buyers ▼
Opportunities for Technical Buyers
How technical teams can leverage this development:
- Fine-tune the 200k-context model for custom agentic systems in software engineering, accelerating multi-step automation without proprietary API dependencies.
- Deploy cost-effectively on domestic Chinese hardware for edge AI in manufacturing or IoT, bypassing NVIDIA shortages and lowering inference costs to $0.11/M tokens.
- Integrate vision and reasoning capabilities into dev tools like Cursor or Cline for enhanced code generation and debugging, boosting productivity in hybrid workflows.
What to Watch ▼
What to Watch
Key things to monitor as this develops, timelines, and decision points for buyers.
Monitor community adoption on Hugging Face and ModelScope for fine-tuned variants, expected within 1-2 months post-release (Feb 2026). Track independent benchmark verifications beyond Zhipu's claims, as real-world agent performance may vary; test via free API tiers now. Watch U.S.-China regulatory shifts, like Entity List expansions, which could restrict access by Q3 2026—ideal decision point for pilots is Q2 2026 to assess scalability before full commitment. Pricing stabilization and capacity expansions will signal production readiness by mid-2026.
Key Takeaways
- Zhipu AI's GLM-5 is a groundbreaking 744B-parameter Mixture-of-Experts (MoE) model with 44B active parameters, trained on 28.5 trillion tokens entirely on Huawei chips, marking a milestone in Chinese AI self-reliance.
- The open-source release outperforms Google's Gemini 3 Pro on key benchmarks like coding (SWE-bench) and agentic tasks, while approaching Anthropic's Claude Opus 4.5 in reasoning and engineering capabilities.
- GLM-5 excels in "Agentic Engineering," enabling more autonomous AI agents for complex workflows, with superior accuracy in open-source comparisons to models like Llama 3.1 and Mistral Large.
- As the largest open-source LLM to date, it democratizes access to frontier-level performance, running efficiently on consumer hardware via quantization, at a fraction of proprietary model costs.
- The launch propelled Zhipu to the world's most valuable LLM company by market cap, signaling intensified global competition and potential shifts in AI supply chains away from U.S. dominance.
Bottom Line
For technical buyers seeking cost-effective, customizable AI, act now: GLM-5 offers immediate value as a superior open-source alternative to Gemini 3 Pro, especially for coding, agents, and edge deployments. Don't wait—early adopters gain a competitive edge in building scalable applications without vendor lock-in. Ignore if your stack is deeply integrated with Western APIs. This matters most to developers in AI engineering, startups optimizing for inference costs, and enterprises in regions prioritizing data sovereignty, like Asia-Pacific markets.
Next Steps
- Download GLM-5 weights from Hugging Face (zai-org/GLM-5) and benchmark against your workflows using tools like LM-Eval.
- Integrate via Zhipu’s API platform at z.ai for quick prototyping of agentic apps, starting with free tiers.
- Join Zhipu’s Discord or WeChat community for updates, fine-tuning guides, and collaborations on GLM-5 extensions.
References (50 sources) ▼
- https://x.com/i/status/2022958914166563306
- https://x.com/i/status/2022647109141578123
- https://x.com/i/status/2022955866400739371
- https://www.youtube.com/watch?v=ErIeuwqmWtQ
- https://x.com/i/status/2022748657796206937
- https://x.com/i/status/2022884043604529234
- https://x.com/i/status/2022959458884681897
- https://x.com/i/status/2022912615953940906
- https://jangwook.net/en/blog/en/ai-model-rush-february-2026
- https://x.com/i/status/2022959075026509857
- https://x.com/i/status/2022906619265454429
- https://www.reddit.com/r/ChatGPT/comments/1r58cmp/just_published_brain_pulse_ai_weekly_newsletter
- https://x.com/i/status/2022712072140644808
- https://x.com/i/status/2022956694247018583
- https://llm-stats.com/llm-updates
- https://x.com/i/status/2022959509140889750
- https://x.com/i/status/2022921403981615499
- https://x.com/i/status/2022580740006027508
- https://x.com/i/status/2022826878411620840
- https://x.com/i/status/2022958309305942276
- https://x.com/i/status/2022957254719516729
- https://x.com/i/status/2022931585415417970
- https://techcrunch.com/2025/11/13/googles-sima-2-agent-uses-gemini-to-reason-and-act-in-virtual-worl
- https://news.samsung.com/us/samsung-galaxy-unpacked-february-2026-next-ai-phone-makes-life-easier
- https://x.com/i/status/2022952176440483951
- https://www.reddit.com/r/LocalLLaMA/comments/1r14bqk/i_benchmarked_the_newest_40_ai_models_feb_2026
- https://www.theverge.com/news/682769/apple-wwdc-2025-biggest-announcements-ios-26
- https://x.com/i/status/2022921377813319852
- https://x.com/i/status/2022934474023964987
- https://x.com/i/status/2022932952645083421
- https://x.com/i/status/2022921118915481714
- https://x.com/i/status/2022830340553752590
- https://x.com/i/status/2022908134721986861
- https://x.com/i/status/2022957667925610832
- https://x.com/i/status/2022959137223749953
- https://x.com/i/status/2022601990384320584
- https://www.crescendo.ai/news/latest-ai-news-and-updates
- https://x.com/i/status/2022959046618452096
- https://x.com/i/status/2022938511473610877
- https://x.com/i/status/2008642654444429470
- https://x.com/i/status/2022794836600934585
- https://x.com/i/status/2022845941280055708
- https://whatever.scalzi.com/2026/02/14/10-thoughts-on-ai-february-2026-edition
- https://x.com/i/status/2022959300012904912
- https://www.theverge.com/2024/8/15/24220378/openai-advanced-voice-mode-uncanny-valley
- https://x.com/i/status/2022959270749233431
- https://www.youtube.com/watch?v=GsHA0eAcmaI
- https://x.com/i/status/2022956130406650071
- https://x.com/i/status/2022877352251056589
- https://x.com/i/status/2022959076620042297