Domestic GPU leader Moore Threads Technology Co.,Ltd. (688795.SH) is rapidly expanding its ecosystem. On December 20, the company held its inaugural "MUSA Developer Conference" (MDC 2025) in Beijing, where founder, chairman, and CEO Zhang Jianzhong unveiled its five-year R&D achievement—the next-gen full-feature GPU architecture "Huagang."
During his keynote, Zhang emphasized "full functionality" as Moore Threads' technological foundation. He described the innovation as an evolution in computing power, enabling GPUs to process most data units and formats. The "Huagang" architecture adopts a new instruction set, boosting computing density by 50% and energy efficiency by 10x compared to its predecessor, with mass production slated for next year. Notably, it supports FP4 to FP64 precision and integrates a first-gen AI-generated rendering (AGR) architecture and second-gen ray-tracing hardware acceleration.
Based on this architecture, Moore Threads announced two core chips: "Huashan," targeting AI training/inference and hyper-intelligent fusion, and "Lushan," optimized for high-performance graphics rendering. "Huashan" features an asynchronous programming model with efficient thread synchronization and warp specialization, along with full-precision MMA and MTFP8/6/4 mixed low-precision computing. Meanwhile, "Lushan" improves task allocation, delivering 64x AI performance and 16x geometry processing gains while fully supporting DirectX 12 Ultimate.
To bridge hardware with large-scale applications, Moore Threads launched its "Kuae" 10,000-GPU AI cluster, achieving 10Exa-Flops FP performance. In benchmarks, the cluster hit 60% MFU for Dense models and 40% for MOE models, with over 90% effective training time. For inference—a key market focus—the company showcased its collaboration with SiliconFlow, achieving 4,000 tokens/s Prefill and 1,000 tokens/s Decode on the DeepSeek R1 671B model with its MTT S5000 GPU, marking a breakthrough in system-level optimization for large-parameter models.
On the software front, Moore Threads upgraded its MUSA architecture to version 5.0, enhancing unified performance. The muDNN library now exceeds 98% GEMM/FlashAttention efficiency, with 97% communication efficiency. The company also outlined open-source plans for core components like acceleration libraries and system frameworks, alongside introducing intermediate language MTX and programming language muLang to simplify developer adoption.
Surprising the market, Moore Threads entered the personal AI hardware space with its MTT AIBOOK laptop (¥9,999 for 32GB+1TB), launching January 10, 2026. Powered by its "Yangtze" SoC (50 TOPS AI performance), the device includes an AI agent and 2D avatar "Xiaomai," supporting instant digital human generation and preloading the Qwen3-8B model. By supporting Windows, Linux, Android, and domestic OSes, Moore Threads aims to extend its MUSA ecosystem from cloud to desktop.
Chinese Academy of Engineering scholar Zheng Weimin stressed that "sovereign AI" hinges on independent computing power, algorithms, and ecosystems. While building 10,000-GPU domestic systems is challenging, he emphasized the need for developer-friendly environments to sustain growth.
In markets, Moore Threads' shares closed at ¥664.10 on December 19, down 5.9% and 29.4% from its December 11 peak. Still, the stock remains 481% above its IPO price, with a market cap of ¥312.15 billion.
As global computing shifts from raw scale to inference efficiency and ecosystem maturity, Moore Threads' "Huagang" architecture and full-stack "chip-edge-device-cloud" strategy signal its transition from hardware vendor to platform-level infrastructure provider. Benchmark results in cluster efficiency and DeepSeek inference bolster its position in the capital-intensive computing race.
Comments