![]() |
Zili Wang
Shang Hai, China
|
Recent Projects
Step3: Cost-Effective Multimodal Intelligence
Step3 is our cutting-edge multimodal reasoning model—built on a massive Mixture-of-Experts architecture with 321 billion total parameters and 38 billion active. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning, mathematics, and code. Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD), Step3 maintains exceptional efficiency across both flagship and low-end accelerators.
Links:
GitHub
|
Project Page
|
Paper
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
OpenCoder is an open and reproducible code LLM family which includes 1.5B and 8B base and chat models, supporting both English and Chinese languages. Starting from scratch, OpenCoder is trained on 2.5 trillion tokens composed of 90% raw code and 10% code-related web data, reaching the performance of top-tier code LLMs. We provide not only model weights and inference code, but also the reproducible training data, the complete data processing pipeline, rigorous experimental ablation results, and detailed training protocols.
Links:
GitHub
|
HuggingFace
|
Project Page
|
Paper
INF-34B: INF's Open-Source Large Language Models
INF-34B has 34 billion parameters with a context window length of 32K, and is trained on about 3.5T well-processed tokens from English and Chinese bilingual corpus. Compared with open source models of comparable size, INF-34B not only provides competitive performance in the OpenCompass evaluation, but also has impressive potential in both finance and healthcare domains. Besides, the quantized INF-34B runs on graphics cards of 24GB VRAM with negligible accuracy loss, which facilitates commercial applications, especially low-resource scenarios.
Links:
GitHub
|
HuggingFace
|
Tech Report
Selected Publications
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu ICLR 2024 • arXiv • Github |
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models (Project Leader, Corresponding Author)
Zili Wang (project leader, corresponding author), et al. arXiv 2024 • arXiv • Github |
HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts
Hao Zhao, Zihan Qiu, Huijia Wu, Zili Wang, Zhaofeng He, Jie Fu ACL 2024 • arXiv • Github |
RefGPT: Reference-> Truthful & Customized Dialogues Generation by GPTs and for GPTs
Dongjie Yang, Ruifeng Yuan, YuanTao Fan, YiFei Yang, Zili Wang, Shushen Wang, Hai Zhao EMNLP 2023 • arXiv • Github |
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye, Lin Zhang, Jianchen Wang, Sihang Jiang, Zhuozhi Xiong, Zihan Li, Qianyu He, Rui Xu, Wenhao Huang, Zili Wang, Shusen Wang, Weiguo Zheng, Hongwei Feng, Yanghua Xiao AAAI 2024 • arXiv • Github |
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
Haoze Wu, Zihan Qiu, Zili Wang, Hang Zhao, Jie Fu • arXiv • Github |
A Closer Look into Mixture-of-Experts in Large Language Models
Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, Jie Fu • arXiv • Github |
Evolving Large Language Model Assistant with Long-Term Conditional Memory
Ruifeng Yuan, Shichao Sun, Zili Wang, Ziqiang Cao, Wenjie Li • arXiv • Github |