Zili Wang 

StepFun

Shang Hai, China

Zili Wang is currently an LLM Researcher at StepFun (October 2024 - Present), focusing on large language model pretraining, particularly responsible for all data aspects and code pretraining components in the Step3 project, supervised by Xiangyu Zhang. Previously, he worked as an Algorithm Expert at INF Technology (September 2023 - September 2024), an Algorithm Engineer at Xiaohongshu Inc. (March 2022 - September 2023) with Prof. Shusen Wang, and a Research Assistant at Hong Kong Polytechnic University (February 2020 - March 2022) with Prof. Wenjie Li.

Recent Projects

Step3: Cost-Effective Multimodal Intelligence

Step3 is our cutting-edge multimodal reasoning model—built on a massive Mixture-of-Experts architecture with 321 billion total parameters and 38 billion active. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning, mathematics, and code. Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD), Step3 maintains exceptional efficiency across both flagship and low-end accelerators.

Links: GitHub GitHub | Project Page Project Page | Paper Paper

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

OpenCoder is an open and reproducible code LLM family which includes 1.5B and 8B base and chat models, supporting both English and Chinese languages. Starting from scratch, OpenCoder is trained on 2.5 trillion tokens composed of 90% raw code and 10% code-related web data, reaching the performance of top-tier code LLMs. We provide not only model weights and inference code, but also the reproducible training data, the complete data processing pipeline, rigorous experimental ablation results, and detailed training protocols.

Links: GitHub GitHub | HuggingFace HuggingFace | Project Page Project Page | Paper Paper

INF-34B: INF's Open-Source Large Language Models

INF-34B has 34 billion parameters with a context window length of 32K, and is trained on about 3.5T well-processed tokens from English and Chinese bilingual corpus. Compared with open source models of comparable size, INF-34B not only provides competitive performance in the OpenCompass evaluation, but also has impressive potential in both finance and healthcare domains. Besides, the quantized INF-34B runs on graphics cards of 24GB VRAM with negligible accuracy loss, which facilitates commercial applications, especially low-resource scenarios.

Links: GitHub GitHub | HuggingFace HuggingFace | Tech Report Tech Report

Selected Publications


MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu
ICLR 2024
arXiv    • Github   
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models (Project Leader, Corresponding Author)
Zili Wang (project leader, corresponding author), et al.
arXiv 2024
arXiv    • Github   
HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts
Hao Zhao, Zihan Qiu, Huijia Wu, Zili Wang, Zhaofeng He, Jie Fu
ACL 2024
arXiv    • Github   
RefGPT: Reference-> Truthful & Customized Dialogues Generation by GPTs and for GPTs
Dongjie Yang, Ruifeng Yuan, YuanTao Fan, YiFei Yang, Zili Wang, Shushen Wang, Hai Zhao
EMNLP 2023
arXiv    • Github   
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye, Lin Zhang, Jianchen Wang, Sihang Jiang, Zhuozhi Xiong, Zihan Li, Qianyu He, Rui Xu, Wenhao Huang, Zili Wang, Shusen Wang, Weiguo Zheng, Hongwei Feng, Yanghua Xiao
AAAI 2024
arXiv    • Github   
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
Haoze Wu, Zihan Qiu, Zili Wang, Hang Zhao, Jie Fu
arXiv    • Github   
A Closer Look into Mixture-of-Experts in Large Language Models
Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, Jie Fu
arXiv    • Github   
Evolving Large Language Model Assistant with Long-Term Conditional Memory
Ruifeng Yuan, Shichao Sun, Zili Wang, Ziqiang Cao, Wenjie Li
arXiv    • Github   

Professional Services

Visitor Map

0 Total Visitors
0 Countries
0 Cities