I am a second-year master’s student at Beihang University in the School of Computer Science and Engineering, working with Dr. Jian Yang and Prof. Zhoujun Li.

My research centres on Natural Language Processing (NLP) and Software Engineering (SE), specifically emphasizing large language models (LLM). I am particularly fascinated by the potential of these models and their application in advancing the field of Artificial General Intelligence (AGI). Currently, I am doing some interesting research work to explore the following areas:

  • Code LLM: My research focuses on integrating executable code with LLMs or large multimodal models. The ultimate goal is to transform LLMs into compilers of natural language, enabling them to execute specific tasks through code or agents.
  • Code Self-Evolution: I am exploring the potential for code to self-evolve using LLM. The ultimate aim is to enable software systems to autonomously update, optimize, and repair themselves without human intervention, thereby enhancing the efficiency and sustainability of software maintenance and development.

Looking ahead, I am actively seeking a Ph.D. position in the field of NLP/SE/LLM, starting in the Fall of 2026. If you are interested in my research or are open to potential collaboration, please feel free to let me know.

/

🔥 News

  • 2025.04 :  🎉🎉 BitsAI-CR paper is accepted by FSE 2025.
  • 2025.01 :  🎉🎉 McEval paper is accepted by ICLR 2025.
  • 2024.12 :  🎉🎉 XCoT paper is accepted by AAAI 2025.
  • 2024.09:  🎉🎉 RoleAgent paper is accepted by NIPS 2024.
  • 2024.05:  🎉🎉 UniCoder paper is accepted by ACL 2024.

📝 Publications & Papers in Preparation

BitsAI-CR: Automated Code Review via LLM in Practice.
Tao Sun, Jian Xu, Yuanpeng Li, Zhao Yan, Ge Zhang, Lintao Xie, Lu Geng, Zheng Wang, Yueyan Chen, Qin Lin, Wenbo Duan, Kaixin Sui.
FSE 2025 Industry Track | arXiv 2501.15134 | Posted by Synced / 机器之心, a Top AI media in China


UniCoder: Scaling Code Large Language Model via Universal Code.
Tao Sun*, Linzheng Chai*, Jian Yang*, Yuwei Yin, Hongcheng Guo, Jiaheng Liu, Bing Wang, Liqun Yang, Zhoujun Li, (*=equal contribution).
ACL 2024 Main Conference | arXiv 2406.16441


BitsAI-P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark
Tao Sun, Enhao Pan, Zhengkai Yang, Kaixin Sui, Jiajun Shi, Xianfu Cheng, Ge Zhang, Wenhao Huang, Tongling Li, Jian Yang, Zhoujun Li.
ICLR 2025 Workshop, NIPS 2025 under review


RoleAgent: Building, Interacting, and Benchmarking High-quality Role-Playing Agents from Script.
Jiaheng Liu*, Zehao Ni*, Haoran Que*, Tao Sun, Zekun Wang, Jian Yang, Jiakai Wang, Hongcheng Guo, Zhongyuan Peng, Ge Zhang, Jiayi Tian, Xingyuan Bu, Ke Xu, Wenge Rong, Junran Peng, Zhaoxiang Zhang, (*=equal contribution).
NIPS 2024 Poster


SVIPTR: Fast and Efficient Scene Text Recognition with Vision Permutable Extractor.
Xianfu Cheng, Weixiao Zhou, Xiang Li, Jian Yang, Hang Zhang, Tao Sun, Wei Zhang, Yuying Mai, Tongliang Li, Xiaoming Chen, Zhoujun Li.
CIKM 2024


Neural Distinguishers on TinyJAMBU-128 and GIFT-64.
Tao Sun, Dongsu Shen, Saiqin Long, Qingyong Deng, Shiguo Wang.
ICONIP 2022 Oral


McEval: Massively Multilingual Code Evaluation.
Linzheng Chai*, Shukai Liu*, Jian Yang*, Yuwei Yin, Ke Jin, Jiaheng Liu, Tao Sun, Ge Zhang, Changyu Ren, Hongcheng Guo, Zekun Wang, Boyang Wang, Xianjie Wu, Bing Wang, Tongliang Li, Liqun Yang, Sufeng Duan, Zhoujun Li, (*=equal contribution).
arXiv 2406.07436 | Post by PaperWeekly, a Top AI media in China


xCOT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning.
Linzheng Chai, Jian Yang, Tao Sun, Hongcheng Guo, Jiaheng Liu, Bing Wang, Xiannian Liang, Jiaqi Bai, Tongliang Li, Qiyao Peng, Zhoujun Li.
arXiv 2401.07037


REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models.
Yinghao Zhu*, Changyu Ren*, Shiyun Xie, Shukai Liu, Hangyuan Ji, Zixiang Wang, Tao Sun, Long He, Zhoujun Li, Xi Zhu, Chengwei Pan, (*=equal contribution).
arXiv 2402.07016


XFormParser: Semi-structured Form Parser with Multimodal and Multilingual Knowledge.
Xianfu Cheng*, Hang Zhang*, Jian Yang, Xiang Li, Weixiao Zhou, Kui Wu, Fei Liu, Wei Zhang, Tao Sun, Tongliang Li, Zhoujun Li, (*=equal contribution).
arXiv 2405.17336


💻 Internships

  • 2024.09 — Present, ByteDance Inc., Beijing, China.
    • Position: Research Intern (Full-time, Onsite, Paid)
    • Participated in development of the programming assistant for Cici (豆包编程助手), China’s leading LLM application.
    • Spearheaded the development of code intent recognition systems and code models for Cici (豆包).
    • Involved in the construction of the internal Code Review system.
  • 2024.03 — Now, Meituan Inc., China.
    • Position: Research Intern (Full-time, Onsite, Paid)
    • Duty: Conducted research and testing on Meituan’s in-house large-scale models, primarily focusing on enhancing the code expert model’s capabilities in code repository-level completion and repair. Explored the application of large models in the field of software engineering, including their abilities in handling long texts and code planning and testing.

💻 Projects

  • Collaborator: Windrecorder (GitHub 3215 Stars) - A memory search app that records everything on your screen, to let you rewind what you have seen, query through OCR text or image description, and get activity statistics.
  • Owner: myRime (GitHub 123 Stars) - A customized input method utilizing the Rime engine, suitable for use with Flypy Double Pinyin (Xiaohe Shuangpin), Luna Pinyin, iBus, Fcitx, Windows and MacOS.

🎖 Honors and Awards

  • 2024 China National Scholarship
  • 2024 The First Prize Scholarship (Awarded by Beihang University)
  • 2023 The Second Prize Scholarship (Awarded by Beihang University)
  • 2021 Pacemaker to Merit Student (Awarded by Xiangtan University, Top 2‰ in School)
  • 2022 The Jingdong Scholarship (Awarded by Jingdong Inc.)
  • 2020 & 2021 & 2022 The First Prize Scholarship (Awarded by Xiangtan University, Top 7% in School)
  • 2021.10 Bronze Medal of 2021 ICPC Asia Regional Contest (Awarded by ICPC Foundation)
  • 2021.06 Silver Medal of 2021 CCPC National Invitational Contest (Awarded by Committee for CCPC)
  • 2021.11 Bronze Medal of 2021 CCPC Guilin Site Contest (Awarded by Committee for CCPC)
  • 2021.10 The First Prize of China Undergraduate Mathematical Contest in Modeling in Hunan Division (Awarded by CSIAM)

📖 Educations

  • 2023.09 - 2026.01 (expected), Beihang University (BUAA), M.S. in Computer Technology, School of Computer Science and Engineering
  • 2019.09 - 2023.06, Xiangtan University (XTU), B.E. in Computer Science and Technology, School of Computer Science, GPA: 3.695 / 4.0, Rank: 2 / 88

💬 Service

  • Teaching Assistant: Algorithm Training Team for ACM-ICPC, School of Computer Science, Xiangtan University.
  • Reviewer :CIKM 2024, ICONIP 2024