Jiajun Fan

I am a Computer Science Ph.D. student at UIUC, advised by Prof. Ge Liu. My research focuses on autonomous RL post-training for large generative models — making diffusion/flow models and multi-modal reasoning LLMs continuously self-improve with less and less human intervention. Previously, I pushed RL to superhuman performance: breaking 24 Atari world records and outperforming Agent57 with 500× less data.

🎓 Seeking research internship — Summer 2026.  [CV]  [Scholar]  [Email]
📰 Latest News
  • Jan 2026 Accept2 papers at ICLR 2026 — CESAR & SP-VLA. See you in Rio 🇧🇷
  • Sep 2025 Accept2 papers at NeurIPS 2025 — ADRPO & VarCon. See you in San Diego 🌊
  • Jun 2025 AcceptPaper accepted at IEEE TPAMI: PRANCE.
  • Feb 2025 AcceptPaper accepted at ICLR 2025: ORW-CFM-W2 (Flow Matching self-evolution).
  • Jan 2025 ServiceArea reviewer: ICLR 2025, NeurIPS 2024, CVPR 2026, AAAI 2025, AISTATS 2025.
  • Aug 2024 🎓 Started Ph.D. at UIUC CS (GPA 4.0/4.0).
  • Jan 2023 Oral · Top 5%LBC at ICLR 2023, ranked 5/4176 — broke 24 Atari world records.
📄 Selected Publications

* = first/co-first author  ·  Full list on Google Scholar  /  Publications page

ICLR 20262026
J. Fan*, R. Ren, J. Li, R. Pandey, P.G. Shivakumar, Y. Gu, A. Gandhe, G. Liu, I. Bulyko
CESAR: process-reward RL (GRPO) resolving test-time inverse scaling in Audio LLMs — models produce hallucinatory reasoning without proper guidance; CESAR rewrites that.
🏆 SOTA on MMAU Test-mini · Outperforms Gemini 2.5 Pro & GPT-4o Audio
ICLR 20262026
Y. Li, Y. Meng, Z. Sun, K. Ji, C. Tang, J. Fan, X. Ma, S.-T. Xia, Z. Wang, W. Zhu
Action-aware model scheduling + spatio-semantic token pruning for VLA acceleration.
⚡ 1.5× lossless speedup (LIBERO) · 2.4× speedup (SimplerEnv)
NeurIPS 20252025
J. Fan*, T. Wei, C. Cheng, Y. Chen, G. Liu
ADRPO: sample-level adaptive KL — high-value samples get more freedom, poor samples get stronger constraint. Plug-and-play on top of any RLHF method.
🚀 2B SD3 surpasses 4.8B & 12B models · Generalizes to LLMs & audio reasoning
NeurIPS 20252025
Z. Wang, J. Fan, T. Nguyen, H. Ji, G. Liu
VarCon: supervised contrastive learning as variational inference — posterior-weighted ELBO replaces pairwise comparisons.
📊 SOTA 79.36% Top-1 on ImageNet-1K (ResNet-50)
ICLR 20252025
J. Fan*, S. Shen, C. Cheng, Y. Chen, C. Liang, G. Liu
ORW-CFM-W2: first online RLHF for flow matching — no human data, no likelihood, no collapse. W2 regularization keeps generation diverse.
Preprint2025
J. Fan*, C. Cheng, S. Shen, X. Zhou, G. Liu  ·  Under Review
AC-Flow: actor-critic with intermediate feedback for flow matching — reward shaping + dual-stability + Wasserstein regularization. Robust fine-tuning on SD3 without collapse.
TPAMI 20252025
Y. Li, C. Tang, Y. Meng, J. Fan, Z. Chai, X. Ma, Z. Wang, W. Zhu  ·  IEEE TPAMI
ICLR 2023
Oral
2023
J. Fan*, Y. Zhuang, Y. Liu, J. Hao, B. Wang, J. Zhu, H. Wang, S.-T. Xia
LBC: learnable hybrid behavior mapping + bandit meta-controller. Unified framework for exploration control in deep RL.
🏅 Ranked 5/4176 · 10,077% mean human score · 24 world records · 500× data efficiency
ICML 20222022
J. Fan*, C. Xiao
GDI: optimizing the data distribution is the key to superhuman RL efficiency. Unified framework for diverse RL algorithms.
📈 Agent57 beaten with 500× less data & 2× avg performance
🔬 Research Interests
🌊
RL Post-Training for Generative Models
Collapse-free online RLHF for flow/diffusion models. No human-collected data needed — the model rewards itself (ORW-CFM-W2, ADRPO, AC-Flow).
🧠
Reasoning in Multimodal LLMs
Process-reward RL for audio/visual LLMs — fixing test-time inverse scaling so reasoning actually helps, not hurts (CESAR).
🎮
Superhuman-Level Deep RL
Sample-efficient RL that exceeds human performance. Broke 24 Atari world records with 500× less data than prior SOTA (LBC, GDI).
⚡ Impact at a Glance
9+
Top Venue Papers
ICLR · NeurIPS · ICML · TPAMI
24
Atari World Records
broken by LBC (ICLR'23 Oral)
500×
More Data-Efficient
than Agent57
SOTA
MMAU Audio Reasoning
Beats Gemini 2.5 Pro
200+
Google Scholar Citations
4.0
GPA — UIUC Ph.D.
Computer Science
💡 Research Vision

Making AI Systems That Improve Themselves

Today's AI is frozen after training. I work to change that: AI that never stops getting better, with less and less human scaffolding at each step. The roadmap: eliminate human-collected data (ORW-CFM-W2) → remove manual KL tuning (ADRPO) → drop hand-crafted process rewards (CESAR) → fully autonomous self-critique (ongoing).

🏅 Awards & Academic Service

🎖 Selected Awards

  • National Scholarship ×2, Top 1% — Nankai Univ.
  • Ranked 1st / 83 in major — Nankai Univ.
  • Outstanding Graduates (Top 1%) — Nankai Univ.
  • Tang Lixin Scholarship (Top 1%)
  • GPA 4.0/4.0 — UIUC Ph.D.
  • GPA 3.97/4.0, Top 1.3% — Tsinghua M.Eng.

🔍 Reviewer

  • ICLR 2024 · 2025 · 2026
  • NeurIPS 2022–2024 · 2025
  • ICML 2023–2024 · 2025 · 2026
  • CVPR 2026
  • AAAI 2025 · AISTATS 2025 · KDD 2024
📬 Contact

Happy to discuss research, internships, or collaborations. Best reached by email.
📧 jiajunf3@illinois.edu  ·  🏛 Siebel Center for CS, UIUC  ·  CV (PDF)