I am a Computer Science Ph.D. student at UIUC, advised by Prof. Ge Liu. My research focuses on autonomous RL post-training for large generative models — making diffusion/flow models and multi-modal reasoning LLMs continuously self-improve with less and less human intervention. Previously, I pushed RL to superhuman performance: breaking 24 Atari world records and outperforming Agent57 with 500× less data.
- Jan 2026 Accept2 papers at ICLR 2026 — CESAR & SP-VLA. See you in Rio 🇧🇷
- Sep 2025 Accept2 papers at NeurIPS 2025 — ADRPO & VarCon. See you in San Diego 🌊
- Jun 2025 AcceptPaper accepted at IEEE TPAMI: PRANCE.
- Feb 2025 AcceptPaper accepted at ICLR 2025: ORW-CFM-W2 (Flow Matching self-evolution).
- Jan 2025 ServiceArea reviewer: ICLR 2025, NeurIPS 2024, CVPR 2026, AAAI 2025, AISTATS 2025.
- Aug 2024 🎓 Started Ph.D. at UIUC CS (GPA 4.0/4.0).
- Jan 2023 Oral · Top 5%LBC at ICLR 2023, ranked 5/4176 — broke 24 Atari world records.
* = first/co-first author · Full list on Google Scholar / Publications page
ICLR 20262026
J. Fan*, R. Ren, J. Li, R. Pandey, P.G. Shivakumar, Y. Gu, A. Gandhe, G. Liu, I. Bulyko
CESAR: process-reward RL (GRPO) resolving test-time inverse scaling in Audio LLMs — models produce hallucinatory reasoning without proper guidance; CESAR rewrites that.
🏆 SOTA on MMAU Test-mini · Outperforms Gemini 2.5 Pro & GPT-4o Audio
ICLR 20262026
Y. Li, Y. Meng, Z. Sun, K. Ji, C. Tang, J. Fan, X. Ma, S.-T. Xia, Z. Wang, W. Zhu
Action-aware model scheduling + spatio-semantic token pruning for VLA acceleration.
⚡ 1.5× lossless speedup (LIBERO) · 2.4× speedup (SimplerEnv)
NeurIPS 20252025
J. Fan*, T. Wei, C. Cheng, Y. Chen, G. Liu
ADRPO: sample-level adaptive KL — high-value samples get more freedom, poor samples get stronger constraint. Plug-and-play on top of any RLHF method.
🚀 2B SD3 surpasses 4.8B & 12B models · Generalizes to LLMs & audio reasoning
NeurIPS 20252025
Z. Wang, J. Fan, T. Nguyen, H. Ji, G. Liu
VarCon: supervised contrastive learning as variational inference — posterior-weighted ELBO replaces pairwise comparisons.
📊 SOTA 79.36% Top-1 on ImageNet-1K (ResNet-50)
ICLR 20252025
J. Fan*, S. Shen, C. Cheng, Y. Chen, C. Liang, G. Liu
ORW-CFM-W2: first online RLHF for flow matching — no human data, no likelihood, no collapse. W2 regularization keeps generation diverse.
Preprint2025
J. Fan*, C. Cheng, S. Shen, X. Zhou, G. Liu · Under Review
AC-Flow: actor-critic with intermediate feedback for flow matching — reward shaping + dual-stability + Wasserstein regularization. Robust fine-tuning on SD3 without collapse.
TPAMI 20252025
Y. Li, C. Tang, Y. Meng, J. Fan, Z. Chai, X. Ma, Z. Wang, W. Zhu · IEEE TPAMI
ICLR 2023
Oral2023
J. Fan*, Y. Zhuang, Y. Liu, J. Hao, B. Wang, J. Zhu, H. Wang, S.-T. Xia
LBC: learnable hybrid behavior mapping + bandit meta-controller. Unified framework for exploration control in deep RL.
🏅 Ranked 5/4176 · 10,077% mean human score · 24 world records · 500× data efficiency
ICML 20222022
J. Fan*, C. Xiao
GDI: optimizing the data distribution is the key to superhuman RL efficiency. Unified framework for diverse RL algorithms.
📈 Agent57 beaten with 500× less data & 2× avg performance
🌊
RL Post-Training for Generative Models
Collapse-free online RLHF for flow/diffusion models. No human-collected data needed — the model rewards itself (ORW-CFM-W2, ADRPO, AC-Flow).
🧠
Reasoning in Multimodal LLMs
Process-reward RL for audio/visual LLMs — fixing test-time inverse scaling so reasoning actually helps, not hurts (CESAR).
🎮
Superhuman-Level Deep RL
Sample-efficient RL that exceeds human performance. Broke 24 Atari world records with 500× less data than prior SOTA (LBC, GDI).
9+
Top Venue Papers
ICLR · NeurIPS · ICML · TPAMI
24
Atari World Records
broken by LBC (ICLR'23 Oral)
500×
More Data-Efficient
than Agent57
SOTA
MMAU Audio Reasoning
Beats Gemini 2.5 Pro
200+
Google Scholar Citations
4.0
GPA — UIUC Ph.D.
Computer Science
Making AI Systems That Improve Themselves
Today's AI is frozen after training. I work to change that: AI that never stops getting better, with less and less human scaffolding at each step. The roadmap: eliminate human-collected data (ORW-CFM-W2) → remove manual KL tuning (ADRPO) → drop hand-crafted process rewards (CESAR) → fully autonomous self-critique (ongoing).
🎖 Selected Awards
- National Scholarship ×2, Top 1% — Nankai Univ.
- Ranked 1st / 83 in major — Nankai Univ.
- Outstanding Graduates (Top 1%) — Nankai Univ.
- Tang Lixin Scholarship (Top 1%)
- GPA 4.0/4.0 — UIUC Ph.D.
- GPA 3.97/4.0, Top 1.3% — Tsinghua M.Eng.
🔍 Reviewer
- ICLR 2024 · 2025 · 2026
- NeurIPS 2022–2024 · 2025
- ICML 2023–2024 · 2025 · 2026
- CVPR 2026
- AAAI 2025 · AISTATS 2025 · KDD 2024
Happy to discuss research, internships, or collaborations. Best reached by email.
📧 jiajunf3@illinois.edu · 🏛 Siebel Center for CS, UIUC · CV (PDF)