Accepted Papers
ICLR 2026 Workshop on Recursive Self-Improvement
110 Papers Accepted, Congrats!
Oral
4 papersSpotlight
21 papers- #5 Language Self-Play For Data-Free Training
- #6 SimpleMem: Efficient Lifelong Memory for LLM Agents
- #7 Towards Execution-Grounded Automated AI Research
- #8 Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation
- #9 Can Language Models Discover Scaling Laws?
- #10 Self-Improving World Models via Asymmetric Forward-Inverse Consistency
- #11 Tiny Autoregressive Recursive Models
- #12 From Growing to Looping: A Unified View of Iterative Computation in LLMs
- #13 Lang-PINN: From Language to Physics-Informed Neural Networks via a Multi-Agent Framework
- #14 Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations
- #15 ACE: Self-Evolving LLM Coding Framework Adversarial Unit Test Generation and Preference Optimization
- #16 Can Current Language Models Close the Discovery to Application Loop?
- #17 CausalEvolve: Towards Open-Ended Discovery with Causal Scratchpad
- #18 Self-Improving Vision-Language-Action Models with Data Generation via Residual RL
- #19 VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model
- #20 Test-Time Self-Distillation
- #21 Self-Evolving Rubrics: Interpretable Instance-Level Criteria for Scalable RL
- #22 Anchored Self-Play for Code Repair
- #23 Interestingness as an Inductive Heuristic for Future Compression Progress
- #24 GASP: Guided Asymmetric Self-Play For Coding LLMs
- #25 Adaptive Meta-Curriculum for Test-Time Self-Improvement
Poster
75 papers- #26 Self-Improving Clinical Reasoning via Textual Gradients
- #27 Federated Agent Reinforcement Learning
- #28 Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning
- #29 OMEGA: Optimizing Machine learning by Evaluating Generated Algorithms
- #30 Intelligent Robot Manipulation Requires Self-Directed Learning
- #31 Correct Reasoning Paths Visit Shared Decision Pivots
- #32 RFTF: Reinforcement Fine-tuning for Vision-language-action Models with Temporal Feedback
- #33 VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
- #34 TangramSR: A Benchmark for Recursive Self-Improvement In Continuous Geometric Reasoning
- #35 SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
- #36 A Framework for Prompt Optimization and Translation Across Foundation Models
- #37 Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence
- #38 LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers
- #39 Contrastive Self-Refinement for Low-Cost Adaptation in Real-World Text-to-SQL
- #40 Simple Baselines are Competitive with Code Evolution
- #41 Rethinking Machine Unlearning: Models Designed to Forget via Key Deletion
- #42 Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation
- #43 Unlocking Intrinsic Self-Reflection for LLM Preference Policy Optimization
- #44 TextBO: Bayesian Optimization in Language Space for Eval-Efficient Self-Improving AI
- #45 POLARIS: A GODEL AGENT FRAMEWORK FOR SMALL LANGUAGE MODELS THROUGH EXPERIENCE ABSTRACTED POLICY REPAIR
- #46 Shape of Thought: When Distribution Matters More than Correctness in Reasoning Tasks
- #47 Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search
- #48 Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
- #49 Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?
- #50 Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision
- #51 Theory-Driven Modeling and LLM-Guided Evolution for Power System Scheduling
- #52 Differentiable Evolutionary Reinforcement Learning
- #53 Your Self-Play Algorithm is Secretly an Adversarial Imitator: Understanding LLM Self-Play through the Lens of Imitation Learning
- #54 Unrolled Policy Iteration for Tiny Recursive Models
- #55 Reasoning Cache: Learning to Extrapolate to Long Lengths via Short-Length RL
- #56 ESDAE: Evaluating Synthetic Data for Agent Evaluation
- #57 Actor-Curator: Scalable Policy-driven Curriculum Learning for RL Post-Training
- #58 Reward Hacking in Self-Improving Code Agents
- #59 Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping
- #60 Learning What to Learn: Curriculum Curation for Test-Time Agent Learning
- #61 Beyond Solving: A Closer Look at LLMs as Solution Verifiers
- #62 Aligned but Stereotypical? Understanding and Mitigating Social Bias in LLM-Driven Text-to-Image Models
- #63 Do Depth-Grown Models Overcome the Curse of Depth? An In-Depth Analysis
- #64 CoT-Seg: Rethinking Segmentation with Chain-of-Thought Reasoning and Self-Correction
- #65 AlphaApollo: A System for Deep Agentic Reasoning
- #66 Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space
- #67 Leveraging Suboptimal and Noisy Trajectories for Goal-Conditional Offline RL
- #68 AUTOHARNESS: IMPROVING LLM AGENTS BY AUTOMATICALLY SYNTHESIZING A CODE HARNESS
- #69 One-Step Video Depth Estimation via Self-Distillation
- #70 Discover the distinguishing and effective reasoning patterns among LLMs via an LLM
- #71 Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
- #72 Self-EvolveRec: Self-Evolving Recommender Systems with LLM-based Directional Feedback
- #73 Learning to Evolve: Scaling Open-Ended Discovery with Relative-Progress RL
- #74 Improved Iterative Refinement for Chart-to-Code Generation via Structured Instruction
- #75 Language-Guided Expertise Evolution for Protein Optimization
- #76 A Task-Centric Theory for Iterative Self-Improvement with Easy-to-Hard Curricula
- #77 Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation
- #78 Dynamic Noise Preference Optimization: Self-Improvement of Large Language Models with Self-Synthetic Data
- #79 Vision-Guided Iterative Refinement for Frontend Code Generation
- #80 MAPPA: Scaling Multiagent Systems with Process Rewards
- #81 Residual Off-Policy RL for Finetuning Behavior Cloning Policies
- #82 SAGE: Self-play Adversarial Games Enhance Large Language Model Reasoning Capabilities
- #83 Soft Mellowmax Monte Carlo Planning
- #84 Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation
- #85 Self-Improving VLM Judges Without Human Annotations
- #86 Duel-Evolve: Pairwise Preference Black-Box Optimization of LLM Responses
- #87 MimicAgent: Learning Quadruped Skills via Text-to-Trajectory Generation
- #88 Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison
- #89 CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning
- #90 Generative Recursive Reasoning Models
- #91 Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
- #92 Refining Large Language Models with Self-Generated Data Through Iterative Training
- #93 Inference-Time Scaling in Diffusion Models through Iterative Partial Refinement
- #94 In-Context Adaptation
- #95 SAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement
- #96 Structure Enables Effective Self-Localization of Errors in LLMs
- #97 Self-Improvement via Fast Tree-search
- #98 Self-Adapting Agents for Automating Research Coding Workflows
- #99 Verifying the Verifiers: Failure Attribution for Agentic Benchmark Diagnostics and Training Data Curation
- #100 Just Enough Learning: GRPO-Guided Controllers for Hyperparameter Sweeps
Short Paper
10 papers- #101 Real-Time Procedural Learning From Experience for AI Agents
- #102 Reference-Guided Machine Unlearning
- #103 Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls
- #104 Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models
- #105 Depth vs Recursion: Outperforming Transformers in Jigsaw Reconstruction
- #106 Test-Time Meta-Adaptation with Self-Synthesis
- #107 Federation over Text
- #108 TamperBench: A Systematic Framework to Stress-Test LLM Safety Under Fine-Tuning and Tampering
- #109 Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants
- #110 Orthogonal Gradient Projection for Continual LLM Unlearning