Scaling Latent Reasoning via Looped Language Models

Abstract

Modern LLMs are trained to think primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data. We present and open-source Ouro, named after the recursive Ouroboros, a family of pre-trained Looped Language Models (LoopLM) that build reasoning into pre-training through (i) iterative computation in latent space, (ii) an entropy-regularized objective for learned depth allocation, and (iii) scaling to 7.7T tokens. Ouro 1.4B and 2.6B models match the performance of up to 12B SOTA LLMs across diverse benchmarks, not through increased knowledge capacity but through superior knowledge manipulation. LoopLM produces reasoning traces that align more closely with final outputs than explicit CoT, highlighting LoopLM as a promising scaling direction for the reasoning era.

Type
Industry research project
Publication
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Create your slides in Markdown - click the Slides button to check out the example.

Paper

Tianyu Zhang
Tianyu Zhang
Ph.D. Student in Machine Learning

My research interests include Algorithmic Game Theory, Agent-based Model Simulator, AI for Climate Change, Multi-agent Reinforcement Learning, Self-supervised Learning, Domain Adaptation. I am still exploring and learning slowly.