GR2 beats OneRec-Think by 2.4% in Recall@5
Researchers proposed the Generative Reasoning Re-ranker (GR2), an end-to-end framework leveraging Large Language Models (LLMs) for recommendation systems. GR2 addresses limitations in existing LLM-based recommenders by focusing on the reranking phase, utilizing LLM reasoning abilities through reinforcement learning (RL) and high-quality data, and encoding non-semantic IDs into scalable semantic IDs. Experiments on two real-world datasets demonstrated GR2's effectiveness, surpassing the state-of-the-art OneRec-Think by 2.4% in Recall@5 and 1.3% in NDCG@5.
Key Takeaways
- GR2 focuses on the reranking phase, which the paper says most LLM recommender work has overlooked.
- The framework converts non-semantic IDs into semantic IDs with a tokenizer that achieves at least 99% uniqueness.
- A larger LLM generates reasoning traces via prompting and rejection sampling, then uses them for supervised fine-tuning.
- GR2 adds Decoupled Clip and Dynamic sAmpling Policy Optimization (DAPO) for scalable RL supervision with verifiable rewards.
- On two real-world datasets, GR2 beat OneRec-Think by 2.4% in Recall@5 and 1.3% in NDCG@5.
Why It Matters
GR2 puts LLM reasoning directly into reranking, the stage that refines final recommendations, rather than stopping at retrieval or ranking. That matters because the paper frames reranking as underexplored in LLM-based recommenders and shows gains over OneRec-Think on two datasets. The bigger ecosystem signal is the paper’s focus on billions of item IDs and reward design for reranking. Next to watch: whether the tokenizer’s ≥99% unique semantic IDs and the conditional verifiable rewards hold up beyond the two reported datasets.
Read full article at arxiv.org
