AI & VideoTechnical Development

GR2 beats OneRec-Think by 2.4% in Recall@5

Researchers proposed the Generative Reasoning Re-ranker (GR2), an end-to-end framework leveraging Large Language Models (LLMs) for recommendation systems. GR2 addresses limitations in existing LLM-based recommenders by focusing on the reranking phase, utilizing LLM reasoning abilities through reinforcement learning (RL) and high-quality data, and encoding non-semantic IDs into scalable semantic IDs. Experiments on two real-world datasets demonstrated GR2's effectiveness, surpassing the state-of-the-art OneRec-Think by 2.4% in Recall@5 and 1.3% in NDCG@5.

Key Takeaways

GR2 focuses on the reranking phase, which the paper says most LLM recommender work has overlooked.
The framework converts non-semantic IDs into semantic IDs with a tokenizer that achieves at least 99% uniqueness.
A larger LLM generates reasoning traces via prompting and rejection sampling, then uses them for supervised fine-tuning.
GR2 adds Decoupled Clip and Dynamic sAmpling Policy Optimization (DAPO) for scalable RL supervision with verifiable rewards.
On two real-world datasets, GR2 beat OneRec-Think by 2.4% in Recall@5 and 1.3% in NDCG@5.

Why It Matters

GR2 puts LLM reasoning directly into reranking, the stage that refines final recommendations, rather than stopping at retrieval or ranking. That matters because the paper frames reranking as underexplored in LLM-based recommenders and shows gains over OneRec-Think on two datasets. The bigger ecosystem signal is the paper’s focus on billions of item IDs and reward design for reranking. Next to watch: whether the tokenizer’s ≥99% unique semantic IDs and the conditional verifiable rewards hold up beyond the two reported datasets.

Read full article at arxiv.org

Get this in your inbox → Subscribe

Enjoy our coverage?

Add StreamingMeme as a preferred source on Google to see more of our streaming news at the top of your Search results.

Add as preferred source

MDPI: KOREATECH and ETRI optimize Qwen3-VL for 25W edge video monitoring

Smallest.ai: Smallest.ai integrates low-latency Pulse and Lightning models into LiveKit Agents

Hyper.ai: Google DeepMind and UC Riverside launch framework to trace synthetic video

GR2 beats OneRec-Think by 2.4% in Recall@5

Key Takeaways

GR2 focuses on the reranking phase, which the paper says most LLM recommender work has overlooked.
The framework converts non-semantic IDs into semantic IDs with a tokenizer that achieves at least 99% uniqueness.
A larger LLM generates reasoning traces via prompting and rejection sampling, then uses them for supervised fine-tuning.
GR2 adds Decoupled Clip and Dynamic sAmpling Policy Optimization (DAPO) for scalable RL supervision with verifiable rewards.
On two real-world datasets, GR2 beat OneRec-Think by 2.4% in Recall@5 and 1.3% in NDCG@5.

Why It Matters

Read full article at arxiv.org

GR2 beats OneRec-Think by 2.4% in Recall@5

Key Takeaways

Why It Matters

Enjoy our coverage?

Related Articles

GR2 beats OneRec-Think by 2.4% in Recall@5

Key Takeaways

Why It Matters

Enjoy our coverage?

Related Articles

Newest

Upcoming Events

Top Sources

Newest

Upcoming Events

Top Sources