Skip to main content

Spring Deadline: Sunday, March 1 @ 11:59pm PT. Click here to apply.

ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems

ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems

December 1, 2025

Retrieval-Augmented Generation (RAG) systems using large language models (LLMs) often generate inaccurate responses due to the retrieval of irrelevant or loosely related information. Existing methods,...

Accepted to NAACL SRW 2025

Authors: Ishneet Singh, Ritvik Aggarwal, Ibrahim Allahverdiyev, Muhammad Taha

Retrieval-Augmented Generation (RAG) systems using large language models (LLMs) often generate inaccurate responses due to the retrieval of irrelevant or loosely related information. Existing methods, which operate at the document level, fail to effectively filter out such content. We propose LLM-driven chunk filtering, ChunkRAG, a framework that enhances RAG systems by evaluating and filtering retrieved information at the chunk level. Our approach employs semantic chunking to divide documents into coherent sections and utilizes LLM-based relevance scoring to assess each chunk's alignment with the user's query. By filtering out less pertinent chunks before the generation phase, we significantly reduce hallucinations and improve factual accuracy. Empirical evaluations on the PopQA, PubHealth and Biography dataset indicate that ChunkRAG improves response accuracy over state-of-the-art RAG methods.

Begin Your Journey

The application takes 10 minutes and is reviewed on a rolling basis. We look for strong technical signal—projects, coursework, or competition results—and a genuine curiosity to do real research.

If admitted, you will join a structured pipeline with direct mentorship to take your work from ideation to top conference submission at venues like NeurIPS, ACL, and EMNLP.