AI-powered document understanding and conversational retrieval system built using LangChain, ChromaDB, FastAPI, Streamlit, and Groq LLMs.
Document Intelligence RAG Agent allows users to:
- Upload PDF documents
- Perform semantic search on document content
- Ask natural language questions
- Retrieve context-aware AI-generated answers
The system uses a Retrieval-Augmented Generation (RAG) pipeline to combine vector search with Large Language Models for accurate responses.
✅ PDF Upload & Processing ✅ Semantic Document Search ✅ Conversational AI Q&A ✅ Vector Embeddings using Sentence Transformers ✅ ChromaDB Vector Store ✅ FastAPI Backend ✅ Streamlit Interactive UI ✅ Groq LLM Integration ✅ Modern Glassmorphism UI ✅ Dark-Themed AI Interface
| Technology | Purpose |
|---|---|
| Python | Core programming language |
| LangChain | RAG pipeline orchestration |
| ChromaDB | Vector database |
| Sentence Transformers | Embedding generation |
| FastAPI | Backend API |
| Streamlit | Frontend UI |
| Groq API | LLM inference |
| PyPDF | PDF text extraction |
PDF Upload
↓
Text Extraction
↓
Chunking
↓
Embeddings Generation
↓
ChromaDB Vector Storage
↓
Similarity Search
↓
Relevant Context Retrieval
↓
LLM Response Generation
rag-agent/
│
├── app/
│ ├── services/
│ ├── utils/
│ ├── uploads/
│
├── frontend/
│ └── frontend.py
│
├── .streamlit/
│ └── config.toml
│
├── requirements.txt
│
└── README.md
git clone <YOUR_GITHUB_REPO_URL>
cd rag-agentpython -m venv venvActivate environment:
venv\Scripts\activatesource venv/bin/activatepip install -r requirements.txtCreate a .env file:
GROQ_API_KEY=your_api_key_hereuvicorn app.main:app --reloadBackend runs on:
http://127.0.0.1:8000
streamlit run frontend/frontend.pyFrontend runs on:
http://localhost:8501
Add your screenshot here
README_images/app_preview.png
Example Markdown after adding screenshot:
- Upload PDF document
- System extracts and indexes content
- Ask questions in chat
- AI retrieves relevant chunks
- LLM generates contextual answer
- Multi-document support
- Conversation memory
- Authentication system
- Cloud deployment
- Source citation highlighting
- OCR support for scanned PDFs
- Hybrid search (BM25 + Vector)
Summarize this document
What are the key findings?
Explain the methodology section
What technologies are discussed?
Built by Anurag Wanwe
This project is for educational and portfolio purposes.