A sophisticated graph-based Retrieval-Augmented Generation (RAG) system built with DSPy and various LLM models for processing and analyzing documents.
- PDF and Markdown document processing
- Graph-based knowledge representation
- Multi-modal question answering capabilities
- Support for different types of queries (general, mathematical, code, visual)
- Parallel processing for efficient document handling
- Table of Contents (TOC) extraction
- Customizable embedding generation
- fully local operation
pip install -r requirements.txt
Required packages:
- dspy
- networkx
- numpy
- nltk
- rake-nltk
- pymupdf (fitz)
- pymupdf4llm
- ollama
- pandas
- tqdm
- pydantic
The system requires the following Ollama models to be installed:
- mistral-nemo:latest (General purpose)
- mathstral:latest (Mathematical computations)
- llava:latest (Visual processing)
- deepseek-coder-v2:latest (Code-related queries)
- mxbai-embed-large (Embeddings)
-
Document Processing
- PDF and Markdown file reading
- Table of Contents extraction
- Parallel chapter processing
-
Graph Construction
- Text segmentation (chapters, pages, paragraphs, sentences)
- Embedding generation
- Graph node and edge creation
- Keyword extraction using RAKE
-
Query Processing
- Sub-question generation
- Relevant chunk retrieval
- Context-aware answer generation
-
Multi-Modal Support
- General text processing
- Mathematical computations
- Code analysis
- Image understanding
from tools import load_graph
from QM import GraphRAG
# Load your graph
graph_path = "path/to/your/graph.gml"
G = load_graph(graph_path)
# Initialize GraphRAG
graph_rag = GraphRAG(graph=G)
# Ask a question
question = "Your question here"
answer = graph_rag.answer_query(query=question, mode="gen")
print(answer)
The system supports four different modes:
gen
: General text processing (default)mat
: Mathematical computationsvis
: Visual processing (developer needed)code
: Code-related queries
# Example with different modes
math_answer = graph_rag.answer_query(query=question, mode="mat")
code_answer = graph_rag.answer_query(query=question, mode="code")
visual_answer = graph_rag.answer_query(query=question, mode="vis")
from tools import process_pdfs_in_folder
# Process multiple PDFs
folder_path = "path/to/pdfs"
save_path = "path/to/save"
process_pdfs_in_folder(folder_path, save_path)
from tools import save_graph, load_graph
# Save graph
save_graph(graph, "path/to/save/graph.gml")
# Load graph
loaded_graph = load_graph("path/to/graph.gml")
tools.py
: Core utilities and functionsConfig.py
: Configuration and importsQM.py
: Query processing and RAG implementationReadPDF.py
: PDF processing functionalitygraphio.py
: Graph I/O operations
The system uses the mxbai-embed-large model for generating embeddings:
def get_embedding(text, model="mxbai-embed-large"):
response = ollama.embeddings(model=model, prompt=text)
return response["embedding"]
Relevance is determined using cosine similarity:
def calculate_cosine_similarity(chunk, query_embedding, embedding):
if np.linalg.norm(query_embedding) == 0 or np.linalg.norm(embedding) == 0:
return (chunk, 0)
cosine_sim = np.dot(query_embedding, embedding) / (np.linalg.norm(query_embedding) * np.linalg.norm(embedding))
return (chunk, cosine_sim)
The system implements several optimization techniques:
- Parallel processing for document handling
- Multi-threading for chapter processing
- Efficient graph storage and retrieval
- Caching of embeddings in graph nodes
- Requires significant computational resources for large documents
- Dependent on Ollama model availability
- Graph size can become large with extensive documents
- Processing time increases with document complexity
- Enhanced caching mechanisms
- Support for additional file formats
- Improved parallel processing
- Advanced context management
- Extended multi-modal capabilities
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.