Introduction to VeritasGraph
VeritasGraph is a production-ready, end-to-end framework for building advanced question-answering and summarization systems that operate entirely within your private infrastructure. It is architected to overcome the fundamental limitations of traditional vector-search-based Retrieval-Augmented Generation (RAG) by leveraging a knowledge graph to perform complex, multi-hop reasoning.
Baseline RAG systems excel at finding direct answers but falter when faced with questions that require connecting disparate information. VeritasGraph addresses this challenge directly, providing not just answers, but transparent, auditable reasoning paths with full source attribution for every generated claim, establishing a new standard for trust and reliability in enterprise AI.
The Architectural Blueprint
The VeritasGraph pipeline is a four-stage process that systematically transforms a corpus of raw, unstructured documents into a structured knowledge asset capable of sophisticated, attributable reasoning.
Stage 1: Knowledge Graph Construction
This initial stage transforms raw, unstructured documents into a structured, interconnected knowledge graph. The goal is to create a rich data foundation that enables complex reasoning, moving beyond simple keyword or vector search.
Graph Composition
Stage 2: Hybrid Retrieval
Instead of relying on a single method, this stage uses a hybrid approach to find the most relevant information. It combines the broad reach of semantic search with the precision of graph traversal to uncover connections that would otherwise be missed.
Retrieval Effectiveness
Stage 3: LoRA-Tuned Reasoning
Once the context is retrieved, a Large Language Model (LLM) synthesizes the final answer. The LLM is fine-tuned using Low-Rank Adaptation (LoRA), making it highly efficient and specialized for generating attributed, factual responses.
Model Enhancement via LoRA
Stage 4: Attribution & Provenance
The final and most critical stage ensures trust and transparency. Every claim in the generated answer is linked back to its source documents and the reasoning path taken through the graph, providing a verifiable proof.
Components of a Final Response
Getting Started
Environment Setup
This guide uses Ollama with `llama3.1` for generation and `nomic-text-embed` for embeddings. It is recommended to run on Windows without WSL if using LM Studio for embeddings to avoid connection issues.
Important: Fix Model Context Length
Ollama's default context length is 2048, which can truncate I/O during indexing. This guide uses a 12k context window. Note that changing the model in `settings.yaml` will restart the entire indexing process.
1. Pull Required Models
# Terminal 1
ollama serve
# Terminal 2
ollama pull llama3.1
ollama pull nomic-embed-text
2. Build Model with Custom Context Length
ollama create llama3.1-12k -f./Modelfile
GraphRAG Indexing Steps
1. Activate Conda Environment
conda create -n rag python=<3.12
conda activate rag
2. Install GraphRAG
# Clone the project and navigate to the config directory
cd graphrag-ollama-config
# Navigate to the local graphrag fix and install
cd graphrag-ollama
pip install -e./
3. Initialize and Configure
# Install dependencies
pip install sympy future ollama
# Initialize graphrag folder (can be skipped if using this repo's setup)
python -m graphrag.index --init --root.
# Create your.env file
cp.env.example.env
Move your input text files to the `./input/` directory and double-check parameters in `.env` and `settings.yaml`.
4. Start Indexing
python -m graphrag.index --root.
Using the UI
1. Install Requirements
pip install -r requirements.txt
2. Run the Application
gradio app.py
Access the UI by visiting http://127.0.0.1:7860/ in your browser.
Code Reference
Gradio UI: `app.py`
import gradio as gr
import os
import asyncio
import pandas as pd
import tiktoken
from dotenv import load_dotenv
from graphrag.query.indexer_adapters import read_indexer_entities, read_indexer_reports
from graphrag.query.structured_search.global_search.community_context import GlobalCommunityContext
from graphrag.query.structured_search.global_search.search import GlobalSearch
from graphrag.query.llm.oai.chat_openai import ChatOpenAI
from graphrag.query.llm.oai.typing import OpenaiApiType
from graphrag.query.question_gen.local_gen import LocalQuestionGen
from graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey
from graphrag.query.indexer_adapters import (
read_indexer_covariates,
read_indexer_entities,
read_indexer_relationships,
read_indexer_reports,
read_indexer_text_units,
)
from graphrag.query.input.loaders.dfs import (
store_entity_semantic_embeddings,
)
from graphrag.query.llm.oai.embedding import OpenAIEmbedding
from graphrag.query.question_gen.local_gen import LocalQuestionGen
from graphrag.query.structured_search.local_search.mixed_context import (
LocalSearchMixedContext,
)
from graphrag.query.structured_search.local_search.search import LocalSearch
from graphrag.vector_stores.lancedb import LanceDBVectorStore
load_dotenv('.env')
join = os.path.join
PRESET_MAPPING = {
"Default": {
"community_level": 2,
"response_type": "Multiple Paragraphs"
},
"Detailed": {
"community_level": 4,
"response_type": "Multi-Page Report"
},
"Quick": {
"community_level": 1,
"response_type": "Single Paragraph"
},
"Bullet": {
"community_level": 2,
"response_type": "List of 3-7 Points"
},
"Comprehensive": {
"community_level": 5,
"response_type": "Multi-Page Report"
},
"High-Level": {
"community_level": 1,
"response_type": "Single Page"
},
"Focused": {
"community_level": 3,
"response_type": "Multiple Paragraphs"
}
}
async def global_search(query, input_dir, community_level=2, temperature=0.5, response_type="Multiple Paragraphs"):
api_key = os.environ
llm_model = os.environ
api_base = os.environ
llm = ChatOpenAI(
api_key=api_key,
api_base=api_base,
model=llm_model,
api_type=OpenaiApiType.OpenAI,
max_retries=10,
)
token_encoder = tiktoken.get_encoding("cl100k_base")
COMMUNITY_REPORT_TABLE = "create_final_community_reports"
ENTITY_TABLE = "create_final_nodes"
ENTITY_EMBEDDING_TABLE = "create_final_entities"
entity_df = pd.read_parquet(join(input_dir, f"{ENTITY_TABLE}.parquet"))
report_df = pd.read_parquet(join(input_dir, f"{COMMUNITY_REPORT_TABLE}.parquet"))
entity_embedding_df = pd.read_parquet(join(input_dir, f"{ENTITY_EMBEDDING_TABLE}.parquet"))
reports = read_indexer_reports(report_df, entity_df, community_level)
entities = read_indexer_entities(entity_df, entity_embedding_df, community_level)
context_builder = GlobalCommunityContext(
community_reports=reports,
entities=entities,
token_encoder=token_encoder,
)
context_builder_params = {
"use_community_summary": False,
"shuffle_data": True,
"include_community_rank": True,
"min_community_rank": 0,
"community_rank_name": "rank",
"include_community_weight": True,
"community_weight_name": "occurrence weight",
"normalize_community_weight": True,
"max_tokens": 4000,
"context_name": "Reports",
}
map_llm_params = {
"max_tokens": 1000,
"temperature": temperature,
"response_format": {"type": "json_object"},
}
reduce_llm_params = {
"max_tokens": 2000,
"temperature": temperature,
}
search_engine = GlobalSearch(
llm=llm,
context_builder=context_builder,
token_encoder=token_encoder,
max_data_tokens=5000,
map_llm_params=map_llm_params,
reduce_llm_params=reduce_llm_params,
allow_general_knowledge=False,
json_mode=True,
context_builder_params=context_builder_params,
concurrent_coroutines=1,
response_type=response_type,
)
result = await search_engine.asearch(query)
return result.response
#... (rest of app.py code)...
LoRA Fine-Tuning: `offlinetraining.py`
import torch
import os
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import TrainingArguments
from peft import LoraConfig, get_peft_model
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer
# Model configuration
max_seq_length = 2048
model_path = r'D:\work\models\Meta-Llama-3.2-3B-Instruct'
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability() >= 8 else torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
model_path,
model_max_length=max_seq_length,
padding_side="right"
)
# Configure LoRA
lora_config = LoraConfig(
r=16, # rank
lora_alpha=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
lora_dropout=0,
bias="none",
task_type="CAUSAL_LM"
)
# Apply PEFT
model = get_peft_model(model, lora_config)
# Define prompt template for formatting
llama31_prompt = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{}<|eot_id|><|start_header_id|>user<|end_header_id|>
{}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
{}<|eot_id|>"""
def formatting_prompts_func(examples):
fields = examples["conversations"]
texts =
for convos in fields:
instruction = convos['value']
input_text = convos[1]['value']
output = convos[2]['value']
text = llama31_prompt.format(instruction, input_text, output)
texts.append(text)
return {"text": texts}
# Load and process dataset
dataset = load_dataset("json", data_files={"train": "data.jsonl"}, split="train")
dataset = dataset.map(formatting_prompts_func, batched=True)
# Configure training arguments
training_args = TrainingArguments(
output_dir="outputs",
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
num_train_epochs=3,
learning_rate=2e-4,
fp16=(torch.cuda.is_available() and not (torch.cuda.get_device_capability() >= 8)),
bf16=(torch.cuda.is_available() and torch.cuda.get_device_capability() >= 8),
logging_steps=1,
optim="adamw_torch",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
report_to="none"
)
#... (rest of offlinetraining.py code)...
Project Philosophy & Roadmap
Philosophy
VeritasGraph is founded on the principle that the most powerful AI systems should also be the most transparent, secure, and controllable. The project's philosophy is a commitment to democratizing enterprise-grade AI, providing organizations with the tools to build their own sovereign knowledge assets. This stands in contrast to a reliance on opaque, proprietary, cloud-based APIs, empowering organizations to maintain full control over their data and the reasoning processes applied to it.
Roadmap
The project is under active development. Future enhancements are planned to expand its capabilities and ecosystem integration:
- **Expanded Database Support:** Integration with a wider range of graph databases and vector stores.
- **Advanced Graph Analytics:** Incorporation of community detection and summarization techniques.
- **Agentic Framework:** Development of an agentic layer that can perform more complex, multi-step reasoning tasks.
- **Visualization UI:** A web-based user interface for visualizing the knowledge graph and exploring attribution paths.