Introduction to VeritasGraph

VeritasGraph is a production-ready, end-to-end framework for building advanced question-answering and summarization systems that operate entirely within your private infrastructure. It is architected to overcome the fundamental limitations of traditional vector-search-based Retrieval-Augmented Generation (RAG) by leveraging a knowledge graph to perform complex, multi-hop reasoning.

Baseline RAG systems excel at finding direct answers but falter when faced with questions that require connecting disparate information. VeritasGraph addresses this challenge directly, providing not just answers, but transparent, auditable reasoning paths with full source attribution for every generated claim, establishing a new standard for trust and reliability in enterprise AI.

The Architectural Blueprint

The VeritasGraph pipeline is a four-stage process that systematically transforms a corpus of raw, unstructured documents into a structured knowledge asset capable of sophisticated, attributable reasoning.

Stage 1: Knowledge Graph Construction

This initial stage transforms raw, unstructured documents into a structured, interconnected knowledge graph. The goal is to create a rich data foundation that enables complex reasoning, moving beyond simple keyword or vector search.

Input: Unstructured Documents (.txt,.pdf)

▼

Process: Document Chunking & LLM Extraction

▼

Output: Assembled Knowledge Graph (Nodes & Edges)

Graph Composition

Stage 2: Hybrid Retrieval

Instead of relying on a single method, this stage uses a hybrid approach to find the most relevant information. It combines the broad reach of semantic search with the precision of graph traversal to uncover connections that would otherwise be missed.

Input: User Query

▼

Process: Vector Search + Graph Traversal

▼

Output: Pruned & Ranked Contextual Facts

Retrieval Effectiveness

Stage 3: LoRA-Tuned Reasoning

Once the context is retrieved, a Large Language Model (LLM) synthesizes the final answer. The LLM is fine-tuned using Low-Rank Adaptation (LoRA), making it highly efficient and specialized for generating attributed, factual responses.

Model Enhancement via LoRA

Stage 4: Attribution & Provenance

The final and most critical stage ensures trust and transparency. Every claim in the generated answer is linked back to its source documents and the reasoning path taken through the graph, providing a verifiable proof.

Components of a Final Response

Getting Started

Environment Setup

This guide uses Ollama with `llama3.1` for generation and `nomic-text-embed` for embeddings. It is recommended to run on Windows without WSL if using LM Studio for embeddings to avoid connection issues.

Important: Fix Model Context Length

Ollama's default context length is 2048, which can truncate I/O during indexing. This guide uses a 12k context window. Note that changing the model in `settings.yaml` will restart the entire indexing process.

1. Pull Required Models


# Terminal 1
ollama serve

# Terminal 2
ollama pull llama3.1
ollama pull nomic-embed-text

2. Build Model with Custom Context Length


ollama create llama3.1-12k -f./Modelfile

GraphRAG Indexing Steps

1. Activate Conda Environment


conda create -n rag python=<3.12
conda activate rag

2. Install GraphRAG


# Clone the project and navigate to the config directory
cd graphrag-ollama-config

# Navigate to the local graphrag fix and install
cd graphrag-ollama
pip install -e./

3. Initialize and Configure


# Install dependencies
pip install sympy future ollama

# Initialize graphrag folder (can be skipped if using this repo's setup)
python -m graphrag.index --init --root.

# Create your.env file
cp.env.example.env

Move your input text files to the `./input/` directory and double-check parameters in `.env` and `settings.yaml`.

4. Start Indexing


python -m graphrag.index --root.

Using the UI

1. Install Requirements


pip install -r requirements.txt

2. Run the Application


gradio app.py

Access the UI by visiting http://127.0.0.1:7860/ in your browser.

Code Reference

Gradio UI: `app.py`


import gradio as gr
import os
import asyncio
import pandas as pd
import tiktoken
from dotenv import load_dotenv

from graphrag.query.indexer_adapters import read_indexer_entities, read_indexer_reports
from graphrag.query.structured_search.global_search.community_context import GlobalCommunityContext
from graphrag.query.structured_search.global_search.search import GlobalSearch
from graphrag.query.llm.oai.chat_openai import ChatOpenAI
from graphrag.query.llm.oai.typing import OpenaiApiType
from graphrag.query.question_gen.local_gen import LocalQuestionGen
from graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey
from graphrag.query.indexer_adapters import (
    read_indexer_covariates,
    read_indexer_entities,
    read_indexer_relationships,
    read_indexer_reports,
    read_indexer_text_units,
)
from graphrag.query.input.loaders.dfs import (
    store_entity_semantic_embeddings,
)
from graphrag.query.llm.oai.embedding import OpenAIEmbedding
from graphrag.query.question_gen.local_gen import LocalQuestionGen
from graphrag.query.structured_search.local_search.mixed_context import (
    LocalSearchMixedContext,
)
from graphrag.query.structured_search.local_search.search import LocalSearch
from graphrag.vector_stores.lancedb import LanceDBVectorStore

load_dotenv('.env')
join = os.path.join

PRESET_MAPPING = {
    "Default": {
        "community_level": 2,
        "response_type": "Multiple Paragraphs"
    },
    "Detailed": {
        "community_level": 4,
        "response_type": "Multi-Page Report"
    },
    "Quick": {
        "community_level": 1,
        "response_type": "Single Paragraph"
    },
    "Bullet": {
        "community_level": 2,
        "response_type": "List of 3-7 Points"
    },
    "Comprehensive": {
        "community_level": 5,
        "response_type": "Multi-Page Report"
    },
    "High-Level": {
        "community_level": 1,
        "response_type": "Single Page"
    },
    "Focused": {
        "community_level": 3,
        "response_type": "Multiple Paragraphs"
    }
}

async def global_search(query, input_dir, community_level=2, temperature=0.5, response_type="Multiple Paragraphs"):
        api_key = os.environ
        llm_model = os.environ
        api_base = os.environ

        llm = ChatOpenAI(
            api_key=api_key,
            api_base=api_base,
            model=llm_model,
            api_type=OpenaiApiType.OpenAI,  
            max_retries=10,
        )

        token_encoder = tiktoken.get_encoding("cl100k_base")

        COMMUNITY_REPORT_TABLE = "create_final_community_reports"
        ENTITY_TABLE = "create_final_nodes"
        ENTITY_EMBEDDING_TABLE = "create_final_entities"
        
        entity_df = pd.read_parquet(join(input_dir, f"{ENTITY_TABLE}.parquet"))
        report_df = pd.read_parquet(join(input_dir, f"{COMMUNITY_REPORT_TABLE}.parquet"))
        entity_embedding_df = pd.read_parquet(join(input_dir, f"{ENTITY_EMBEDDING_TABLE}.parquet"))

        reports = read_indexer_reports(report_df, entity_df, community_level)
        entities = read_indexer_entities(entity_df, entity_embedding_df, community_level)

        context_builder = GlobalCommunityContext(
            community_reports=reports,
            entities=entities,
            token_encoder=token_encoder,
        )

        context_builder_params = {
            "use_community_summary": False,
            "shuffle_data": True,
            "include_community_rank": True,
            "min_community_rank": 0,
            "community_rank_name": "rank",
            "include_community_weight": True,
            "community_weight_name": "occurrence weight",
            "normalize_community_weight": True,
            "max_tokens": 4000,
            "context_name": "Reports",
        }

        map_llm_params = {
            "max_tokens": 1000,
            "temperature": temperature,
            "response_format": {"type": "json_object"},
        }

        reduce_llm_params = {
            "max_tokens": 2000,
            "temperature": temperature,
        }

        search_engine = GlobalSearch(
            llm=llm,
            context_builder=context_builder,
            token_encoder=token_encoder,
            max_data_tokens=5000,
            map_llm_params=map_llm_params,
            reduce_llm_params=reduce_llm_params,
            allow_general_knowledge=False,
            json_mode=True,
            context_builder_params=context_builder_params,
            concurrent_coroutines=1,
            response_type=response_type,
        )

        result = await search_engine.asearch(query)
        return result.response

#... (rest of app.py code)...

LoRA Fine-Tuning: `offlinetraining.py`


import torch
import os
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import TrainingArguments
from peft import LoraConfig, get_peft_model
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer

# Model configuration
max_seq_length = 2048
model_path = r'D:\work\models\Meta-Llama-3.2-3B-Instruct'

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability() >= 8 else torch.float16,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    model_max_length=max_seq_length,
    padding_side="right"
)

# Configure LoRA
lora_config = LoraConfig(
    r=16,  # rank
    lora_alpha=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_dropout=0,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply PEFT
model = get_peft_model(model, lora_config)

# Define prompt template for formatting
llama31_prompt = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{}<|eot_id|><|start_header_id|>user<|end_header_id|>

{}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{}<|eot_id|>"""

def formatting_prompts_func(examples):
    fields = examples["conversations"]
    texts =
    for convos in fields:
        instruction = convos['value']
        input_text = convos[1]['value']
        output = convos[2]['value']
        text = llama31_prompt.format(instruction, input_text, output)
        texts.append(text)
    return {"text": texts}

# Load and process dataset
dataset = load_dataset("json", data_files={"train": "data.jsonl"}, split="train")
dataset = dataset.map(formatting_prompts_func, batched=True)

# Configure training arguments
training_args = TrainingArguments(
    output_dir="outputs",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=5,
    num_train_epochs=3,
    learning_rate=2e-4,
    fp16=(torch.cuda.is_available() and not (torch.cuda.get_device_capability() >= 8)),
    bf16=(torch.cuda.is_available() and torch.cuda.get_device_capability() >= 8),
    logging_steps=1,
    optim="adamw_torch",
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=3407,
    report_to="none"
)

#... (rest of offlinetraining.py code)...

Project Philosophy & Roadmap

Philosophy

VeritasGraph is founded on the principle that the most powerful AI systems should also be the most transparent, secure, and controllable. The project's philosophy is a commitment to democratizing enterprise-grade AI, providing organizations with the tools to build their own sovereign knowledge assets. This stands in contrast to a reliance on opaque, proprietary, cloud-based APIs, empowering organizations to maintain full control over their data and the reasoning processes applied to it.

Roadmap

The project is under active development. Future enhancements are planned to expand its capabilities and ecosystem integration:

**Expanded Database Support:** Integration with a wider range of graph databases and vector stores.
**Advanced Graph Analytics:** Incorporation of community detection and summarization techniques.
**Agentic Framework:** Development of an agentic layer that can perform more complex, multi-step reasoning tasks.
**Visualization UI:** A web-based user interface for visualizing the knowledge graph and exploring attribution paths.