Introduction to VeritasGraph

Q: How is GraphRAG different from standard vector RAG?

Standard vector RAG retrieves chunks by cosine similarity, which works for single-hop questions but fails when the answer requires connecting facts across documents. GraphRAG retrieves a connected subgraph through typed-edge traversal, optionally fused with tree (TOC) navigation and vector similarity. On multi-hop benchmarks like MuSiQue and HotpotQA, GraphRAG typically outperforms vector RAG by +18 to +34 absolute exact-match points.

Q: Is there an open-source knowledge graph framework on GitHub I can use today?

Yes. VeritasGraph is an MIT-licensed open-source knowledge graph and GraphRAG framework available at https://github.com/bibinprathap/VeritasGraph. It supports automated entity and relation extraction, hybrid graph + tree + vector retrieval, RDF / linked-data export, and verifiable attribution. Install with `pip install veritasgraph` and run `veritasgraph demo --mode=lite` to launch a working demo in two minutes.

Q: What are concrete knowledge graph examples I can copy?

Five common production patterns: (1) Movie or product recommendation graph, (2) Corporate ontology graph spanning HR, Finance, and Procurement, (3) Scientific literature citation graph, (4) Compliance / policy graph mapping regulations to controls to evidence, and (5) E-commerce product knowledge graph with attribute and compatibility relationships. Runnable examples for each are published in the VeritasGraph GitHub repository.

VeritasGraph is a production-ready, end-to-end framework for building advanced question-answering and summarization systems that operate entirely within your private infrastructure. It is architected to overcome the fundamental limitations of traditional vector-search-based Retrieval-Augmented Generation (RAG) by leveraging a knowledge graph to perform complex, multi-hop reasoning.

View our open-source Knowledge Graph on GitHub → Read the GraphRAG documentation

Baseline RAG systems excel at finding direct answers but falter when faced with questions that require connecting disparate information. VeritasGraph addresses this challenge directly, providing not just answers, but transparent, auditable reasoning paths with full source attribution for every generated claim, establishing a new standard for trust and reliability in enterprise AI.

The Architectural Blueprint

The VeritasGraph pipeline is a four-stage process that systematically transforms a corpus of raw, unstructured documents into a structured knowledge asset capable of sophisticated, attributable reasoning.

Stage 1: Knowledge Graph Construction

This initial stage transforms raw, unstructured documents into a structured, interconnected knowledge graph. The goal is to create a rich data foundation that enables complex reasoning, moving beyond simple keyword or vector search.

Input: Unstructured Documents (.txt,.pdf)

▼

Process: Document Chunking & LLM Extraction

▼

Output: Assembled Knowledge Graph (Nodes & Edges)

Graph Composition

Stage 2: Hybrid Retrieval

Instead of relying on a single method, this stage uses a hybrid approach to find the most relevant information. It combines the broad reach of semantic search with the precision of graph traversal to uncover connections that would otherwise be missed.

Input: User Query

▼

Process: Vector Search + Graph Traversal

▼

Output: Pruned & Ranked Contextual Facts

Retrieval Effectiveness

Stage 3: LoRA-Tuned Reasoning

Once the context is retrieved, a Large Language Model (LLM) synthesizes the final answer. The LLM is fine-tuned using Low-Rank Adaptation (LoRA), making it highly efficient and specialized for generating attributed, factual responses.

Model Enhancement via LoRA

Stage 4: Attribution & Provenance

The final and most critical stage ensures trust and transparency. Every claim in the generated answer is linked back to its source documents and the reasoning path taken through the graph, providing a verifiable proof.

Components of a Final Response

Getting Started

Environment Setup

This guide uses Ollama with `llama3.1` for generation and `nomic-text-embed` for embeddings. It is recommended to run on Windows without WSL if using LM Studio for embeddings to avoid connection issues.

Important: Fix Model Context Length

Ollama's default context length is 2048, which can truncate I/O during indexing. This guide uses a 12k context window. Note that changing the model in `settings.yaml` will restart the entire indexing process.

1. Pull Required Models


# Terminal 1
ollama serve

# Terminal 2
ollama pull llama3.1
ollama pull nomic-embed-text

2. Build Model with Custom Context Length


ollama create llama3.1-12k -f./Modelfile

GraphRAG Indexing Steps

1. Activate Conda Environment


conda create -n rag python=<3.12
conda activate rag

2. Install GraphRAG


# Clone the project and navigate to the config directory
cd graphrag-ollama-config

# Navigate to the local graphrag fix and install
cd graphrag-ollama
pip install -e./

3. Initialize and Configure


# Install dependencies
pip install sympy future ollama

# Initialize graphrag folder (can be skipped if using this repo's setup)
python -m graphrag.index --init --root.

# Create your.env file
cp.env.example.env

Move your input text files to the `./input/` directory and double-check parameters in `.env` and `settings.yaml`.

4. Start Indexing


python -m graphrag.index --root.

Using the UI

1. Install Requirements


pip install -r requirements.txt

2. Run the Application


gradio app.py

Access the UI by visiting http://127.0.0.1:7860/ in your browser.

Code Reference

Gradio UI: `app.py`


import gradio as gr
import os
import asyncio
import pandas as pd
import tiktoken
from dotenv import load_dotenv

from graphrag.query.indexer_adapters import read_indexer_entities, read_indexer_reports
from graphrag.query.structured_search.global_search.community_context import GlobalCommunityContext
from graphrag.query.structured_search.global_search.search import GlobalSearch
from graphrag.query.llm.oai.chat_openai import ChatOpenAI
from graphrag.query.llm.oai.typing import OpenaiApiType
from graphrag.query.question_gen.local_gen import LocalQuestionGen
from graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey
from graphrag.query.indexer_adapters import (
    read_indexer_covariates,
    read_indexer_entities,
    read_indexer_relationships,
    read_indexer_reports,
    read_indexer_text_units,
)
from graphrag.query.input.loaders.dfs import (
    store_entity_semantic_embeddings,
)
from graphrag.query.llm.oai.embedding import OpenAIEmbedding
from graphrag.query.question_gen.local_gen import LocalQuestionGen
from graphrag.query.structured_search.local_search.mixed_context import (
    LocalSearchMixedContext,
)
from graphrag.query.structured_search.local_search.search import LocalSearch
from graphrag.vector_stores.lancedb import LanceDBVectorStore

load_dotenv('.env')
join = os.path.join

PRESET_MAPPING = {
    "Default": {
        "community_level": 2,
        "response_type": "Multiple Paragraphs"
    },
    "Detailed": {
        "community_level": 4,
        "response_type": "Multi-Page Report"
    },
    "Quick": {
        "community_level": 1,
        "response_type": "Single Paragraph"
    },
    "Bullet": {
        "community_level": 2,
        "response_type": "List of 3-7 Points"
    },
    "Comprehensive": {
        "community_level": 5,
        "response_type": "Multi-Page Report"
    },
    "High-Level": {
        "community_level": 1,
        "response_type": "Single Page"
    },
    "Focused": {
        "community_level": 3,
        "response_type": "Multiple Paragraphs"
    }
}

async def global_search(query, input_dir, community_level=2, temperature=0.5, response_type="Multiple Paragraphs"):
        api_key = os.environ
        llm_model = os.environ
        api_base = os.environ

        llm = ChatOpenAI(
            api_key=api_key,
            api_base=api_base,
            model=llm_model,
            api_type=OpenaiApiType.OpenAI,  
            max_retries=10,
        )

        token_encoder = tiktoken.get_encoding("cl100k_base")

        COMMUNITY_REPORT_TABLE = "create_final_community_reports"
        ENTITY_TABLE = "create_final_nodes"
        ENTITY_EMBEDDING_TABLE = "create_final_entities"
        
        entity_df = pd.read_parquet(join(input_dir, f"{ENTITY_TABLE}.parquet"))
        report_df = pd.read_parquet(join(input_dir, f"{COMMUNITY_REPORT_TABLE}.parquet"))
        entity_embedding_df = pd.read_parquet(join(input_dir, f"{ENTITY_EMBEDDING_TABLE}.parquet"))

        reports = read_indexer_reports(report_df, entity_df, community_level)
        entities = read_indexer_entities(entity_df, entity_embedding_df, community_level)

        context_builder = GlobalCommunityContext(
            community_reports=reports,
            entities=entities,
            token_encoder=token_encoder,
        )

        context_builder_params = {
            "use_community_summary": False,
            "shuffle_data": True,
            "include_community_rank": True,
            "min_community_rank": 0,
            "community_rank_name": "rank",
            "include_community_weight": True,
            "community_weight_name": "occurrence weight",
            "normalize_community_weight": True,
            "max_tokens": 4000,
            "context_name": "Reports",
        }

        map_llm_params = {
            "max_tokens": 1000,
            "temperature": temperature,
            "response_format": {"type": "json_object"},
        }

        reduce_llm_params = {
            "max_tokens": 2000,
            "temperature": temperature,
        }

        search_engine = GlobalSearch(
            llm=llm,
            context_builder=context_builder,
            token_encoder=token_encoder,
            max_data_tokens=5000,
            map_llm_params=map_llm_params,
            reduce_llm_params=reduce_llm_params,
            allow_general_knowledge=False,
            json_mode=True,
            context_builder_params=context_builder_params,
            concurrent_coroutines=1,
            response_type=response_type,
        )

        result = await search_engine.asearch(query)
        return result.response

#... (rest of app.py code)...

LoRA Fine-Tuning: `offlinetraining.py`


import torch
import os
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import TrainingArguments
from peft import LoraConfig, get_peft_model
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer

# Model configuration
max_seq_length = 2048
model_path = r'D:\work\models\Meta-Llama-3.2-3B-Instruct'

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability() >= 8 else torch.float16,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    model_max_length=max_seq_length,
    padding_side="right"
)

# Configure LoRA
lora_config = LoraConfig(
    r=16,  # rank
    lora_alpha=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_dropout=0,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply PEFT
model = get_peft_model(model, lora_config)

# Define prompt template for formatting
llama31_prompt = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{}<|eot_id|><|start_header_id|>user<|end_header_id|>

{}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{}<|eot_id|>"""

def formatting_prompts_func(examples):
    fields = examples["conversations"]
    texts =
    for convos in fields:
        instruction = convos['value']
        input_text = convos[1]['value']
        output = convos[2]['value']
        text = llama31_prompt.format(instruction, input_text, output)
        texts.append(text)
    return {"text": texts}

# Load and process dataset
dataset = load_dataset("json", data_files={"train": "data.jsonl"}, split="train")
dataset = dataset.map(formatting_prompts_func, batched=True)

# Configure training arguments
training_args = TrainingArguments(
    output_dir="outputs",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=5,
    num_train_epochs=3,
    learning_rate=2e-4,
    fp16=(torch.cuda.is_available() and not (torch.cuda.get_device_capability() >= 8)),
    bf16=(torch.cuda.is_available() and torch.cuda.get_device_capability() >= 8),
    logging_steps=1,
    optim="adamw_torch",
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=3407,
    report_to="none"
)

#... (rest of offlinetraining.py code)...

Project Philosophy & Roadmap

Philosophy

VeritasGraph is founded on the principle that the most powerful AI systems should also be the most transparent, secure, and controllable. The project's philosophy is a commitment to democratizing enterprise-grade AI, providing organizations with the tools to build their own sovereign knowledge assets. This stands in contrast to a reliance on opaque, proprietary, cloud-based APIs, empowering organizations to maintain full control over their data and the reasoning processes applied to it.

Roadmap

The project is under active development. Future enhancements are planned to expand its capabilities and ecosystem integration:

**Expanded Database Support:** Integration with a wider range of graph databases and vector stores.
**Advanced Graph Analytics:** Incorporation of community detection and summarization techniques.
**Agentic Framework:** Development of an agentic layer that can perform more complex, multi-step reasoning tasks.
**Visualization UI:** A web-based user interface for visualizing the knowledge graph and exploring attribution paths.

Knowledge Graph for RAG — FAQ

Direct answers to the questions developers most often ask about knowledge graphs, GraphRAG, and using VeritasGraph in production.

What is a knowledge graph for RAG?

A knowledge graph for RAG (Retrieval-Augmented Generation) is a structured database of typed entities (nodes) and relationships (edges) that supplies an LLM with factual, relational context at query time. Compared to a pure vector database, a knowledge graph reduces hallucination, enables multi-hop reasoning, and produces verifiable citations because every generated claim can be traced back to a specific node, edge, and source span.

How is GraphRAG different from standard vector RAG?

Vector RAG retrieves chunks by cosine similarity — great for single-hop FAQ, brittle for connected facts. GraphRAG retrieves a connected subgraph through typed-edge traversal, optionally fused with tree (TOC) navigation and vector similarity. On multi-hop benchmarks like MuSiQue and HotpotQA, GraphRAG typically outperforms vector RAG by +18 to +34 absolute exact-match points.

Is there an open-source knowledge graph on GitHub I can use today?

Yes — VeritasGraph on GitHub is MIT-licensed, supports automated entity and relation extraction, hybrid graph + tree + vector retrieval, RDF / linked-data export, and verifiable attribution. Install with pip install veritasgraph and launch a working demo in two minutes via veritasgraph demo --mode=lite.

Can a knowledge graph for an LLM run fully on-premise?

Yes. VeritasGraph supports a 100% local deployment using Ollama or vLLM for inference and a native graph store (or Neo4j) for the graph backend. No data, embeddings, or prompts ever leave your infrastructure — the recommended deployment for HIPAA, SOC 2, and EU AI Act workloads.

Do I need Neo4j to build a knowledge graph for RAG?

No. VeritasGraph ships with a native graph backend that requires no external database. Neo4j, Memgraph, and TerminusDB are optional plugins for teams that already standardize on those graph databases.

What are concrete knowledge graph examples I can copy?

Five common production patterns: a movie or product recommendation graph, a corporate ontology graph spanning HR / Finance / Procurement, a scientific literature citation graph, a compliance / policy graph mapping regulations to controls to evidence, and an e-commerce product knowledge graph with attribute and compatibility relationships. Runnable examples for each are published in the VeritasGraph GitHub repository.

How does Veritasgraph compare to Google's Knowledge Graph?

Google's Knowledge Graph is a closed product that returns entity facts about public web concepts. VeritasGraph is the opposite: an open-source framework for building your own knowledge graph from your own private documents, fully under your control, queryable by your own LLMs.