← Back to works

Project

Canon — Agentic RAG Academic Correctness & Citation-First AI Platform

Production-grade, course-aware academic Q&A backend using LangGraph routing + confidence-gated RAG with citations, local LLM inference, and admin-controlled document governance.

2026-01-27RAGAgentic AILangGraphFastAPIPostgreSQLFAISSOllamaDockerObservabilitySecurityGitHub

Canon Banner

Problem statement

Typical “AI tutor” chatbots hallucinate, mix courses, and can’t justify answers. For university-level coursework, I needed a system that:

  • Grounds answers strictly in official course documents
  • Returns citations (document + chunk) for auditability
  • Refuses when the material is insufficient
  • Separates admin-controlled ingestion from student usage to protect academic integrity

Architecture overview

Canon is a FastAPI backend with an agentic decision layer (LangGraph) that routes queries through either:

  • Course-aware RAG (preferred for academic correctness), or
  • Direct local LLM (for general questions, when policy allows)

Retrieved chunks are filtered using course metadata and stored in a persistent FAISS index. Answers are generated by local LLM inference via Ollama, returning citations and a confidence label.

┌────────────┐
│  Student   │
└─────┬──────┘
      │ Question
      ▼
┌──────────────┐
│ FastAPI API  │
└─────┬────────┘
      ▼
┌─────────────────────────┐
│ Agent (LangGraph)       │
│  - Intent Classification│
│  - Policy Enforcement   │
└─────┬───────────┬───────┘
      │           │
      ▼           ▼
 Course RAG     Direct LLM
      │           │
      ▼           ▼
FAISS + PDFs   Ollama (Local)

What I built

  • Agentic routing (LangGraph) with policy overrides (course-referenced questions → force RAG)
  • Course-aware retrieval using metadata filters (department/course/document)
  • Admin-only ingestion for uploading official PDFs + tagging + activation control
  • Citation-first answers (document title + chunk index) for auditability
  • Confidence-gated responses (high / medium / low / none) with refusal behavior when evidence is insufficient
  • Document governance: versioning + “active document” enforcement to prevent silent changes
  • Platform hardening: structured logs, request IDs, health checks, and admin audit logs

Technical decisions & tradeoffs

  • Local inference (Ollama) vs external APIs

    • Chosen for privacy, cost control, and local-first deployment
    • Tradeoff: answer quality depends on the local model + hardware
  • FAISS for vector search

    • Fast and simple for local persistence
    • Tradeoff: scaling to multi-node needs extra coordination (future: managed vector DB)
  • Confidence-gated RAG + refusals

    • Prioritized academic integrity over “always answer”
    • Tradeoff: requires solid document coverage and sometimes follow-up questions
  • Strict admin boundaries

    • Admin endpoints protected (API key / role gating); students only access query endpoints
    • Tradeoff: more setup, but prevents untrusted ingestion changes

Observability & reliability

  • Request correlation via request IDs
  • Structured logs suitable for production debugging
  • Readiness/health checks for containerized deployment
  • Designed for failure visibility (retrieval confidence + routing trace)

Screenshots / visuals

Replace these with real screenshots from your repo (UI or API demo).

RAG with citations Admin ingestion flow

Lessons learned

  • Academic “correctness” is mostly grounding + refusal policies, not just model capability.
  • Metadata discipline (course mapping, active docs, versioning) matters as much as embeddings.
  • Observability (routing + retrieval trace + confidence) makes RAG systems much easier to debug.

How to run (local)

docker compose up --build
docker compose exec ollama ollama pull llama3

Example usage

POST /ask
{
  "question": "Explain entropy as used in this course",
  "course_code": "CS5589"
}
{
  "answer": "...",
  "source": "rag:CS5589",
  "citations": [
    { "document": "Lecture 3 – Entropy", "chunk": 14 }
  ]
}