Project
Canon — Agentic RAG Academic Correctness & Citation-First AI Platform
Production-grade, course-aware academic Q&A backend using LangGraph routing + confidence-gated RAG with citations, local LLM inference, and admin-controlled document governance.

Problem statement
Typical “AI tutor” chatbots hallucinate, mix courses, and can’t justify answers. For university-level coursework, I needed a system that:
- Grounds answers strictly in official course documents
- Returns citations (document + chunk) for auditability
- Refuses when the material is insufficient
- Separates admin-controlled ingestion from student usage to protect academic integrity
Architecture overview
Canon is a FastAPI backend with an agentic decision layer (LangGraph) that routes queries through either:
- Course-aware RAG (preferred for academic correctness), or
- Direct local LLM (for general questions, when policy allows)
Retrieved chunks are filtered using course metadata and stored in a persistent FAISS index. Answers are generated by local LLM inference via Ollama, returning citations and a confidence label.
┌────────────┐
│ Student │
└─────┬──────┘
│ Question
▼
┌──────────────┐
│ FastAPI API │
└─────┬────────┘
▼
┌─────────────────────────┐
│ Agent (LangGraph) │
│ - Intent Classification│
│ - Policy Enforcement │
└─────┬───────────┬───────┘
│ │
▼ ▼
Course RAG Direct LLM
│ │
▼ ▼
FAISS + PDFs Ollama (Local)
What I built
- Agentic routing (LangGraph) with policy overrides (course-referenced questions → force RAG)
- Course-aware retrieval using metadata filters (department/course/document)
- Admin-only ingestion for uploading official PDFs + tagging + activation control
- Citation-first answers (document title + chunk index) for auditability
- Confidence-gated responses (high / medium / low / none) with refusal behavior when evidence is insufficient
- Document governance: versioning + “active document” enforcement to prevent silent changes
- Platform hardening: structured logs, request IDs, health checks, and admin audit logs
Technical decisions & tradeoffs
-
Local inference (Ollama) vs external APIs
- Chosen for privacy, cost control, and local-first deployment
- Tradeoff: answer quality depends on the local model + hardware
-
FAISS for vector search
- Fast and simple for local persistence
- Tradeoff: scaling to multi-node needs extra coordination (future: managed vector DB)
-
Confidence-gated RAG + refusals
- Prioritized academic integrity over “always answer”
- Tradeoff: requires solid document coverage and sometimes follow-up questions
-
Strict admin boundaries
- Admin endpoints protected (API key / role gating); students only access query endpoints
- Tradeoff: more setup, but prevents untrusted ingestion changes
Observability & reliability
- Request correlation via request IDs
- Structured logs suitable for production debugging
- Readiness/health checks for containerized deployment
- Designed for failure visibility (retrieval confidence + routing trace)
Screenshots / visuals
Replace these with real screenshots from your repo (UI or API demo).

Lessons learned
- Academic “correctness” is mostly grounding + refusal policies, not just model capability.
- Metadata discipline (course mapping, active docs, versioning) matters as much as embeddings.
- Observability (routing + retrieval trace + confidence) makes RAG systems much easier to debug.
How to run (local)
docker compose up --build
docker compose exec ollama ollama pull llama3
Example usage
POST /ask
{
"question": "Explain entropy as used in this course",
"course_code": "CS5589"
}
{
"answer": "...",
"source": "rag:CS5589",
"citations": [
{ "document": "Lecture 3 – Entropy", "chunk": 14 }
]
}