RAGシステム
01

Project 01 · 2025 · Harsh Yadav

Enterprise
Document
Intelligence.

文書知能プラットフォーム

An end-to-end RAG pipeline that ingests PDF/DOCX/TXT documents, chunks and embeds text into ChromaDB, and generates contextual summaries and Q&A via LLMs. Reduced manual document review time by an estimated 65%.

Year2025
RoleAI/ML Engineer
TypeMulti-Agent RAG
StatusProduction-Ready
System Overview · アーキテクチャ FastAPI · Celery · LangGraph · ChromaDB · Docker
65% Manual review time reduced
Faster than sync baseline
30% Lower API failure rate
[01] Overview

An intelligent platform that reads, understands, and answers questions about your documents.

Built to solve the bottleneck of enterprise document overload. The system accepts PDF, DOCX, and TXT uploads, asynchronously processes them through Celery task workers backed by Redis, generates semantic embeddings, and stores them in ChromaDB and pgvector for sub-second retrieval.

A multi-agent LangGraph workflow orchestrates between summarization, Q&A, and grounding agents, specifically designed to reduce hallucination by applying retrieval at each reasoning step.

Python FastAPI Celery Redis LangGraph ChromaDB MongoDB Docker
Pipeline Architecture · パイプライン データフロー設計
[ Document Upload (PDF / DOCX / TXT) ] │ ▼ [ FastAPI Gateway ] ── JWT Auth ── Pydantic Validation │ ┌─────────┴──────────┐ ▼ ▼ [ Celery Worker ] [ Direct Sync ] (Async Tasks) │ ├──► [ Text Chunking & Cleaning ] │ └──► [ Embedding Generation ] │ ▼ [ ChromaDB + pgvector ] ←── Semantic Index │ ▼ [ LangGraph Multi-Agent ] ├── Summarization Agent ├── Q&A Grounding Agent └── Hallucination Reduction Layer │ ▼ [ JSON Response → Client ]