Retrieval Augmented Generation for Document Engineering

Gregory M. Kapfhammer

December 1, 2025

Understanding RAG

  • What is RAG?
    • Combining document retrieval with text generation
    • Finding relevant information to support answers
    • Building context-aware document systems
    • Enhancing responses with retrieved knowledge
  • What are this week’s highlights?
    • A “from scratch” implementation of basic RAG concepts:
      • Document ingestion and preprocessing
      • Text chunking and organization
      • Simple vector-like representations
      • Retrieval and context building
      • Response generation with context

Key insights for prosegrammers

  • RAG combines retrieval and generation to build intelligent document systems that provide meaningful, contextualized answers
  • Simple implementations using basic Python can demonstrate core RAG concepts without requiring use of complex libraries
  • Understanding RAG fundamentals helps prosegrammers design better document engineering tools and pipelines
  • You can leverage these insights to build a more full-featured system using packages SentenceTransformers and FAISS!

Course learning objectives

Learning Objectives for Document Engineering

  • CS-104-1: Explain processes such as software installation or design for a variety of technical and non-technical audiences ranging from inexperienced to expert.
  • CS-104-2: Use professional-grade integrated development environments (IDEs), command-line tools, and version control systems to compose, edit, and deploy well-structured, web-ready documents and industry-standard documentation tools.
  • CS-104-3: Build automated publishing pipelines to format, check, and ensure both the uniformity and quality of digital documents.
  • CS-104-4: Identify and apply appropriate conventions of a variety of technical communities, tools, and computer languages to produce industry-consistent diagrams, summaries, and descriptions of technical topics or processes.
  • Content aids in attainment of learning objectives CS-104-3 and CS-104-4!

Document ingestion and preprocessing

  • Document ingestion loads text data into a processing pipeline
    • Read files from the filesystem
    • Parse different text formats
    • Extract raw content for analysis
    • Foundation of all RAG systems
  • Why ingest documents?
    • Build knowledge base for retrieval
    • Prepare content for searching
    • Enable context-aware responses
    • Support question-answering systems

Reading documents from files

  • Simple document loading from string content
  • Normalize text by removing extra blank spaces
  • Key insight: RAG starts with loading documents into memory

Data cleaning and preprocessing

  • Remove extra blank spaces and normalize formatting
  • Prepare text for consistent processing
  • Cleaning ensures uniform document handling

Divide text into chunks

  • Text chunking divides documents into smaller, manageable pieces
    • Split long documents into segments
    • Create context-sized pieces for retrieval
    • Balance chunk size for completeness
    • Enable efficient searching and matching
  • Why chunk documents?
    • Large documents overwhelm processing
    • Smaller chunks match queries better
    • Control context window size
    • Improve retrieval precision
    • Trade-off efficiency and representation of document’s relevance!

Simple sentence-based chunking

  • Split documents into sentence-level chunks
  • Each chunk becomes searchable unit
  • Smaller chunks enable precise retrieval

Fixed-size word chunking

  • Control chunk size by word count
  • Useful for consistent context windows
  • Trade-off between completeness and granularity

Vector-like entities

  • Vector representations encode text as numerical values
    • Transform text into comparable format
    • Enable similarity calculations
    • Foundation for semantic search
    • Real systems use embeddings from models
  • Simple representation approach
    • Word frequency as feature vector
    • Shared vocabulary across chunks
    • Basic similarity through overlap
    • Demonstrates core concept simply
  • Better to use SentenceTransformers or a cloud-based API for an LLM!

Tools for vector embeddings

  • Popular embedding tools and packages:
    • SentenceTransformers: Pre-trained models for semantic embeddings
    • OpenAI Embeddings API: Cloud-based embedding generation
    • Hugging Face Transformers: Open-source embedding models
    • FAISS: Efficient similarity search for vectors
    • ChromaDB: Vector database for RAG systems
    • Pinecone: Managed vector database service
    • Qdrant: Vector search and storage solutions

Simple word frequency vectors

  • See text as word frequency dictionary, each word becomes a “feature dimension”

Computing similarity between chunks

  • Calculate overlap between word sets
  • Higher overlap means higher relevance
  • Basis for retrieval ranking
  • Sophistated systems use cosine similarity or other metrics!

Relevant documents

  • Retrieval finds most relevant chunks for a query
    • Compare query against all chunks
    • Rank by similarity score
    • Select top matches
    • Core of RAG systems
  • Why retrieve documents?
    • Provide relevant context for answers
    • Find supporting information
    • Build knowledge-grounded responses
    • Enable question answering
    • Offer input to a local or cloud-based LLM

Building a simple retriever

  • Score all chunks against query
  • Sort by relevance score …
  • … And return top matches! For this query, did system pick correct chunks?

Understanding relevance scores

  • Show explicit word matches
  • Explain why chunk is relevant

Combining retrieved context with queries

  • Context combination merges query with retrieved information
    • Build context from top chunks
    • Format for response generation
    • Maintain source attribution
    • Create comprehensive knowledge base
  • Why combine context?
    • Provide evidence for answers
    • Support factual responses
    • Enable source citation

Building context from retrieval

  • Format query with retrieved chunks and create structured context

Formatting context for responses

  • Add source tracking to context and include relevance scores

Context-driven data

  • Response generation creates answers using retrieved context
    • Extract relevant information
    • Synthesize coherent responses
    • Maintain factual grounding
    • In practice, uses (large) language models
  • Simple generation approach
    • Template-based responses
    • Direct information extraction
    • Demonstrates concept flow
    • Real systems use LLMs for flexibility

Tools for response generation

  • Language models for generation:
    • OpenAI GPT: Cloud-based LLM for text generation
    • Anthropic Claude: Conversational AI with long context
    • Google Gemini: Multimodal generation capabilities
    • Hugging Face Models: Open-source LLMs like Llama
    • LangChain: Framework for building RAG applications
    • LlamaIndex: Data framework for LLM applications
  • For this course: We use template-based generation to demonstrate the concept without requiring external APIs or packages. This approach generates an interesting result! Yet, not general-purpose enough to be used in a production RAG tool.

Simple template-based generation

  • Create response from top retrieved chunk
  • Simple template wraps information
  • Demonstrates basic generation concept

Better response with multiple sources

  • Synthesize information from multiple chunks and present full answer
  • Better responses use more context; can you explain why these match?

Complete RAG pipeline demonstration

  • End-to-end RAG system combines all components
    • Ingest and preprocess documents
    • Chunk text into searchable units
    • Create vector representations
    • Retrieve relevant chunks
    • Generate context-grounded responses
  • Real-world applications
    • Question answering systems
    • Document search assistants

Complete RAG system

  • Integrate all RAG components
  • Process multiple documents
  • Return context-aware response

RAG system with source attribution

Enhancing RAG

  • Improving RAG systems:
    • Better chunking strategies:
      • Semantic chunking by topic
      • Overlapping chunks for context
      • Adaptive chunk sizes
    • Enhanced retrieval methods:
      • Advanced similarity metrics
      • Hybrid search combining keywords and vectors
      • Re-ranking for better results
    • Context optimization:
      • Chunk selection strategies
      • Context window management
      • Prompt engineering for generation

Key takeaways for prosegrammers

  • Understand RAG components
    • Document ingestion prepares knowledge base
    • Chunking creates retrievable units
    • Vector representations enable similarity search
    • Retrieval finds relevant context
    • Generation produces grounded responses
  • Master retrieval concepts
    • Similarity scoring ranks relevance
    • Top-k selection balances context and precision
    • Source attribution maintains transparency
    • Retrieved context grounds generated answers
  • What are some practical ways in which you could integrate RAG into your document engineering tool? How will you extend the starting implementation presented this week? Make sure to listen to SE Radio Episode 690: Kacper Łukawski on Qdrant Vector Database!