Few-Shot Prompting, RAG and Agents

 

Summary

In this post, I explore my Kaggle notebook “Few‑Shot Prompting, RAG and Agents,” which demonstrates how to build a context‑aware conversational agent by combining three advanced AI techniques: few‑shot prompting to guide the LLM with in‑prompt examples, Retrieval‑Augmented Generation (RAG) to ground outputs in externally retrieved documents, and agent orchestration via LangGraph to manage a multi‑step workflow. The notebook ingests domain texts, embeds chunks with Google’s embeddings API, indexes them in a FAISS vector store, constructs dynamic prompts using LangChain’s ChatPromptTemplate and few‑shot templates, and finally wires everything together into a runnable graph that can decide when to retrieve context and when to generate answers.


1. Few‑Shot Prompting

Few‑shot prompting involves providing a handful of example input–output pairs directly in the prompt to steer the model’s behavior on new queries. By conditioning the LLM with 3–7 demonstrations, you can dramatically improve performance on specialized tasks without fine‑tuning. In the notebook, I curate exemplar Q&A exchanges that illustrate the desired style and structure, then merge them at inference time using LangChain’s few‑shot prompt templates. This approach sits between zero‑shot prompting—where no examples are given—and traditional fine‑tuning, offering a cost‑effective path to high‑quality outputs.

2. Retrieval‑Augmented Generation (RAG)

RAG enhances generation by first retrieving relevant passages from a document corpus and then augmenting the prompt with those passages before invoking the LLM. This grounds responses in factual context and reduces hallucinations. In my notebook, I use Google’s generative embeddings to vectorize text chunks and index them in a FAISS store for efficient similarity search. At query time, the top‑k passages are concatenated into the prompt, ensuring that the model has access to the most pertinent information.RAG augments generation by first retrieving relevant passages from an external knowledge source via vector similarity search, then appending those passages to the prompt before invoking the LLM.This approach grounds the model’s output in factual context, reduces hallucinations, and allows handling documents longer than the model’s token limit Medium.

2.1. Semantic Embeddings Generation

Text chunks are embedded using Google’s generative AI embeddings (GoogleGenerativeAIEmbeddings) to capture semantic meaning in high‑dimensional vectors.
These embeddings enable efficient retrieval of semantically similar documents based on user queries.

2.2. Vector Store & Similarity Search

Embeddings are indexed in a FAISS vector store for approximate nearest‑neighbor search, providing sub‑millisecond retrieval of top‑k relevant passages.
FAISS, developed by Meta AI, is highly optimized for large‑scale similarity search and clustering of dense vectors Wikipedia.

2.3. Prompt Templating with LangChain

The notebook uses ChatPromptTemplate from LangChain to define flexible chat prompts, inserting context and user questions into a templated system message.
This abstraction makes it easy to maintain and update prompt structure across experiments.

2.4. Few‑Shot Examples via FewShotChatMessagePromptTemplate

Examples are wrapped in a FewShotChatMessagePromptTemplate, which formats and optionally selects a fixed list of demonstration exchanges to include in the prompt.
This utility helps inject structured few‑shot examples seamlessly alongside system and user messages.

2.5. Declarative Chain Composition (LCEL)

LangChain’s Expression Language (LCEL) allows composing the retrieval, prompting, and LLM invocation steps into a concise pipeline using the | operator.
This declarative style improves readability and modularity of the RAG workflow.

2.6. Graph Orchestration with LangGraph

The notebook constructs a LangGraph with nodes for input wrapping (user_input), RAG processing (rag), and response extraction (generation), then compiles it into a runnable graph.
LangGraph provides a stateful, visualizable execution plan for complex LLM pipelines.

2.7. Integration with Google Gemini LLM

Responses are generated by the ChatGoogleGenerativeAI client, which implements LangChain’s Runnable Interface and invokes Google’s Gemini series of chat models via .invoke().
This integration showcases using cutting‑edge commercial LLMs within an open‑source orchestration framework.

2.8. Grounding & Fallback Behavior

For queries outside the available context (e.g., “What is the weather in London?”), the agent gracefully admits lack of knowledge, demonstrating robust fallback logic inherent to the RAG‑augmented prompt design Medium.

3. Agent Orchestration with LangGraph

Instead of the older initialize_agent API, I leverage LangGraph to define an explicit graph of nodes representing each step: wrapping user input, running the RAG chain, and extracting the final response. LangGraph models workflows as directed graphs where each node transforms the shared state, offering clear visualization, branching, and state management. This makes it straightforward to debug, extend with additional steps (e.g., tool calls or memory), and deploy in production.

3.1. Define the Graph Structure

LangGraph represents workflows as directed graphs, where each node corresponds to a specific operation or decision point.

  • Input Processing: Handles user queries and prepares them for retrieval.
  • Retrieval: Uses a retriever (e.g., FAISS) to fetch relevant documents based on the processed query.

  • Generation: Employs a language model to generate responses using the retrieved documents as context.

  • Decision Making: Determines whether additional information is needed or if the response is sufficient.

3.2. Implement Nodes as Functions

Each node in the graph is implemented as a function that performs a specific task. For example:

  • Input Processing Node: Cleans and formats the user query.

  • Retrieval Node: Executes a similarity search using the retriever to find relevant documents.

  • Generation Node: Constructs a prompt with the retrieved documents and invokes the language model to generate a response.

  • Decision Node: Analyzes the generated response to decide if further retrieval or clarification is necessary.

3.3. Manage State Across Nodes

LangGraph maintains a state object that is passed between nodes, allowing for the sharing of information such as the user query, retrieved documents, and generated responses. This state management is crucial for maintaining context and enabling complex decision-making processes within the agent.

3.4. Compile and Execute the Graph

Once all nodes are defined and connected, the graph is compiled into an executable agent. This agent can then process user queries, traverse the defined workflow, and produce contextually relevant responses.

3.5. Visualize and Debug

LangGraph provides tools for visualizing the graph structure and debugging the workflow. This visualization aids in understanding the flow of data and decisions within the agent, making it easier to identify and resolve issues.

Integrations and Tools

  • GoogleGenerativeAIEmbeddings: Used to generate semantic embeddings for text chunks, capturing nuanced meaning in high-dimensional vectors.

  • FAISS: The Meta AI similarity‑search library, providing sub‑millisecond nearest‑neighbor retrieval over millions of vectors.

  • LangChain Prompt Templates: ChatPromptTemplate constructs flexible chat prompts, while FewShotChatMessagePromptTemplate wraps few‑shot examples into the prompt.

  • ChatGoogleGenerativeAI: The LangChain client for Google’s Gemini LLMs, invoked via the Runnable Interface (.invoke()) to produce grounded responses.

  • LangGraph: Orchestrates the end‑to‑end flow, from input parsing to retrieval to generation, as a compiled graph.

Conclusion

My “Few‑Shot Prompting, RAG and Agents” notebook on Kaggle offers an end‑to‑end blueprint for building AI agents that are both grounded in external knowledge and guided by in‑context demonstrations. By combining few‑shot examples, semantic retrieval, and graph‑based orchestration, developers can rapidly prototype robust, domain‑specialized chatbots and tools—without the overhead of fine‑tuning large models. Experiment with your own documents, prompt designs, and graph structures to unlock new possibilities in conversational AI.

You can explore the full implementation and details in my Kaggle notebook: Few-Shot Prompting, RAG, and Agents.

Comments

Popular posts from this blog

DevFest Prayagraj 2025

HauntHub - A 19th-Century Butler with 21st-Century Tech