
In this tutorial, we demonstrate how to create a powerful and intelligent question-answer system, including the strength of Tavali Search API, Croma, Google Gemini LLMS and Langchen Framework. The pipeline uses real -time web discovery, which uses relevant reaction generation via the Croma Vector Store via the cementic document caching and the gemini model. These devices are integrated through modular components of the legchain, such as Runnablelalambda, Chatprompttemplate, Coursivebuffermemory, and googlegenerativeiaimbeddings. It goes beyond a simple quantity by introducing a hybrid recovery mechanism that examines for cache embeding before applying fresh web discoveries. The recovered documents are wisely formed, briefly, and passed through a structured LLM prompt, in which the source atribution, user history and confidence are noticed. Major functions such as advanced prompt engineering, emotion and unit analysis and dynamic vector store updates make this pipeline suitable for matters of research aid, domain-specific summary and advanced uses such as intelligent agents.
!pip install -qU langchain-community tavily-python langchain-google-genai streamlit matplotlib pandas tiktoken chromadb langchain_core pydantic langchain
We install and upgrade a broad set of libraries required for the creation of an advanced AI search accessory. This includes the appliances for Tavily-Python, Chromadb, LLM Integration, Langchain, Data, Pydantic, Vizusuration (Matplib, Strealit), and tools. These components create the main foundation for the construction of a real-time, reference-individual QA system.
import os
import getpass
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import json
import time
from typing import List, Dict, Any, Optional
from datetime import datetime
We import the necessary python libraries used throughout the notebook. This includes the standard library for environmental variables, safe inputs, time tracking and data types (OS, Gatepass, Time, Typing, Daytime). Additionally, it brings core data science tools such as pandas, matplletlib, and data handling, visualization and pneump for numerical computation, as well as JSON to purses structured data.
if "TAVILY_API_KEY" not in os.environ:
os.environ["TAVILY_API_KEY"] = getpass.getpass("Enter Tavily API key: ")
if "GOOGLE_API_KEY" not in os.environ:
os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter Google API key: ")
import logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)
We start users only then and API keys for Google Gemini, which only indicate users, if they are already not set in the environment, ensure safe and repeated access to external services. It also configures a standardized logging setup using the logging module of the python, which helps in monitoring the execution flow and captures dibugs or error messages in the notebook.
from langchain_community.retrievers import TavilySearchAPIRetriever
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.memory import ConversationBufferMemory
We import major components from the Langchen ecosystem and its integration. It brings tavilysearchapiretriever for real-time web search, Croma for vector storage and googlegenativei module for chat and embeding models. Core language modules such as Chatprompttemplate, Runnablalambda, Contintbuffermemory, and Output Parser enable flexible early construction, memory handling and pipeline execution.
class SearchQueryError(Exception):
"""Exception raised for errors in the search query."""
pass
def format_docs(docs):
formatted_content = []
for i, doc in enumerate(docs):
metadata = doc.metadata
source = metadata.get('source', 'Unknown source')
title = metadata.get('title', 'Untitled')
score = metadata.get('score', 0)
formatted_content.append(
f"Document i+1 [Score: score:.2f]:n"
f"Title: titlen"
f"Source: sourcen"
f"Content: doc.page_contentn"
)
return "nn".join(formatted_content)
We define two essential components for search and document handling. Searchqueryerror class creates a custom exception to manage invalid or unsuccessful search query. Format_docS function processes a list of recovered documents by removing title, source and relevance scores such as metadata and forms them into a clean, readable string.
class SearchResultsParser:
def parse(self, text):
try:
if isinstance(text, str):
import re
import json
json_match = re.search(r'.*', text, re.DOTALL)
if json_match:
json_str = json_match.group(0)
return json.loads(json_str)
return "answer": text, "sources": [], "confidence": 0.5
elif hasattr(text, 'content'):
return "answer": text.content, "sources": [], "confidence": 0.5
else:
return "answer": str(text), "sources": [], "confidence": 0.5
except Exception as e:
logger.warning(f"Failed to parse JSON: e")
return "answer": str(text), "sources": [], "confidence": 0.5
Searchresultsparser square provides a strong method for extracting structured information from LLM reactions. This model attempts to pursue a JSON -like string from the output, if a plain text returns to the reaction format when the passing fails. It consistently handles string output and message objects, which ensures frequent downstream processing. In the case of errors, it logs a warning and reacts to the system’s mistake tolerance, a decline with raw north, empty sources and a default confidence score.
class EnhancedTavilyRetriever:
def __init__(self, api_key=None, max_results=5, search_depth="advanced", include_domains=None, exclude_domains=None):
self.api_key = api_key
self.max_results = max_results
self.search_depth = search_depth
self.include_domains = include_domains or []
self.exclude_domains = exclude_domains or []
self.retriever = self._create_retriever()
self.previous_searches = []
def _create_retriever(self):
try:
return TavilySearchAPIRetriever(
api_key=self.api_key,
k=self.max_results,
search_depth=self.search_depth,
include_domains=self.include_domains,
exclude_domains=self.exclude_domains
)
except Exception as e:
logger.error(f"Failed to create Tavily retriever: e")
raise
def invoke(self, query, **kwargs):
if not query or not query.strip():
raise SearchQueryError("Empty search query")
try:
start_time = time.time()
results = self.retriever.invoke(query, **kwargs)
end_time = time.time()
search_record =
"timestamp": datetime.now().isoformat(),
"query": query,
"num_results": len(results),
"response_time": end_time - start_time
self.previous_searches.append(search_record)
return results
except Exception as e:
logger.error(f"Search failed: e")
raise SearchQueryError(f"Failed to perform search: str(e)")
def get_search_history(self):
return self.previous_searches
There is a custom cover around the enginecedtavilyretriever square tavilysearchapiretriever, which adds traceability to more flexibility, control and search operations. This discovery supports advanced features such as depth, domain inclusion/exclusion filter, and configuble results. Invoc method demonstrates web discovers and tracks each query’s metadata (Timstamp, Response Time and Result Count), storing it for later analysis.
class SearchCache:
def __init__(self):
self.embedding_function = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
self.vector_store = None
self.text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
def add_documents(self, documents):
if not documents:
return
try:
if self.vector_store is None:
self.vector_store = Chroma.from_documents(
documents=documents,
embedding=self.embedding_function
)
else:
self.vector_store.add_documents(documents)
except Exception as e:
logger.error(f"Failed to add documents to cache: e")
def search(self, query, k=3):
if self.vector_store is None:
return []
try:
return self.vector_store.similarity_search(query, k=k)
except Exception as e:
logger.error(f"Vector search failed: e")
return []
The searchcache class applies a semantic caching layer that stores and recover documents using vector embeding for efficient equality search. It uses googlegenativeiambeddings to convert documents into dense vectors and store them in the croma vector database. The add_documented method starts or update the vector store, while the search method enables rapid recovery of the most relevant cashed documents based on cementic equality. It reduces fruitless API calls and improves the response time for frequent or related questions, serving the AI auxiliary pipeline as a light hybrid memory layer.
search_cache = SearchCache()
enhanced_retriever = EnhancedTavilyRetriever(max_results=5)
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
system_template = """You are a research assistant that provides accurate answers based on the search results provided.
Follow these guidelines:
1. Only use the context provided to answer the question
2. If the context doesn't contain the answer, say "I don't have sufficient information to answer this question."
3. Cite your sources by referencing the document numbers
4. Don't make up information
5. Keep the answer concise but complete
Context: context
Chat History: chat_history
"""
system_message = SystemMessagePromptTemplate.from_template(system_template)
human_template = "Question: question"
human_message = HumanMessagePromptTemplate.from_template(human_template)
prompt = ChatPromptTemplate.from_messages([system_message, human_message])
We integrate the main components of AI auxiliary: a semantic searchcache, enhancement for web-based query, and a conversation to maintain chat history in turn. It also defines a structured signal, which directs the lLM to act as a research assistant, using Chatprompttemplate. The indication applies strict rules for factual accuracy, reference use, source citation and brief answer, ensuring reliable and grassroots reactions.
def get_llm(model_name="gemini-2.0-flash-lite", temperature=0.2, response_mode="json"):
try:
return ChatGoogleGenerativeAI(
model=model_name,
temperature=temperature,
convert_system_message_to_human=True,
top_p=0.95,
top_k=40,
max_output_tokens=2048
)
except Exception as e:
logger.error(f"Failed to initialize LLM: e")
raise
output_parser = SearchResultsParser()
We define the Get_LLM function, which incentives the Google Gemini language model with configured parameters such as model names, temperature and decoding settings (eg, top_p, top_k, and max token). This failed model ensures strengthening with the error for the initialization. An example of searchresultsparser is also designed to standardize and composition the raw reactions of LLM, enabling the continuous downstream processing of the north and metadata.
def plot_search_metrics(search_history):
if not search_history:
print("No search history available")
return
df = pd.DataFrame(search_history)
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.plot(range(len(df)), df['response_time'], marker="o")
plt.title('Search Response Times')
plt.xlabel('Search Index')
plt.ylabel('Time (seconds)')
plt.grid(True)
plt.subplot(1, 2, 2)
plt.bar(range(len(df)), df['num_results'])
plt.title('Number of Results per Search')
plt.xlabel('Search Index')
plt.ylabel('Number of Results')
plt.grid(True)
plt.tight_layout()
plt.show()
The plot_serch_metrics function using matplotlib imagines the trend of performance from the previous query. This discovery converts history into a dataframe and plots two sub grides: one per search shows the response time and the other displays the number of results. It helps analyze the efficiency and search quality of the system over time, helps developers to cure retrievers or recognize obstacles in real -world use.
def retrieve_with_fallback(query):
cached_results = search_cache.search(query)
if cached_results:
logger.info(f"Retrieved len(cached_results) documents from cache")
return cached_results
logger.info("No cache hit, performing web search")
search_results = enhanced_retriever.invoke(query)
search_cache.add_documents(search_results)
return search_results
def summarize_documents(documents, query):
llm = get_llm(temperature=0)
summarize_prompt = ChatPromptTemplate.from_template(
"""Create a concise summary of the following documents related to this query: query
documents
Provide a comprehensive summary that addresses the key points relevant to the query.
"""
)
chain = (
"documents": lambda docs: format_docs(docs), "query": lambda _: query
| summarize_prompt
| llm
| StrOutputParser()
)
return chain.invoke(documents)
Both these functions increase the wisdom and efficiency of the assistant. The retrieve_with_fallback function applies a hybrid retrieval mechanism: It first tries to bring the relevant documents seminally from the local Croma cash and, if unsuccessful, comes back for a real time, cashing new results for future use. Meanwhile, summarize_dochacks take advantage of a Gemini llm, which is directed by a structured signal to generate brief summary from recovering documents that ensure relevance for the querry. Together, they enable low-distraction, informative and reference-incredible reactions.
def advanced_chain(query_engine="enhanced", model="gemini-1.5-pro", include_history=True):
llm = get_llm(model_name=model)
if query_engine == "enhanced":
retriever = lambda query: retrieve_with_fallback(query)
else:
retriever = enhanced_retriever.invoke
def chain_with_history(input_dict):
query = input_dict["question"]
chat_history = memory.load_memory_variables()["chat_history"] if include_history else []
docs = retriever(query)
context = format_docs(docs)
result = prompt.invoke(
"context": context,
"question": query,
"chat_history": chat_history
)
memory.save_context("input": query, "output": result.content)
return llm.invoke(result)
return RunnableLambda(chain_with_history) | StrOutputParser()
Advanced_chain function defines a modular, end-to-end Reasoning Workflow to answer user questions using cured or real-time discovery. It integrity the specified Gemini model, selects the recovery strategy (cache followed or direct search), constructs a response pipeline that includes the chat history (if enabled), the documents are included in the context, and indicate LLM using a system-guided template. The series also logs into memory and gives a final answer, which is parsed in a clean text. This design enables flexible use with models and recovery strategies while maintaining interaction.
qa_chain = advanced_chain()
def analyze_query(query):
llm = get_llm(temperature=0)
analysis_prompt = ChatPromptTemplate.from_template(
"""Analyze the following query and provide:
1. Main topic
2. Sentiment (positive, negative, neutral)
3. Key entities mentioned
4. Query type (factual, opinion, how-to, etc.)
Query: query
Return the analysis in JSON format with the following structure:
"topic": "main topic",
"sentiment": "sentiment",
"entities": ["entity1", "entity2"],
"type": "query type"
"""
)
chain = analysis_prompt | llm | output_parser
return chain.invoke("query": query)
print("Advanced Tavily-Gemini Implementation")
print("="*50)
query = "what year was breath of the wild released and what was its reception?"
print(f"Query: query")
We integrate the last components of the intelligent assistant. Qa_chain is a assembled region pipeline prepared to process the user query using retrieval, memory and gemini-based response generations. Analyze_query function performs a mild meaning analysis on a query, using the main theme, emotion, institutions and query types of Gemini models and a structured JSON Prompt, extracts the query type. Examples about Query, Wilde’s release and reception, showing how the assistant is triggered and prepared for full-track estimate and semantic interpretation. Printed heading marks the onset of interactive execution.
try:
print("nSearching for answer...")
answer = qa_chain.invoke("question": query)
print("nAnswer:")
print(answer)
print("nAnalyzing query...")
try:
query_analysis = analyze_query(query)
print("nQuery Analysis:")
print(json.dumps(query_analysis, indent=2))
except Exception as e:
print(f"Query analysis error (non-critical): e")
except Exception as e:
print(f"Error in search: e")
history = enhanced_retriever.get_search_history()
print("nSearch History:")
for i, h in enumerate(history):
print(f"i+1. Query: h['query'] - Results: h['num_results'] - Time: h['response_time']:.2fs")
print("nAdvanced search with domain filtering:")
specialized_retriever = EnhancedTavilyRetriever(
max_results=3,
search_depth="advanced",
include_domains=["nintendo.com", "zelda.com"],
exclude_domains=["reddit.com", "twitter.com"]
)
try:
specialized_results = specialized_retriever.invoke("breath of the wild sales")
print(f"Found len(specialized_results) specialized results")
summary = summarize_documents(specialized_results, "breath of the wild sales")
print("nSummary of specialized results:")
print(summary)
except Exception as e:
print(f"Error in specialized search: e")
print("nSearch Metrics:")
plot_search_metrics(history)
We display the entire pipeline in action. It makes a discovery using qa_chain, displays the generated answer, and then analyzes the query for emotion, subjects, institutions and types. It also recurs and print the discovery history, response time and result calculation of each query. In addition, it drives a domain-filtated discovery focused on nintendo-related sites, summarizes the results, and imagines the search performance using plot_Serch_Metricics, offering a comprehensive view of the capabilities of auxiliary time.
Finally, this tutorial gives users a wide blueprint to create a highly capable, reference-inconceivable and scalable RAG system that bridges real-time web intelligence with connivance AI. Tavily search API allows users to draw fresh and relevant content directly from the web. Gemini LLM combines strong arguments and summary capabilities, while the abstract layer of the language allows spontaneous orchestration between memory, embeding and model outputs. Implementation includes advanced features such as domain-specific filtering, query analysis (emotion, subject and unit extraction), and Croma and Croma and Googlegenerativieimbeddings using a semantic vector cash using a semantic vector cache. In addition, structured logging, error handling, and analytics dashboards provide transparency and diagnosis for real -world deployment.
View Colab Notebook, All credit for this research goes to the researchers of this project. Also, feel free to follow us Twitter And don’t forget to join us 90K+ ML Subredit,
Asif razzaq is CEO of Marktechpost Media Inc .. As a visionary entrepreneur and engineer, ASIF is committed to using the ability of artificial intelligence for social good. His most recent effort is the launch of an Artificial Intelligence Media Platform, Marktekpost, which stands for his intensive coverage of machine learning and deep learning news, technically sound and easily understand by a comprehensive audience. The stage claims more than 2 million monthly ideas, reflecting its popularity among the audience.
🚨 Build Jenai you can trust. ⭐ Parlant controlled, obedient and purposeful AI is your open-source engine for conversations-Star Parliament on Github! (Promoted)