Agents
As I mentioned earlier in our discussion about LangChain tools, agents are essentially LLMs (large language models), but they are designed and prompted to work with relevant tools and make decisions based on those tools. To explain agents in a clearer way, let me give you a scenario.
Imagine you’re at home and you notice that the handle of your door is broken. You have multiple options to fix it. If it's a small issue, you might grab a basic screwdriver and fix it yourself. But if it's a bigger problem, you might decide to use your phone to call someone to help. Or if it’s just a scratch, you might even choose to ignore it. While this is a simple example, it reflects how agents work: when given a problem, they choose the most appropriate tool based on the scenario.
In the same way, agents are provided with a set of tools. When a user asks an agent a question or gives a query, the agent (which is just an LLM, after all) evaluates the available tools and selects the most suitable one. As I said before, agents are built to think in this way—they decide which tools to use and how to apply them.
Let’s take an example to explain this in more detail. Imagine you give the agent a query, and it has three tools available: a Wikipedia tool, a database retriever tool, and a math function tool. When you ask a question like, "What is policy #33 in my lease?", the agent first looks at the descriptions and names of the tools to decide which one is most relevant. Since this is a policy-related question, the agent might decide to use the retriever tool first, because it's likely the best fit.
Once the agent uses the retriever tool, it fetches the relevant documents. The LLM, being intelligent in understanding text, will read the retrieved content. If the documents retrieved by the retriever don’t contain information about policy #33, the LLM will recognize that this information is not relevant and won’t use it as the final answer. This process of thinking through the tool's output is often called the thought process.
If the LLM doesn’t find anything useful from the retriever tool, it will decide to move on to another tool. It might then try the Wikipedia tool, but again, if it doesn’t find anything about policy #33, it will give up and inform the user that it couldn’t find the answer. This entire process is part of the decision-making loop where the agent goes through different tools, trying to find the right answer based on the descriptions and outputs.
Now, this decision-making process—where the agent chooses a tool, observes its output and decides if the answer is relevant—is what makes agents so powerful. While the LLM understands language and processes text, it is the agent that decides which tool to use and when to use it. The LLM performs its thought process by reflecting on the results and deciding if they answer the user’s query. If the first attempt with a tool doesn’t work, the LLM will try the next one until it either finds an answer or exhausts all available tools.
Why Do Agents Need Tools?
You might wonder, why we need to convert retrievers into tools? After all, both retrievers and tools serve the same purpose of retrieving information. The reason is simple: agents only work with tools. An agent cannot directly use a retriever or any other function unless it is wrapped as a tool. The agent is designed to think and decide which tool to use, so it needs all resources, like retrievers or even Python functions, to be converted into tools.
For example, if you have an SQL retriever, you would need to convert it into a tool so that the agent can access and use it. This same logic applies to creating custom tools. You can convert any function, retriever, or resource into a tool for the agent to use.
Naming and Descriptions Matter
When designing these tools, naming and descriptions are critical. The LLM uses these descriptions to understand the purpose of each tool. If the tool descriptions are clear, the agent will likely pick the correct tool on the first attempt. Otherwise, the agent might loop through multiple tools before finding the right one. This is why providing clear names and descriptions for tools is essential for the agent to work efficiently.
The Thought Process and Execution
From an external view, it might seem like the LLM is thinking or performing some sort of advanced logic. However, internally, it’s quite simple: the LLM follows a set of steps that you’ve prompted it to follow. It examines the outputs from tools, checks if they match the user query, and decides if further steps are needed. While the LLM handles the decision-making, the actual execution of tools (like fetching information from Wikipedia or retrieving documents) is handled by the agent’s execution system in the background.
In summary, the LLM decides, understands, and processes, while the tools themselves do the work of retrieving or calculating. This back-and-forth interaction between the LLM and the tools is what makes agents so powerful.
Example of Using Tools with an Agent
Now, I’ll show you a simple code example where I create multiple tools and ask the LLM a set of questions. This will help demonstrate how the LLM uses the tools to think through and answer the queries.
from langchain_community.tools.wikipedia.tool import WikipediaQueryRun from langchain_community.utilities import WikipediaAPIWrapper from langchain.tools.retriever import create_retriever_tool from langchain_community.tools.yahoo_finance_news import YahooFinanceNewsTool from langchain.prompts import ChatPromptTemplate from langchain_community.chat_models import ChatOpenAI from langchain_core.messages import HumanMessage from langchain.agents import AgentExecutor import os from dotenv import load_dotenv load_dotenv() from langchain_community.vectorstores import Chroma from langchain_openai import OpenAIEmbeddings from langchain.embeddings import HuggingFaceEmbeddings #Wikki ap=WikipediaAPIWrapper(top_k_results=2,doc_content_chars_max=300) Wikipedia_tool=WikipediaQueryRun(api_wrapper=ap) #yahoo yahoo_news_tool = YahooFinanceNewsTool() #Vector DB persistent_directory = os.path.join('vh') huggingface_embeddings = HuggingFaceEmbeddings( model_name="sentence-transformers/all-mpnet-base-v2" ) db = Chroma(persist_directory=persistent_directory, embedding_function=huggingface_embeddings) retriever = db.as_retriever( search_type="similarity", search_kwargs={"k": 1}, ) retriever_tool=create_retriever_tool(retriever,description="It is a retriever_tool that explains all the policies of the lease",name="Lease_information_Retriever") #prompt template messages = [ ("system", """You are a helpful assistant. I will provide you with a set of tools to operate on. Your task is to help the user solve their query by utilizing these tools effectively. When solving the query, you can choose one tool at a time, analyze the information retrieved by this tool, and determine if it is suitable to answer the user's query. If the answer is not satisfactory, you may go back and select another tool to try again. Continue this process until you find a suitable answer. You can repeat this loop as many times as necessary, but ensure that you deliver an efficient and accurate answer at the end. The goal is to solve the user's query with the best possible solution, no matter how many iterations are required. USER QUERY: {query} {agent_scratchpad} # Placeholder for the agent's intermediate thought process """), ] tools=[Wikipedia_tool,yahoo_news_tool,retriever_tool] prompt_template = ChatPromptTemplate.from_messages(messages) llm=ChatOpenAI(model="gpt-3.5-turbo") from langchain.agents import create_openai_tools_agent agent=create_openai_tools_agent(llm,tools,prompt_template) agent_executor=AgentExecutor(agent=agent,tools=tools,verbose=True) agent_executor.invoke({"query":"Tesla latest news"})
In this code, we're setting up an agent with a combination of tools, specifically the Wikipedia tool, Yahoo Finance News tool, and a custom retriever tool for lease document information. The core functionality of the agent is driven by the prompt we pass, which directs the agent to use these tools in an efficient way. The agent is instructed to try different tools to find the best possible answer to the user's query, and it continues to loop through the tools until it finds a suitable solution.
Here’s a breakdown of how the code works:
Wikipedia Tool: This tool pulls information from Wikipedia. It’s useful for fetching general knowledge.
Yahoo Finance News Tool: This tool retrieves the latest news about specific companies. It’s important to pass the company’s name as the query (e.g., "Tesla") rather than a descriptive sentence.
Custom Retriever Tool: This tool is set up to retrieve specific lease policy information from a vector database created using Chroma, which is embedded using the HuggingFace embeddings model. This is essentially fetching relevant content based on the user’s query.
Prompt: The prompt defines how the agent should operate. The agent is told to iterate through the tools, analyze the results, and determine whether the output is relevant to the user query. The prompt also includes a placeholder
{agent_scratchpad}
for the agent’s intermediate reasoning steps, allowing it to record its thought process as it uses the tools.Agent Creation: The
create_openai_tools_agent
function takes the LLM, tools, and prompt template to create the agent. This agent is then executed by theAgentExecutor
, which ensures that each tool is used properly during the execution process.Execution: The
agent_executor.invoke({"query":"Tesla latest news"})
runs the agent with the user’s query about Tesla, and the agent uses the tools to find the most relevant answer.
The agent continues to fetch data from the tools and analyze it until it finds an appropriate response. Each tool plays a role, and the agent decides which tool to use based on the input query and the results fetched. The AgentExecutor
ensures smooth execution of this entire process, supervising the agent’s interaction with the tools.
There are different ways to create agents in LangChain, and a wide variety of agents are available to cater to different use cases. For example, I am currently using the ChatOpenAI Agent, which is primarily designed to interact with models like GPT (ChatGPT). This is just one use case, and LangChain offers various other agents to handle specific tasks.
For instance, you can also use the ReAct Agent (created using create_react_agent
) when you need the LLM (language model) to reason and process analogies while answering questions. This agent helps the LLM think through multiple steps before arriving at an answer.
Similarly, there are agents like the SQL Agent (created using create_sql_agent
), which is useful when converting natural language queries into SQL queries to interact with databases.
There are several other agents like these, each suited for different purposes. Below, I will list a few agent creation functions, and you can check which agent best suits your requirements. In the upcoming examples, I will demonstrate the usage of one or two of these agents.
List of Agents and Their Use Cases:
ChatOpenAI Agent (
create_openai_tools_agent
)- Use for interacting with OpenAI models like ChatGPT and solving queries using predefined tools.
ReAct Agent (
create_react_agent
)- Combines reasoning and actions, ideal for multi-step problem-solving with tool-based interaction.
SQL Agent (
create_sql_agent
)- Converts natural language queries into SQL queries to interact with structured databases.
Python Agent (
create_python_agent
)- Executes Python code for calculations, data manipulation, or scripting tasks directly from the LLM.
Zero-Shot ReAct Agent (
zero_shot_react_agent
)- Takes decisions without needing prior context, useful for simple, one-shot reasoning tasks with tools.
Self-Ask-with-Search Agent (
create_self_ask_with_search_agent
)- Asks follow-up questions and searches for additional data to clarify answers before responding.
VectorStore Agent (
create_vectorstore_agent
)- Retrieves information from a vector database using similarity searches on embeddings.
Multi-Action Agent (
create_multi_action_agent
)- Executes multiple actions in parallel, useful when tasks require outputs from several tools or steps.
Custom Agent (
create_custom_agent
)- Allows you to define a fully customized agent with specific tools, prompts, and behaviors for unique tasks.
These agents are designed for different purposes depending on the complexity and type of problem you're solving. In the upcoming examples, I will demonstrate how to use one or two of these agents effectively.
We’ll explore the different ways of creating agents using LangChain and why prompts are key in guiding an agent’s actions. At first glance, it might seem like the prompt is the main factor behind how an agent operates, but there’s more to it than that. While the prompt does play a central role in helping the agent decide which tools to use and how to process information, each type of agent in LangChain is internally coded to work in a unique way.
For example, if you use a create_react_agent
, it’s designed to simulate reasoning steps, while a create_custom_agent
follows a more straightforward approach. The difference lies in how each agent is structured internally, which is what makes them better suited for specific tasks. The prompt will guide the agent’s behavior, but the internal structure of the agent helps it excel at certain functions.
I used to wonder: if the prompt is the heart of an agent’s decision-making, why do we need different agent types like ReAct agents, SQL agents, or custom agents? After digging deeper, I realized that it’s not just about how the prompt works but also how the agent is built internally. Each create agent function is optimized to handle different workflows. For example, the ReAct agent uses internal logic that’s focused on reasoning through multiple steps, making it different from simpler agents.
To better understand this, I tried crisscrossing prompts—using prompts made for one agent with another type of agent. Even though the agent ran the tasks, the results were different, proving that while the prompt is important, the agent’s structure plays a huge role too.
Apart from using custom prompts, LangChain also provides predefined prompts that are already designed to guide agents effectively. These are optimized for specific tasks, making the agent’s responses more reliable. In one of the earlier examples, I used a custom prompt, but in upcoming examples, I’ll show you how to use predefined prompts and how they change the agent’s behavior.
While the prompt is crucial for how the agent thinks, the internal setup of the agent complements that thinking. For example, the ReAct agent is built to reason through multiple actions, while the SQL agent is designed to handle natural language queries for databases. This makes each agent uniquely suited for its task, even though they all rely on prompts to function.
Tried Crisscrossing using lama-index in create_openai_tools_agent:
from langchain_community.tools.wikipedia.tool import WikipediaQueryRun from langchain_community.utilities import WikipediaAPIWrapper from langchain.tools.retriever import create_retriever_tool from langchain_community.tools.yahoo_finance_news import YahooFinanceNewsTool from langchain.prompts import ChatPromptTemplate from langchain_community.chat_models import ChatOpenAI from langchain.agents import AgentExecutor from langchain_groq import ChatGroq import os from dotenv import load_dotenv load_dotenv() from langchain_community.vectorstores import Chroma from langchain_openai import OpenAIEmbeddings from langchain.embeddings import HuggingFaceEmbeddings #Wikki ap=WikipediaAPIWrapper(top_k_results=2,doc_content_chars_max=300) Wikipedia_tool=WikipediaQueryRun(api_wrapper=ap) #yahoo yahoo_news_tool = YahooFinanceNewsTool() #Vector DB persistent_directory = os.path.join('vh') huggingface_embeddings = HuggingFaceEmbeddings( model_name="sentence-transformers/all-mpnet-base-v2" ) db = Chroma(persist_directory=persistent_directory, embedding_function=huggingface_embeddings) retriever = db.as_retriever( search_type="similarity", search_kwargs={"k": 1}, ) retriever_tool=create_retriever_tool(retriever,description="It is a retriever_tool that explains all the policies of the lease",name="Lease_information_Retriever") #prompt template messages = [ ("system", """You are a helpful assistant. I will provide you with a set of tools to operate on. Your task is to help the user solve their query by utilizing these tools effectively. When solving the query, you can choose one tool at a time, analyze the information retrieved by this tool, and determine if it is suitable to answer the user's query. If the answer is not satisfactory, you may go back and select another tool to try again. Continue this process until you find a suitable answer. You can repeat this loop as many times as necessary, but ensure that you deliver an efficient and accurate answer at the end. The goal is to solve the user's query with the best possible solution, no matter how many iterations are required. USER QUERY: {query} {agent_scratchpad} # Placeholder for the agent's intermediate thought process """), ] tools=[Wikipedia_tool,yahoo_news_tool,retriever_tool] prompt_template = ChatPromptTemplate.from_messages(messages) groq_api_key=os.getenv("GROQ_API_KEY") llm=ChatGroq(groq_api_key=groq_api_key,model_name="Llama3-8b-8192") from langchain.agents import create_openai_tools_agent agent=create_openai_tools_agent(llm,tools,prompt_template) agent_executor=AgentExecutor(agent=agent,tools=tools,verbose=True) agent_executor.invoke({"query":"Tesla latest news"})
As discussed earlier, I experimented with "crisscrossing" by using Llama Index instead of ChatGPT in the create_openai_tools_agent
function, and surprisingly, it still worked. This leads me to believe that there might be similarities in how these functions are built internally, possibly because LangChain structures them in a way that allows for some flexibility. However, despite this flexibility, each agent function is specifically designed for its purpose and may include optimizations or behaviors suited for certain models or workflows.
So, while it's possible that you can mix and match models and agents with some success, it’s always optimal to use the right one that fits the function. These agents were created with specific goals in mind, and using them as intended will likely produce the best results. However, you can experiment and check the results if you're curious.
Problem Noticed:
One problem I recognized while testing the agent was its tendency to loop through the same tools repeatedly, particularly when the results didn’t change. For example, when I asked the LLM about Tesla, it kept bouncing between Wikipedia and Yahoo News, repeatedly fetching the same results without any new information. The process continued until the context window was filled, leading to unnecessary repetitions.
This issue arises because there’s no clear instruction in the prompt to prevent the agent from looping over the same tools when the information retrieved doesn’t change. To tackle this, I realized it’s crucial to modify the prompt to specify that if the tool retrieves the same content twice, there’s no point in continuing. Instead, the agent should seek alternate ways to gather relevant information.
Here are a couple of approaches I considered:
Stop After Repeated Results: If the agent goes through the same tool twice and retrieves the same information, it should stop using that tool and avoid regenerating the same content. The prompt can explicitly suggest avoiding further attempts with the same tool when no new data is found.
Reformulate the Query: If after two attempts the tool doesn’t return satisfactory results, the LLM could be prompted to reformulate the query or adjust the way it interacts with the tool. By changing the input query, the agent may trigger the tool in a different way, potentially retrieving more relevant information.
Combined Solutions
In addition to the ideas I came up with, there are several other methods to handle this looping issue effectively. Here's a complete list of strategies:
Limit Tool Iterations: Restrict the agent to a maximum of two attempts per tool. If the tool returns the same results twice, it should stop using that tool and try another option.
Dynamic Query Reformulation: If repeated attempts don’t yield the desired results, the agent can dynamically modify the query and try again with the same tool, improving the chances of getting a relevant answer.
Result Comparison Logic: Introduce logic to compare the retrieved results. If consecutive attempts produce the same content, the agent should stop using that tool and move on to others.
Confidence-Based Stopping: After each tool run, the agent can evaluate its confidence in the retrieved information. If it’s consistently low, it should stop using that tool and explore other options.
Threshold-Based Attempts: Set a hard limit on the total number of attempts across all tools. After, say, three iterations, the agent should return the best result it has found so far, preventing excessive looping.
Tool Preference Hierarchy: Use a priority system where the most relevant tool is tried first. If that fails, the agent moves down the hierarchy, ensuring it uses the most appropriate tools before moving on.
Time-Based Stopping: Set a time limit for tool execution. If the agent exceeds this time limit, it stops and returns the most relevant information gathered within that period.
Fallback Answer Strategy: If the agent fails to retrieve meaningful information after trying all tools, it can fall back on a general or pre-defined response, ensuring the user still gets a meaningful answer.
By implementing these strategies, we can prevent the agent from getting stuck in loops and optimize the process of gathering and presenting relevant information to the user. These methods can be incorporated into the prompt or agent logic to ensure efficient and accurate responses.
from langchain_community.tools.wikipedia.tool import WikipediaQueryRun from langchain_community.utilities import WikipediaAPIWrapper from langchain.tools.retriever import create_retriever_tool from langchain_community.tools.yahoo_finance_news import YahooFinanceNewsTool from langchain.prompts import ChatPromptTemplate from langchain_community.chat_models import ChatOpenAI from langchain.agents import AgentExecutor from langchain_groq import ChatGroq from langchain import hub import os from dotenv import load_dotenv load_dotenv() from langchain_community.vectorstores import Chroma from langchain_openai import OpenAIEmbeddings from langchain.embeddings import HuggingFaceEmbeddings # Wikipedia Tool ap = WikipediaAPIWrapper(top_k_results=2, doc_content_chars_max=300) Wikipedia_tool = WikipediaQueryRun(api_wrapper=ap) # Yahoo Finance Tool yahoo_news_tool = YahooFinanceNewsTool() # Vector DB persistent_directory = os.path.join('vh') huggingface_embeddings = HuggingFaceEmbeddings( model_name="sentence-transformers/all-mpnet-base-v2" ) db = Chroma(persist_directory=persistent_directory, embedding_function=huggingface_embeddings) retriever = db.as_retriever( search_type="similarity", search_kwargs={"k": 1}, ) retriever_tool = create_retriever_tool(retriever, description="It is a retriever_tool that explains all the policies of the lease", name="Lease_information_Retriever") prompt = hub.pull("hwchase17/react") tools = [Wikipedia_tool, yahoo_news_tool, retriever_tool] llm=ChatOpenAI(model="gpt-3.5-turbo") from langchain.agents import create_react_agent agent = create_react_agent(llm, tools, prompt) agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True) agent_executor.invoke({"input": "Tesla latest news"})
In this section, we’re diving into how the ReAct agent functions using a predefined prompt from the LangChain Hub. We’ve seen that to use the ReAct agent, it’s as simple as modifying the import statement and utilizing the appropriate prompt from the LangChain Hub. This particular agent follows a structured reasoning process, which is why it's known for "reacting" or taking a series of steps to arrive at a final answer.
The key change here is using the ReAct prompt specifically designed for reasoning, decision-making, and answering queries in multiple steps. I’ll walk you through how this prompt is structured and how it enables the agent to think through the steps before arriving at an answer.
By examining the prompt, you'll gain insight into the core mechanism driving the agent. I will show you the prompt used for this agent so you can better understand how it's guiding the LLM to think, analyze, and answer queries step by step.
Answer the following questions as best you can. You have access to the following tools:\n\n{tools}\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [{tool_names}]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: {input}\nThought:{agent_scratchpad}
In this example, we’re using the ReAct agent with a structured prompt, which is quite different from the earlier approach where I had written a more free-form version of the process. This structured version is designed to break down the agent's decision-making into clear states—such as Thought, Action, and Observation—making it much more reliable and efficient in following through with the query steps.
Here’s a breakdown of the ReAct prompt:
- Thought: The agent reflects on what it needs to do.
- Action: The agent chooses a tool from the available ones.
- Action Input: The input given to the selected tool.
- Observation: The agent takes note of what the tool returns.
- This sequence can repeat until the agent concludes.
- Final Answer: Once the agent has gathered enough information, it gives the final answer.
This specific structure makes the agent more organized and ensures it follows a logical process before arriving at the final answer. It avoids unnecessary looping issues, which was a problem we encountered earlier when we tried to create our own agent. The structure inherently prevents the agent from getting stuck in a loop by making it check its thoughts and observations after each tool usage.
Even though this prompt works quite efficiently on its own, you can still customize it. For instance, if you want to add a control mechanism (like limiting how many times a tool can be used), you can easily do that by inserting additional rules in the prompt. This way, you can fine-tune how the agent interacts with tools to suit your specific needs.
Comments
Post a Comment