LLM Chronicles #6.4: LLM Agents with ReAct (Reason + Act)
Science & Technology
LLM Chronicles #6.4: LLM Agents with ReAct (Reason + Act)
In this episode, we'll cover LLM agents focusing on the core research that helped to improve LLM's reasoning while allowing them to interact with the external world via the use of tools. First, we will look at how Chain of Thought prompting can improve LLM's ability to solve problems by generating a set of reasoning steps in natural language. Then, we'll see how PAL (Program-Aided Language Models) builds on these by getting the LLM to generate executable programs as intermediate reasoning steps to a solution. Another related method that builds on Chain of Thought is ReAct, short for Reason and Act, which gives LLMs access to tools for computation and interaction with the external world.
The ReAct paper provided a blueprint for building powerful agents which frameworks like LangChain and ToolProvider AI extend to enable agentic workflows. This allows them to leverage LLMs to create all sorts of autonomous agents, such as browsing and coding agents like LangChain.
Chain of Thought Prompting
In the 2022 paper titled "Chain of Thought Prompting Elicits Reasoning in Large Language Models," researchers from Google introduced the concept of Chain of Thought prompting. Chain of Thought prompts the model to think in intermediate steps to solve a problem instead of just answering the question directly. This is achieved through in-context learning, where the LLM is first shown one or a few examples of reasoning step-by-step to produce an answer before it is given a question to answer. The authors show that the LLM is able to reproduce the step-by-step reasoning pattern in its answer, resulting in better performance on many tasks.
In a follow-up paper titled "Large Language Models are Zero-Shot Reasoners," researchers from the University of Tokyo and Google found that simply adding the phrase "let's think step by step" to a prompt allows LLMs to perform Chain of Thought reasoning without needing to see examples first. This zero-shot Chain of Thought prompting showed great results, improving performance on various benchmarks.
PAL (Program-Aided Language Models)
While Chain of Thought prompting greatly improves reasoning via step-by-step decomposition, LLMs often still make logical and arithmetic mistakes in their solutions even when the problem is decomposed into the correct steps. Thus, various methods like PAL and ReAct have been explored to improve on Chain of Thought.
In the 2022 paper titled "PAL: Program-Aided Language Models," researchers from Carnegie Mellon University introduced a method to enhance LLMs' problem-solving by combining them with a code interpreter such as Python. The aim of PAL is to have the LLM generate programs as intermediate reasoning steps while the actual computation is handled by a code interpreter. PAL is typically implemented using in-context learning to guide the LLM. The model is provided with samples of natural language problems and their corresponding Python solutions, often including comments in natural language to describe each step.
ReAct (Reason + Act)
Around the same time in 2022, while researchers were exploring ways to enhance LLMs with methods like PAL, others were also experimenting with improving and leveraging Chain of Thought prompting. This led to the development of the ReAct method, which stands for Reason and Act.
In the ReAct framework described in the paper titled "ReAct: Synergizing Reasoning and Acting in Large Language Models", Google researchers introduced an approach that uses Chain of Thought-style prompting to teach LLMs to use tools and perform actions for specific tasks. While PAL focuses on using code interpreters to solve problems, ReAct offers a more general approach. The LLM is provided with a flexible set of tools it can call upon to solve problems in an iterative fashion.
Operational Loop of ReAct
The operational loop of ReAct has three stages: thought, action, and observation. Here’s how it works:
- Thought: First, the model is shown a series of tools or actions it can perform, such as a calculator or a tool to search information on the web. Then it's given examples of how to use these tools and the reasoning pattern it should follow—first producing a thought, then an action based on the thought. The action is essentially a call to a specific tool with the required parameters.
- Action: After the in-context example, the LLM is given the actual user's query to answer.
- In the thought phase, the LLM follows the pattern in the example and produces a thought (e.g., "I need to find the current exchange rate from USD to EUR").
- Based on this thought, the LLM decides to call the web search tool to find the current exchange rate.
- Observation: The output from the tool is returned to the LLM as an observation, which is then added to the LLM’s context.
This loop continues with the updated context, allowing the LLM to generate a sequence of thoughts and actions iteratively until it decides that it has gathered all the necessary information and completes the task by calling the finish action with the answer for the user.
Key Concepts in ReAct
- Reasoning Traces: ReAct works because the steps of thought, action, and observation are kept in the LLM’s context for each iteration of the loop, forming a reasoning trace. This trace allows the LLM to keep track of what it has done so far and to think about what to do next.
- Agent Executor Module: LLMs only take text as input and produce text as output, and they can’t directly call tools or functions. To enable them to use tools, an agent executor module (like LangChain’s agent executor) is used. This module parses the LLM's output and identifies when the LLM produced a call to a tool, executing the required tool with the specified parameters and feeding the output back into the LLM context as an observation.
Tools and Terminology
In the context of ReAct, a tool refers to something the LLM can use to perform actions. Tools can be categorized into three main types:
- Knowledge Access: Tools that access and retrieve information from search engines, databases, etc.
- Computation: Tools that perform calculations and interpret code.
- Interaction with External World: Tools that interact with external systems, like controlling smart devices or sending commands.
Specific Frameworks and Implementations
Unlike PAL, which is limited to generating code for a specific interpreter, ReAct allows the LLM to use various tools as needed. For example:
- Customer Assistant LLM Agents: Might be given tools to access a current list of users' orders, their open support tickets, etc.
- Browser Agents: The LLM is given access to a browser tab and can perform actions like clicking on links and inputting text.
- Coding Agents: Use a version of the ReAct loop called CodAct, in which the agent can converse with humans, execute code, and use tools to manage files.
Considerations and Limitations
These agentic workflows are powerful but come with limitations. Notably, LLMs have major limitations, including potential cybersecurity issues like jailbreaks and prompt injection attacks, which could trick LLM agents into disclosing confidential information or performing malicious actions. It’s crucial to implement security measures to mitigate these risks.
Thank you for reading! Don’t forget to like, subscribe, and leave your comments below.
Keywords
- LLM Agents
- ReAct
- Chain of Thought Prompting
- Program-Aided Language Models
- Reasoning Traces
- Agent Executor
- Tools and Plugins
- Computation Tools
- Cybersecurity
FAQ
What is Chain of Thought prompting? Chain of Thought prompting is a method that encourages LLMs to think in intermediate steps to solve a problem instead of directly answering the question.
What is the difference between PAL and ReAct? PAL focuses on using code interpreters to solve problems by generating executable programs as intermediate reasoning steps, while ReAct offers a more general approach by allowing LLMs to use a variety of tools iteratively.
How does ReAct improve LLM problem-solving? ReAct improves problem-solving by providing a reasoning trace that keeps track of thoughts, actions, and observations iteratively. This allows the LLM to use tools and compute in a more organized and effective manner.
What are the main types of tools in the ReAct framework? The main types of tools are Knowledge Access tools, Computation tools, and tools that interact with the external world.
What is the role of an agent executor module in ReAct? An agent executor module enables the LLM to use tools by parsing its output and executing the required actions, thus feeding the observations back into the LLM context.