AgentChain uses Large Language Models (LLMs) for planning and orchestrating multiple Agents or Large Models (LMs) for accomplishing sophisticated tasks. AgentChain is fully multimodal: it accepts text, image, audio, tabular data as input and output.
- 🧠 LLMs as the brain: AgentChain leverages state-of-the-art Large Language Models to provide users with the ability to plan and make decisions based on natural language inputs. This feature makes AgentChain a versatile tool for a wide range of applications, such as task execution give natural language instructions, data understanding, and data generation.
- 🌟 Fully Multimodal IO: AgentChain is fully multimodal, accepting input and output from various modalities, such as text, image, audio, or video (coming soon). This feature makes AgentChain a versatile tool for a wide range of applications, such as computer vision, speech recognition, and transitioning from one modality to another.
- 🤝 Orchestrate Versatile Agents: AgentChain can orchestrate multiple agents to perform complex tasks. Using composability and hierarchical structuring of tools AgentChain can choose intelligently which tools to use and when for a certain task. This feature makes AgentChain a powerful tool for projects that require complex combination of tools.
- 🔧 Customizable for Ad-hoc Needs: AgentChain can be customized to fit specific project requirements, making it a versatile tool for a wide range of applications. Specific requirements can be met by enhancing capabilities with new agents (and distributed architecture coming soon).
- Install requirements:
pip install -r requirements.txt
- Download model checkpoints:
bash download.sh
- Depending on the agents you need in-place, make sure to export environment variables
OPENAI_API_KEY={YOUR_OPENAI_API_KEY} # mandatory since the LLM is central in this application
SERPAPI_API_KEY={YOUR_SERPAPI_API_KEY} # make sure to include a serp API key in case you need the agent to be able to search the web
# These environment variables are needed in case you want the agent to be able to make phone calls
AWS_ACCESS_KEY_ID={YOUR_AWS_ACCESS_KEY_ID}
AWS_SECRET_ACCESS_KEY={YOUR_AWS_SECRET_ACCESS_KEY}
TWILIO_ACCOUNT_SID={YOUR_TWILIO_ACCOUNT_SID}
TWILIO_AUTH_TOKEN={YOUR_TWILIO_AUTH_TOKEN}
AWS_S3_BUCKET_NAME={YOUR_AWS_S3_BUCKET_NAME} # make sure to create an S3 bucket with public access
- Install
ffmpeg
library (needed for whisper):sudo apt update && sudo apt install ffmpeg
(Ubuntu command) - Run the main script:
python main.py
As of this commit, it is needed to have at least 29 GB of GPU memory to run the AgentChain.
However, make sure to assign GPU devices correctly in main.py
.
You can comment out some tools and models to reduce the GPU memory footprint (but for less capabilities).
AgentChain demo 1: transcribing audio and visualizing the result as an image. A video of the AgentChain interface shows an uploaded audio and the resulting generated image, which is a representation of the audio content.
Demo1.sound.mp4
AgentChain demo 2: asking questions about an image. A video of the AgentChain interface shows an image and a question being asked about it, with the resulting answer displayed below.
Demo2.sound.mp4
AgentChain demo 3: question-answering on tabular data and making a phone call to report the results. A video of the AgentChain interface shows a table of data with a question being asked and the resulting answer displayed, followed by a phone call being made using the CommsAgent
.
Demo3.sound.mp4
The content of this document mostly shows our vision and what we aim to achieve with AgentChain. Check the Demo section to understand what we achieved so far.
AgentChain is a sophisticated system with the goal of solving general problems. It can orchestrate multiple agents to accomplish sub-problems. These agents are organized into different groups, each with their unique set of capabilities and functionalities. Here are some of the agent groups in AgentChain:
The SearchAgents
group is responsible for gathering information from various sources, including search engines, online databases, and APIs. The agents in this group are highly skilled at retrieving up-to-date world knowledge information. Some examples of agents in this group include the Google Search API
, Bing API
, Wikipedia API
, and Serp
.
The CommsAgents
group is responsible for handling communication between different parties, such as sending emails, making phone calls, or messaging via various platforms. The agents in this group can integrate with a wide range of platforms. Some examples of agents in this group include TwilioCaller
, TwilioEmailWriter
, TwilioMessenger
and Slack
.
The ToolsAgents
group is responsible for performing various computational tasks, such as performing calculations, running scripts, or executing commands. The agents in this group can work with a wide range of programming languages and tools. Some examples of agents in this group include Math
, Python REPL
, and Terminal
.
The MultiModalAgents
group is responsible for handling input and output from various modalities, such as text, image, audio, or video (coming soon). The agents in this group can process and understand different modalities. Some examples of agents in this group include OpenAI Whisper
, Blip2
, Coqui
, and StableDiffusion
.
The ImageAgents
group is responsible for processing and manipulating images, such as enhancing image quality, object detection, or image recognition. The agents in this group can perform complex operations on images. Some examples of agents in this group include Upscaler
, ControlNet
and YOLO
.
The DBAgents
group is responsible for adding and fetching data from your database, such as getting metrics or aggregations from your database. The agents in this group interact with databases and enrich other agents with your database information. Some examples of agents in this group include SQL
, MongoDB
, ElasticSearch
, Qrant
and Notion
.
As a travel company that is promoting a new and exotic destination, it is crucial to have high-quality images that can grab the attention of potential travelers. However, manually creating stunning images can be time-consuming and expensive. That's why the travel company wants to use AgentChain to automate the image generation process and create beautiful visuals with the help of various agents.
Here is how AgentChain can help by chaining different agents together:
- Use
SearchAgent
(Google Search API
,Wikipedia API
,Serp
) to gather information and inspiration about the destination, such as the most popular landmarks, the local cuisine, and the unique features of the location. - Use
ImageAgent
(Upscaler
) to enhance the quality of images and make them more appealing by using state-of-the-art algorithms to increase the resolution and remove noise from the images. - Use
MultiModalAgent
(Blip2
) to generate descriptive captions for the images, providing more context and making the images more meaningful. - Use
CommsAgent
(TwilioEmailWriter
) to send the images to the target audience via email or other messaging platforms, attracting potential travelers with stunning visuals and promoting the new destination.
As an investment firm that manages a large portfolio of stocks, it is critical to stay up-to-date with the latest market trends and analyze the performance of different stocks to make informed investment decisions. However, analyzing data from multiple sources can be time-consuming and error-prone. That's why the investment firm wants to use AgentChain to automate the analysis process and generate reports with the help of various agents.
Here is how AgentChain can help by chaining different agents together:
- Use
ToolsAgent
(Python REPL
,TableQA
) to analyze data from different sources (e.g., CSV files, stock market APIs) and perform calculations related to financial metrics such as earnings, dividends, and P/E ratios. - Use
SearchAgent
(Bing API
) to gather news and information related to the stocks in the portfolio, such as recent earnings reports, industry trends, and analyst ratings. - Use
NLPAgent
(GPT
) to create a summary and bullet points of the news and information gathered, providing insights into market sentiment and potential trends. - Use
CommsAgent
(TwilioEmailWriter
) to send a summary report of the analysis to the appropriate stakeholders, helping them make informed decisions about their investments.
As an e-commerce site that wants to provide excellent customer service, it is crucial to have a chatbot that can handle customer inquiries and support requests in a timely and efficient manner. However, building a chatbot that can understand and respond to complex customer requests can be challenging. That's why the e-commerce site wants to use AgentChain to automate the chatbot process and provide superior customer service with the help of various agents.
Here is how AgentChain can help by chaining different agents together:
- Use
MultiModalAgent
(Blip2
,Whisper
) to handle input from various modalities (text, image, audio), making it easier for customers to ask questions and make requests in a natural way. - Use
SearchAgent
(Google Search API
,Wikipedia API
) orDBAgent
to provide information about products or services whether in-house or public, such as specifications, pricing, and availability. - Use
CommsAgent
(TwilioMessenger
) to communicate with customers via messaging platforms, providing support and answering questions in real-time. - Use
ToolsAgent
(Math
) to perform calculations related to discounts, taxes, or shipping costs, helping customers make informed decisions about their purchases. - Use
MultiModalAgent
(Coqui
) to generate natural-sounding responses and hold more complex conversations, providing a personalized and engaging experience for customers.
Access to personal health assistance can be expensive and limited. It is essential to have a personal health assistant that can help individuals manage their health and well-being. However, providing personalized health advice and reminders can be challenging, especially for seniors. That's why AgentChain aims to automate the health assistant process and provide personalized support with the help of various agents.
Here is how AgentChain can help by chaining different agents together:
- Use
DBAgent
to handle input from various health monitoring devices (e.g., heart rate monitors, blood pressure monitors, sleep trackers), providing real-time health data and alerts to the health assistant. - Use
SearchAgent
(Google Search API
,Wikipedia API
) or any other medical database to provide information about health topics and medications, such as side effects, dosage, and interactions. - Use
NLPAgent
(GPT
) to generate personalized recommendations for diet, exercise, and medication, taking into account the seniors' health goals and preferences. - Use
CommsAgent
(TwilioCaller
,TwilioMessenger
) to advise, make reminders and provide alerts to help stay on track with their health goals, improving their quality of life and reducing the need for emergency care.
We appreciate the open source of the following projects:
Hugging Face LangChain Stable Diffusion ControlNet InstructPix2Pix CLIPSeg BLIP Microsoft