Skip to content

vdutts7/cs186-ai-chat

Repository files navigation


Logo Logo Logo

CS186 AI Chatbot

CS186 AI Chatbot ~ trained on official course website

screen-recording.mp4

Table of Contents

    📝 About
💻 How to build 🚀 Next steps 🔧 Tools used
👤 Contact

📝 About

More natural way to help students study for exams, review weekly content, and customize learnings to recreate similar problems etc to their prefernce. Trained on the weekly Notes. CS186 students, staff, and more generally anyone can clone and use this repo and adjust to their liking.

UC Berkeley 🐻🔵🟡 • CS186: Introduction to Database Systems • Spring 2023

(back to top)

💻 How to build

Note: macOS version, adjust accordingly for Windows / Linux

Initial setup

Clone the repo and install dependencies.

git clone https://github.com/vdutts7/cs186-ai-chat
cd cs186-ai-chat
pnpm install

Create a .env file and add your API keys (refer .env.local.example for this template):

OPENAI_API_KEY=""
NEXT_PUBLIC_SUPABASE_URL=""
NEXT_PUBLIC_SUPABASE_ANON_KEY=""
SUPABASE_SERVICE_ROLE_KEY=""

Get API keys:

IMPORTANT: Verify that .gitignore contains .env in it.

Prepare Supabase environment

I used Supabase as my vectorstore. Alternatives: Pinecone, Qdrant, Weaviate, Chroma, etc

You should have already created a Supabase project to get your API keys. Inside the project's SQL editor, create a new query and run the schema.sql. You should now have a documents table created with 4 columns.

Embedding and upserting

Inside the config folder is class-website-urls.ts. Modify to your liking. Project is setup to handle HTML pages in a consistent HTML/CSS format, which are then scraped using the cheerio jQuery package. Modify /utils/custom_web_loader.ts to control which CSS elements of the webpages' text you want scraped.

Manually run scrape-embed.ts from the scripts folder OR run the package script from terminal:

npm run scrape-embed

This is a one-time process and depending on size of data, it can take up to a few minutes. Check documents in your Supabase project and you should see rows populated with the embeddings that were just created.

Technical explanation

The scrape-embed.ts script:

  • Retrieves URLs from /config/class-website-urls.ts, extract the HTML/CSS data via cheerio as specified in /utils/custom_web_loader.ts
  • Vectorizes and embeds data into a JSON object using OpenAI's Embeddings(text-embedding-ada-002). This makes several vectors of 1536 dimensionality optimized for cosine similarity searches.
  • Upserts embeddings into documents (Supabase vectorstore). The upsert operation inserts new rows and overwrites existing rows.

visualized-flow-chart

Run app

npm run dev

Go to http://localhost:3000. You should be able to type and ask questions now. Done ✅

🚀 Next steps

Deploy

I used Vercel as this was a small project.

Alternatives: Heroku, Firebase, AWS Elastic Beanstalk, DigitalOcean, etc.

Customizations

UI/UX: change to your liking.

Bot behavior: edit prompt template in /utils/makechain.ts to fine-tune and add greater control on the bot's outputs.

Data: change URLs to handle whatever pages you want

(back to top)

🔧 Built With

Next Typescript Langchain OpenAI cheerio Supabase Tailwind CSS Vercel

(back to top)

👤 Contact

[email protected]

🔗 Project Link: https://github.com/vdutts7/cs186-ai-chat

(back to top)