screen-recording.mp4
More natural way to help students study for exams, review weekly content, and customize learnings to recreate similar problems etc to their prefernce. Trained on the weekly Notes. CS186 students, staff, and more generally anyone can clone and use this repo and adjust to their liking.
UC Berkeley 🐻🔵🟡 • CS186: Introduction to Database Systems • Spring 2023
Note: macOS version, adjust accordingly for Windows / Linux
Clone the repo and install dependencies.
git clone https://github.com/vdutts7/cs186-ai-chat
cd cs186-ai-chat
pnpm install
Create a .env file and add your API keys (refer .env.local.example
for this template):
OPENAI_API_KEY=""
NEXT_PUBLIC_SUPABASE_URL=""
NEXT_PUBLIC_SUPABASE_ANON_KEY=""
SUPABASE_SERVICE_ROLE_KEY=""
Get API keys:
IMPORTANT: Verify that .gitignore
contains .env
in it.
I used Supabase as my vectorstore. Alternatives: Pinecone, Qdrant, Weaviate, Chroma, etc
You should have already created a Supabase project to get your API keys. Inside the project's SQL editor, create a new query and run the schema.sql
. You should now have a documents
table created with 4 columns.
Inside the config
folder is class-website-urls.ts
. Modify to your liking. Project is setup to handle HTML pages in a consistent HTML/CSS format, which are then scraped using the cheerio
jQuery package. Modify /utils/custom_web_loader.ts
to control which CSS elements of the webpages' text you want scraped.
Manually run scrape-embed.ts
from the scripts
folder OR run the package script from terminal:
npm run scrape-embed
This is a one-time process and depending on size of data, it can take up to a few minutes. Check documents
in your Supabase project and you should see rows populated with the embeddings that were just created.
The scrape-embed.ts
script:
- Retrieves URLs from
/config/class-website-urls.ts
, extract the HTML/CSS data viacheerio
as specified in/utils/custom_web_loader.ts
- Vectorizes and embeds data into a JSON object using OpenAI's Embeddings(text-embedding-ada-002). This makes several vectors of 1536 dimensionality optimized for cosine similarity searches.
- Upserts embeddings into
documents
(Supabase vectorstore). The upsert operation inserts new rows and overwrites existing rows.
npm run dev
Go to http://localhost:3000
. You should be able to type and ask questions now. Done ✅
I used Vercel as this was a small project.
Alternatives: Heroku, Firebase, AWS Elastic Beanstalk, DigitalOcean, etc.
UI/UX: change to your liking.
Bot behavior: edit prompt template in /utils/makechain.ts
to fine-tune and add greater control on the bot's outputs.
Data: change URLs to handle whatever pages you want
🔗 Project Link: https://github.com/vdutts7/cs186-ai-chat