This repository contains a collection of notebooks for gaining insights into presentation slides through multimodal AI models. The goal is to compare different models and how they perform on summarizing the content of presentation slides. This is not implemented through text-to-text models but rather through image-to-text (multimodal) models.
To access the models, a free Service from Github is used.
Be aware that there are certain rate limits for each model!
As a first test pdf I3D:bio's Training Material 'WhatIsOMERO.pdf' (Schmidt, C., Bortolomeazzi, M. et al., 2023) is used.
Make sure to generate a developer key / personal access token on Github and set it as an environment variable. You can generate the token via the Github website under user settings and afterwards set it like this for your current session:
export GITHUB_TOKEN= "your-github-token-goes-here"
$Env:GITHUB_TOKEN= "your-github-token-goes-here"
set GITHUB_TOKEN= your-github-token-goes-here
Install Azure AI Inference SDK:
pip install azure-ai-inference
Set up the Model. How this is done is shown in the first Notebook