Demo Whisper / GPT / DallE in the Gradio-powered Web apps

Whisper models allow you to transcribe and translate audio files, using their speech-to-text capabilities.

If you want to know more about the possibilities for Whisper visit this link To discover the languages supported visit this link - Actually the last version is V3 ( in open AI is still in V2 apis call )

There are difference between use Whisper by azure Ai speech or use whisper by Azure open Ai service for different feature visit this link to know the possilities

Extract from the documentation

Dall-E model allow you to transform text to image or prompt to image, so in simple terms it draws pictures based on your description

Cookbook for Dall-E3

We will display how to utilise Whisper models offline or consume them through an Azure endpoint (either from Azure OpenAI or Azure AI Speech resources).

You can access have the access to the portal Azure AI speech on this link

The Table of contents below is wrapped into a functional Web interface, powered by Gradio

Option 0 - Access to Whisper models in offline mode / local

Whisper model can be consumed offline. You may notice differences in its performance on the weaker local computers in comparison to an Azure based deployment. At the same time, this may serve certain scenarios where access to external resources is prohibited or not possible.

To instantiate Web app with offline Whisper functionality, please follow these steps:

Install gradio Python package. This will allow you to define and instantiate a Web app, that will run locally as a Web service.

pip install --upgrade gradio

Install openai-whisper Python package. It comes with a few pre-trained Whisper models of various sizes. E.g. "base" model may require ~1 Gb of RAM, while "large" one would expect ~10 Gb of RAM.

if you want to select other Whisper model inside the notebook you will have to change this line with the correct name

whisper.load_model("base")

pip install --upgrade openai-whisper

Note: You may also require installation of FFMpeg package to make this solution work on your local computer.

pip install ffpmpeg

Launch provided Python script for offline Web app.

python 0_Whisper_Offline.py

If successful, you should be able to access new Web app's interface at http://127.0.0.1:7860/ as shown below. You can now record your speech through the computer's microphone and transcribe it using one of selected Whisper models.

Option 1 - Access to Whisper models via Azure OpenAI endpoint

Whisper models are now available as a part of Azure OpenAI resource. To consume its API endpoint in your Gradio app, please follow these steps:

Create a deployment and Deploy Whisper in available Azure OpenAI region in *The Azure openAi studio
Go to the azure portal to select the deployment.
Copy API endpoint and key details.
Install gradio Python package. This will allow you to define and instantiate a Web app, that will run locally as a Web service.

pip install --upgrade gradio

Install openai Python package. This is the client SDK that your Web app will use to interact with Azure OpenAI endpoint or Open Ai model if you need it

pip install --upgrade openai

Launch provided Python script for a Web app, integrated with Azure OpenAI endpoint.

python 1_Whisper_AOAI_endpoint.py

If successful, you should be able to access new Web app's interface at http://127.0.0.1:7860/ as shown below. You can now record your speech through the computer's microphone and transcribe it using Whisper model enabled in Azure OpenAI.

Option 2 - Access to Whisper models via Azure AI Speech endpoint - URI access

🚧 don't work now

🚧
Whisper models are also available through Azure AI Speech. Using batch API (similar to what is described here), can increase audio file size limit up to 1 Gb.

pip install --upgrade azure-cognitiveservices-speech

You could find some information for using whisper models in Azure speech inside this page

This demo is based on this link

Option 3 - Whisper model / processing GPT and Dall-E generation

This last demo is the more advanced with diffrent tabs and possiblity to customize your choice in the bottom in the page

In your terminal in visual studio code or powershell / cmd

python 3_Whisper_process_dalle.py

You will have the information in terminal cmd

Running on local URL:  http://127.0.0.1:7860

You could open your webbrowser at the localhost adress shown http://127.0.0.1:7860

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
images		images
.gitignore		.gitignore
0_Whisper_Offline.py		0_Whisper_Offline.py
1_Whisper_AOAI_endpoint.py		1_Whisper_AOAI_endpoint.py
2_Whisper_AzureAISpeech_endpoint.py		2_Whisper_AzureAISpeech_endpoint.py
3_Whisper_process_dalle.py		3_Whisper_process_dalle.py
LICENSE		LICENSE
README.md		README.md
azure_env		azure_env
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Demo Whisper / GPT / DallE in the Gradio-powered Web apps

Table of contents:

Option 0 - Access to Whisper models in offline mode / local

Option 1 - Access to Whisper models via Azure OpenAI endpoint

Option 2 - Access to Whisper models via Azure AI Speech endpoint - URI access

Option 3 - Whisper model / processing GPT and Dall-E generation

About

Releases

Packages

Languages

License

olivMertens/AOAI-Whisper-Gradio

Folders and files

Latest commit

History

Repository files navigation

Demo Whisper / GPT / DallE in the Gradio-powered Web apps

Table of contents:

Option 0 - Access to Whisper models in offline mode / local

Option 1 - Access to Whisper models via Azure OpenAI endpoint

Option 2 - Access to Whisper models via Azure AI Speech endpoint - URI access

Option 3 - Whisper model / processing GPT and Dall-E generation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages