Creating an Audio to Text Converter with Databutton and OpenAI Whisper
A simple step-by-step walkthrough on creating an audio file uploader using Databutton, storing the audio files, and converting them to text using the OpenAI Whisper model.
✍🏼 Prompt : Can you create an audio file uploader where I can upload .mp3 files
Databutton creates an UI component for AudioFileUploader .
The 'Upload' button needs a functionalities .
Create APIs (Python backends)
Functionalities
Store the audio file from the frontend in the database.
Process and translate the audio file.
Storing the Audio file from the frontend to database
✍🏼 Prompt : You will have an audio file in your frontend. Store that audio file over databutton's storage and pass an unique key which can be later used to fetch it back from the storage. Store the audio file in binary format using Databutton's SDK.
Note : We're currently using Databutton's default storage to store the audio files. However, other storage services like Firebase can also be used. It's recommended to use Firebase for storing audio files due to its scalability.
API : Store Uploaded Files ( Code )
from fastapi import APIRouter, UploadFile, Filefrom pydantic import BaseModelimport databutton as dbimport uuid# Router for endpointsrouter =APIRouter()classUploadAudioResponse(BaseModel): file_key:str@router.post('/upload-audio', response_model=UploadAudioResponse)defupload_audio(file: UploadFile =File(...)) -> UploadAudioResponse:# Generate a unique key for the file with .mp4 extension file_key =f"{uuid.uuid4()}.mp3"# Read the file content file_content = file.file.read()# Store the file in Databutton's storage in binary format db.storage.binary.put(file_key, file_content)# Return the unique keyreturnUploadAudioResponse(file_key=file_key)
Process and Translate the Audio File
✍🏼 Prompt : I would like you to use OpenAI whisper model and perfrom trsanscripion of an audio file. The audio file is stored in the Databutton's storage. The file can be accessed via an unique key which would be the input.
The output needs to be the transcription as a text performed by the OpenAI model.
Next, Databutton will define the Pydantic model (input/output parameters) and seek for the OpenAI API key.
Once the API key is passed, Databutton proceeds on generating a functional API endpoint.
Error Handling and Debugging Prompts:
Databutton might need some additional support on how to handle the file stored and pass the file path according to the supported format. Here's a suggested prompt:
Prompt : I would like you to fetch the data from storage using the Databutton sdk. Then use a temp path which will be passed to OpenAI LLM. Also remember that the OpenAI sdk requires to open the temporary file in binary mode
from fastapi import APIRouter, HTTPExceptionfrom pydantic import BaseModelimport databutton as dbfrom openai import OpenAIimport ioimport osimport tempfile# Router for endpointsrouter =APIRouter()# Initialize OpenAI clientclient =OpenAI(api_key=db.secrets.get("OPENAI_API_KEY"))classTranscriptionRequest(BaseModel): storage_key:strclassTranscriptionResponse(BaseModel): transcription_text:str@router.post("/transcribe-audio", response_model=TranscriptionResponse)deftranscribe_audio(request: TranscriptionRequest) -> TranscriptionResponse:try:# Retrieve the audio file from storage audio_file = db.storage.binary.get(request.storage_key)# Save the audio file to a temporary file with a recognized formatwith tempfile.NamedTemporaryFile(suffix=".mp3", delete=False)as temp_audio_file: temp_audio_file.write(audio_file) temp_audio_file_path = temp_audio_file.name# Print the temporary audio file path for debuggingprint(f"Temporary audio file path: {temp_audio_file_path}")# Open the temporary file in binary mode and pass the file object to the Whisper modelwithopen(temp_audio_file_path, "rb")as file: transcription = client.audio.transcriptions.create(model="whisper-1", file=file)print(transcription)# Clean up the temporary file os.remove(temp_audio_file_path)# Extract the transcription text transcription_text = transcription.textprint(f"Transcription: {transcription_text}")# Return the transcription textreturnTranscriptionResponse(transcription_text=transcription_text)exceptExceptionas e:raiseHTTPException(status_code=500, detail=str(e))from e
Add the API to the UI component
Integrating the "Store Uploaded Files" API
✍🏼 Prompt :
Can you integrate #store_audio_file API .
This api get's triggered when the Upload button is pressed
Adding a new button and integrating the process and translate API
✍🏼 Prompt :
Can you also add a button called Translate ..
This Translate button will trigger #process_audio_file API
Import the AudioFileUploader UI Component to Home Page of the App
✍🏼 Prompt : Import the #AudioFileUploader component here in this main page
Further, the main home page of the app can be polished. Here's how the main UI code looks like.