Creating an Audio to Text Converter with Databutton and OpenAI Whisper
A simple step-by-step walkthrough on creating an audio file uploader using Databutton, storing the audio files, and converting them to text using the OpenAI Whisper model.
Last updated
Was this helpful?
Create an Audio File Uploader
Create APIs (Python Backends)
Add the API to the UI Component
Create an Audio File Uploader
✍🏼 Prompt : Can you create an audio file uploader where I can upload .mp3 files
Databutton creates an UI component for AudioFileUploader .
The 'Upload' button needs a functionalities .
Create APIs (Python backends)
Functionalities
Store the audio file from the frontend in the database.
Process and translate the audio file.
Storing the Audio file from the frontend to database
✍🏼 Prompt : You will have an audio file in your frontend. Store that audio file over databutton's storage and pass an unique key which can be later used to fetch it back from the storage. Store the audio file in binary format using Databutton's SDK.
Note : We're currently using Databutton's default storage to store the audio files. However, other storage services like Firebase can also be used. It's recommended to use Firebase for storing audio files due to its scalability.
API : Store Uploaded Files ( Code )
from fastapi import APIRouter, UploadFile, File
from pydantic import BaseModel
import databutton as db
import uuid
# Router for endpoints
router = APIRouter()
class UploadAudioResponse(BaseModel):
file_key: str
@router.post('/upload-audio', response_model=UploadAudioResponse)
def upload_audio(file: UploadFile = File(...)) -> UploadAudioResponse:
# Generate a unique key for the file with .mp4 extension
file_key = f"{uuid.uuid4()}.mp3"
# Read the file content
file_content = file.file.read()
# Store the file in Databutton's storage in binary format
db.storage.binary.put(file_key, file_content)
# Return the unique key
return UploadAudioResponse(file_key=file_key)
Process and Translate the Audio File
✍🏼 Prompt : I would like you to use OpenAI whisper model and perfrom trsanscripion of an audio file. The audio file is stored in the Databutton's storage. The file can be accessed via an unique key which would be the input.
The output needs to be the transcription as a text performed by the OpenAI model.
Next, Databutton will define the Pydantic model (input/output parameters) and seek for the OpenAI API key.
Once the API key is passed, Databutton proceeds on generating a functional API endpoint.
Error Handling and Debugging Prompts:
Databutton might need some additional support on how to handle the file stored and pass the file path according to the supported format. Here's a suggested prompt:
Prompt : I would like you to fetch the data from storage using the Databutton sdk. Then use a temp path which will be passed to OpenAI LLM. Also remember that the OpenAI sdk requires to open the temporary file in binary mode
Save the audio file to a temporary file with a recognized format
with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as temp_audio_file:
temp_audio_file.write(audio_file)
temp_audio_file_path = temp_audio_file.name
Open the temporary file in binary mode and pass the file object to the Whisper model
with open(temp_audio_file_path, "rb") as file:
transcription = client.audio.transcriptions.create(model="whisper-1", file=file)
print(transcription)
API : Process and trsanlsate Audio files ( Code )
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel
import databutton as db
from openai import OpenAI
import io
import os
import tempfile
# Router for endpoints
router = APIRouter()
# Initialize OpenAI client
client = OpenAI(api_key=db.secrets.get("OPENAI_API_KEY"))
class TranscriptionRequest(BaseModel):
storage_key: str
class TranscriptionResponse(BaseModel):
transcription_text: str
@router.post("/transcribe-audio", response_model=TranscriptionResponse)
def transcribe_audio(request: TranscriptionRequest) -> TranscriptionResponse:
try:
# Retrieve the audio file from storage
audio_file = db.storage.binary.get(request.storage_key)
# Save the audio file to a temporary file with a recognized format
with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as temp_audio_file:
temp_audio_file.write(audio_file)
temp_audio_file_path = temp_audio_file.name
# Print the temporary audio file path for debugging
print(f"Temporary audio file path: {temp_audio_file_path}")
# Open the temporary file in binary mode and pass the file object to the Whisper model
with open(temp_audio_file_path, "rb") as file:
transcription = client.audio.transcriptions.create(model="whisper-1", file=file)
print(transcription)
# Clean up the temporary file
os.remove(temp_audio_file_path)
# Extract the transcription text
transcription_text = transcription.text
print(f"Transcription: {transcription_text}")
# Return the transcription text
return TranscriptionResponse(transcription_text=transcription_text)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e)) from e
Add the API to the UI component
Integrating the "Store Uploaded Files" API
✍🏼 Prompt :
Can you integrate #store_audio_file API .
This api get's triggered when the Upload button is pressed
Adding a new button and integrating the process and translate API
✍🏼 Prompt :
Can you also add a button called Translate ..
This Translate button will trigger #process_audio_file API
Import the AudioFileUploader UI Component to Home Page of the App
✍🏼 Prompt : Import the #AudioFileUploader component here in this main page
Further, the main home page of the app can be polished. Here's how the main UI code looks like.