โœ๏ธCreating an Audio to Text Converter with Databutton and OpenAI Whisper

A simple step-by-step walkthrough on creating an audio file uploader using Databutton, storing the audio files, and converting them to text using the OpenAI Whisper model.

  1. Create an Audio File Uploader

  2. Create APIs (Python Backends)

  3. Add the API to the UI Component

Create an Audio File Uploader

โœ๐Ÿผ Prompt : Can you create an audio file uploader where I can upload .mp3 files

Databutton creates an UI component for AudioFileUploader .

The 'Upload' button needs a functionalities .

Create APIs (Python backends)

Functionalities

  • Store the audio file from the frontend in the database.

  • Process and translate the audio file.

Storing the Audio file from the frontend to database

โœ๐Ÿผ Prompt : You will have an audio file in your frontend. Store that audio file over databutton's storage and pass an unique key which can be later used to fetch it back from the storage. Store the audio file in binary format using Databutton's SDK.

Note : We're currently using Databutton's default storage to store the audio files. However, other storage services like Firebase can also be used. It's recommended to use Firebase for storing audio files due to its scalability.

API : Store Uploaded Files ( Code )
from fastapi import APIRouter, UploadFile, File
from pydantic import BaseModel
import databutton as db
import uuid

# Router for endpoints
router = APIRouter()

class UploadAudioResponse(BaseModel):
    file_key: str

@router.post('/upload-audio', response_model=UploadAudioResponse)
def upload_audio(file: UploadFile = File(...)) -> UploadAudioResponse:
    # Generate a unique key for the file with .mp4 extension
    file_key = f"{uuid.uuid4()}.mp3"
    
    # Read the file content
    file_content = file.file.read()
    
    # Store the file in Databutton's storage in binary format
    db.storage.binary.put(file_key, file_content)
    
    # Return the unique key
    return UploadAudioResponse(file_key=file_key)

Process and Translate the Audio File

โœ๐Ÿผ Prompt : I would like you to use OpenAI whisper model and perfrom trsanscripion of an audio file. The audio file is stored in the Databutton's storage. The file can be accessed via an unique key which would be the input.

The output needs to be the transcription as a text performed by the OpenAI model.

Next, Databutton will define the Pydantic model (input/output parameters) and seek for the OpenAI API key.

Once the API key is passed, Databutton proceeds on generating a functional API endpoint.

Error Handling and Debugging Prompts:

Databutton might need some additional support on how to handle the file stored and pass the file path according to the supported format. Here's a suggested prompt:

Prompt : I would like you to fetch the data from storage using the Databutton sdk. Then use a temp path which will be passed to OpenAI LLM. Also remember that the OpenAI sdk requires to open the temporary file in binary mode

Prompt breakdown and expected code generation

  • Retrieve the audio file from storage

audio_file = db.storage.binary.get(request.storage_key)
  • Save the audio file to a temporary file with a recognized format

with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as temp_audio_file:
    temp_audio_file.write(audio_file)
    temp_audio_file_path = temp_audio_file.name
  • Open the temporary file in binary mode and pass the file object to the Whisper model

  with open(temp_audio_file_path, "rb") as file:
        transcription = client.audio.transcriptions.create(model="whisper-1", file=file)
    print(transcription)
API : Process and trsanlsate Audio files ( Code )
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel
import databutton as db
from openai import OpenAI
import io
import os
import tempfile

# Router for endpoints
router = APIRouter()

# Initialize OpenAI client
client = OpenAI(api_key=db.secrets.get("OPENAI_API_KEY"))


class TranscriptionRequest(BaseModel):
    storage_key: str


class TranscriptionResponse(BaseModel):
    transcription_text: str


@router.post("/transcribe-audio", response_model=TranscriptionResponse)
def transcribe_audio(request: TranscriptionRequest) -> TranscriptionResponse:
    try:
        # Retrieve the audio file from storage
        audio_file = db.storage.binary.get(request.storage_key)

        # Save the audio file to a temporary file with a recognized format
        with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as temp_audio_file:
            temp_audio_file.write(audio_file)
            temp_audio_file_path = temp_audio_file.name

        # Print the temporary audio file path for debugging
        print(f"Temporary audio file path: {temp_audio_file_path}")
        # Open the temporary file in binary mode and pass the file object to the Whisper model
        with open(temp_audio_file_path, "rb") as file:
            transcription = client.audio.transcriptions.create(model="whisper-1", file=file)
        print(transcription)
        # Clean up the temporary file
        os.remove(temp_audio_file_path)

        # Extract the transcription text
        transcription_text = transcription.text
        print(f"Transcription: {transcription_text}")

        # Return the transcription text
        return TranscriptionResponse(transcription_text=transcription_text)

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e)) from e

Add the API to the UI component

  • Integrating the "Store Uploaded Files" API

โœ๐Ÿผ Prompt :
Can you integrate #store_audio_file API . 
This api get's triggered when the Upload button is pressed
  • Adding a new button and integrating the process and translate API

โœ๐Ÿผ Prompt :
Can you also add a button called Translate ..

This Translate button will trigger #process_audio_file API

Import the AudioFileUploader UI Component to Home Page of the App

โœ๐Ÿผ Prompt : Import the #AudioFileUploader component here in this main page

Further, the main home page of the app can be polished. Here's how the main UI code looks like.

Final UI ( Code )
import { Box, Container, Heading, VStack } from "@chakra-ui/react";

import { AudioFileUploader } from "../components/AudioFileUploader";

export default function App() {
  return (
    <VStack
      spacing={4}
      justify="center"
      align="center"
      height="100vh"
      bg="black"
    >
      <Heading as="h1" size="2xl" my={2} textAlign="center" color="#FFFFFF">
        Audio Converter
      </Heading>
      <Container maxW="container.md" centerContent overflowY="auto">
        <Box
          p={8}
          borderRadius="lg"
          boxShadow="lg"
          bgImage="url('https://images.unsplash.com/reserve/LJIZlzHgQ7WPSh5KVTCB_Typewriter.jpg?q=80&w=3096&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D')"
          bgSize="cover"
          bgPosition="center"
          bgColor="rgba(255, 255, 255, 0.8)"
          bgBlendMode="lighten"
        >
          <Heading
            as="h3"
            size="m"
            my={4}
            textAlign="center"
            color="black"
            fontWeight="normal"
          ></Heading>
          <AudioFileUploader />
        </Box>
      </Container>
    </VStack>
  );
}

Last updated