Transcribe Audio to Text with Deepgram, Node.js, and React

Published in

Dev Genius

4 min readApr 4, 2024

#Intro

Deepgram — Deepgram is a leading provider of automatic speech recognition (ASR) technology. Their platform uses deep learning algorithms to transcribe audio data into text with high accuracy. Deepgram’s ASR technology is designed to handle various accents, languages, and audio qualities, making it suitable for a wide range of applications, from transcription services to voice-controlled applications.

Start with backend —

Installing Dependencies: Use npm (Node Package Manager) to install the necessary dependencies for your Node.js project. For example:

npm install express axios @deepgram/sdk dotenv multer

Setting up a Basic Express Server: In your Node.js application, create a basic Express server by importing Express and defining routes. For example:

import express from "express";
import cors from "cors";
import multer from "multer";


import getTranscript from "./deepgram.js";

const app = express();
const PORT = 3000;

app.use(cors());

const storage = multer.memoryStorage(); // Store files in memory
const upload = multer({ storage: storage });


app.post("/getTranscript",upload.single("file"), getTranscript);

// Start the server
app.listen(PORT, () => {
  console.log(`Server is running on http://localhost:${PORT}`);
});

Integrating Deepgram’s API for Speech-to-Text: Use Deepgram’s SDK to integrate their API for speech-to-text conversion. First, create a Deepgram client using your API key:

import { createClient } from "@deepgram/sdk";
import dotenv from "dotenv";
dotenv.config();

const deepgramSDK = createClient(process.env.DEEPGRAM_API_KEY);

export { deepgramSDK };

Then, use the client to transcribe audio files. For example, to transcribe a file named audio.mp3:

import { deepgramSDK } from "./config.js"; // Import your Deepgram SDK

const getTranscript = async (request, response) => {
  try {
    const fileData = request.file.buffer; // Retrieve the 'file' from FormData

    const options = {
      language: "sv",
      modal: "enhanced",
      replace: "",
      keyword: "",
    }; // Replace with your desired transcription options

    // Transcribe the file using the Deepgram SDK
    const transcript = await deepgramSDK.listen.prerecorded.transcribeFile(
      fileData, 
      options
    );

    // Send the transcript back as the response
    response.status(200).json({ transcript });
  } catch (error) {
    console.error("Error:", error.message);
    response
      .status(500)
      .json({ error: "An error occurred while transcribing the file." });
  }
};

export default getTranscript;

Building the React Frontend —

For the frontend, we use React to create a component that allows users to upload audio files and receive a transcript. We install the necessary dependencies using npm:

# npm
npm init vite@latest my-react-app -- --template react

or

# yarn
yarn create vite my-react-app --template react
 
or

# pnpm
pnpm create vite my-react-app --template react

Next, we update a app.js component in our React app:

import { useState } from "react";
import axios from "axios";
import "./index.css";
const App = () => {
  const [selectedFile, setSelectedFile] = useState(null);
  const [uploadProgress, setUploadProgress] = useState(0);
  const [transcript, setTranscript] = useState("");

  // Function to handle file upload
  const handleFileUpload = (event) => {
    setSelectedFile(event.target.files[0]);
  };

  // Function to handle API call after upload
  const handleUpload = async () => {
    if (!selectedFile) {
      alert("Please select a file!");
      return;
    }

    try {
      const formData = new FormData();
      formData.append("file", selectedFile);

      // Replace 'YOUR_API_ENDPOINT' with your actual API endpoint
      const response = await axios.post(
        "http://localhost:3000/getTranscript",
        formData,
        {
          headers: {
            "Content-Type": "multipart/form-data",
            // Include any necessary headers for authorization or other requirements
          },
          onUploadProgress: (progressEvent) => {
            const progress = Math.round(
              (progressEvent.loaded * 100) / progressEvent.total
            );
            setUploadProgress(progress);
          },
        }
      );

      console.log("API Response:", response.data);
      // Set the transcript state with the received response
      setTranscript(
        response.data.transcript.result.results.channels[0].alternatives[0]
          .transcript
      );
    } catch (error) {
      console.error("Error:", error.message);
      // Handle errors here
    }
  };

  return (
    <div className="upload-container">
      <div className="upload-input">
        <input type="file" onChange={handleFileUpload} />
        <span>
          <i className="fas fa-upload"></i> Choose File
        </span>
      </div>
      <button className="upload-btn" onClick={handleUpload}>
        Upload & Hit API
      </button>
      {uploadProgress > 0 && <p>Upload Progress: {uploadProgress}%</p>}
      {transcript && (
        <div className="transcript-box">
          <h3>Transcript:</h3>
          <p>{transcript}</p>
        </div>
      )}
    </div>
  );
};

export default App;

Demo

You can find the code for this project in this GitHub repository

GitHub - luckytarun24/Deepgram

Contribute to luckytarun24/Deepgram development by creating an account on GitHub.

github.com

Conclusion

In this blog post, we’ve learned how to integrate Deepgram’s speech-to-text API with a React frontend and a Node.js backend. By following these steps, you can create a powerful speech recognition application that can transcribe audio files into text.