Transcribe Audio to Text with Deepgram, Node.js, and React

Tarun jain
Dev Genius
Published in
4 min readApr 4, 2024

--

#Intro

Deepgram — Deepgram is a leading provider of automatic speech recognition (ASR) technology. Their platform uses deep learning algorithms to transcribe audio data into text with high accuracy. Deepgram’s ASR technology is designed to handle various accents, languages, and audio qualities, making it suitable for a wide range of applications, from transcription services to voice-controlled applications.

Start with backend —

Installing Dependencies: Use npm (Node Package Manager) to install the necessary dependencies for your Node.js project. For example:

npm install express axios @deepgram/sdk dotenv multer

Setting up a Basic Express Server: In your Node.js application, create a basic Express server by importing Express and defining routes. For example:

import express from "express";
import cors from "cors";
import multer from "multer";


import getTranscript from "./deepgram.js";

const app = express();
const PORT = 3000;

app.use(cors());

const storage = multer.memoryStorage(); // Store files in memory
const upload = multer({ storage: storage });


app.post("/getTranscript",upload.single("file"), getTranscript);

// Start the server
app.listen(PORT, () => {
console.log(`Server is running on http://localhost:${PORT}`);
});

Integrating Deepgram’s API for Speech-to-Text: Use Deepgram’s SDK to integrate their API for speech-to-text conversion. First, create a Deepgram client using your API key:

import { createClient } from "@deepgram/sdk";
import dotenv from "dotenv";
dotenv.config();

const deepgramSDK = createClient(process.env.DEEPGRAM_API_KEY);

export { deepgramSDK };

Then, use the client to transcribe audio files. For example, to transcribe a file named audio.mp3:

import { deepgramSDK } from "./config.js"; // Import your Deepgram SDK

const getTranscript = async (request, response) => {
try {
const fileData = request.file.buffer; // Retrieve the 'file' from FormData

const options = {
language: "sv",
modal: "enhanced",
replace: "",
keyword: "",
}; // Replace with your desired transcription options

// Transcribe the file using the Deepgram SDK
const transcript = await deepgramSDK.listen.prerecorded.transcribeFile(
fileData,
options
);

// Send the transcript back as the response
response.status(200).json({ transcript });
} catch (error) {
console.error("Error:", error.message);
response
.status(500)
.json({ error: "An error occurred while transcribing the file." });
}
};

export default getTranscript;

Building the React Frontend —

For the frontend, we use React to create a component that allows users to upload audio files and receive a transcript. We install the necessary dependencies using npm:

# npm
npm init vite@latest my-react-app -- --template react

or

# yarn
yarn create vite my-react-app --template react

or

# pnpm
pnpm create vite my-react-app --template react

Next, we update a app.js component in our React app:

import { useState } from "react";
import axios from "axios";
import "./index.css";
const App = () => {
const [selectedFile, setSelectedFile] = useState(null);
const [uploadProgress, setUploadProgress] = useState(0);
const [transcript, setTranscript] = useState("");

// Function to handle file upload
const handleFileUpload = (event) => {
setSelectedFile(event.target.files[0]);
};

// Function to handle API call after upload
const handleUpload = async () => {
if (!selectedFile) {
alert("Please select a file!");
return;
}

try {
const formData = new FormData();
formData.append("file", selectedFile);

// Replace 'YOUR_API_ENDPOINT' with your actual API endpoint
const response = await axios.post(
"http://localhost:3000/getTranscript",
formData,
{
headers: {
"Content-Type": "multipart/form-data",
// Include any necessary headers for authorization or other requirements
},
onUploadProgress: (progressEvent) => {
const progress = Math.round(
(progressEvent.loaded * 100) / progressEvent.total
);
setUploadProgress(progress);
},
}
);

console.log("API Response:", response.data);
// Set the transcript state with the received response
setTranscript(
response.data.transcript.result.results.channels[0].alternatives[0]
.transcript
);
} catch (error) {
console.error("Error:", error.message);
// Handle errors here
}
};

return (
<div className="upload-container">
<div className="upload-input">
<input type="file" onChange={handleFileUpload} />
<span>
<i className="fas fa-upload"></i> Choose File
</span>
</div>
<button className="upload-btn" onClick={handleUpload}>
Upload & Hit API
</button>
{uploadProgress > 0 && <p>Upload Progress: {uploadProgress}%</p>}
{transcript && (
<div className="transcript-box">
<h3>Transcript:</h3>
<p>{transcript}</p>
</div>
)}
</div>
);
};

export default App;

Demo

Landing page
Result after uploading Transcript

You can find the code for this project in this GitHub repository

Conclusion

In this blog post, we’ve learned how to integrate Deepgram’s speech-to-text API with a React frontend and a Node.js backend. By following these steps, you can create a powerful speech recognition application that can transcribe audio files into text.

--

--

React JS developer | Frontend Developer | Fullstack MERN devepoler | Javascript | Typescript