⟵ Back to Home
Multilingual Translator & Text-to-Speech Application for English Input
Banner

Introduction

This documentation provides an overview of the project, including goals, architecture, and implementation details. It covers the full development lifecycle from design and testing to deployment on Streamlit Community Cloud, highlighting key challenges and the solutions used to address them.

1. Project Overview

This web application translates English text into multiple languages and converts translated text into speech. It uses Streamlit for the interface, Google’s Gemini API for translation, and gTTS for audio generation. Users can type text or upload files (PDF, TXT, CSV, XLSX, PNG, JPG, JPEG) and receive translated text and downloadable audio.

App Workflow Diagram

2. Design Decision: English-Only Input

The system restricts input to English to reduce complexity, avoid unreliable language detection, and ensure consistent translation quality. Handling arbitrary input languages would require extensive exception handling and introduce unpredictable behavior.

3. Technology Stack

Technology Stack

4. Setup Instructions

4.1 Prerequisites

4.2 Clone the Repository

cli

4.3 Create and Activate Virtual Environment

cli
cli

This will activate the virtual environment and we are now ready for development.

4.4 Install Dependencies

cli

4.5 Configure Gemini API Key

  1. Generate a key from Google AI Studio. and go to Get API key
  2. Create a file named . env in the root of your project directory (trans date —and— speak/).
  3. Add your API key to this file in the format:
  4. cli
  5. The python-dotenv package will securely load this key into the application.

4.6 Launch the Application

cli

Powershell will launch a local server and the default browser will open the application:

cli

5. Key Features

5.1 Elegant and Responsive Interface

5.2 Dual Input Options

5.3 Smart File Parsing

The application employs a robust text extraction pipeline to handle diverse input formats, intelligently determining the best method to retrieve content for translation. This includes direct text extraction from readable documents and an OCR fallback for image-based content.

Smart Flow Parsing Workflow

This workflow illustrates the sequence:

  1. User Input & Validation: The system first processes any direct text or uploaded file. It performs basic validation checks for empty input.
  2. It then determines if the uploaded file's format (PDF, TXT, CSV, XLSX, PNG, JPG, JPEG) is supported. Unsupported files lead to an error.
  3. For supported files, the system attempts to directly extract text (e.g., from text-based PDFs, TXT, CSV, XLSX).
  4. If direct text extraction yields no content (e.g., image-only PDFs, pure image files), an OCR process is initiated using easyocr to convert visual text into readable data.
  5. Once text is successfully extracted (either directly or via OCR), it proceeds to the translation and text-to-speech modules.
  6. 5.4 Multilingual Translation Powered by Gemini

    5.5 Instant Text-to-Speech

    5.6 Robust Error Handling and User Feedback

    In this user-facing application which involves file uploads, OCR, translation, and audio generation, clear communication is key. Streamlit's built-in messaging functions therefore play a vital role in guiding users, managing expectations, and gracefully handling errors. Here's an overview of how the application leverages these messages to foster a seamless and reliable user experience while encouraging confidence and ease of interaction:

    cli
    cli
    cli

    6. Project Structure

    The project appears as follows in the dedicated Github remote repository:

    cli

    At its core is a streamlit-based application that manages user interaction, file uploads, text extraction, translation and speech synthesis. There are sample test documents in various formats which can be used to test the application's OCR and translation capabilities.

    A key component of the system is the pre-trained machine learning models housed in the m1 mode1s/easyocr directory. There are two models:

    The supporting script l11e pa rser . py is stored in the utils folder and will ensure that the input text strings/documents are properly processed before translation. Configuration files like requ1rement s . txt and r unt1me . txt define the Python environment and dependencies, packages . txt lists system-level dependencies that need to be installed on the deployment platform and documentation files provide guidance for setup and usage.

    7. Deployment to Streamlit Cloud

    This lightweight application was built with Streamlit and the repository is public, allowing for easy deployment on Streamlit Community Cloud.

    7.1 Key Benefits of Streamlit Community Cloud

    7.2 Configuration and Launch

    First we navigate to the Streamlit sign‑in page and log in with a GitHub account.

    7.2.1 Configure Deployment Settings

    Click the “New application” button to access the “Deploy an application” dialog box. Select the GitHub repository that contains the codebase, branch for deployment( typically main) and then specify the path to the main script.

    Streamlit Cloud Deployment Config

    Streamlit Community Cloud does not directly support .env files for security reasons. Instead, environment variables (such as API Keys) can be securely stored using Streamlit Secrets. From the deployed application navigate to “Manage application” (gear icon bottom right) then select “Secrets”. Now we can add the Google API key in TOML format.

    Streamlit Cloud Deployment Config

    8. Testing & Debugging

    8.1 Local Deployment Testing

    Executed a series of test cases using the various allowable file formats : PDF, TXT, CSV, XLSX, PNG, JPG and JPEG, to verify the accuracy of the parsing logic and robustness of the application. Key findings and resolutions from these tests include:

    8.1.1 OCR Library Deployment Challenges

    Initially, pytesseract (Python wrapper for the Tesseract OCR engine) was used for OCR local deployment. However, when attempting to deploy to the Streamlit Community Cloud , the application consistently failed to extract and subsequently translate text from documents that require OCR processing.