Translate and Speak: Multilingual for English Input 🌍🔊

Built with Streamlit, Gemini API, gTTS and OCR file support.

Translate English text into dozens of languages and hear it spoken aloud — all in one smooth, easy-to-use web app with audio download included.

Simply input your English text (or upload a file!), choose your target language, and receive both the translated text and a downloadable audio file.

✨ Features

Translate English text into 20+ languages
Generate spoken audio using gTTS
OCR support for PDF, images, and scanned documents
Downloadable audio output
Clean and intuitive Streamlit UI

🧭Getting Started

Follow these steps to set up and run the application in your local machine

1. Prerequisites

Python 3.10
Python package installer pip
Poppler Utilities required for PDF processing.
- macOS (Homebrew): brew install poppler
- Linux (apt-get): sudo apt-get install poppler-utils
- Windows: Download from here Ensure it's added to your PATH.

2. Clone the repository

Clone to your local machine in your desired project folder:

git clone https://github.com/mcadriaans/translate-and-speak.git

3. Set Up a Virtual Environment

It's best practice to use a virtual environment.

For Windows (cmd)

uv venv venv
.\venv\Scripts\activate

(If you encounter script execution issues, you might need to run:

Set-ExecutionPolicy RemoteSigned -Scope CurrentUser

For macOS/Linux

python3 -m venv venv
source venv/bin/activate

4. Install Dependencies

Install all necessary Python packages:

uv pip install -r requirements.txt

❗Note: EasyOCR models (craft_mlt_25k.pth, english_g2.pth) are included in the ml_models/easyocr directory. Ensure these are present for OCR functionality.

5. Configure Your Gemini API Key

This application requires a Google Gemini API key.

Generate a key from Google AI Studio
Create a file named .env in the root of your project directory.
Add your API key to this file:

GOOGLE_API_KEY = your_api_key_here

Replace your_api_key_here with you actual key.

6. Launch the Application

Activate the virtual environment, configure the API key and then run the application in cmd:

streamlit run translator_app.py

Your default web browser will open the application, ready for use.

🧱Technology Stack

Category	Tools Used
Web Framework	`streamlit`
AI/Translation	`google.generativeai` (Gemini 1.5 Flash)
Text-to-Speech	`gTTS` (Google Text-to-Speech)
OCR(Image Parsing)	`easyOCR`, `pdf2image`, `PIL`, `numpy`
File Handling	`io`, `tempfile`, `os`, `openpyxl`, `pandas`
Environment Management	`python-dotenv`
Language Detection	`langdetect`
Custom Utilities	`extract_text_from_file` (from `utils.file_parser`)

📄Detailed Documentation

For a comprehensive understanding of the project's design, architectural decisions, testing methodology, detailed feature explanations etc., refer to the documentation.pdf file in this repository.

📁Project Structure

translate-and-speak/                 # Root directory of the Translate & Speak application
├── assets/                          # Folder for static resources used by the app
    └── images   
│   └── sample_files/                # Contains example input files for testing OCR and translation (PDFs, DOCX, TXT)
├── ml_models/                       # Stores machine learning models used by the app
│   └── easyocr/                     # EasyOCR-specific model files for text detection and recognition
        └── craft_mlt_25k.pth        # Pretrained model for detecting text regions in images (CRAFT model)
        └── english_g2.pth           # Pretrained model for recognizing English characters (G2 model)
├── utils/                           # Utility scripts and helper functions to support core app logic
│   └── file_parser.py               # Handles file parsing and text extraction from uploaded documents
├── .gitignore                       # Tells Git which files/folders to ignore (e.g., .env, venv)
├── README.md                        # Main documentation file: explains what the app does and how to use it
├── packages.txt                     # Optional list of system-level packages needed for deployment platform
├── project_guide.pdf                # Comprehensive document covering design, setup, features, challenges, and deployment
├── requirements.txt                 # Lists all Python libraries required to run the app (used for pip install)
├── runtime.txt                      # Specifies the Python version for deployment platform
└──translator_speak_app.py           # The main Streamlit app script: manages UI, file upload, OCR, translation, and speech

⚠️Important Considerations and Limitations

Input Language: Currently, the application processes English text only for translation. This was a deliberate design choice for complexity management and consistent quality.
gTTS Language Support: gTTS has a limited set of supported languages for speech synthesis.
File Size: Maximum upload size is 200MB.
No Real-Time Preview: Translation and audio generation automatically occurs after text is submitted.
No GPU Support: If deploying on Streamlit Community Cloud. This can lead to slower inference times as computational demands grow with more complex/larger tasks.

🤝Contributing

Contributions are welcome! If you have suggestions, bug reports, or want to contribute code, please feel free to open an issue or submit a pull request.

📜License

This project is licensed under the MIT License.

🙋‍♀️ Author: Created with 💜 by Michéle