Built with Streamlit, Gemini API, gTTS and OCR file support.
Translate English text into dozens of languages and hear it spoken aloud — all in one smooth, easy-to-use web app with audio download included.
Simply input your English text (or upload a file!), choose your target language, and receive both the translated text and a downloadable audio file.
Follow these steps to set up and run the application in your local machine
pipbrew install popplersudo apt-get install poppler-utilsClone to your local machine in your desired project folder:
git clone https://github.com/mcadriaans/translate-and-speak.git
It's best practice to use a virtual environment.
For Windows (cmd)
uv venv venv
.\venv\Scripts\activate
(If you encounter script execution issues, you might need to run:
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
For macOS/Linux
python3 -m venv venv
source venv/bin/activate
Install all necessary Python packages:
uv pip install -r requirements.txt
❗Note: EasyOCR models (craft_mlt_25k.pth, english_g2.pth) are included in the ml_models/easyocr directory. Ensure these are present for OCR functionality.
This application requires a Google Gemini API key.
.env in the root of your project directory.GOOGLE_API_KEY = your_api_key_here
Replace your_api_key_here with you actual key.
Activate the virtual environment, configure the API key and then run the application in cmd:
streamlit run translator_app.py
Your default web browser will open the application, ready for use.
| Category | Tools Used |
|---|---|
| Web Framework | streamlit |
| AI/Translation | google.generativeai (Gemini 1.5 Flash) |
| Text-to-Speech | gTTS (Google Text-to-Speech) |
| OCR(Image Parsing) | easyOCR, pdf2image, PIL, numpy |
| File Handling | io, tempfile, os, openpyxl, pandas |
| Environment Management | python-dotenv |
| Language Detection | langdetect |
| Custom Utilities | extract_text_from_file (from utils.file_parser) |
For a comprehensive understanding of the project's design, architectural decisions, testing methodology, detailed feature explanations etc., refer to the documentation.pdf file in this repository.
translate-and-speak/ # Root directory of the Translate & Speak application
├── assets/ # Folder for static resources used by the app
└── images
│ └── sample_files/ # Contains example input files for testing OCR and translation (PDFs, DOCX, TXT)
├── ml_models/ # Stores machine learning models used by the app
│ └── easyocr/ # EasyOCR-specific model files for text detection and recognition
└── craft_mlt_25k.pth # Pretrained model for detecting text regions in images (CRAFT model)
└── english_g2.pth # Pretrained model for recognizing English characters (G2 model)
├── utils/ # Utility scripts and helper functions to support core app logic
│ └── file_parser.py # Handles file parsing and text extraction from uploaded documents
├── .gitignore # Tells Git which files/folders to ignore (e.g., .env, venv)
├── README.md # Main documentation file: explains what the app does and how to use it
├── packages.txt # Optional list of system-level packages needed for deployment platform
├── project_guide.pdf # Comprehensive document covering design, setup, features, challenges, and deployment
├── requirements.txt # Lists all Python libraries required to run the app (used for pip install)
├── runtime.txt # Specifies the Python version for deployment platform
└──translator_speak_app.py # The main Streamlit app script: manages UI, file upload, OCR, translation, and speech
Contributions are welcome! If you have suggestions, bug reports, or want to contribute code, please feel free to open an issue or submit a pull request.
This project is licensed under the MIT License.
🙋♀️ Author: Created with 💜 by Michéle