About This Dataset
This dataset contains parallel sentence pairs in multiple languages with aligned translations. Perfect for machine translation, multilingual NLP, and cross-lingual model training.
Multiple Language Pairs
Parallel sentences in multiple languages with aligned translations for comprehensive training.
Aligned Translations
Precisely aligned sentence pairs ensuring accurate translation model training.
Training & Validation Sets
Pre-split datasets ready for immediate use in machine learning pipelines.
Ready for Translation Models
Optimized format compatible with Transformers, mBERT, and mT5 models.
Technologies
TSV JSON Transformers mBERT mT5How It Works - Complete Guide
Using the Translation Tool:
- Select Languages: Choose your source language (or use "Detect language" for auto-detection) and target language from the dropdown menus.
- Enter Text: Type or paste any word, phrase, or sentence in the left text box.
- Auto-Translation: Translation happens automatically as you type (with 500ms delay for better performance).
- View Result: The translated text appears instantly in the right box.
- Swap Languages: Click the swap button (↔) to reverse the translation direction.
- Copy Text: Use the copy buttons to copy input or output text to clipboard.
- Listen: Click the speaker icon to hear the translated text (text-to-speech).
- Clear: Use the X button to clear the input field.
Supported Languages: English, Spanish, French, German
Three-Tier Translation System:
Tier 1: Local Dictionary
First, the system checks the local dictionary with 1,983 translation entries. This works completely offline!
- Instant translation
- No internet required
- Exact phrase matching
Tier 2: Word-by-Word
If exact match not found, the system translates word-by-word using the local dictionary.
- Better coverage
- Handles new combinations
- Still works offline
Tier 3: API Fallback
As a last resort, uses MyMemory Translation API for real-time translation.
- Requires internet
- Handles any text
- Real-time translation
Translation Status: The footer shows which method was used (Local Dictionary, Word-by-word, or API).
Comprehensive Offline Translation Dictionary
The local dictionary contains 1,983 translation entries covering:
Content Categories:
- Greetings & Common Phrases
- Numbers & Dates
- Days of Week & Months
- Food & Drinks
- Family & Relationships
- Colors & Descriptions
- Time & Places
- Actions & Verbs
- Technology Terms
- Travel & Transportation
- Business & Education
- And much more!
Language Pairs (12 total):
- English ↔ Spanish
- English ↔ French
- English ↔ German
- Spanish ↔ French
- Spanish ↔ German
- French ↔ German
File Location: data/local_dictionary.json
Format: JSON with nested dictionaries for each language pair
Core Features:
- Real-time Translation: Translates as you type (500ms debounce)
- Auto Language Detection: Automatically detects source language
- Offline Support: Works without internet using local dictionary
- Word-by-Word Translation: Handles phrases not in exact dictionary
- Character Counter: Shows 0/5000 with limit warning
- Copy to Clipboard: Easy copy buttons for input/output
- Text-to-Speech: Listen to translations in target language
- Language Swap: One-click language direction reversal
Advanced Features:
- Smart Matching: Handles punctuation and case variations
- Status Indicators: Shows translation source (Local/API)
- Error Handling: Graceful fallbacks if translation fails
- Responsive Design: Works on desktop, tablet, and mobile
- Google Translate UI: Familiar, user-friendly interface
- Toast Notifications: Visual feedback for user actions
- Loading States: Shows spinner during translation
- Keyboard Shortcuts: Ctrl+Enter to translate
Complete Dataset Structure:
| File | Format | Entries | Purpose |
|---|---|---|---|
train.json / train.tsv |
JSON / TSV | 50 | Training dataset with parallel sentences |
validation.json / validation.tsv |
JSON / TSV | 5 | Validation dataset for model testing |
sample_data.json |
JSON | 15 | Sample data for preview/demo |
local_dictionary.json |
JSON | 1,983 | Comprehensive offline translation dictionary |
Total Translation Entries: 2,053 (50 + 5 + 15 + 1,983)
Languages Covered: English, Spanish, French, German (4 languages, 12 pairs)
Technology Stack:
Frontend:
- HTML5: Semantic markup
- CSS3: Custom styling with Google Translate-inspired design
- JavaScript (ES6+): Vanilla JS, no frameworks
- Bootstrap 5: Responsive grid and components
- Font Awesome 6: Icons
Backend/Data:
- JSON: Data storage format
- TSV: Tab-separated values for easy processing
- Python 3: Data processing scripts
- MyMemory API: Free translation API fallback
- Web Speech API: Text-to-speech functionality
Key Functions:
loadLocalDictionary()- Loads offline dictionaryloadTranslationData()- Loads dataset translationstranslateText()- Main translation functiontranslateWordByWord()- Word-by-word translationtranslateWithAPI()- API fallback translationdetectLanguage()- Auto language detectionhandleInput()- Auto-translate on input
Tips for Best Results:
- Use Complete Sentences: Full sentences translate better than single words
- Check Status Indicator: See if translation came from local dictionary (faster) or API
- Offline Mode: Most common phrases work offline - no internet needed!
- Language Detection: Use "Detect language" if unsure of source language
- Character Limit: Maximum 5,000 characters per translation
- Copy & Paste: Easy copy buttons for both input and output
- Listen Feature: Use speaker icon to hear pronunciation
- Swap Languages: Quickly reverse translation direction
Common Use Cases:
- Learning new languages
- Quick phrase translation
- Understanding foreign text
- Travel communication
- Language practice
- Document translation
About This Project:
Project ID: 25
Category: Text Data
Difficulty: Advanced
Year: 2016
Technologies: TSV, JSON, Transformers, mBERT, mT5
Project Structure:
language-translation/
├── data/ # Dataset files (JSON, TSV)
├── scripts/ # Python processing scripts
├── examples/ # Usage examples
├── index.html # Main demo page
└── Documentation/ # README, SETUP, etc.
Available Scripts:
process_data.py- Process and convert datasetsconvert_format.py- Convert between TSV and JSONanalyze_dataset.py- Analyze dataset statisticsdownload_translation_data.py- Download from public sourcesbuild_local_dictionary.py- Build local dictionary
Created by: RSK World
Website: https://rskworld.in
Email: help@rskworld.in
Phone: +91 93305 39277
Dataset Preview
| ID | English | Spanish | French | German |
|---|
Dataset Features
- Parallel sentences
- Multiple language pairs
- Aligned translations
- Training and validation sets
- Ready for translation models