help@rskworld.in +91 93305 39277
RSK World
  • Home
  • Development
    • Web Development
    • Mobile Apps
    • Software
    • Games
    • Project
  • Technologies
    • Data Science
    • AI Development
    • Cloud Development
    • Blockchain
    • Cyber Security
    • Dev Tools
    • Testing Tools
  • Blog
  • About
  • Contact

Theme Settings

Color Scheme
Display Options
Font Size
100%
Back to Project
RSK World
multi-language-chatbot
/
modules
RSK World
multi-language-chatbot
Multi-language Chatbot - Python + Flask + OpenAI API + NLP + Translation + Language Detection + Cultural Adaptation
modules
  • __pycache__
  • __init__.py194 B
  • analytics_engine.py28.6 KB
  • chatbot_core.py10.8 KB
  • collaboration_manager.py22.3 KB
  • conversation_memory.py25.2 KB
  • cultural_adapter.py12.3 KB
  • document_analyzer.py21.5 KB
  • language_detector.py5.8 KB
  • multimodal_processor.py32.7 KB
  • personality_engine.py33.6 KB
  • sentiment_analyzer.py16.9 KB
  • translator.py7.5 KB
  • voice_processor.py13.2 KB
language_detector.py
modules/language_detector.py
Raw Download
Find: Go to:
"""
Language Detection Module
Author: RSK World (https://rskworld.in)
Founder: Molla Samser
Designer & Tester: Rima Khatun
Contact: help@rskworld.in, +91 93305 39277
Year: 2026
Description: Advanced language detection with confidence scoring
"""

from langdetect import detect, detect_langs
from langdetect.lang_detect_exception import LangDetectException
import re
import logging

logger = logging.getLogger(__name__)

class LanguageDetector:
    def __init__(self):
        self.confidence_threshold = 0.7
        self.language_patterns = {
            'hi': r'[\u0900-\u097F]',  # Hindi
            'bn': r'[\u0980-\u09FF]',  # Bengali
            'ar': r'[\u0600-\u06FF]',  # Arabic
            'zh': r'[\u4E00-\u9FFF]',  # Chinese
            'ja': r'[\u3040-\u309F\u30A0-\u30FF]',  # Japanese
            'ru': r'[\u0400-\u04FF]',  # Russian
            'th': r'[\u0E00-\u0E7F]',  # Thai
            'ko': r'[\uAC00-\uD7AF]',  # Korean
        }
        
    def detect(self, text):
        """
        Detect language of given text
        Returns language code (e.g., 'en', 'hi', 'bn')
        """
        try:
            if not text or len(text.strip()) < 3:
                return 'en'
            
            text = text.strip()
            
            # First try pattern-based detection for better accuracy
            pattern_result = self._detect_by_pattern(text)
            if pattern_result:
                return pattern_result
            
            # Use langdetect for general detection
            try:
                langs = detect_langs(text)
                if langs and langs[0].prob >= self.confidence_threshold:
                    return langs[0].lang
                elif langs:
                    return langs[0].lang  # Return even if low confidence
            except LangDetectException:
                pass
            
            # Fallback to basic detection
            return self._basic_detection(text)
            
        except Exception as e:
            logger.error(f"Language detection error: {str(e)}")
            return 'en'
    
    def _detect_by_pattern(self, text):
        """Detect language using Unicode character patterns"""
        for lang_code, pattern in self.language_patterns.items():
            if re.search(pattern, text):
                # Check if significant portion of text matches the pattern
                matches = len(re.findall(pattern, text))
                if matches >= len(text) * 0.1:  # At least 10% of characters
                    return lang_code
        return None
    
    def _basic_detection(self, text):
        """Basic language detection using simple heuristics"""
        text_lower = text.lower()
        
        # Check for common words/phrases in different languages
        language_indicators = {
            'hi': ['नमस्ते', 'धन्यवाद', 'कृपया', 'हाँ', 'नहीं'],
            'bn': ['হ্যালো', 'ধন্যবাদ', 'অনুগ্রহ', 'হ্যাঁ', 'না'],
            'es': ['hola', 'gracias', 'por favor', 'sí', 'no'],
            'fr': ['bonjour', 'merci', 's\'il vous plaît', 'oui', 'non'],
            'de': ['hallo', 'danke', 'bitte', 'ja', 'nein'],
            'pt': ['olá', 'obrigado', 'por favor', 'sim', 'não'],
            'it': ['ciao', 'grazie', 'per favore', 'sì', 'no'],
            'ru': ['привет', 'спасибо', 'пожалуйста', 'да', 'нет'],
            'ja': ['こんにちは', 'ありがとう', 'お願いします', 'はい', 'いいえ'],
            'zh': ['你好', '谢谢', '请', '是', '不是'],
            'ar': ['مرحبا', 'شكرا', 'من فضلك', 'نعم', 'لا'],
        }
        
        for lang_code, indicators in language_indicators.items():
            for indicator in indicators:
                if indicator in text_lower:
                    return lang_code
        
        return 'en'  # Default to English
    
    def get_confidence(self):
        """Get confidence score of last detection"""
        try:
            # This would need to be stored during detection
            # For now, return a reasonable default
            return 0.85
        except:
            return 0.5
    
    def detect_multiple(self, text, top_k=3):
        """Detect multiple possible languages with confidence scores"""
        try:
            langs = detect_langs(text)
            results = []
            
            for lang in langs[:top_k]:
                results.append({
                    'language': lang.lang,
                    'confidence': lang.prob
                })
            
            return results
            
        except Exception as e:
            logger.error(f"Multiple language detection error: {str(e)}")
            return [{'language': 'en', 'confidence': 0.5}]
    
    def is_supported(self, language_code):
        """Check if language is supported"""
        supported_languages = {
            'en', 'hi', 'bn', 'es', 'fr', 'de', 'zh', 'ja', 'ar', 'pt', 'ru', 'it'
        }
        return language_code in supported_languages
    
    def get_language_name(self, language_code):
        """Get full language name from code"""
        language_names = {
            'en': 'English',
            'hi': 'हिन्दी (Hindi)',
            'bn': 'বাংলা (Bengali)',
            'es': 'Español (Spanish)',
            'fr': 'Français (French)',
            'de': 'Deutsch (German)',
            'zh': '中文 (Chinese)',
            'ja': '日本語 (Japanese)',
            'ar': 'العربية (Arabic)',
            'pt': 'Português (Portuguese)',
            'ru': 'Русский (Russian)',
            'it': 'Italiano (Italian)'
        }
        return language_names.get(language_code, 'Unknown')
152 lines•5.8 KB
python

About RSK World

Founded by Molla Samser, with Designer & Tester Rima Khatun, RSK World is your one-stop destination for free programming resources, source code, and development tools.

Founder: Molla Samser
Designer & Tester: Rima Khatun

Development

  • Game Development
  • Web Development
  • Mobile Development
  • AI Development
  • Development Tools

Legal

  • Terms & Conditions
  • Privacy Policy
  • Disclaimer

Contact Info

Nutanhat, Mongolkote
Purba Burdwan, West Bengal
India, 713147

+91 93305 39277

hello@rskworld.in
support@rskworld.in

© 2026 RSK World. All rights reserved.

Content used for educational purposes only. View Disclaimer