help@rskworld.in +91 93305 39277
RSK World
  • Home
  • Development
    • Web Development
    • Mobile Apps
    • Software
    • Games
    • Project
  • Technologies
    • Data Science
    • AI Development
    • Cloud Development
    • Blockchain
    • Cyber Security
    • Dev Tools
    • Testing Tools
  • About
  • Contact

Theme Settings

Color Scheme
Display Options
Font Size
100%
Back to Project
RSK World
speech-recognition
/
scripts
RSK World
speech-recognition
Speech Recognition Dataset - Audio AI + Speech-to-Text + Voice Recognition
scripts
  • __init__.py848 B
  • augmentation.py13.9 KB
  • evaluate_model.py13.9 KB
  • example_usage.py5.3 KB
  • generate_sample_audio.py10.2 KB
  • load_dataset.py9 KB
  • preprocess.py8.5 KB
  • train_model.py9.5 KB
  • transformer_model.py14.9 KB
generate_sample_audio.py__init__.py
scripts/generate_sample_audio.py
Raw Download
Find: Go to:
"""
============================================================================
Speech Recognition Dataset - Sample Audio Generator with Text-to-Speech
============================================================================

Project: Speech Recognition Dataset
Description: Audio speech recognition dataset with labeled speech samples 
             for training speech-to-text and voice recognition models.

============================================================================
DEVELOPER INFORMATION
============================================================================
Website: https://rskworld.in
Founded by: Molla Samser
Designer & Tester: Rima Khatun
Email: help@rskworld.in
Support: support@rskworld.in
Phone: +91 93305 39277
Address: Nutanhat, Mongolkote, Purba Burdwan, West Bengal, India, 713147

============================================================================
COPYRIGHT NOTICE
============================================================================
© 2026 RSK World. All rights reserved.
This dataset is provided for educational and research purposes.

============================================================================

This script generates sample audio files using Text-to-Speech (TTS).
It speaks the actual transcripts like "Hello, how are you today?"
"""

import numpy as np
import pandas as pd
from pathlib import Path
from scipy.io import wavfile
import os

# Try to import TTS libraries
TTS_ENGINE = None

try:
    import pyttsx3
    TTS_ENGINE = 'pyttsx3'
    print("[OK] Found pyttsx3 - Will generate real speech!")
except ImportError:
    print("[!] pyttsx3 not found")
    try:
        from gtts import gTTS
        TTS_ENGINE = 'gtts'
        print("[OK] Found gTTS - Will generate real speech (requires internet)")
    except ImportError:
        print("[!] gTTS not found")
        print("[!] Install TTS: pip install pyttsx3")
        print("[!] Or: pip install gtts")


class SpeechAudioGenerator:
    """
    Generate speech audio files using Text-to-Speech.
    Speaks the actual words like "Hello, how are you today?"
    """
    
    def __init__(self, output_dir='data/audio', sr=22050):
        """
        Initialize the speech generator.
        
        Args:
            output_dir: Directory to save audio files
            sr: Sample rate (22050 is common for speech)
        """
        self.output_dir = Path(output_dir)
        self.sr = sr
        self.output_dir.mkdir(parents=True, exist_ok=True)
        
        # Initialize TTS engine
        self.engine = None
        self.voices = []
        
        if TTS_ENGINE == 'pyttsx3':
            try:
                self.engine = pyttsx3.init()
                # Configure for natural speech
                self.engine.setProperty('rate', 150)  # Words per minute
                self.engine.setProperty('volume', 0.9)
                
                # Get available voices
                self.voices = self.engine.getProperty('voices') or []
                print(f"[OK] Initialized pyttsx3 with {len(self.voices)} voice(s)")
                
                # Show available voices
                for i, voice in enumerate(self.voices):
                    print(f"    Voice {i}: {voice.name}")
            except Exception as e:
                print(f"[!] Error initializing pyttsx3: {e}")
                self.engine = None
    
    def generate_speech_pyttsx3(self, text, output_path, voice_index=0):
        """
        Generate speech using pyttsx3 (Windows SAPI / espeak).
        
        Args:
            text: Text to speak
            output_path: Where to save the audio
            voice_index: Which voice to use
            
        Returns:
            True if successful
        """
        if not self.engine:
            return False
        
        try:
            # Select voice
            if self.voices and voice_index < len(self.voices):
                self.engine.setProperty('voice', self.voices[voice_index].id)
            
            # Generate speech and save to file
            self.engine.save_to_file(text, str(output_path))
            self.engine.runAndWait()
            
            # Verify file was created
            if output_path.exists() and output_path.stat().st_size > 0:
                return True
            return False
            
        except Exception as e:
            print(f"[!] pyttsx3 error for '{text[:30]}...': {e}")
            return False
    
    def generate_speech_gtts(self, text, output_path):
        """
        Generate speech using Google Text-to-Speech.
        
        Args:
            text: Text to speak
            output_path: Where to save (will be MP3)
            
        Returns:
            True if successful
        """
        try:
            from gtts import gTTS
            
            tts = gTTS(text=text, lang='en', slow=False)
            
            # gTTS saves as MP3
            mp3_path = str(output_path).replace('.wav', '.mp3')
            tts.save(mp3_path)
            
            # Try to convert to WAV
            try:
                from pydub import AudioSegment
                audio = AudioSegment.from_mp3(mp3_path)
                audio = audio.set_frame_rate(self.sr)
                audio.export(str(output_path), format='wav')
                os.remove(mp3_path)
            except ImportError:
                # Keep as MP3
                if output_path.suffix == '.wav':
                    os.rename(mp3_path, str(output_path).replace('.wav', '.mp3'))
            
            return True
            
        except Exception as e:
            print(f"[!] gTTS error: {e}")
            return False
    
    def generate_fallback(self, text, output_path, duration):
        """
        Generate fallback audio when TTS is not available.
        Creates speech-like synthetic sounds.
        """
        num_samples = int(duration * self.sr)
        t = np.linspace(0, duration, num_samples)
        
        # Fundamental frequency (pitch)
        f0 = 120
        
        # Generate harmonics
        audio = np.zeros(num_samples)
        for harmonic in range(1, 8):
            freq = f0 * harmonic
            amp = 1.0 / harmonic
            audio += amp * np.sin(2 * np.pi * freq * t)
        
        # Create envelope based on words
        words = text.split() if text else ['word']
        word_dur = duration / max(len(words), 1)
        
        envelope = np.zeros(num_samples)
        for i in range(len(words)):
            start = int(i * word_dur * self.sr)
            end = int((i + 0.7) * word_dur * self.sr)
            end = min(end, num_samples)
            if start < num_samples and end > start:
                length = end - start
                envelope[start:end] = np.hanning(length)
        
        audio *= envelope
        
        # Add subtle noise
        audio += np.random.randn(num_samples) * 0.03
        
        # Normalize
        if np.max(np.abs(audio)) > 0:
            audio = audio / np.max(np.abs(audio)) * 0.7
        
        # Save
        audio_int16 = (audio * 32767).astype(np.int16)
        wavfile.write(str(output_path), self.sr, audio_int16)
    
    def generate_from_metadata(self, metadata_path='data/metadata.csv'):
        """
        Generate speech audio files from metadata.
        
        Args:
            metadata_path: Path to CSV with transcripts
        """
        metadata = pd.read_csv(metadata_path)
        
        print("\n" + "=" * 60)
        print("GENERATING SPEECH AUDIO")
        print("=" * 60)
        print(f"Total files: {len(metadata)}")
        print(f"Output: {self.output_dir}")
        print(f"TTS Engine: {TTS_ENGINE or 'Fallback (synthetic)'}")
        print("=" * 60 + "\n")
        
        success_tts = 0
        success_fallback = 0
        
        for idx, row in metadata.iterrows():
            file_name = row['file_name']
            duration = row['duration']
            transcript = row['transcript']
            speaker = row.get('speaker', 'Speaker_001')
            
            output_path = self.output_dir / file_name
            
            # Progress
            print(f"[{idx+1}/{len(metadata)}] {file_name}")
            print(f"    Text: \"{transcript}\"")
            
            success = False
            
            # Try pyttsx3
            if TTS_ENGINE == 'pyttsx3' and self.engine:
                # Vary voice based on speaker
                speaker_num = int(speaker.split('_')[1]) if '_' in speaker else 0
                voice_idx = speaker_num % max(1, len(self.voices))
                success = self.generate_speech_pyttsx3(transcript, output_path, voice_idx)
            
            # Try gTTS
            elif TTS_ENGINE == 'gtts':
                success = self.generate_speech_gtts(transcript, output_path)
            
            if success:
                print(f"    [OK] Generated with TTS")
                success_tts += 1
            else:
                # Fallback
                self.generate_fallback(transcript, output_path, duration)
                print(f"    [OK] Generated with fallback")
                success_fallback += 1
        
        # Summary
        print("\n" + "=" * 60)
        print("GENERATION COMPLETE")
        print("=" * 60)
        print(f"TTS generated:      {success_tts} files")
        print(f"Fallback generated: {success_fallback} files")
        print(f"Total:              {len(metadata)} files")
        print(f"Location:           {self.output_dir}")
        print("=" * 60)


def main():
    """Generate speech audio files."""
    print("\n" + "=" * 60)
    print("SPEECH RECOGNITION DATASET")
    print("Audio Generator with Text-to-Speech")
    print("RSK World - https://rskworld.in")
    print("=" * 60)
    
    # Check for TTS
    if not TTS_ENGINE:
        print("\n[!] No TTS engine found!")
        print("    To generate real speech, install pyttsx3:")
        print("    pip install pyttsx3")
        print("\n    Continuing with fallback audio...")
    
    generator = SpeechAudioGenerator(
        output_dir='data/audio',
        sr=22050  # Good quality for speech
    )
    
    generator.generate_from_metadata('data/metadata.csv')
    
    print("\n" + "=" * 60)
    print("(C) 2026 RSK World. All rights reserved.")
    print("=" * 60 + "\n")


if __name__ == '__main__':
    main()
307 lines•10.2 KB
python
scripts/__init__.py
Raw Download
Find: Go to:
"""
============================================================================
Speech Recognition Dataset - Scripts Package
============================================================================

Project: Speech Recognition Dataset
Website: https://rskworld.in
Founded by: Molla Samser
Designer & Tester: Rima Khatun
Email: help@rskworld.in
Support: support@rskworld.in
Phone: +91 93305 39277

============================================================================
COPYRIGHT NOTICE
============================================================================
© 2026 RSK World. All rights reserved.
This dataset is provided for educational and research purposes.

============================================================================
"""

__version__ = '1.0.0'
__author__ = 'Molla Samser - RSK World'

26 lines•848 B
python

About RSK World

Founded by Molla Samser, with Designer & Tester Rima Khatun, RSK World is your one-stop destination for free programming resources, source code, and development tools.

Founder: Molla Samser
Designer & Tester: Rima Khatun

Development

  • Game Development
  • Web Development
  • Mobile Development
  • AI Development
  • Development Tools

Legal

  • Terms & Conditions
  • Privacy Policy
  • Disclaimer

Contact Info

Nutanhat, Mongolkote
Purba Burdwan, West Bengal
India, 713147

+91 93305 39277

hello@rskworld.in
support@rskworld.in

© 2026 RSK World. All rights reserved.

Content used for educational purposes only. View Disclaimer