help@rskworld.in +91 93305 39277
RSK World
  • Home
  • Development
    • Web Development
    • Mobile Apps
    • Software
    • Games
    • Project
  • Technologies
    • Data Science
    • AI Development
    • Cloud Development
    • Blockchain
    • Cyber Security
    • Dev Tools
    • Testing Tools
  • About
  • Contact

Theme Settings

Color Scheme
Display Options
Font Size
100%
Back to Project
RSK World
speech-recognition
/
scripts
RSK World
speech-recognition
Speech Recognition Dataset - Audio AI + Speech-to-Text + Voice Recognition
scripts
  • __init__.py848 B
  • augmentation.py13.9 KB
  • evaluate_model.py13.9 KB
  • example_usage.py5.3 KB
  • generate_sample_audio.py10.2 KB
  • load_dataset.py9 KB
  • preprocess.py8.5 KB
  • train_model.py9.5 KB
  • transformer_model.py14.9 KB
train_model.pyevaluate_model.py
scripts/train_model.py
Raw Download
Find: Go to:
"""
============================================================================
Speech Recognition Dataset - Model Training Script
============================================================================

Project: Speech Recognition Dataset
Description: Audio speech recognition dataset with labeled speech samples 
             for training speech-to-text and voice recognition models.

============================================================================
DEVELOPER INFORMATION
============================================================================
Website: https://rskworld.in
Founded by: Molla Samser
Designer & Tester: Rima Khatun
Email: help@rskworld.in
Support: support@rskworld.in
Phone: +91 93305 39277
Address: Nutanhat, Mongolkote, Purba Burdwan, West Bengal, India, 713147

============================================================================
COPYRIGHT NOTICE
============================================================================
© 2026 RSK World. All rights reserved.
This dataset is provided for educational and research purposes.

============================================================================
"""

import numpy as np
import pandas as pd
from pathlib import Path
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Bidirectional
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.utils import to_categorical
import pickle

class SpeechRecognitionModel:
    """
    LSTM-based model for speech recognition
    """
    
    def __init__(self, feature_dir='data/features', model_dir='models'):
        """
        Initialize the model trainer
        
        Args:
            feature_dir: Directory containing extracted features
            model_dir: Directory to save trained models
        """
        self.feature_dir = Path(feature_dir)
        self.model_dir = Path(model_dir)
        self.model_dir.mkdir(parents=True, exist_ok=True)
        
        self.model = None
        self.label_encoder = LabelEncoder()
    
    def load_features(self, feature_name='mfcc'):
        """
        Load all features from the dataset
        
        Args:
            feature_name: Name of the feature to load
            
        Returns:
            X: Feature arrays
            y: Labels (speakers or transcripts)
            metadata: Metadata dataframe
        """
        metadata_path = self.feature_dir / 'features_metadata.csv'
        metadata = pd.read_csv(metadata_path)
        
        X = []
        y = []
        
        print(f"Loading {feature_name} features...")
        for idx, row in metadata.iterrows():
            file_id = row['id']
            feature_path = self.feature_dir / f"{file_id}_{feature_name}.npy"
            
            if feature_path.exists():
                feature = np.load(feature_path)
                X.append(feature)
                # Using speaker as label for speaker recognition
                # For speech-to-text, you would use transcript
                y.append(row['speaker'])
        
        return np.array(X, dtype=object), np.array(y), metadata
    
    def pad_sequences(self, sequences, max_length=None):
        """
        Pad sequences to the same length
        
        Args:
            sequences: List of variable-length sequences
            max_length: Maximum length (if None, uses max sequence length)
            
        Returns:
            Padded sequences array
        """
        if max_length is None:
            max_length = max(len(seq) for seq in sequences)
        
        padded = []
        for seq in sequences:
            if len(seq) < max_length:
                # Pad with zeros
                pad_width = max_length - len(seq)
                padded_seq = np.pad(seq, ((0, pad_width), (0, 0)), mode='constant')
            else:
                # Truncate if longer
                padded_seq = seq[:max_length]
            padded.append(padded_seq)
        
        return np.array(padded)
    
    def build_model(self, input_shape, num_classes):
        """
        Build LSTM model for speech recognition
        
        Args:
            input_shape: Shape of input data (timesteps, features)
            num_classes: Number of output classes
            
        Returns:
            Compiled Keras model
        """
        model = Sequential([
            Bidirectional(LSTM(128, return_sequences=True), input_shape=input_shape),
            Dropout(0.3),
            Bidirectional(LSTM(64, return_sequences=True)),
            Dropout(0.3),
            LSTM(32),
            Dropout(0.3),
            Dense(64, activation='relu'),
            Dropout(0.2),
            Dense(num_classes, activation='softmax')
        ])
        
        model.compile(
            optimizer='adam',
            loss='categorical_crossentropy',
            metrics=['accuracy']
        )
        
        return model
    
    def train(self, X, y, test_size=0.2, validation_size=0.1, epochs=50, batch_size=32):
        """
        Train the model
        
        Args:
            X: Feature arrays
            y: Labels
            test_size: Proportion of data for testing
            validation_size: Proportion of training data for validation
            epochs: Number of training epochs
            batch_size: Batch size for training
        """
        # Encode labels
        y_encoded = self.label_encoder.fit_transform(y)
        num_classes = len(self.label_encoder.classes_)
        y_categorical = to_categorical(y_encoded, num_classes)
        
        # Pad sequences
        print("Padding sequences...")
        X_padded = self.pad_sequences(X)
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(
            X_padded, y_categorical, test_size=test_size, random_state=42, stratify=y_encoded
        )
        
        X_train, X_val, y_train, y_val = train_test_split(
            X_train, y_train, test_size=validation_size, random_state=42
        )
        
        # Build model
        input_shape = (X_padded.shape[1], X_padded.shape[2])
        self.model = self.build_model(input_shape, num_classes)
        
        print("\nModel Architecture:")
        self.model.summary()
        
        # Callbacks
        callbacks = [
            EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True),
            ModelCheckpoint(
                self.model_dir / 'best_model.h5',
                monitor='val_accuracy',
                save_best_only=True,
                verbose=1
            )
        ]
        
        # Train model
        print("\nTraining model...")
        history = self.model.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=epochs,
            batch_size=batch_size,
            callbacks=callbacks,
            verbose=1
        )
        
        # Evaluate on test set
        print("\nEvaluating on test set...")
        test_loss, test_accuracy = self.model.evaluate(X_test, y_test, verbose=0)
        print(f"Test Accuracy: {test_accuracy:.4f}")
        print(f"Test Loss: {test_loss:.4f}")
        
        # Save label encoder
        with open(self.model_dir / 'label_encoder.pkl', 'wb') as f:
            pickle.dump(self.label_encoder, f)
        
        # Save final model
        self.model.save(self.model_dir / 'final_model.h5')
        
        print(f"\nModel saved to: {self.model_dir}")
        return history
    
    def predict(self, feature_array):
        """
        Make predictions on new audio features
        
        Args:
            feature_array: Feature array from audio file
            
        Returns:
            Predicted class and probabilities
        """
        if self.model is None:
            # Load saved model
            model_path = self.model_dir / 'best_model.h5'
            if model_path.exists():
                self.model = tf.keras.models.load_model(model_path)
            else:
                raise ValueError("Model not found. Please train the model first.")
        
        # Load label encoder
        with open(self.model_dir / 'label_encoder.pkl', 'rb') as f:
            self.label_encoder = pickle.load(f)
        
        # Pad sequence
        X = self.pad_sequences([feature_array])
        
        # Predict
        probabilities = self.model.predict(X)[0]
        predicted_class_idx = np.argmax(probabilities)
        predicted_class = self.label_encoder.inverse_transform([predicted_class_idx])[0]
        
        return predicted_class, probabilities


def main():
    """Main function to train the model"""
    # Initialize model trainer
    trainer = SpeechRecognitionModel(
        feature_dir='data/features',
        model_dir='models'
    )
    
    # Load features
    X, y, metadata = trainer.load_features(feature_name='mfcc')
    
    print(f"\nDataset Info:")
    print(f"Total samples: {len(X)}")
    print(f"Number of classes: {len(np.unique(y))}")
    print(f"Feature shape (sample): {X[0].shape}")
    
    # Train model
    history = trainer.train(
        X, y,
        test_size=0.2,
        validation_size=0.1,
        epochs=50,
        batch_size=32
    )
    
    print("\nTraining completed successfully!")


if __name__ == '__main__':
    main()

291 lines•9.5 KB
python
scripts/evaluate_model.py
Raw Download
Find: Go to:
"""
============================================================================
Speech Recognition Dataset - Model Evaluation Script
============================================================================

Project: Speech Recognition Dataset
Description: Audio speech recognition dataset with labeled speech samples 
             for training speech-to-text and voice recognition models.

============================================================================
DEVELOPER INFORMATION
============================================================================
Website: https://rskworld.in
Founded by: Molla Samser
Designer & Tester: Rima Khatun
Email: help@rskworld.in
Support: support@rskworld.in
Phone: +91 93305 39277
Address: Nutanhat, Mongolkote, Purba Burdwan, West Bengal, India, 713147

============================================================================
COPYRIGHT NOTICE
============================================================================
© 2026 RSK World. All rights reserved.
This dataset is provided for educational and research purposes.

============================================================================

This script provides comprehensive model evaluation including:
- Confusion matrix
- Classification report
- ROC curves
- Precision-Recall curves
- Error analysis
"""

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from sklearn.metrics import (
    confusion_matrix, classification_report, accuracy_score,
    precision_recall_fscore_support, roc_curve, auc,
    precision_recall_curve, average_precision_score
)
from sklearn.preprocessing import LabelEncoder, label_binarize
import tensorflow as tf
import pickle
import json


class ModelEvaluator:
    """
    Comprehensive model evaluation for speech recognition.
    """
    
    def __init__(self, model_path, label_encoder_path, output_dir='evaluation'):
        """
        Initialize the evaluator.
        
        Args:
            model_path: Path to trained model
            label_encoder_path: Path to label encoder
            output_dir: Directory to save evaluation results
        """
        self.model_path = Path(model_path)
        self.output_dir = Path(output_dir)
        self.output_dir.mkdir(parents=True, exist_ok=True)
        
        # Load model
        print(f"Loading model from {model_path}...")
        self.model = tf.keras.models.load_model(str(model_path))
        
        # Load label encoder
        with open(label_encoder_path, 'rb') as f:
            self.label_encoder = pickle.load(f)
        
        self.classes = self.label_encoder.classes_
        self.num_classes = len(self.classes)
    
    def evaluate(self, X_test, y_test):
        """
        Perform comprehensive evaluation.
        
        Args:
            X_test: Test features
            y_test: Test labels (not encoded)
            
        Returns:
            Dictionary with evaluation metrics
        """
        # Encode labels
        y_true = self.label_encoder.transform(y_test)
        y_true_onehot = label_binarize(y_true, classes=range(self.num_classes))
        
        # Get predictions
        print("Making predictions...")
        y_pred_proba = self.model.predict(X_test)
        y_pred = np.argmax(y_pred_proba, axis=1)
        
        # Calculate metrics
        metrics = {}
        
        # Basic accuracy
        metrics['accuracy'] = accuracy_score(y_true, y_pred)
        print(f"\nAccuracy: {metrics['accuracy']:.4f}")
        
        # Precision, Recall, F1
        precision, recall, f1, support = precision_recall_fscore_support(
            y_true, y_pred, average='weighted'
        )
        metrics['precision'] = precision
        metrics['recall'] = recall
        metrics['f1_score'] = f1
        
        print(f"Precision: {precision:.4f}")
        print(f"Recall: {recall:.4f}")
        print(f"F1 Score: {f1:.4f}")
        
        # Per-class metrics
        metrics['per_class'] = precision_recall_fscore_support(
            y_true, y_pred, average=None
        )
        
        # Generate visualizations
        self._plot_confusion_matrix(y_true, y_pred)
        self._plot_roc_curves(y_true_onehot, y_pred_proba)
        self._plot_precision_recall_curves(y_true_onehot, y_pred_proba)
        self._generate_classification_report(y_true, y_pred)
        self._analyze_errors(X_test, y_test, y_true, y_pred, y_pred_proba)
        
        # Save metrics
        self._save_metrics(metrics)
        
        return metrics
    
    def _plot_confusion_matrix(self, y_true, y_pred):
        """Plot and save confusion matrix."""
        cm = confusion_matrix(y_true, y_pred)
        
        plt.figure(figsize=(12, 10))
        sns.heatmap(
            cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=self.classes,
            yticklabels=self.classes
        )
        plt.title('Confusion Matrix', fontsize=16, fontweight='bold')
        plt.xlabel('Predicted', fontsize=12)
        plt.ylabel('Actual', fontsize=12)
        plt.tight_layout()
        plt.savefig(self.output_dir / 'confusion_matrix.png', dpi=300)
        plt.close()
        
        print(f"Confusion matrix saved to {self.output_dir / 'confusion_matrix.png'}")
        
        # Normalized confusion matrix
        cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        
        plt.figure(figsize=(12, 10))
        sns.heatmap(
            cm_normalized, annot=True, fmt='.2f', cmap='Blues',
            xticklabels=self.classes,
            yticklabels=self.classes
        )
        plt.title('Normalized Confusion Matrix', fontsize=16, fontweight='bold')
        plt.xlabel('Predicted', fontsize=12)
        plt.ylabel('Actual', fontsize=12)
        plt.tight_layout()
        plt.savefig(self.output_dir / 'confusion_matrix_normalized.png', dpi=300)
        plt.close()
    
    def _plot_roc_curves(self, y_true_onehot, y_pred_proba):
        """Plot ROC curves for each class."""
        plt.figure(figsize=(12, 8))
        
        colors = plt.cm.rainbow(np.linspace(0, 1, self.num_classes))
        
        # Calculate ROC for each class
        all_fpr = []
        all_tpr = []
        all_auc = []
        
        for i, (class_name, color) in enumerate(zip(self.classes, colors)):
            if self.num_classes > 2:
                fpr, tpr, _ = roc_curve(y_true_onehot[:, i], y_pred_proba[:, i])
                roc_auc = auc(fpr, tpr)
            else:
                fpr, tpr, _ = roc_curve(y_true_onehot.ravel(), y_pred_proba[:, 1])
                roc_auc = auc(fpr, tpr)
            
            all_fpr.append(fpr)
            all_tpr.append(tpr)
            all_auc.append(roc_auc)
            
            plt.plot(fpr, tpr, color=color, lw=2,
                    label=f'{class_name} (AUC = {roc_auc:.3f})')
        
        plt.plot([0, 1], [0, 1], 'k--', lw=2, label='Random')
        plt.xlim([0.0, 1.0])
        plt.ylim([0.0, 1.05])
        plt.xlabel('False Positive Rate', fontsize=12)
        plt.ylabel('True Positive Rate', fontsize=12)
        plt.title('ROC Curves', fontsize=16, fontweight='bold')
        plt.legend(loc='lower right', fontsize=10)
        plt.grid(True, alpha=0.3)
        plt.tight_layout()
        plt.savefig(self.output_dir / 'roc_curves.png', dpi=300)
        plt.close()
        
        print(f"ROC curves saved to {self.output_dir / 'roc_curves.png'}")
        print(f"Average AUC: {np.mean(all_auc):.4f}")
    
    def _plot_precision_recall_curves(self, y_true_onehot, y_pred_proba):
        """Plot Precision-Recall curves."""
        plt.figure(figsize=(12, 8))
        
        colors = plt.cm.rainbow(np.linspace(0, 1, self.num_classes))
        
        for i, (class_name, color) in enumerate(zip(self.classes, colors)):
            if self.num_classes > 2:
                precision, recall, _ = precision_recall_curve(
                    y_true_onehot[:, i], y_pred_proba[:, i]
                )
                ap = average_precision_score(y_true_onehot[:, i], y_pred_proba[:, i])
            else:
                precision, recall, _ = precision_recall_curve(
                    y_true_onehot.ravel(), y_pred_proba[:, 1]
                )
                ap = average_precision_score(y_true_onehot.ravel(), y_pred_proba[:, 1])
            
            plt.plot(recall, precision, color=color, lw=2,
                    label=f'{class_name} (AP = {ap:.3f})')
        
        plt.xlabel('Recall', fontsize=12)
        plt.ylabel('Precision', fontsize=12)
        plt.title('Precision-Recall Curves', fontsize=16, fontweight='bold')
        plt.legend(loc='lower left', fontsize=10)
        plt.grid(True, alpha=0.3)
        plt.tight_layout()
        plt.savefig(self.output_dir / 'precision_recall_curves.png', dpi=300)
        plt.close()
        
        print(f"Precision-Recall curves saved to {self.output_dir / 'precision_recall_curves.png'}")
    
    def _generate_classification_report(self, y_true, y_pred):
        """Generate and save classification report."""
        report = classification_report(
            y_true, y_pred,
            target_names=self.classes,
            output_dict=True
        )
        
        # Save as CSV
        report_df = pd.DataFrame(report).transpose()
        report_df.to_csv(self.output_dir / 'classification_report.csv')
        
        # Print report
        print("\nClassification Report:")
        print(classification_report(y_true, y_pred, target_names=self.classes))
        
        # Plot per-class metrics
        plt.figure(figsize=(14, 6))
        
        metrics_df = report_df.iloc[:-3]  # Exclude avg rows
        x = range(len(metrics_df))
        width = 0.25
        
        plt.bar([i - width for i in x], metrics_df['precision'], width, label='Precision', alpha=0.8)
        plt.bar(x, metrics_df['recall'], width, label='Recall', alpha=0.8)
        plt.bar([i + width for i in x], metrics_df['f1-score'], width, label='F1-Score', alpha=0.8)
        
        plt.xlabel('Class', fontsize=12)
        plt.ylabel('Score', fontsize=12)
        plt.title('Per-Class Performance Metrics', fontsize=16, fontweight='bold')
        plt.xticks(x, metrics_df.index, rotation=45, ha='right')
        plt.legend()
        plt.grid(True, alpha=0.3, axis='y')
        plt.tight_layout()
        plt.savefig(self.output_dir / 'per_class_metrics.png', dpi=300)
        plt.close()
    
    def _analyze_errors(self, X_test, y_test, y_true, y_pred, y_pred_proba):
        """Analyze prediction errors."""
        # Find misclassified samples
        errors = y_true != y_pred
        error_indices = np.where(errors)[0]
        
        error_analysis = []
        for idx in error_indices:
            error_analysis.append({
                'index': int(idx),
                'true_label': self.classes[y_true[idx]],
                'predicted_label': self.classes[y_pred[idx]],
                'confidence': float(np.max(y_pred_proba[idx])),
                'true_label_prob': float(y_pred_proba[idx][y_true[idx]])
            })
        
        # Save error analysis
        error_df = pd.DataFrame(error_analysis)
        error_df.to_csv(self.output_dir / 'error_analysis.csv', index=False)
        
        print(f"\nTotal errors: {len(error_indices)} ({len(error_indices)/len(y_true)*100:.2f}%)")
        
        # Plot confidence distribution of errors
        if len(error_analysis) > 0:
            confidences = [e['confidence'] for e in error_analysis]
            
            plt.figure(figsize=(10, 6))
            plt.hist(confidences, bins=20, edgecolor='white', alpha=0.7)
            plt.xlabel('Prediction Confidence', fontsize=12)
            plt.ylabel('Count', fontsize=12)
            plt.title('Confidence Distribution of Misclassified Samples', fontsize=16, fontweight='bold')
            plt.grid(True, alpha=0.3)
            plt.tight_layout()
            plt.savefig(self.output_dir / 'error_confidence_distribution.png', dpi=300)
            plt.close()
        
        # Most common error pairs
        if len(error_df) > 0:
            error_pairs = error_df.groupby(['true_label', 'predicted_label']).size()
            error_pairs = error_pairs.sort_values(ascending=False)
            
            print("\nMost Common Error Pairs:")
            print(error_pairs.head(10))
    
    def _save_metrics(self, metrics):
        """Save metrics to JSON file."""
        # Convert numpy types to Python types
        def convert_to_native(obj):
            if isinstance(obj, np.ndarray):
                return obj.tolist()
            elif isinstance(obj, np.integer):
                return int(obj)
            elif isinstance(obj, np.floating):
                return float(obj)
            elif isinstance(obj, tuple):
                return [convert_to_native(item) for item in obj]
            return obj
        
        metrics_native = {k: convert_to_native(v) for k, v in metrics.items()}
        
        with open(self.output_dir / 'metrics.json', 'w') as f:
            json.dump(metrics_native, f, indent=2)
        
        print(f"\nMetrics saved to {self.output_dir / 'metrics.json'}")


def main():
    """Main function to evaluate a trained model."""
    # Load test data
    from load_dataset import SpeechRecognitionDataset
    
    # Initialize evaluator
    evaluator = ModelEvaluator(
        model_path='models/best_model.h5',
        label_encoder_path='models/label_encoder.pkl',
        output_dir='evaluation'
    )
    
    # Load test features (you would need to prepare these)
    print("Please prepare your test data (X_test, y_test) and run evaluation.")
    print("Example:")
    print("  evaluator.evaluate(X_test, y_test)")
    
    print("\nEvaluation script ready!")


if __name__ == '__main__':
    main()

372 lines•13.9 KB
python

About RSK World

Founded by Molla Samser, with Designer & Tester Rima Khatun, RSK World is your one-stop destination for free programming resources, source code, and development tools.

Founder: Molla Samser
Designer & Tester: Rima Khatun

Development

  • Game Development
  • Web Development
  • Mobile Development
  • AI Development
  • Development Tools

Legal

  • Terms & Conditions
  • Privacy Policy
  • Disclaimer

Contact Info

Nutanhat, Mongolkote
Purba Burdwan, West Bengal
India, 713147

+91 93305 39277

hello@rskworld.in
support@rskworld.in

© 2026 RSK World. All rights reserved.

Content used for educational purposes only. View Disclaimer