help@rskworld.in +91 93305 39277
RSK World
  • Home
  • Development
    • Web Development
    • Mobile Apps
    • Software
    • Games
    • Project
  • Technologies
    • Data Science
    • AI Development
    • Cloud Development
    • Blockchain
    • Cyber Security
    • Dev Tools
    • Testing Tools
  • About
  • Contact

Theme Settings

Color Scheme
Display Options
Font Size
100%
Back to Project
RSK World
statsmodels-statistical
RSK World
statsmodels-statistical
Statistical Modeling with Statsmodels
statsmodels-statistical
  • __pycache__
  • data
  • examples
  • notebooks
  • .gitignore458 B
  • CHANGELOG.md4 KB
  • FEATURES.md6.3 KB
  • LICENSE1.2 KB
  • PROJECT_INFO.md2.2 KB
  • PROJECT_SUMMARY.md4.2 KB
  • README.md7.4 KB
  • RELEASE_NOTES_v1.0.0.md6.5 KB
  • UNIQUE_FEATURES.md5.3 KB
  • advanced_time_series.py9.8 KB
  • automated_reporting.py8.3 KB
  • bayesian_statistics.py7.5 KB
  • data_preprocessing.py8.2 KB
  • econometric_modeling.py9.8 KB
  • hypothesis_testing.py12.5 KB
  • index.html10.8 KB
  • model_evaluation.py9.1 KB
  • model_persistence.py6.5 KB
  • model_selection.py9.7 KB
  • panel_data_analysis.py7.3 KB
  • performance_benchmarking.py7.3 KB
  • regression_analysis.py9 KB
  • requirements.txt361 B
  • statistical_diagnostics.py13.8 KB
  • statsmodels-statistical.png284 B
  • time_series_analysis.py10.3 KB
  • visualization_utils.py8.9 KB
model_evaluation.py
model_evaluation.py
Raw Download
Find: Go to:
"""
Model Evaluation and Cross-Validation Utilities

Author: RSK World
Website: https://rskworld.in
Email: help@rskworld.in
Phone: +91 93305 39277
"""

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import KFold, TimeSeriesSplit
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import warnings
warnings.filterwarnings('ignore')


class ModelEvaluation:
    """
    Model Evaluation and Cross-Validation Tools
    
    Author: RSK World
    Website: https://rskworld.in
    Email: help@rskworld.in
    Phone: +91 93305 39277
    """
    
    def __init__(self):
        self.cv_results = {}
    
    def cross_validate(self, X, y, model_func, cv_folds=5, scoring='mse'):
        """
        Perform k-fold cross-validation
        
        Parameters:
        -----------
        X : array-like
            Independent variables
        y : array-like
            Dependent variable
        model_func : callable
            Function that returns fitted model
        cv_folds : int
            Number of folds
        scoring : str
            Scoring metric ('mse', 'mae', 'r2')
        """
        kf = KFold(n_splits=cv_folds, shuffle=True, random_state=42)
        scores = []
        
        for fold, (train_idx, test_idx) in enumerate(kf.split(X)):
            X_train, X_test = X[train_idx], X[test_idx]
            y_train, y_test = y[train_idx], y[test_idx]
            
            model = model_func(X_train, y_train)
            y_pred = model.predict(X_test)
            
            if scoring == 'mse':
                score = mean_squared_error(y_test, y_pred)
            elif scoring == 'mae':
                score = mean_absolute_error(y_test, y_pred)
            elif scoring == 'r2':
                score = r2_score(y_test, y_pred)
            else:
                raise ValueError("scoring must be 'mse', 'mae', or 'r2'")
            
            scores.append(score)
        
        results = {
            'mean': np.mean(scores),
            'std': np.std(scores),
            'scores': scores
        }
        
        print(f"Cross-Validation Results ({scoring.upper()}):")
        print(f"Mean: {results['mean']:.4f}")
        print(f"Std: {results['std']:.4f}")
        print(f"Individual scores: {[f'{s:.4f}' for s in scores]}")
        
        return results
    
    def time_series_cv(self, X, y, model_func, n_splits=5, scoring='mse'):
        """
        Time series cross-validation
        
        Parameters:
        -----------
        X : array-like
            Independent variables
        y : array-like
            Dependent variable
        model_func : callable
            Function that returns fitted model
        n_splits : int
            Number of splits
        scoring : str
            Scoring metric
        """
        tscv = TimeSeriesSplit(n_splits=n_splits)
        scores = []
        
        for fold, (train_idx, test_idx) in enumerate(tscv.split(X)):
            X_train, X_test = X[train_idx], X[test_idx]
            y_train, y_test = y[train_idx], y[test_idx]
            
            model = model_func(X_train, y_train)
            y_pred = model.predict(X_test)
            
            if scoring == 'mse':
                score = mean_squared_error(y_test, y_pred)
            elif scoring == 'mae':
                score = mean_absolute_error(y_test, y_pred)
            elif scoring == 'r2':
                score = r2_score(y_test, y_pred)
            else:
                raise ValueError("scoring must be 'mse', 'mae', or 'r2'")
            
            scores.append(score)
            print(f"Fold {fold + 1}: {scoring.upper()} = {score:.4f}")
        
        results = {
            'mean': np.mean(scores),
            'std': np.std(scores),
            'scores': scores
        }
        
        print(f"\nMean {scoring.upper()}: {results['mean']:.4f} ± {results['std']:.4f}")
        
        return results
    
    def calculate_metrics(self, y_true, y_pred):
        """
        Calculate multiple evaluation metrics
        
        Parameters:
        -----------
        y_true : array-like
            True values
        y_pred : array-like
            Predicted values
        """
        mse = mean_squared_error(y_true, y_pred)
        rmse = np.sqrt(mse)
        mae = mean_absolute_error(y_true, y_pred)
        r2 = r2_score(y_true, y_pred)
        
        # Mean Absolute Percentage Error
        mape = np.mean(np.abs((y_true - y_pred) / y_true)) * 100
        
        metrics = {
            'MSE': mse,
            'RMSE': rmse,
            'MAE': mae,
            'R²': r2,
            'MAPE': mape
        }
        
        print("Model Evaluation Metrics:")
        print("=" * 50)
        for metric, value in metrics.items():
            print(f"{metric}: {value:.4f}")
        
        return metrics
    
    def plot_prediction_comparison(self, y_true, y_pred, title="Prediction Comparison"):
        """
        Plot actual vs predicted values
        
        Parameters:
        -----------
        y_true : array-like
            True values
        y_pred : array-like
            Predicted values
        title : str
            Plot title
        """
        fig, axes = plt.subplots(1, 2, figsize=(14, 5))
        
        # Scatter plot
        axes[0].scatter(y_true, y_pred, alpha=0.6)
        min_val = min(min(y_true), min(y_pred))
        max_val = max(max(y_true), max(y_pred))
        axes[0].plot([min_val, max_val], [min_val, max_val], 'r--', lw=2)
        axes[0].set_xlabel('Actual')
        axes[0].set_ylabel('Predicted')
        axes[0].set_title('Actual vs Predicted')
        axes[0].grid(True, alpha=0.3)
        
        # Residuals plot
        residuals = y_true - y_pred
        axes[1].scatter(y_pred, residuals, alpha=0.6)
        axes[1].axhline(y=0, color='r', linestyle='--')
        axes[1].set_xlabel('Predicted')
        axes[1].set_ylabel('Residuals')
        axes[1].set_title('Residuals Plot')
        axes[1].grid(True, alpha=0.3)
        
        plt.suptitle(title, fontsize=14, y=1.02)
        plt.tight_layout()
        plt.show()
    
    def learning_curve(self, X, y, model_func, train_sizes=None, cv=5):
        """
        Generate learning curve
        
        Parameters:
        -----------
        X : array-like
            Independent variables
        y : array-like
            Dependent variable
        model_func : callable
            Function that returns fitted model
        train_sizes : array-like
            Training set sizes
        cv : int
            Number of CV folds
        """
        if train_sizes is None:
            train_sizes = np.linspace(0.1, 1.0, 10)
        
        train_scores = []
        val_scores = []
        
        for size in train_sizes:
            n_samples = int(size * len(X))
            X_train = X[:n_samples]
            y_train = y[:n_samples]
            
            # Train score
            model = model_func(X_train, y_train)
            y_pred_train = model.predict(X_train)
            train_score = r2_score(y_train, y_pred_train)
            train_scores.append(train_score)
            
            # Validation score (using remaining data)
            if n_samples < len(X):
                X_val = X[n_samples:]
                y_val = y[n_samples:]
                y_pred_val = model.predict(X_val)
                val_score = r2_score(y_val, y_pred_val)
                val_scores.append(val_score)
            else:
                val_scores.append(train_score)
        
        plt.figure(figsize=(10, 6))
        plt.plot(train_sizes, train_scores, 'o-', label='Training Score', linewidth=2)
        plt.plot(train_sizes, val_scores, 'o-', label='Validation Score', linewidth=2)
        plt.xlabel('Training Set Size')
        plt.ylabel('R² Score')
        plt.title('Learning Curve')
        plt.legend()
        plt.grid(True, alpha=0.3)
        plt.tight_layout()
        plt.show()
        
        return train_scores, val_scores


if __name__ == "__main__":
    # Example usage
    print("Model Evaluation Example")
    print("=" * 70)
    
    from regression_analysis import LinearRegressionModel
    
    # Generate sample data
    np.random.seed(42)
    n = 100
    X = np.random.randn(n, 3)
    y = 2 + 1.5 * X[:, 0] + 0.8 * X[:, 1] - 0.5 * X[:, 2] + np.random.randn(n) * 0.5
    
    # Create evaluation object
    evaluator = ModelEvaluation()
    
    # Define model function
    def create_model(X_train, y_train):
        model = LinearRegressionModel()
        model.fit(X_train, y_train)
        return model
    
    # Cross-validation
    cv_results = evaluator.cross_validate(X, y, create_model, cv_folds=5, scoring='mse')
    
    # Calculate metrics
    model = create_model(X, y)
    y_pred = model.predict(X)
    metrics = evaluator.calculate_metrics(y, y_pred)
    
    # Plot comparison
    evaluator.plot_prediction_comparison(y, y_pred)

295 lines•9.1 KB
python

About RSK World

Founded by Molla Samser, with Designer & Tester Rima Khatun, RSK World is your one-stop destination for free programming resources, source code, and development tools.

Founder: Molla Samser
Designer & Tester: Rima Khatun

Development

  • Game Development
  • Web Development
  • Mobile Development
  • AI Development
  • Development Tools

Legal

  • Terms & Conditions
  • Privacy Policy
  • Disclaimer

Contact Info

Nutanhat, Mongolkote
Purba Burdwan, West Bengal
India, 713147

+91 93305 39277

hello@rskworld.in
support@rskworld.in

© 2026 RSK World. All rights reserved.

Content used for educational purposes only. View Disclaimer