help@rskworld.in +91 93305 39277
RSK World
  • Home
  • Development
    • Web Development
    • Mobile Apps
    • Software
    • Games
    • Project
  • Technologies
    • Data Science
    • AI Development
    • Cloud Development
    • Blockchain
    • Cyber Security
    • Dev Tools
    • Testing Tools
  • About
  • Contact

Theme Settings

Color Scheme
Display Options
Font Size
100%
Back to Project
RSK World
statsmodels-statistical
RSK World
statsmodels-statistical
Statistical Modeling with Statsmodels
statsmodels-statistical
  • __pycache__
  • data
  • examples
  • notebooks
  • .gitignore458 B
  • CHANGELOG.md4 KB
  • FEATURES.md6.3 KB
  • LICENSE1.2 KB
  • PROJECT_INFO.md2.2 KB
  • PROJECT_SUMMARY.md4.2 KB
  • README.md7.4 KB
  • RELEASE_NOTES_v1.0.0.md6.5 KB
  • UNIQUE_FEATURES.md5.3 KB
  • advanced_time_series.py9.8 KB
  • automated_reporting.py8.3 KB
  • bayesian_statistics.py7.5 KB
  • data_preprocessing.py8.2 KB
  • econometric_modeling.py9.8 KB
  • hypothesis_testing.py12.5 KB
  • index.html10.8 KB
  • model_evaluation.py9.1 KB
  • model_persistence.py6.5 KB
  • model_selection.py9.7 KB
  • panel_data_analysis.py7.3 KB
  • performance_benchmarking.py7.3 KB
  • regression_analysis.py9 KB
  • requirements.txt361 B
  • statistical_diagnostics.py13.8 KB
  • statsmodels-statistical.png284 B
  • time_series_analysis.py10.3 KB
  • visualization_utils.py8.9 KB
regression_analysis.py
regression_analysis.py
Raw Download
Find: Go to:
"""
Linear and Generalized Linear Regression Analysis using Statsmodels

Author: RSK World
Website: https://rskworld.in
Email: help@rskworld.in
Phone: +91 93305 39277
"""

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.formula.api import ols
from statsmodels.api import OLS, add_constant
from statsmodels.genmod.generalized_linear_model import GLM
from statsmodels.genmod import families
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.stats.diagnostic import het_breuschpagan
from statsmodels.stats.stattools import durbin_watson
import warnings
warnings.filterwarnings('ignore')


class LinearRegressionModel:
    """
    Linear Regression Model using Statsmodels OLS
    
    Author: RSK World
    Website: https://rskworld.in
    Email: help@rskworld.in
    Phone: +91 93305 39277
    """
    
    def __init__(self):
        self.model = None
        self.results = None
        self.X = None
        self.y = None
        
    def fit(self, X, y, add_intercept=True):
        """
        Fit linear regression model
        
        Parameters:
        -----------
        X : array-like
            Independent variables
        y : array-like
            Dependent variable
        add_intercept : bool
            Whether to add intercept term
        """
        self.X = X
        self.y = y
        
        if add_intercept:
            X_with_const = add_constant(X)
        else:
            X_with_const = X
            
        self.model = OLS(y, X_with_const)
        self.results = self.model.fit()
        return self.results
    
    def summary(self):
        """Print model summary"""
        if self.results is not None:
            print(self.results.summary())
        else:
            print("Model not fitted yet. Call fit() first.")
    
    def predict(self, X_new, add_intercept=True):
        """
        Make predictions on new data
        
        Parameters:
        -----------
        X_new : array-like
            New independent variables
        add_intercept : bool
            Whether to add intercept term
        """
        if self.results is None:
            raise ValueError("Model not fitted yet. Call fit() first.")
        
        if add_intercept:
            X_new = add_constant(X_new, has_constant='add')
        
        return self.results.predict(X_new)
    
    def get_residuals(self):
        """Get model residuals"""
        if self.results is None:
            raise ValueError("Model not fitted yet. Call fit() first.")
        return self.results.resid
    
    def get_fitted_values(self):
        """Get fitted values"""
        if self.results is None:
            raise ValueError("Model not fitted yet. Call fit() first.")
        return self.results.fittedvalues
    
    def plot_residuals(self):
        """Plot residual analysis"""
        if self.results is None:
            raise ValueError("Model not fitted yet. Call fit() first.")
        
        residuals = self.get_residuals()
        fitted = self.get_fitted_values()
        
        fig, axes = plt.subplots(2, 2, figsize=(12, 10))
        
        # Residuals vs Fitted
        axes[0, 0].scatter(fitted, residuals, alpha=0.6)
        axes[0, 0].axhline(y=0, color='r', linestyle='--')
        axes[0, 0].set_xlabel('Fitted Values')
        axes[0, 0].set_ylabel('Residuals')
        axes[0, 0].set_title('Residuals vs Fitted')
        axes[0, 0].grid(True, alpha=0.3)
        
        # Q-Q Plot
        from scipy import stats
        stats.probplot(residuals, dist="norm", plot=axes[0, 1])
        axes[0, 1].set_title('Q-Q Plot')
        axes[0, 1].grid(True, alpha=0.3)
        
        # Residuals Histogram
        axes[1, 0].hist(residuals, bins=30, edgecolor='black', alpha=0.7)
        axes[1, 0].set_xlabel('Residuals')
        axes[1, 0].set_ylabel('Frequency')
        axes[1, 0].set_title('Residuals Distribution')
        axes[1, 0].grid(True, alpha=0.3)
        
        # Scale-Location Plot
        sqrt_abs_residuals = np.sqrt(np.abs(residuals))
        axes[1, 1].scatter(fitted, sqrt_abs_residuals, alpha=0.6)
        axes[1, 1].set_xlabel('Fitted Values')
        axes[1, 1].set_ylabel('√|Standardized Residuals|')
        axes[1, 1].set_title('Scale-Location Plot')
        axes[1, 1].grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
    
    def check_multicollinearity(self):
        """Check for multicollinearity using VIF"""
        if self.results is None:
            raise ValueError("Model not fitted yet. Call fit() first.")
        
        X_with_const = add_constant(self.X) if hasattr(self.X, 'shape') else add_constant(pd.DataFrame(self.X))
        vif_data = pd.DataFrame()
        vif_data["Variable"] = X_with_const.columns
        vif_data["VIF"] = [variance_inflation_factor(X_with_const.values, i) 
                          for i in range(X_with_const.shape[1])]
        
        print("\nVariance Inflation Factor (VIF):")
        print(vif_data)
        print("\nVIF > 10 indicates multicollinearity")
        return vif_data
    
    def check_heteroscedasticity(self):
        """Check for heteroscedasticity using Breusch-Pagan test"""
        if self.results is None:
            raise ValueError("Model not fitted yet. Call fit() first.")
        
        lm, lm_pvalue, fvalue, f_pvalue = het_breuschpagan(self.results.resid, 
                                                           self.results.model.exog)
        
        print("\nBreusch-Pagan Test for Heteroscedasticity:")
        print(f"LM Statistic: {lm:.4f}")
        print(f"LM p-value: {lm_pvalue:.4f}")
        print(f"F Statistic: {fvalue:.4f}")
        print(f"F p-value: {f_pvalue:.4f}")
        
        if f_pvalue < 0.05:
            print("Warning: Heteroscedasticity detected (p < 0.05)")
        else:
            print("No significant heteroscedasticity detected")
        
        return {'lm': lm, 'lm_pvalue': lm_pvalue, 'fvalue': fvalue, 'f_pvalue': f_pvalue}
    
    def check_autocorrelation(self):
        """Check for autocorrelation using Durbin-Watson test"""
        if self.results is None:
            raise ValueError("Model not fitted yet. Call fit() first.")
        
        dw = durbin_watson(self.results.resid)
        
        print("\nDurbin-Watson Test for Autocorrelation:")
        print(f"Durbin-Watson Statistic: {dw:.4f}")
        
        if dw < 1.5:
            print("Warning: Positive autocorrelation detected")
        elif dw > 2.5:
            print("Warning: Negative autocorrelation detected")
        else:
            print("No significant autocorrelation detected")
        
        return dw


class GLMModel:
    """
    Generalized Linear Model using Statsmodels GLM
    
    Author: RSK World
    Website: https://rskworld.in
    Email: help@rskworld.in
    Phone: +91 93305 39277
    """
    
    def __init__(self, family=families.Gaussian()):
        self.family = family
        self.model = None
        self.results = None
        self.X = None
        self.y = None
    
    def fit(self, X, y, add_intercept=True):
        """
        Fit GLM model
        
        Parameters:
        -----------
        X : array-like
            Independent variables
        y : array-like
            Dependent variable
        add_intercept : bool
            Whether to add intercept term
        """
        self.X = X
        self.y = y
        
        if add_intercept:
            X_with_const = add_constant(X)
        else:
            X_with_const = X
        
        self.model = GLM(y, X_with_const, family=self.family)
        self.results = self.model.fit()
        return self.results
    
    def summary(self):
        """Print model summary"""
        if self.results is not None:
            print(self.results.summary())
        else:
            print("Model not fitted yet. Call fit() first.")
    
    def predict(self, X_new, add_intercept=True):
        """Make predictions on new data"""
        if self.results is None:
            raise ValueError("Model not fitted yet. Call fit() first.")
        
        if add_intercept:
            X_new = add_constant(X_new, has_constant='add')
        
        return self.results.predict(X_new)


if __name__ == "__main__":
    # Example usage
    print("Linear Regression Analysis Example")
    print("=" * 50)
    
    # Generate sample data
    np.random.seed(42)
    n = 100
    X = np.random.randn(n, 3)
    y = 2 + 1.5 * X[:, 0] + 0.8 * X[:, 1] - 0.5 * X[:, 2] + np.random.randn(n) * 0.5
    
    # Create and fit model
    model = LinearRegressionModel()
    model.fit(X, y)
    
    # Print summary
    model.summary()
    
    # Diagnostic plots
    model.plot_residuals()
    
    # Check assumptions
    model.check_multicollinearity()
    model.check_heteroscedasticity()
    model.check_autocorrelation()

289 lines•9 KB
python

About RSK World

Founded by Molla Samser, with Designer & Tester Rima Khatun, RSK World is your one-stop destination for free programming resources, source code, and development tools.

Founder: Molla Samser
Designer & Tester: Rima Khatun

Development

  • Game Development
  • Web Development
  • Mobile Development
  • AI Development
  • Development Tools

Legal

  • Terms & Conditions
  • Privacy Policy
  • Disclaimer

Contact Info

Nutanhat, Mongolkote
Purba Burdwan, West Bengal
India, 713147

+91 93305 39277

hello@rskworld.in
support@rskworld.in

© 2026 RSK World. All rights reserved.

Content used for educational purposes only. View Disclaimer