RSK World - Statistical Data Analysis with Seaborn Documentation | Seaborn Visualization | Statistical Visualization | Python | Data Science | EDA | RSK World

Statistical Data Analysis with Seaborn - Project Description | Seaborn Visualization | Statistical Visualization

This project creates comprehensive statistical visualizations using Seaborn for data analysis and exploratory data analysis. The statistical visualization project includes correlation matrices, distribution plots, box plots, violin plots, pair plots, heatmaps, regression plots, Q-Q plots, clustermaps, residual analysis, ridge plots, time series analysis, and advanced heatmaps. Perfect for exploratory data analysis, statistical insights, data science education, research projects, portfolio projects, and teaching statistical concepts. The system provides 16+ visualization types, high-resolution outputs (300 DPI), comprehensive statistical analysis, and professional quality visualizations.

The statistical data analysis project features correlation matrix heatmaps with hierarchical clustering, distribution plots with KDE (Kernel Density Estimation), box plots and violin plots for quartile analysis and outlier detection, pair plots for multivariate analysis, regression plots with confidence intervals, Q-Q plots for normality testing, clustermaps with dendrograms, residual analysis for regression diagnostics, ridge plots for overlapping density distributions, time series analysis for temporal trends, and advanced heatmaps with pivot tables. Built with Python, Seaborn, Matplotlib, Pandas, NumPy, SciPy, Scikit-learn, and Jupyter Notebook for powerful statistical analysis and data visualization capabilities.

Statistical Data Analysis Screenshots | Seaborn Visualization Images | Statistical Visualization Examples

1 / 4

Statistical Data Analysis with Seaborn - Seaborn Visualization - Statistical Visualization - Python - Data Science - EDA - RSK World

Statistical Data Analysis Core Features | Seaborn Visualization Features | Statistical Visualization Features

Correlation Analysis

Correlation matrix heatmaps
Hierarchical clustering
Pattern discovery
Variable relationships
Clustermap visualization

Distribution Plots

Histograms with KDE
Density plots
Q-Q plots for normality
Category-wise distributions
Ridge plots (Joy plots)

Box & Violin Plots

Quartile analysis
Outlier detection
Distribution comparison
Category-wise analysis
Statistical summaries

Pair Plots

Multivariate analysis
Scatter plot matrices
Pairwise relationships
Regression lines
Category coloring

Regression Analysis

Regression plots
Confidence intervals
Residual analysis
Model validation
Diagnostic plots

High-Resolution Outputs

300 DPI PNG images
Professional quality
Presentation-ready
Publication quality
Descriptive filenames

Advanced Analytics Dashboard Features | Statistical Analysis Features | Data Transformation Tools

Date Range Presets

Last 7/30/90 days
Last 6 months / Last year
This month / This year
Custom date range
Quick filtering

Trend Analysis

Linear regression analysis
R-squared calculation
P-value statistics
Trend direction identification

Data Transformation

Data normalization
Missing value handling
Duplicate removal
Data sampling

Advanced Statistics

Descriptive statistics
Distribution analysis
Outlier detection
Skewness & Kurtosis
Statistical measures

Web Interface Features | Dashboard Features | Interactive Dashboard Capabilities

Feature	Description	Usage
Interactive Filtering	Filter data by Region, Product, Category, Date Range with presets	Use sidebar filters and date presets to filter data
Real-time Exploration	Interactive widgets update charts instantly	All charts and metrics update automatically as you interact
Data Export	Export filtered data to CSV, Excel, or JSON	Click Export buttons to download data or charts as PNG
Interactive Charts	10+ chart types with hover details	Hover over charts to see detailed information
Data Table	View, search, filter, and sort data	Data table displays filtered data with pagination
Statistical Analysis	Advanced statistics and trend analysis	View Summary Stats, Advanced Stats, and Trend Analysis tabs

Technologies Used | Python Technologies | Data Science Stack | Statistical Analysis Tools

This Statistical Data Analysis with Seaborn project is built using modern statistical visualization and data science technologies. The core implementation uses Python 3.8+ as the primary programming language and Seaborn for creating statistical visualizations. The project includes Pandas for data manipulation, NumPy for numerical computing, Matplotlib for plotting, SciPy for statistical analysis, and Scikit-learn for machine learning. The statistical visualization project features correlation matrices, distribution plots, box plots, violin plots, pair plots, heatmaps, regression plots, Q-Q plots, clustermaps, and comprehensive statistical analysis tools for exploratory data analysis and data science applications.

The project uses Seaborn library for creating statistical visualizations with Python and Jupyter Notebook. It supports correlation matrix heatmaps with hierarchical clustering, distribution plots with KDE (Kernel Density Estimation), box plots and violin plots for quartile analysis and outlier detection, pair plots for multivariate analysis, regression plots with confidence intervals, Q-Q plots for normality testing, clustermaps with dendrograms, residual analysis for regression diagnostics, ridge plots for overlapping density distributions, time series analysis for temporal trends, and advanced heatmaps with pivot tables. The system includes high-resolution PNG outputs (300 DPI), comprehensive statistical analysis, and professional quality visualizations for exploratory data analysis, statistical insights, data science education, and research projects.

Python 3.8+ Seaborn Pandas NumPy Matplotlib Statistical Analysis Jupyter Notebook SciPy Scikit-learn Exploratory Data Analysis

Installation & Usage Guide | How to Install Statistical Data Analysis | Project Setup Tutorial

Installation

Install all required dependencies for the Statistical Data Analysis with Seaborn project:

# Install all requirements
pip install -r requirements.txt

# Required packages:
# - numpy>=1.21.0
# - pandas>=1.3.0
# - matplotlib>=3.4.0
# - seaborn>=0.11.0
# - scipy>=1.7.0
# - scikit-learn>=1.0.0
# - jupyter>=1.0.0
# - notebook>=6.4.0

Running the Project

Start the statistical analysis project:

# Option 1: Jupyter Notebook (Recommended)
jupyter notebook statistical_analysis.ipynb

# Option 2: Python Script
python main.py

# Option 3: Generate example data first
python create_example_data.py
python main.py

# The notebook will open in your default browser
# Run all cells to generate all visualizations
# All plots are saved as high-resolution PNG images (300 DPI)

Using Your Own Data

Load your own CSV file for analysis:

# Load your own CSV file:
import pandas as pd
from visualization_utils import *

# Load your data
df = pd.read_csv('your_data.csv')

# Setup plot style
setup_plot_style()

# Create visualizations
create_correlation_heatmap(df)
create_distribution_plots(df, ['Column1', 'Column2', 'Column3'])
create_box_plots(df, ['Column1', 'Column2'], 'Category')
create_violin_plots(df, ['Column1', 'Column2'], 'Category')
create_pair_plot(df, ['Column1', 'Column2', 'Column3'], 'Category')

# All visualizations are saved as PNG images

Project Features

Explore the statistical visualization features:

# Visualization Features:
# 1. Correlation Matrix Heatmaps - Variable relationships
# 2. Distribution Plots - Histograms with KDE
# 3. Box & Violin Plots - Quartile analysis and outliers
# 4. Pair Plots - Multivariate analysis
# 5. Regression Plots - With confidence intervals
# 6. Q-Q Plots - Normality testing
# 7. Clustermap - Hierarchical clustering
# 8. Residual Analysis - Regression diagnostics
# 9. Ridge Plots - Overlapping densities
# 10. Time Series Analysis - Temporal trends
# 11. Advanced Heatmaps - Pivot tables
# 12. Categorical Plots - Count and bar plots
# 13. Facet Grids - Multi-panel visualizations
# 14. Statistical Summary - Comprehensive overview

# All visualizations are saved as 300 DPI PNG images

Configuration

Customize visualization settings in visualization_utils.py:

# Customize plot style in visualization_utils.py:
# - Figure size and DPI
# - Color schemes
# - Font settings
# - Style themes
# - Save paths

# Modify setup_plot_style() function:
# - Change figure size
# - Adjust DPI (default: 300)
# - Customize color palette
# - Set style theme

# Or modify individual plot functions:
# - Change color schemes
# - Adjust plot dimensions
# - Customize labels and titles
# - Modify save paths

Project Structure | Dashboard File Structure | Source Code Organization

                streamlit-dashboard/

                ├── README.md                          # Main documentation

                ├── requirements.txt                   # Python dependencies

                ├── LICENSE                            # License file

                ├── RELEASE_NOTES.md                   # Release notes

                ├── PROJECT_INFO.md                    # Project information

                ├── FEATURES.md                        # Features documentation

                │

                ├── Core Application

                │   ├── app.py                         # Main Streamlit application

                │   ├── config.py                      # Configuration settings

                │   ├── utils.py                       # Utility functions

                │   └── visualizations.py              # Visualization functions

                │

                ├── Data

                │   └── sample_data.csv                # Sample data (auto-generated)

                │

                ├── Scripts

                │   ├── run.bat                        # Windows run script

                │   └── run.sh                         # Linux/Mac run script

                │

                ├── .streamlit/

                │   └── config.toml                    # Streamlit configuration

                │

                └── .gitignore                         # Git ignore file

Configuration Options | Dashboard Configuration | Customization Guide

Dashboard Configuration

Customize dashboard settings in app.py and .streamlit/config.toml:

# Streamlit Page Configuration (app.py)
st.set_page_config(
    page_title="Analytics Dashboard - RSK World",
    page_icon="📊",
    layout="wide",
    initial_sidebar_state="expanded"
)

# Streamlit Configuration (.streamlit/config.toml)
[theme]
primaryColor = "#3498db"
backgroundColor = "#ffffff"
secondaryBackgroundColor = "#f0f2f6"
textColor = "#262730"
font = "sans serif"

[server]
port = 8501
address = "localhost"

# Chart Colors (in app.py)
COLOR_PRIMARY = '#3498db'
COLOR_SUCCESS = '#27ae60'
COLOR_DANGER = '#e74c3c'
COLOR_WARNING = '#f39c12'
COLOR_INFO = '#17a2b8'

Configuration Tips:

PORT: Server port. Default: 8501. Change in .streamlit/config.toml if port is already in use
ADDRESS: Server address. 'localhost' for local only, '0.0.0.0' allows network access
THEME: Customize colors and fonts in .streamlit/config.toml
LAYOUT: Change layout to 'centered' or 'wide' in st.set_page_config()
COLOR_*: Customize chart colors by modifying color constants in app.py
PAGE_TITLE/ICON: Modify in st.set_page_config() for custom branding

Data Format Requirements

Your CSV file can have flexible structure. Recommended columns for best experience:

# Recommended CSV columns (flexible):
# Date,Region,Product,Category,Sales,Revenue,Customers

# Example data:
Date,Region,Product,Category,Sales,Revenue,Customers
2023-01-01,North,Product A,Electronics,100,5000.0,50
2023-01-01,South,Product B,Clothing,50,1500.0,25
2023-01-02,East,Product C,Food,200,2000.0,100

# Column descriptions:
# - Date: Date in YYYY-MM-DD format (optional but recommended)
# - Region: Geographic region (text, optional)
# - Product: Product name (text, optional)
# - Category: Product category (text, optional)
# - Numeric columns: Any numeric columns for analysis (Sales, Revenue, etc.)

# Note: Dashboard automatically detects column types
# Works with any CSV structure - upload and explore!

Customizing Charts

Modify chart configurations in visualizations.py or app.py:

# Chart customization in app.py or visualizations.py:

# Change chart colors:
fig.update_traces(marker_color='#3498db')  # Bar chart color
fig.update_layout(colorway=['#3498db', '#27ae60', '#e74c3c'])

# Modify chart titles:
fig.update_layout(title='Your Custom Title')

# Adjust chart size:
fig.update_layout(height=400, width=800)

# Change color scales:
color_continuous_scale='Blues'  # For bar charts
color_continuous_scale='Viridis'  # For heatmaps

# Customize hover information:
fig.update_traces(hovertemplate='Value: $%{y:,.2f}
Date: %{x}')

Adding Custom Charts

Add new visualizations to the dashboard:

# Add new chart to app.py:

# 1. Add chart in main area:
st.plotly_chart(fig, use_container_width=True)

# 2. Create chart function:
def create_custom_chart(df):
    # Your chart logic here
    fig = px.bar(df, x='Column1', y='Column2', 
                 title='Your Custom Chart')
    return fig

# 3. Use in main app:
if st.checkbox('Show Custom Chart'):
    filtered_df = apply_filters(df)
    fig = create_custom_chart(filtered_df)
    st.plotly_chart(fig, use_container_width=True)

# Or add to visualizations.py for reuse

Detailed Architecture | Dashboard Architecture | System Architecture | Technical Architecture

Dashboard Architecture

1. Streamlit Framework:

Built on Python web framework
Uses React.js for frontend components
Server-side rendering with Python scripts
Real-time updates via widget interactions
Interactive components (selectboxes, date inputs, buttons, file uploaders)

2. Data Processing Pipeline:

Pandas DataFrame for data manipulation
CSV file loading and parsing
Date parsing and filtering
Data aggregation and grouping
Real-time filtering based on user selections

3. Visualization Components:

Plotly Express for quick chart creation
Plotly Graph Objects for advanced customization
Interactive charts with hover tooltips
Responsive chart sizing
Multiple chart types (line, bar, pie, scatter, area, heatmap)

Streamlit Widget System

The dashboard uses Streamlit widgets for real-time updates:

# Streamlit Widget Structure:
# Widgets in sidebar or main area
region = st.selectbox('Select Region', options=['All', 'North', 'South'])
start_date = st.date_input('Start Date', value=datetime(2023, 1, 1))
end_date = st.date_input('End Date', value=datetime(2023, 12, 31))

# Filter data based on widget values
filtered_df = filter_data(df, region, start_date, end_date)

# Create visualization
fig = create_chart(filtered_df)

# Display chart (updates automatically when widgets change)
st.plotly_chart(fig, use_container_width=True)

# Streamlit Flow:
# 1. User interacts with widget (selectbox, date_input, etc.)
# 2. Script re-runs automatically
# 3. Data is filtered and processed
# 4. Chart is updated and displayed
# 5. Dashboard reflects changes in real-time

Data Filtering Logic

How the dashboard filters data based on user selections:

# Filter Function:
def filter_data(region, product, category, start_date, end_date):
    filtered_df = df.copy()
    
    # Apply region filter
    if region != 'All':
        filtered_df = filtered_df[filtered_df['Region'] == region]
    
    # Apply product filter
    if product != 'All':
        filtered_df = filtered_df[filtered_df['Product'] == product]
    
    # Apply category filter
    if category != 'All':
        filtered_df = filtered_df[filtered_df['Category'] == category]
    
    # Apply date range filter
    filtered_df = filtered_df[
        (filtered_df['Date'] >= start_date) & 
        (filtered_df['Date'] <= end_date)
    ]
    
    return filtered_df

# All charts and KPIs use the same filtered data
# Ensures consistency across all dashboard components

Data Quality Metrics Calculation

How data quality metrics are calculated from filtered data:

# Data Quality Metrics:
filtered_df = filter_data(region, product, category, start_date, end_date)

# Total Rows and Columns
total_rows = len(filtered_df)
total_columns = len(filtered_df.columns)

# Missing Values
missing_count = filtered_df.isnull().sum()
missing_percentage = (missing_count / total_rows) * 100

# Duplicate Rows
duplicate_count = filtered_df.duplicated().sum()

# Column Types
numeric_cols = filtered_df.select_dtypes(include=[np.number]).columns.tolist()
text_cols = filtered_df.select_dtypes(include=['object']).columns.tolist()
date_cols = filtered_df.select_dtypes(include=['datetime']).columns.tolist()

# All metrics update automatically when filters change

Chart Types and Usage

Different chart types used in the dashboard:

Line Charts: Time series trends using px.line()
Bar Charts: Categorical comparisons using px.bar()
Pie Charts: Distribution visualization using px.pie()
Scatter Plots: Relationship analysis using px.scatter()
Area Charts: Cumulative trends using px.area()
Heatmaps: Correlation analysis using px.imshow()
Box Plots: Distribution and outliers using px.box()
Histograms: Frequency distribution using px.histogram()
Violin Plots: Distribution comparison using px.violin()
3D Scatter Plots: Multi-dimensional analysis using px.scatter_3d()

Advanced Features Usage | Dashboard Features Guide | How to Use Analytics Dashboard

Using Filters Effectively

How to use the interactive filters for data analysis:

# Filter Usage Examples:

# 1. Filter by Region:
# Select "North" from Region dropdown
# All charts and KPIs update to show only North region data

# 2. Filter by Product:
# Select "Product A" from Product dropdown
# Dashboard shows data only for Product A

# 3. Filter by Category:
# Select "Electronics" from Category dropdown
# View sales data for Electronics category only

# 4. Filter by Date Range:
# Select start date: 2023-01-01
# Select end date: 2023-12-31
# View data for the entire year 2023

# 5. Combined Filters:
# Region: "North"
# Product: "Product A"
# Category: "Electronics"
# Date Range: 2023-01-01 to 2023-12-31
# View specific combination of filters

# All filters work together - combine multiple filters for detailed analysis

Data Export Usage

Export filtered data for further analysis:

# Export Data Steps:

# 1. Apply filters to get desired data subset
#    - Select Region, Product, Category, Date Range
#    - Optionally use search box for text search

# 2. Click "Export to CSV" button
#    - Downloads sales_data_export.csv
#    - Contains only filtered data
#    - Includes all columns: Date, Region, Product, Category, Quantity, Price, Revenue

# 3. Click "Export to Excel" button
#    - Downloads sales_data_export.xlsx
#    - Excel format for easy analysis
#    - Same filtered data as CSV

# 4. Use exported data in:
#    - Excel for pivot tables and analysis
#    - Python/Pandas for advanced analysis
#    - Other BI tools for reporting
#    - Share with team members

# Exported files respect all active filters and search terms

Understanding Chart Types

When to use different chart types for analysis:

# Chart Type Usage Guide:

# 1. Revenue Trend Chart (Line Chart)
#    - Use: Track revenue over time
#    - Shows: Daily revenue trends
#    - Best for: Identifying trends and patterns

# 2. Regional Performance (Bar Chart)
#    - Use: Compare revenue across regions
#    - Shows: Total revenue by region
#    - Best for: Geographic performance analysis

# 3. Product Performance (Bar Chart)
#    - Use: Compare revenue by product
#    - Shows: Total revenue per product
#    - Best for: Product ranking and analysis

# 4. Category Distribution (Pie Chart)
#    - Use: View revenue distribution
#    - Shows: Percentage of revenue by category
#    - Best for: Understanding category mix

# 5. Quantity vs Revenue (Scatter Plot)
#    - Use: Analyze quantity-revenue relationship
#    - Shows: Correlation between quantity and revenue
#    - Best for: Identifying pricing patterns

# 6. Monthly Comparison (Bar Chart)
#    - Use: Compare monthly revenue
#    - Shows: Revenue by month
#    - Best for: Month-over-month analysis

# 7. Cumulative Revenue (Area Chart)
#    - Use: Track cumulative growth
#    - Shows: Running total of revenue
#    - Best for: Growth trend visualization

# 8. Sales Heatmap
#    - Use: Identify sales patterns
#    - Shows: Revenue by day of week and month
#    - Best for: Finding peak sales periods

# 9. Year-over-Year Comparison
#    - Use: Compare annual performance
#    - Shows: Monthly revenue across years
#    - Best for: Yearly trend analysis

# 10. Top Performers
#     - Use: Identify best combinations
#     - Shows: Top 10 product-region pairs
#     - Best for: Strategic decision making

Search Functionality

Use the search box to quickly find specific data:

# Search Examples:

# 1. Search by Product Name:
#    Type: "Product A"
#    Results: All records containing "Product A" in Product column

# 2. Search by Region:
#    Type: "North"
#    Results: All records with "North" in Region column

# 3. Search by Category:
#    Type: "Electronics"
#    Results: All Electronics category records

# 4. Partial Search:
#    Type: "Prod"
#    Results: All products starting with "Prod" (Product A, Product B, etc.)

# 5. Case-Insensitive:
#    Type: "north" or "NORTH" or "North"
#    Results: All match regardless of case

# Search works across:
# - Product names
# - Region names
# - Category names

# Search is combined with active filters
# Results update all charts and KPIs in real-time

Data Table Features

Using the interactive data table for detailed analysis:

# Data Table Usage:

# 1. View Data Table:
#    Click "View Data Table" button
#    Table appears below charts
#    Shows all filtered data

# 2. Native Filtering:
#    Click filter icon in column header
#    Enter filter criteria
#    Table updates immediately

# 3. Sorting:
#    Click column header to sort
#    Click again to reverse sort
#    Sort by any column (Date, Region, Product, etc.)

# 4. Pagination:
#    Table shows 20 rows per page
#    Use pagination controls to navigate
#    View all data across multiple pages

# 5. Search in Table:
#    Use search box above table
#    Filters table rows in real-time
#    Works with column filters

# 6. Export from Table:
#    Apply filters in table
#    Use Export buttons to download
#    Exports current table view

# Table respects all dashboard filters
# Updates automatically when filters change

Printing Dashboard

Generate reports by printing the dashboard:

# Print Dashboard Steps:

# 1. Apply desired filters
#    - Set Region, Product, Category, Date Range
#    - Apply search if needed

# 2. Click "Print Dashboard" button
#    - Opens browser print dialog
#    - Shows print preview

# 3. Configure print settings:
#    - Select printer or "Save as PDF"
#    - Choose page orientation (Portrait/Landscape)
#    - Adjust margins if needed

# 4. Print or Save:
#    - Click Print to print on paper
#    - Or Save as PDF for digital sharing

# Print includes:
# - All KPI cards
# - All charts and visualizations
# - Current filter settings
# - Dashboard title and branding

# Use for:
# - Monthly/quarterly reports
# - Executive presentations
# - Team meetings
# - Documentation

Complete Dashboard Workflow | Step-by-Step Guide | Dashboard Tutorial

Step-by-Step Dashboard Setup

Step 1: Install Dependencies

# Install all required packages
pip install -r requirements.txt

# Required packages:
# - dash==2.14.1
# - plotly==5.18.0
# - pandas==2.1.3
# - numpy==1.26.2
# - dash-table==5.0.0
# - openpyxl==3.1.2

# Verify installation
python -c "import dash, plotly, pandas, numpy; print('All packages installed successfully')"

Step 2: Prepare Data

# Option 1: Use sample data (auto-generated)
# Dashboard will generate sample data if sales_data.csv doesn't exist
# No action needed - just run the dashboard

# Option 2: Use your own data
# Prepare CSV file with columns: Date, Region, Product, Category, Quantity, Price, Revenue
# Place file as sales_data.csv in project directory

# Option 3: Generate sample data manually
python generate_data.py

# Data format:
# Date,Region,Product,Category,Quantity,Price,Revenue
# 2023-01-01,North,Product A,Electronics,100,50.0,5000.0

Step 3: Run Dashboard

# Run the dashboard
python app.py

# Or use run scripts:
# Windows: run.bat
# Linux/Mac: ./run.sh

# Dashboard will start on:
# http://localhost:8501

# Open browser and navigate to the URL

Step 4: Use Dashboard Features

View KPI cards for key metrics (Revenue, Orders, AOV, Products, Growth Rate)
Apply filters (Region, Product, Category, Date Range) to analyze specific data
Explore 10 different chart types for comprehensive analysis
Use search box to find specific products, regions, or categories
Click "View Data Table" to see detailed data with filtering and sorting
Export filtered data to CSV or Excel for further analysis
Print dashboard for reports and presentations

Step 5: Customize Dashboard

# Customize dashboard in config.py:
# - Change dashboard title and subtitle
# - Modify data file path
# - Adjust server host and port
# - Change refresh interval
# - Update color schemes

# Modify app.py to:
# - Add new charts
# - Change chart configurations
# - Add new filters
# - Customize KPI calculations
# - Modify dashboard layout

Dashboard Customization Examples | Customization Guide | Code Examples

Adding Custom KPI Cards

Add new KPI cards to track additional metrics:

# Add new KPI card in app.py layout:
html.Div([
    html.H3(id='custom-kpi', style={'color': '#9b59b6', 'margin': '0'}),
    html.P('Custom Metric', style={'color': '#7f8c8d', 'margin': '5px 0 0 0'})
], className='kpi-card', style={
    'width': '18%', 'display': 'inline-block', 'padding': '20px',
    'margin': '10px', 'backgroundColor': '#ffffff', 'borderRadius': '10px',
    'boxShadow': '0 2px 4px rgba(0,0,0,0.1)', 'textAlign': 'center'
}),

# Add callback to update KPI:
@app.callback(
    Output('custom-kpi', 'children'),
    [Input('region-filter', 'value'),
     Input('product-filter', 'value'),
     Input('category-filter', 'value'),
     Input('date-range', 'start_date'),
     Input('date-range', 'end_date')]
)
def update_custom_kpi(region, product, category, start_date, end_date):
    filtered_df = filter_data(region, product, category, start_date, end_date)
    # Calculate your custom metric
    custom_metric = filtered_df['Revenue'].median()  # Example: median revenue
    return f'${custom_metric:,.2f}'

Creating Custom Charts

Add new chart types to the dashboard:

# Add custom chart in layout:
html.Div([
    dcc.Graph(id='custom-chart')
], style={'width': '48%', 'display': 'inline-block', 'margin': '10px'}),

# Create callback for custom chart:
@app.callback(
    Output('custom-chart', 'figure'),
    [Input('region-filter', 'value'),
     Input('product-filter', 'value'),
     Input('category-filter', 'value'),
     Input('date-range', 'start_date'),
     Input('date-range', 'end_date')]
)
def update_custom_chart(region, product, category, start_date, end_date):
    filtered_df = filter_data(region, product, category, start_date, end_date)
    
    # Your custom chart logic
    # Example: Box plot for revenue distribution
    fig = px.box(filtered_df, x='Region', y='Revenue', 
                 title='Revenue Distribution by Region')
    fig.update_layout(
        plot_bgcolor='rgba(0,0,0,0)',
        paper_bgcolor='rgba(0,0,0,0)'
    )
    return fig

Modifying Data Source

Connect dashboard to different data sources:

# Option 1: Load from database
import sqlite3
import pandas as pd

def load_data_from_db():
    conn = sqlite3.connect('sales.db')
    df = pd.read_sql_query("SELECT * FROM sales", conn)
    conn.close()
    df['Date'] = pd.to_datetime(df['Date'])
    return df

df = load_data_from_db()

# Option 2: Load from API
import requests

def load_data_from_api():
    response = requests.get('https://api.example.com/sales')
    data = response.json()
    df = pd.DataFrame(data)
    df['Date'] = pd.to_datetime(df['Date'])
    return df

df = load_data_from_api()

# Option 3: Load from Excel
df = pd.read_excel('sales_data.xlsx')
df['Date'] = pd.to_datetime(df['Date'])

# Replace the df loading section in app.py with your data source

Changing Refresh Interval

Modify auto-refresh interval for real-time updates:

# Modify refresh interval in app.py:

# Current: 30 seconds (30000 milliseconds)
dcc.Interval(
    id='interval-component',
    interval=30000,  # Change this value
    n_intervals=0
)

# Examples:
# interval=10000   # 10 seconds (more frequent updates)
# interval=60000   # 60 seconds (less frequent updates)
# interval=5000    # 5 seconds (very frequent, may impact performance)
# interval=0       # Disable auto-refresh (manual refresh only)

# Or make it configurable in config.py:
from config import REFRESH_INTERVAL

dcc.Interval(
    id='interval-component',
    interval=REFRESH_INTERVAL,
    n_intervals=0
)

Customizing Chart Colors

Change color schemes for all charts:

# Customize colors in chart callbacks:

# Option 1: Use color constants from config.py
from config import COLOR_PRIMARY, COLOR_SUCCESS, COLOR_DANGER

fig.update_traces(marker_color=COLOR_PRIMARY)

# Option 2: Use color scales
fig = px.bar(data, x='Region', y='Revenue',
             color='Revenue',
             color_continuous_scale='Blues')  # or 'Greens', 'Reds', 'Viridis'

# Option 3: Custom color mapping
color_map = {'North': '#3498db', 'South': '#27ae60', 
             'East': '#e74c3c', 'West': '#f39c12'}
fig.update_traces(marker_color=[color_map[r] for r in data['Region']])

# Option 4: Use Plotly color sequences
import plotly.express as px
fig.update_layout(colorway=px.colors.qualitative.Set3)

Dashboard Chart Types | Available Chart Types | Data Visualization Charts

Chart Type	Use Case	Data Required	Best For
Line Chart	Revenue trends over time	Date, Revenue	Time series analysis
Bar Chart	Regional/Product comparison	Category, Revenue	Comparing categories
Pie Chart	Category distribution	Category, Revenue	Proportion analysis
Scatter Plot	Quantity vs Revenue	Quantity, Revenue	Correlation analysis
Area Chart	Cumulative revenue	Date, Cumulative Revenue	Growth tracking
Heatmap	Sales patterns by day/month	Day of Week, Month, Revenue	Pattern identification
Grouped Bar	Year-over-year comparison	Year, Month, Revenue	Annual comparison
Horizontal Bar	Top performers	Product-Region, Revenue	Ranking analysis

Dataset Information | Data Format | CSV Format | Data Requirements

Data Format Requirements

The dashboard requires CSV format for sales data:

Required columns: Date, Region, Product, Category, Quantity, Price, Revenue
Date format: YYYY-MM-DD (e.g., 2023-01-01)
Numeric columns: Quantity, Price, Revenue must be numeric
Text columns: Region, Product, Category are text fields
Automatic data loading and parsing
Date parsing and validation

Sample Data Format

Your sales data CSV file should follow this structure:

# CSV file structure (sales_data.csv):
Date,Region,Product,Category,Quantity,Price,Revenue
2023-01-01,North,Product A,Electronics,100,50.0,5000.0
2023-01-01,South,Product B,Clothing,50,30.0,1500.0
2023-01-02,East,Product C,Food,200,10.0,2000.0
2023-01-02,West,Product D,Books,75,15.0,1125.0

# Column descriptions:
# - Date: Sales date (YYYY-MM-DD format)
# - Region: Geographic region (text: North, South, East, West, Central)
# - Product: Product name (text: Product A, Product B, etc.)
# - Category: Product category (text: Electronics, Clothing, Food, Books, Sports)
# - Quantity: Number of units sold (numeric)
# - Price: Unit price (numeric, decimal)
# - Revenue: Total revenue (numeric, can be Quantity * Price or pre-calculated)

Generating Sample Data

Use the included script to generate sample sales data:

# Generate sample data
python generate_data.py

# The script will:
# - Generate sales data from 2023-01-01 to 2024-12-31
# - Create data for 5 regions (North, South, East, West, Central)
# - Generate 5 products (Product A through Product E)
# - Assign random categories (Electronics, Clothing, Food, Books, Sports)
# - Calculate quantity, price, and revenue
# - Save to sales_data.csv

# Customize data generation:
# Edit generate_data.py to modify:
# - Date range
# - Number of regions
# - Number of products
# - Categories
# - Quantity and price ranges

Using Your Own Data

Replace sample data with your own sales data:

# Steps to use your own data:

# 1. Prepare your CSV file
#    - Ensure all required columns are present
#    - Date format: YYYY-MM-DD
#    - Numeric columns: Quantity, Price, Revenue
#    - Text columns: Region, Product, Category

# 2. Replace sales_data.csv
#    - Backup existing sales_data.csv (if needed)
#    - Place your CSV file as sales_data.csv
#    - Or modify DATA_FILE in config.py to point to your file

# 3. Verify data format
#    - Open CSV in Excel or text editor
#    - Check date format is correct
#    - Ensure no missing values in required columns
#    - Verify numeric columns contain numbers only

# 4. Run dashboard
#    - Dashboard will automatically load your data
#    - All filters and charts will work with your data
#    - KPIs will calculate based on your data

Troubleshooting & Best Practices | Common Issues | Performance Optimization | Best Practices

Common Issues

Port Already in Use: Change port in .streamlit/config.toml (default: 8501). Or stop the process using the port: lsof -ti:8501 | xargs kill
Data File Not Found: Ensure sales_data.csv exists or modify DATA_FILE in config.py. Dashboard will generate sample data if file doesn't exist
Import Errors: Verify all dependencies installed: pip install -r requirements.txt. Check Python version (3.8+)
Date Parsing Errors: Ensure dates are in YYYY-MM-DD format. Check CSV file for invalid date formats
Charts Not Updating: Check browser console for JavaScript errors. Verify all callbacks are properly defined
Slow Performance: Reduce data size, increase REFRESH_INTERVAL, or optimize data filtering logic
Memory Issues: Reduce data size, limit date range, or process data in chunks
Export Not Working: Ensure openpyxl is installed for Excel export: pip install openpyxl
Search Not Working: Verify search input is connected to filter function. Check callback dependencies
Filters Not Applying: Check filter callback functions. Verify filter_data() function is working correctly
KPIs Showing Zero: Check if data is loaded correctly. Verify date range includes data
Charts Empty: Verify data filtering is working. Check if filtered data has records
Dashboard Not Loading: Check if Streamlit is running. Verify port 8501 is accessible. Run: streamlit run app.py
CSS Not Loading: Clear browser cache. Check if all CSS files are properly linked
Data Table Not Showing: Verify DataTable component is properly imported. Check pagination settings

Performance Optimization Tips

Data Size: Limit date range or filter data before loading to reduce memory usage
Refresh Interval: Increase REFRESH_INTERVAL for less frequent updates (reduces server load)
Chart Optimization: Limit number of data points in charts. Use data aggregation for large datasets
Caching: Cache filtered data results to avoid repeated calculations
Lazy Loading: Load data only when needed. Use pagination for large datasets
Data Preprocessing: Pre-process and aggregate data before loading into dashboard
Database Connection: Use connection pooling for database connections
Server Configuration: Use production server (gunicorn) instead of development server for better performance

Best Practices

Data Quality: Ensure data is clean, consistent, and properly formatted before loading
Date Format: Always use YYYY-MM-DD format for dates. Validate dates before loading
Numeric Columns: Ensure Quantity, Price, Revenue are numeric. Handle missing values appropriately
Data Size: For large datasets (100K+ rows), consider data aggregation or sampling
Refresh Interval: Use 30 seconds for real-time dashboards. Increase for less frequent updates
Error Handling: Add error handling in callbacks to prevent dashboard crashes
Data Validation: Validate data format and types before processing
User Experience: Add loading indicators for long-running operations
Responsive Design: Test dashboard on different screen sizes. Ensure mobile compatibility
Security: Validate user inputs. Sanitize data before displaying
Logging: Add logging for debugging. Monitor dashboard performance
Backup Data: Keep backups of your sales data. Version control your data files
Documentation: Document custom modifications. Keep track of configuration changes
Testing: Test with different data sizes and filter combinations
Production Deployment: Use production server (gunicorn). Configure proper error handling

Use Cases and Applications

Sales Performance Analysis: Track sales performance across regions, products, and time periods
Regional Comparison: Compare sales performance across different geographic regions
Product Analytics: Analyze product performance and identify top sellers
Revenue Tracking: Monitor revenue trends and growth over time
Business Intelligence: Create comprehensive BI dashboards for decision-making
Data Visualization: Visualize complex sales data in an interactive format
Reporting: Generate reports and presentations with current data
Trend Analysis: Identify sales trends and patterns over time
Performance Monitoring: Monitor KPIs and key metrics in real-time
Data Export: Export filtered data for further analysis in Excel or other tools

Performance Benchmarks

Expected performance for different data sizes:

Data Size	Rows	Load Time	Filter Time	Chart Render	Memory Usage
Small	1K - 10K	< 1 second	< 100ms	< 500ms	< 50 MB
Medium	10K - 100K	1-3 seconds	100-500ms	500ms - 2s	50-200 MB
Large	100K - 1M	3-10 seconds	500ms - 2s	2-5 seconds	200-500 MB
Very Large	1M+	10+ seconds	2-5 seconds	5-10 seconds	500+ MB

Note: Performance depends on hardware, data complexity, and number of charts. Consider data aggregation for very large datasets.

System Requirements

Recommended system requirements for optimal performance:

Component	Minimum	Recommended	Optimal
Python	3.8	3.9+	3.10+
RAM	4 GB	8 GB	16 GB+
CPU	2 cores	4 cores	8+ cores
Storage	100 MB	500 MB	1 GB+
Browser	Chrome 90+	Chrome 100+	Latest

Note: Dashboard runs on CPU. No GPU required. Performance scales with data size and number of concurrent users.

Real-World Examples & Use Cases | Dashboard Use Cases | Analytics Use Cases | Business Use Cases

Example 1: E-commerce Analytics Dashboard

Complete setup for e-commerce data analytics:

# 1. Prepare e-commerce data
# Export data from your e-commerce platform (Shopify, WooCommerce, etc.)
# Format: Date, Region, Product, Category, Sales, Revenue, Customers

# 2. Upload data to dashboard
# Use CSV upload feature in sidebar
# Dashboard automatically detects column types

# 3. Run dashboard
streamlit run app.py

# 4. Analyze data performance
# - Filter by region to see geographic performance
# - Filter by product to identify top sellers
# - Use date presets to analyze specific periods
# - View data quality metrics
# - Export data for further analysis

# 5. Generate insights
# - View advanced statistics
# - Analyze trends with linear regression
# - Export charts as PNG images
# - Share dashboard URL with team members

Example 2: Retail Store Performance

Monitor retail store sales across multiple locations:

# Use Case: Multi-store retail chain

# 1. Data Structure:
# Region: Store locations (Store A, Store B, Store C, etc.)
# Product: Product SKUs or names
# Category: Product categories (Electronics, Clothing, Food, etc.)

# 2. Analysis Workflow:
# - Filter by Region to compare store performance
# - Use Year-over-Year comparison for annual trends
# - Heatmap to identify peak sales days
# - Top Performers to find best product-store combinations

# 3. Key Metrics:
# - Total Revenue per store
# - Average Order Value by location
# - Product performance across stores
# - Growth rate comparison

# 4. Reporting:
# - Export store-specific data
# - Generate monthly reports
# - Share insights with store managers

Example 3: Product Category Analysis

Analyze sales performance by product category:

# Use Case: Category performance analysis

# 1. Filter by Category:
# - Select "Electronics" to see electronics sales
# - Compare with "Clothing" category
# - Analyze "Food" category trends

# 2. Key Insights:
# - Category Distribution (Pie Chart) shows revenue share
# - Product Performance shows top products in category
# - Regional Performance shows category sales by region
# - Cumulative Revenue tracks category growth

# 3. Strategic Decisions:
# - Identify underperforming categories
# - Allocate resources to high-performing categories
# - Plan inventory based on category trends
# - Adjust marketing for specific categories

Example 4: Monthly Sales Reporting

Generate monthly sales reports and presentations:

# Use Case: Monthly reporting workflow

# 1. Set Date Range:
# - Start Date: First day of month (e.g., 2024-01-01)
# - End Date: Last day of month (e.g., 2024-01-31)

# 2. Apply Filters:
# - Select specific regions if needed
# - Filter by product categories
# - Use search for specific products

# 3. Review KPIs:
# - Total Revenue for the month
# - Total Orders placed
# - Average Order Value
# - Products Sold
# - Growth Rate vs previous month

# 4. Analyze Charts:
# - Revenue Trend shows daily performance
# - Monthly Comparison shows month-over-month
# - Category Distribution shows category mix
# - Top Performers identifies best sellers

# 5. Export and Share:
# - Export data to Excel for detailed analysis
# - Print dashboard for presentations
# - Share insights with stakeholders

Example 5: Real-time Sales Monitoring

Monitor sales in real-time with auto-refresh:

# Use Case: Live sales monitoring

# 1. Configure Auto-Refresh:
# - Set REFRESH_INTERVAL to 10000 (10 seconds)
# - Dashboard updates automatically
# - No manual refresh needed

# 2. Connect Live Data:
# - Modify data loading to fetch from API
# - Or connect to database with real-time updates
# - Dashboard will show latest data

# 3. Monitor Key Metrics:
# - Watch KPI cards update in real-time
# - Track revenue trends as they happen
# - Monitor order counts live
# - Observe product performance changes

# 4. Use Cases:
# - Sales team monitoring during promotions
# - Real-time performance tracking
# - Live dashboard displays in office
# - Executive dashboards for quick insights

Integration Examples | Database Integration | API Integration | Deployment Guide

Integration with Database

Connect dashboard to SQL database for live data:

# Connect to SQL database in app.py

import sqlite3
import pandas as pd

def load_data_from_db():
    """Load sales data from SQL database."""
    conn = sqlite3.connect('sales.db')
    query = """
        SELECT 
            date as Date,
            region as Region,
            product as Product,
            category as Category,
            quantity as Quantity,
            price as Price,
            revenue as Revenue
        FROM sales
        ORDER BY date
    """
    df = pd.read_sql_query(query, conn)
    conn.close()
    df['Date'] = pd.to_datetime(df['Date'])
    return df

# Replace CSV loading with database loading
df = load_data_from_db()

# For MySQL/PostgreSQL:
# import mysql.connector
# conn = mysql.connector.connect(
#     host='localhost',
#     user='username',
#     password='password',
#     database='sales_db'
# )

Integration with REST API

Load data from REST API endpoint:

# Load data from REST API in app.py

import requests
import pandas as pd

def load_data_from_api():
    """Load sales data from REST API."""
    response = requests.get('https://api.example.com/sales', 
                           headers={'Authorization': 'Bearer YOUR_TOKEN'})
    data = response.json()
    df = pd.DataFrame(data['sales'])
    df['Date'] = pd.to_datetime(df['Date'])
    return df

# Replace CSV loading with API loading
df = load_data_from_api()

# For real-time updates, add API call in interval callback:
@app.callback(
    Output('interval-component', 'n_intervals'),
    [Input('interval-component', 'n_intervals')]
)
def update_data(n):
    global df
    df = load_data_from_api()  # Refresh data from API
    return n

Embedding Dashboard in Existing Website

Embed the dashboard in an existing web application:

# Option 1: Embed as iframe
# In your HTML file:
<iframe src="http://localhost:8501" 
        width="100%" 
        height="800px" 
        frameborder="0">
</iframe>

# Option 2: Run on different port
# Modify .streamlit/config.toml:
[server]
port = 8502  # Use different port

# Then access dashboard at: http://localhost:8502

# Option 3: Use reverse proxy (nginx)
# nginx configuration file (/etc/nginx/sites-available/dashboard):
server {
    listen 80;
    server_name your-domain.com;
    
    location /dashboard/ {
        proxy_pass http://localhost:8501/;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

# Then restart nginx:
sudo systemctl restart nginx

# Option 4: Deploy to cloud platforms
# Heroku:
# - Create Procfile: web: gunicorn app:server
# - Deploy: git push heroku main

# AWS EC2:
# - Use Streamlit with systemd service
# - Configure security groups for port 8501

# Azure:
# - Deploy as web app
# - Configure startup command: gunicorn app:server

# All platforms:
# - Use gunicorn for production
# - Configure environment variables
# - Set up SSL certificate for HTTPS

Production Deployment

Deploy dashboard to production environment:

# Production deployment with Streamlit

# 1. Streamlit runs directly (no gunicorn needed)
# Streamlit is production-ready out of the box

# 2. Configure Streamlit for production (.streamlit/config.toml):
[server]
port = 8501
address = "0.0.0.0"
headless = true
enableCORS = false
enableXsrfProtection = true

# 3. Run with Streamlit
streamlit run app.py --server.port=8501 --server.address=0.0.0.0

# 4. Use systemd service (Linux)
# Create /etc/systemd/system/dashboard.service:
[Unit]
Description=Analytics Dashboard
After=network.target

[Service]
User=www-data
WorkingDirectory=/path/to/dashboard
ExecStart=/usr/bin/streamlit run app.py --server.port=8501 --server.address=0.0.0.0
Restart=always

[Install]
WantedBy=multi-user.target

# 5. Enable and start service
sudo systemctl enable dashboard
sudo systemctl start dashboard

# Alternative: Use Streamlit Cloud for easy deployment
# Push to GitHub and deploy at share.streamlit.io

Contact Information | Support | Get Help | Contact RSK World

Get in Touch

Developer: Molla Samser
Designer & Tester: Rima Khatun

rskworld.in

help@rskworld.in support@rskworld.in

+91 93305 39277

License | Open Source License | Project License

This project is for educational purposes only. See LICENSE file for more details.

Theme Settings

Color Scheme

Display Options

Font Size

Statistical Data Analysis with Seaborn - Project Description | Seaborn Visualization | Statistical Visualization

Statistical Data Analysis Screenshots | Seaborn Visualization Images | Statistical Visualization Examples

Statistical Data Analysis Core Features | Seaborn Visualization Features | Statistical Visualization Features

Correlation Analysis

Distribution Plots

Box & Violin Plots

Pair Plots

Regression Analysis

High-Resolution Outputs

Advanced Analytics Dashboard Features | Statistical Analysis Features | Data Transformation Tools

Date Range Presets

Trend Analysis

Data Transformation

Advanced Statistics

Web Interface Features | Dashboard Features | Interactive Dashboard Capabilities

Technologies Used | Python Technologies | Data Science Stack | Statistical Analysis Tools

Installation & Usage Guide | How to Install Statistical Data Analysis | Project Setup Tutorial

Installation

Running the Project

Using Your Own Data

Project Features

Configuration

Project Structure | Dashboard File Structure | Source Code Organization

Configuration Options | Dashboard Configuration | Customization Guide

Dashboard Configuration

Data Format Requirements

Customizing Charts

Adding Custom Charts

Detailed Architecture | Dashboard Architecture | System Architecture | Technical Architecture

Dashboard Architecture

Streamlit Widget System

Data Filtering Logic

Data Quality Metrics Calculation

Chart Types and Usage

Advanced Features Usage | Dashboard Features Guide | How to Use Analytics Dashboard

Using Filters Effectively

Data Export Usage

Understanding Chart Types

Search Functionality

Data Table Features

Printing Dashboard

Complete Dashboard Workflow | Step-by-Step Guide | Dashboard Tutorial

Step-by-Step Dashboard Setup

Dashboard Customization Examples | Customization Guide | Code Examples

Adding Custom KPI Cards

Creating Custom Charts

Modifying Data Source

Changing Refresh Interval

Customizing Chart Colors

Dashboard Chart Types | Available Chart Types | Data Visualization Charts

Dataset Information | Data Format | CSV Format | Data Requirements

Data Format Requirements

Sample Data Format

Generating Sample Data

Using Your Own Data

Troubleshooting & Best Practices | Common Issues | Performance Optimization | Best Practices

Common Issues

Performance Optimization Tips

Best Practices

Use Cases and Applications

Performance Benchmarks

System Requirements

Real-World Examples & Use Cases | Dashboard Use Cases | Analytics Use Cases | Business Use Cases

Example 1: E-commerce Analytics Dashboard

Example 2: Retail Store Performance

Example 3: Product Category Analysis

Example 4: Monthly Sales Reporting

Example 5: Real-time Sales Monitoring

Integration Examples | Database Integration | API Integration | Deployment Guide

Integration with Database

Integration with REST API

Embedding Dashboard in Existing Website

Production Deployment

Contact Information | Support | Get Help | Contact RSK World

Get in Touch

License | Open Source License | Project License