help@rskworld.in +91 93305 39277
RSK World
  • Home
  • Development
    • Web Development
    • Mobile Apps
    • Software
    • Games
    • Project
  • Technologies
    • Data Science
    • AI Development
    • Cloud Development
    • Blockchain
    • Cyber Security
    • Dev Tools
    • Testing Tools
  • About
  • Contact

Theme Settings

Color Scheme
Display Options
Font Size
100%
Back to Project
RSK World
dask-parallel
/
notebooks
RSK World
dask-parallel
Parallel and distributed computing with Dask
notebooks
  • 01_dask_arrays.ipynb4.2 KB
  • 02_dask_dataframes.ipynb5 KB
  • 03_delayed_computations.ipynb5.2 KB
  • 04_distributed_computing.ipynb4.8 KB
  • 05_task_scheduling.ipynb5.4 KB
  • 06_dask_bags.ipynb5.3 KB
  • 07_advanced_dataframes.ipynb6.7 KB
  • 08_dask_ml.ipynb7.2 KB
07_advanced_dataframes.ipynb
notebooks/07_advanced_dataframes.ipynb
Raw Download
Find: Go to:
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Advanced Dask DataFrame Operations\n",
        "\n",
        "<!--\n",
        "Project: Dask Parallel Computing\n",
        "Author: Molla Samser\n",
        "Designer & Tester: Rima Khatun\n",
        "Website: https://rskworld.in\n",
        "Email: help@rskworld.in, support@rskworld.in\n",
        "Phone: +91 93305 39277\n",
        "-->\n",
        "\n",
        "This notebook demonstrates advanced DataFrame operations including joins, window functions, time series operations, and complex aggregations.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import dask.dataframe as dd\n",
        "import pandas as pd\n",
        "import numpy as np\n",
        "import time\n",
        "from datetime import datetime, timedelta\n",
        "\n",
        "print(\"Advanced Dask DataFrame Operations\")\n",
        "print(\"=\" * 50)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Creating Complex Datasets\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Create complex time series data\n",
        "n_rows = 1000000\n",
        "dates = pd.date_range('2020-01-01', periods=n_rows, freq='1H')\n",
        "\n",
        "df1 = pd.DataFrame({\n",
        "    'id': range(n_rows),\n",
        "    'timestamp': dates,\n",
        "    'value': np.random.randn(n_rows),\n",
        "    'category': np.random.choice(['A', 'B', 'C', 'D', 'E'], n_rows),\n",
        "    'region': np.random.choice(['North', 'South', 'East', 'West'], n_rows),\n",
        "    'amount': np.random.uniform(100, 10000, n_rows)\n",
        "})\n",
        "\n",
        "df2 = pd.DataFrame({\n",
        "    'id': range(n_rows),\n",
        "    'metadata': np.random.choice(['Type1', 'Type2', 'Type3'], n_rows),\n",
        "    'status': np.random.choice(['Active', 'Inactive'], n_rows),\n",
        "    'score': np.random.randint(0, 100, n_rows)\n",
        "})\n",
        "\n",
        "# Save to CSV\n",
        "df1.to_csv('../data/advanced_data_1.csv', index=False)\n",
        "df2.to_csv('../data/advanced_data_2.csv', index=False)\n",
        "\n",
        "print(f\"Created datasets with {n_rows} rows each\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Advanced Joins and Merges\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Load data with Dask\n",
        "ddf1 = dd.read_csv('../data/advanced_data_1.csv')\n",
        "ddf2 = dd.read_csv('../data/advanced_data_2.csv')\n",
        "\n",
        "# Inner join\n",
        "print(\"Performing inner join...\")\n",
        "start_time = time.time()\n",
        "joined = dd.merge(ddf1, ddf2, on='id', how='inner')\n",
        "joined_result = joined.head(10)\n",
        "end_time = time.time()\n",
        "\n",
        "print(f\"Join completed in {end_time - start_time:.2f} seconds\")\n",
        "print(f\"\\nJoined DataFrame shape: {joined.shape}\")\n",
        "print(f\"\\nSample joined data:\")\n",
        "print(joined_result)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Window Functions and Rolling Operations\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Sort by timestamp for time series operations\n",
        "ddf_sorted = ddf1.set_index('timestamp').sort_index()\n",
        "\n",
        "# Rolling window operations\n",
        "print(\"Calculating rolling statistics...\")\n",
        "start_time = time.time()\n",
        "\n",
        "# Rolling mean over 24 hours\n",
        "rolling_mean = ddf_sorted['value'].rolling(window='24H').mean()\n",
        "\n",
        "# Rolling sum over 7 days\n",
        "rolling_sum = ddf_sorted['amount'].rolling(window='7D').sum()\n",
        "\n",
        "# Compute results\n",
        "mean_result = rolling_mean.compute()\n",
        "sum_result = rolling_sum.compute()\n",
        "\n",
        "end_time = time.time()\n",
        "\n",
        "print(f\"Rolling operations completed in {end_time - start_time:.2f} seconds\")\n",
        "print(f\"\\nRolling mean sample (first 10):\\n{mean_result.head(10)}\")\n",
        "print(f\"\\nRolling sum sample (first 10):\\n{sum_result.head(10)}\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Complex Aggregations\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Multi-level groupby with multiple aggregations\n",
        "print(\"Performing complex aggregations...\")\n",
        "start_time = time.time()\n",
        "\n",
        "complex_agg = ddf1.groupby(['category', 'region']).agg({\n",
        "    'value': ['mean', 'std', 'min', 'max'],\n",
        "    'amount': ['sum', 'mean', 'count']\n",
        "}).compute()\n",
        "\n",
        "end_time = time.time()\n",
        "\n",
        "print(f\"Aggregation completed in {end_time - start_time:.2f} seconds\")\n",
        "print(f\"\\nComplex aggregation result:\")\n",
        "print(complex_agg.head(10))\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Time Series Resampling\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Resample time series data\n",
        "ddf_ts = ddf1.set_index('timestamp')\n",
        "\n",
        "# Resample to daily frequency\n",
        "print(\"Resampling to daily frequency...\")\n",
        "daily = ddf_ts.resample('1D').agg({\n",
        "    'value': 'mean',\n",
        "    'amount': 'sum',\n",
        "    'id': 'count'\n",
        "}).compute()\n",
        "\n",
        "print(f\"\\nDaily resampled data (first 10 days):\")\n",
        "print(daily.head(10))\n",
        "\n",
        "# Resample to weekly frequency\n",
        "weekly = ddf_ts.resample('1W').agg({\n",
        "    'value': ['mean', 'std'],\n",
        "    'amount': 'sum'\n",
        "}).compute()\n",
        "\n",
        "print(f\"\\nWeekly resampled data (first 5 weeks):\")\n",
        "print(weekly.head(5))\n"
      ]
    }
  ],
  "metadata": {
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}
219 lines•6.7 KB
json

About RSK World

Founded by Molla Samser, with Designer & Tester Rima Khatun, RSK World is your one-stop destination for free programming resources, source code, and development tools.

Founder: Molla Samser
Designer & Tester: Rima Khatun

Development

  • Game Development
  • Web Development
  • Mobile Development
  • AI Development
  • Development Tools

Legal

  • Terms & Conditions
  • Privacy Policy
  • Disclaimer

Contact Info

Nutanhat, Mongolkote
Purba Burdwan, West Bengal
India, 713147

+91 93305 39277

hello@rskworld.in
support@rskworld.in

© 2026 RSK World. All rights reserved.

Content used for educational purposes only. View Disclaimer