help@rskworld.in +91 93305 39277
RSK World
  • Home
  • Development
    • Web Development
    • Mobile Apps
    • Software
    • Games
    • Project
  • Technologies
    • Data Science
    • AI Development
    • Cloud Development
    • Blockchain
    • Cyber Security
    • Dev Tools
    • Testing Tools
  • About
  • Contact

Theme Settings

Color Scheme
Display Options
Font Size
100%
Back to Project
RSK World
dask-parallel
/
notebooks
RSK World
dask-parallel
Parallel and distributed computing with Dask
notebooks
  • 01_dask_arrays.ipynb4.2 KB
  • 02_dask_dataframes.ipynb5 KB
  • 03_delayed_computations.ipynb5.2 KB
  • 04_distributed_computing.ipynb4.8 KB
  • 05_task_scheduling.ipynb5.4 KB
  • 06_dask_bags.ipynb5.3 KB
  • 07_advanced_dataframes.ipynb6.7 KB
  • 08_dask_ml.ipynb7.2 KB
02_dask_dataframes.ipynb
notebooks/02_dask_dataframes.ipynb
Raw Download
Find: Go to:
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Dask DataFrames - Parallel DataFrame Computing\n",
        "\n",
        "<!--\n",
        "Project: Dask Parallel Computing\n",
        "Author: Molla Samser\n",
        "Designer & Tester: Rima Khatun\n",
        "Website: https://rskworld.in\n",
        "Email: help@rskworld.in, support@rskworld.in\n",
        "Phone: +91 93305 39277\n",
        "-->\n",
        "\n",
        "This notebook demonstrates how to use Dask DataFrames for parallel processing of large datasets.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import dask.dataframe as dd\n",
        "import pandas as pd\n",
        "import numpy as np\n",
        "import time\n",
        "\n",
        "print(\"Dask DataFrames Demo\")\n",
        "print(\"=\" * 50)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Creating Sample Data\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Create a large sample dataset\n",
        "n_rows = 1000000\n",
        "n_cols = 10\n",
        "\n",
        "data = {\n",
        "    'id': range(n_rows),\n",
        "    'value1': np.random.randn(n_rows),\n",
        "    'value2': np.random.randn(n_rows),\n",
        "    'category': np.random.choice(['A', 'B', 'C', 'D'], n_rows),\n",
        "    'date': pd.date_range('2020-01-01', periods=n_rows, freq='1min')\n",
        "}\n",
        "\n",
        "# Save to CSV for demonstration\n",
        "df_pandas = pd.DataFrame(data)\n",
        "df_pandas.to_csv('../data/sample_data.csv', index=False)\n",
        "print(f\"Created CSV with {len(df_pandas)} rows\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Reading Large CSV Files\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Read CSV with Dask (chunked, lazy loading)\n",
        "df_dask = dd.read_csv('../data/sample_data.csv')\n",
        "\n",
        "print(f\"Dask DataFrame shape: {df_dask.shape}\")\n",
        "print(f\"Number of partitions: {df_dask.npartitions}\")\n",
        "print(f\"Columns: {list(df_dask.columns)}\")\n",
        "print(\"\\nFirst few rows:\")\n",
        "print(df_dask.head())\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## DataFrame Operations\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Groupby operations\n",
        "print(\"Performing groupby operation...\")\n",
        "start_time = time.time()\n",
        "\n",
        "grouped = df_dask.groupby('category').agg({\n",
        "    'value1': 'mean',\n",
        "    'value2': 'sum',\n",
        "    'id': 'count'\n",
        "}).compute()\n",
        "\n",
        "end_time = time.time()\n",
        "print(f\"\\nGroupby result:\")\n",
        "print(grouped)\n",
        "print(f\"\\nComputation time: {end_time - start_time:.2f} seconds\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Filtering and Selection\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Filter data\n",
        "filtered = df_dask[df_dask['value1'] > 0]\n",
        "print(f\"Rows with value1 > 0: {len(filtered)}\")\n",
        "\n",
        "# Compute statistics\n",
        "stats = filtered[['value1', 'value2']].describe().compute()\n",
        "print(\"\\nStatistics:\")\n",
        "print(stats)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Comparison with Pandas\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Compare performance on smaller dataset\n",
        "small_df = df_pandas.head(100000)\n",
        "\n",
        "# Pandas\n",
        "start = time.time()\n",
        "pandas_result = small_df.groupby('category')['value1'].mean()\n",
        "pandas_time = time.time() - start\n",
        "\n",
        "# Dask\n",
        "dask_small = dd.from_pandas(small_df, npartitions=4)\n",
        "start = time.time()\n",
        "dask_result = dask_small.groupby('category')['value1'].mean().compute()\n",
        "dask_time = time.time() - start\n",
        "\n",
        "print(f\"Pandas time: {pandas_time:.4f} seconds\")\n",
        "print(f\"Dask time: {dask_time:.4f} seconds\")\n",
        "print(f\"\\nResults match: {pandas_result.equals(dask_result)}\")\n"
      ]
    }
  ],
  "metadata": {
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}
183 lines•5 KB
json

About RSK World

Founded by Molla Samser, with Designer & Tester Rima Khatun, RSK World is your one-stop destination for free programming resources, source code, and development tools.

Founder: Molla Samser
Designer & Tester: Rima Khatun

Development

  • Game Development
  • Web Development
  • Mobile Development
  • AI Development
  • Development Tools

Legal

  • Terms & Conditions
  • Privacy Policy
  • Disclaimer

Contact Info

Nutanhat, Mongolkote
Purba Burdwan, West Bengal
India, 713147

+91 93305 39277

hello@rskworld.in
support@rskworld.in

© 2026 RSK World. All rights reserved.

Content used for educational purposes only. View Disclaimer