help@rskworld.in +91 93305 39277
RSK World
  • Home
  • Development
    • Web Development
    • Mobile Apps
    • Software
    • Games
    • Project
  • Technologies
    • Data Science
    • AI Development
    • Cloud Development
    • Blockchain
    • Cyber Security
    • Dev Tools
    • Testing Tools
  • About
  • Contact

Theme Settings

Color Scheme
Display Options
Font Size
100%
Back to Project
RSK World
polars-fastdataframes
/
notebooks
RSK World
polars-fastdataframes
High-performance DataFrames with Polars
notebooks
  • 01_basic_operations.ipynb7.1 KB
  • 02_lazy_evaluation.ipynb5.5 KB
  • 03_performance_comparison.ipynb7.2 KB
  • 04_advanced_queries.ipynb45.3 KB
02_lazy_evaluation.ipynb
notebooks/02_lazy_evaluation.ipynb
Raw Download
Find: Go to:
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Lazy Evaluation and Optimization in Polars\n",
        "\n",
        "<!--\n",
        "Author: RSK World\n",
        "Website: https://rskworld.in\n",
        "Email: help@rskworld.in\n",
        "Phone: +91 93305 39277\n",
        "-->\n",
        "\n",
        "This notebook demonstrates Polars' lazy evaluation capabilities, which allow for query optimization and efficient processing of large datasets.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Author: RSK World\n",
        "# Website: https://rskworld.in\n",
        "# Email: help@rskworld.in\n",
        "# Phone: +91 93305 39277\n",
        "\n",
        "import polars as pl\n",
        "import numpy as np\n",
        "from datetime import datetime, timedelta\n",
        "\n",
        "print(\"Polars version:\", pl.__version__)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 1. Understanding Lazy Evaluation\n",
        "\n",
        "Lazy evaluation means that operations are not executed immediately. Instead, Polars builds a query plan that is optimized before execution.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Create a sample DataFrame\n",
        "df = pl.DataFrame({\n",
        "    'id': range(1, 10001),\n",
        "    'category': np.random.choice(['A', 'B', 'C', 'D', 'E'], 10000),\n",
        "    'value1': np.random.randn(10000) * 100,\n",
        "    'value2': np.random.randn(10000) * 50,\n",
        "    'value3': np.random.randint(1, 1000, 10000)\n",
        "})\n",
        "\n",
        "print(\"DataFrame shape:\", df.shape)\n",
        "print(\"\\nFirst few rows:\")\n",
        "df.head()\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 2. Creating a LazyFrame\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Convert DataFrame to LazyFrame\n",
        "lazy_df = df.lazy()\n",
        "\n",
        "print(\"Type:\", type(lazy_df))\n",
        "print(\"\\nLazyFrame operations are not executed until .collect() is called\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 3. Building a Lazy Query\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Build a complex query (not executed yet)\n",
        "query = (lazy_df\n",
        "    .filter(pl.col('value1') > 50)\n",
        "    .filter(pl.col('value2') < 20)\n",
        "    .select(['id', 'category', 'value1', 'value2'])\n",
        "    .group_by('category')\n",
        "    .agg([\n",
        "        pl.col('value1').mean().alias('avg_value1'),\n",
        "        pl.col('value2').mean().alias('avg_value2'),\n",
        "        pl.count().alias('count')\n",
        "    ])\n",
        "    .sort('avg_value1', descending=True)\n",
        ")\n",
        "\n",
        "print(\"Query built but not executed yet!\")\n",
        "print(\"Type:\", type(query))\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 4. Viewing the Query Plan\n",
        "\n",
        "Polars can show you the optimized query plan before execution.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Show the query plan\n",
        "print(\"Query Plan:\")\n",
        "print(query.explain())\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 5. Executing the Query\n",
        "\n",
        "Now we execute the query using `.collect()`\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Execute the query\n",
        "result = query.collect()\n",
        "print(\"Query executed!\")\n",
        "result\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 6. Query Optimization Benefits\n",
        "\n",
        "Lazy evaluation allows Polars to optimize queries by:\n",
        "- Pushing predicates down (filtering early)\n",
        "- Projection pushdown (selecting only needed columns)\n",
        "- Predicate combination (combining multiple filters)\n",
        "- Join reordering\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Example: Reading from CSV with lazy evaluation\n",
        "# This is more efficient for large files\n",
        "try:\n",
        "    lazy_from_csv = pl.scan_csv('data/sample_data.csv')\n",
        "    print(\"LazyFrame from CSV created\")\n",
        "    print(\"\\nQuery plan:\")\n",
        "    print(lazy_from_csv.filter(pl.col('price') > 100).select(['name', 'price']).explain())\n",
        "except FileNotFoundError:\n",
        "    print(\"Sample data file not found. Run data_generator.py first.\")\n"
      ]
    }
  ],
  "metadata": {
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}
197 lines•5.5 KB
json

About RSK World

Founded by Molla Samser, with Designer & Tester Rima Khatun, RSK World is your one-stop destination for free programming resources, source code, and development tools.

Founder: Molla Samser
Designer & Tester: Rima Khatun

Development

  • Game Development
  • Web Development
  • Mobile Development
  • AI Development
  • Development Tools

Legal

  • Terms & Conditions
  • Privacy Policy
  • Disclaimer

Contact Info

Nutanhat, Mongolkote
Purba Burdwan, West Bengal
India, 713147

+91 93305 39277

hello@rskworld.in
support@rskworld.in

© 2026 RSK World. All rights reserved.

Content used for educational purposes only. View Disclaimer