help@rskworld.in +91 93305 39277
RSK World
  • Home
  • Development
    • Web Development
    • Mobile Apps
    • Software
    • Games
    • Project
  • Technologies
    • Data Science
    • AI Development
    • Cloud Development
    • Blockchain
    • Cyber Security
    • Dev Tools
    • Testing Tools
  • About
  • Contact

Theme Settings

Color Scheme
Display Options
Font Size
100%
Back to Project
RSK World
text-classification
RSK World
text-classification
Text Classification Dataset - NLP + Multi-Class Classification + Machine Learning
text-classification
  • assets
  • data
  • models
  • notebooks
  • scripts
  • .gitignore1.4 KB
  • CHANGELOG.md2.6 KB
  • LICENSE3.2 KB
  • README.md11.1 KB
  • classifier.html34.1 KB
  • dashboard.html41.4 KB
  • explorer.html41.4 KB
  • index.html28.4 KB
  • requirements.txt1.8 KB
  • text-classification.svg4.6 KB
index.html
index.html
Raw Download
Find: Go to:
<!--
================================================================================
  Text Classification Dataset Project
================================================================================
  Project: Text Classification Dataset
  Category: Text Data / NLP
  Description: Multi-class text classification dataset with labeled documents 
               for news categorization, topic classification, and document analysis.
  
  Author: Molla Samser
  Designer & Tester: Rima Khatun
  Website: https://rskworld.in
  Email: help@rskworld.in | support@rskworld.in
  Phone: +91 93305 39277
  
  Copyright (c) 2026 RSK World - All Rights Reserved
  This content is provided for educational purposes only.
  
  Created: December 2026
  Last Modified: December 2026
================================================================================
-->
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta name="description" content="Text Classification Dataset - Multi-class text classification dataset with labeled documents for news categorization, topic classification, and document analysis. By RSK World.">
    <meta name="keywords" content="text classification, NLP, machine learning, dataset, news categorization, BERT, transformers, document classification, RSK World">
    <meta name="author" content="Molla Samser - RSK World">
    <meta name="robots" content="index, follow">
    
    <!-- Open Graph Meta Tags -->
    <meta property="og:title" content="Text Classification Dataset - RSK World">
    <meta property="og:description" content="Multi-class text classification dataset with labeled documents for NLP tasks.">
    <meta property="og:image" content="text-classification.png">
    <meta property="og:url" content="https://rskworld.in/text-classification/">
    <meta property="og:type" content="website">
    
    <!-- Twitter Card Meta Tags -->
    <meta name="twitter:card" content="summary_large_image">
    <meta name="twitter:title" content="Text Classification Dataset - RSK World">
    <meta name="twitter:description" content="Multi-class text classification dataset for NLP and machine learning.">
    
    <title>Text Classification Dataset | RSK World</title>
    
    <!-- Favicon -->
    <link rel="icon" type="image/svg+xml" href="assets/favicon.svg">
    
    <!-- Google Fonts -->
    <link rel="preconnect" href="https://fonts.googleapis.com">
    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
    <link href="https://fonts.googleapis.com/css2?family=Playfair+Display:wght@400;500;600;700&family=Source+Sans+3:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
    
    <!-- Font Awesome -->
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.1/css/all.min.css">
    
    <!-- Custom CSS -->
    <link rel="stylesheet" href="assets/css/style.css">
</head>
<body>
    <!-- Animated Background -->
    <div class="bg-animation">
        <div class="floating-shapes">
            <span></span><span></span><span></span><span></span><span></span>
            <span></span><span></span><span></span><span></span><span></span>
        </div>
    </div>

    <!-- Header -->
    <header class="header">
        <div class="container">
            <nav class="navbar">
                <a href="https://rskworld.in" class="logo">
                    <i class="fas fa-brain"></i>
                    <span>RSK<span class="highlight">World</span></span>
                </a>
                <ul class="nav-links">
                    <li><a href="#overview">Overview</a></li>
                    <li><a href="#features">Features</a></li>
                    <li><a href="#dataset">Dataset</a></li>
                    <li><a href="#usage">Usage</a></li>
                    <li><a href="#download">Download</a></li>
                    <li><a href="explorer.html"><i class="fas fa-search"></i> Explorer</a></li>
                    <li><a href="classifier.html"><i class="fas fa-robot"></i> Classifier</a></li>
                    <li><a href="dashboard.html" class="nav-highlight"><i class="fas fa-chart-line"></i> Dashboard</a></li>
                </ul>
                <button class="mobile-menu-btn" aria-label="Toggle Menu">
                    <span></span>
                    <span></span>
                    <span></span>
                </button>
            </nav>
        </div>
    </header>

    <!-- Hero Section -->
    <section class="hero">
        <div class="container">
            <div class="hero-content">
                <div class="hero-badge">
                    <i class="fas fa-file-alt"></i>
                    <span>Text Data</span>
                </div>
                <h1 class="hero-title">
                    Text Classification
                    <span class="gradient-text">Dataset</span>
                </h1>
                <p class="hero-description">
                    Multi-class text classification dataset with labeled documents for news categorization, 
                    topic classification, and document analysis. Perfect for NLP model training and research.
                </p>
                <div class="hero-stats">
                    <div class="stat-item">
                        <span class="stat-number" data-target="10000">0</span>
                        <span class="stat-label">Documents</span>
                    </div>
                    <div class="stat-item">
                        <span class="stat-number" data-target="6">0</span>
                        <span class="stat-label">Categories</span>
                    </div>
                    <div class="stat-item">
                        <span class="stat-number" data-target="3">0</span>
                        <span class="stat-label">Formats</span>
                    </div>
                </div>
                <div class="hero-actions">
                    <a href="#download" class="btn btn-primary">
                        <i class="fas fa-download"></i>
                        Download Dataset
                    </a>
                    <a href="#dataset" class="btn btn-outline">
                        <i class="fas fa-eye"></i>
                        Preview Data
                    </a>
                </div>
            </div>
            <div class="hero-visual">
                <div class="code-window">
                    <div class="window-header">
                        <div class="window-dots">
                            <span class="dot red"></span>
                            <span class="dot yellow"></span>
                            <span class="dot green"></span>
                        </div>
                        <span class="window-title">text_classifier.py</span>
                    </div>
                    <pre class="code-content"><code><span class="comment"># Text Classification with Transformers</span>
<span class="keyword">from</span> transformers <span class="keyword">import</span> AutoTokenizer
<span class="keyword">from</span> transformers <span class="keyword">import</span> AutoModelForSequenceClassification

<span class="comment"># Load pretrained model</span>
model_name = <span class="string">"bert-base-uncased"</span>
tokenizer = AutoTokenizer.from_pretrained(model_name)

<span class="comment"># Categories</span>
categories = [
    <span class="string">"Technology"</span>, <span class="string">"Sports"</span>,
    <span class="string">"Politics"</span>, <span class="string">"Entertainment"</span>,
    <span class="string">"Business"</span>, <span class="string">"Science"</span>
]

<span class="comment"># Classify text</span>
<span class="keyword">def</span> <span class="function">classify</span>(text):
    inputs = tokenizer(text, return_tensors=<span class="string">"pt"</span>)
    outputs = model(**inputs)
    <span class="keyword">return</span> categories[outputs.argmax()]</code></pre>
                </div>
            </div>
        </div>
    </section>

    <!-- Overview Section -->
    <section id="overview" class="section overview">
        <div class="container">
            <div class="section-header">
                <span class="section-badge">About Dataset</span>
                <h2 class="section-title">Comprehensive Text Classification Resource</h2>
                <p class="section-subtitle">
                    This dataset includes labeled documents across multiple categories for text classification tasks. 
                    Perfect for news categorization, topic classification, document analysis, and NLP model training.
                </p>
            </div>
            <div class="overview-grid">
                <div class="overview-card">
                    <div class="card-icon">
                        <i class="fas fa-newspaper"></i>
                    </div>
                    <h3>News Categorization</h3>
                    <p>Classify news articles into categories like technology, sports, politics, and more.</p>
                </div>
                <div class="overview-card">
                    <div class="card-icon">
                        <i class="fas fa-tags"></i>
                    </div>
                    <h3>Topic Classification</h3>
                    <p>Identify main topics and themes from unstructured text documents.</p>
                </div>
                <div class="overview-card">
                    <div class="card-icon">
                        <i class="fas fa-file-alt"></i>
                    </div>
                    <h3>Document Analysis</h3>
                    <p>Analyze and categorize large volumes of documents automatically.</p>
                </div>
                <div class="overview-card">
                    <div class="card-icon">
                        <i class="fas fa-robot"></i>
                    </div>
                    <h3>Model Training</h3>
                    <p>Train and fine-tune transformer models like BERT for text classification.</p>
                </div>
            </div>
        </div>
    </section>

    <!-- Features Section -->
    <section id="features" class="section features">
        <div class="container">
            <div class="section-header">
                <span class="section-badge">Features</span>
                <h2 class="section-title">Dataset Features</h2>
            </div>
            <div class="features-list">
                <div class="feature-item">
                    <div class="feature-icon">
                        <i class="fas fa-layer-group"></i>
                    </div>
                    <div class="feature-content">
                        <h3>Multiple Document Categories</h3>
                        <p>6 distinct categories covering technology, sports, politics, entertainment, business, and science topics.</p>
                    </div>
                </div>
                <div class="feature-item">
                    <div class="feature-icon">
                        <i class="fas fa-check-circle"></i>
                    </div>
                    <div class="feature-content">
                        <h3>Labeled Training Data</h3>
                        <p>All documents are professionally labeled with accurate category assignments for supervised learning.</p>
                    </div>
                </div>
                <div class="feature-item">
                    <div class="feature-icon">
                        <i class="fas fa-balance-scale"></i>
                    </div>
                    <div class="feature-content">
                        <h3>Test and Validation Sets</h3>
                        <p>Pre-split into training, validation, and test sets for proper model evaluation.</p>
                    </div>
                </div>
                <div class="feature-item">
                    <div class="feature-icon">
                        <i class="fas fa-cogs"></i>
                    </div>
                    <div class="feature-content">
                        <h3>Preprocessed Versions</h3>
                        <p>Includes cleaned, tokenized, and normalized versions ready for immediate use.</p>
                    </div>
                </div>
                <div class="feature-item">
                    <div class="feature-icon">
                        <i class="fas fa-microchip"></i>
                    </div>
                    <div class="feature-content">
                        <h3>Transformer Ready Format</h3>
                        <p>Formatted for direct use with BERT, RoBERTa, and other transformer architectures.</p>
                    </div>
                </div>
            </div>
        </div>
    </section>

    <!-- Technologies Section -->
    <section class="section technologies">
        <div class="container">
            <div class="section-header">
                <span class="section-badge">Technologies</span>
                <h2 class="section-title">Compatible Technologies</h2>
            </div>
            <div class="tech-grid">
                <div class="tech-card">
                    <i class="fas fa-file-csv"></i>
                    <span>CSV</span>
                </div>
                <div class="tech-card">
                    <i class="fas fa-file-alt"></i>
                    <span>TXT</span>
                </div>
                <div class="tech-card">
                    <i class="fas fa-code"></i>
                    <span>JSON</span>
                </div>
                <div class="tech-card">
                    <i class="fas fa-exchange-alt"></i>
                    <span>Transformers</span>
                </div>
                <div class="tech-card">
                    <i class="fas fa-brain"></i>
                    <span>BERT</span>
                </div>
            </div>
        </div>
    </section>

    <!-- Dataset Preview Section -->
    <section id="dataset" class="section dataset-preview">
        <div class="container">
            <div class="section-header">
                <span class="section-badge">Data Preview</span>
                <h2 class="section-title">Sample Dataset</h2>
            </div>
            <div class="preview-tabs">
                <button class="tab-btn active" data-tab="csv">CSV Format</button>
                <button class="tab-btn" data-tab="json">JSON Format</button>
                <button class="tab-btn" data-tab="stats">Statistics</button>
            </div>
            <div class="preview-content">
                <div class="tab-content active" id="csv">
                    <div class="data-table-wrapper">
                        <table class="data-table">
                            <thead>
                                <tr>
                                    <th>ID</th>
                                    <th>Text</th>
                                    <th>Category</th>
                                    <th>Label</th>
                                </tr>
                            </thead>
                            <tbody>
                                <tr>
                                    <td>1</td>
                                    <td>Apple announces new iPhone with revolutionary AI features...</td>
                                    <td><span class="category-badge tech">Technology</span></td>
                                    <td>0</td>
                                </tr>
                                <tr>
                                    <td>2</td>
                                    <td>Manchester United wins Premier League title after dramatic finish...</td>
                                    <td><span class="category-badge sports">Sports</span></td>
                                    <td>1</td>
                                </tr>
                                <tr>
                                    <td>3</td>
                                    <td>Senate passes new climate bill with bipartisan support...</td>
                                    <td><span class="category-badge politics">Politics</span></td>
                                    <td>2</td>
                                </tr>
                                <tr>
                                    <td>4</td>
                                    <td>Marvel releases trailer for upcoming superhero movie...</td>
                                    <td><span class="category-badge entertainment">Entertainment</span></td>
                                    <td>3</td>
                                </tr>
                                <tr>
                                    <td>5</td>
                                    <td>Stock market reaches all-time high amid economic recovery...</td>
                                    <td><span class="category-badge business">Business</span></td>
                                    <td>4</td>
                                </tr>
                                <tr>
                                    <td>6</td>
                                    <td>NASA discovers new exoplanet potentially habitable...</td>
                                    <td><span class="category-badge science">Science</span></td>
                                    <td>5</td>
                                </tr>
                            </tbody>
                        </table>
                    </div>
                </div>
                <div class="tab-content" id="json">
                    <div class="code-window">
                        <div class="window-header">
                            <div class="window-dots">
                                <span class="dot red"></span>
                                <span class="dot yellow"></span>
                                <span class="dot green"></span>
                            </div>
                            <span class="window-title">dataset.json</span>
                        </div>
                        <pre class="code-content json-preview"><code>{
  <span class="json-key">"dataset"</span>: {
    <span class="json-key">"name"</span>: <span class="json-string">"Text Classification Dataset"</span>,
    <span class="json-key">"version"</span>: <span class="json-string">"1.0.0"</span>,
    <span class="json-key">"total_samples"</span>: <span class="json-number">10000</span>,
    <span class="json-key">"categories"</span>: [
      <span class="json-string">"Technology"</span>,
      <span class="json-string">"Sports"</span>,
      <span class="json-string">"Politics"</span>,
      <span class="json-string">"Entertainment"</span>,
      <span class="json-string">"Business"</span>,
      <span class="json-string">"Science"</span>
    ]
  },
  <span class="json-key">"samples"</span>: [
    {
      <span class="json-key">"id"</span>: <span class="json-number">1</span>,
      <span class="json-key">"text"</span>: <span class="json-string">"Apple announces new iPhone..."</span>,
      <span class="json-key">"category"</span>: <span class="json-string">"Technology"</span>,
      <span class="json-key">"label"</span>: <span class="json-number">0</span>
    }
  ]
}</code></pre>
                    </div>
                </div>
                <div class="tab-content" id="stats">
                    <div class="stats-grid">
                        <div class="stat-card">
                            <div class="stat-chart">
                                <canvas id="categoryChart"></canvas>
                            </div>
                        </div>
                        <div class="stat-card">
                            <h3>Dataset Statistics</h3>
                            <ul class="stats-list">
                                <li><span>Total Documents:</span> <strong>10,000</strong></li>
                                <li><span>Training Set:</span> <strong>7,000 (70%)</strong></li>
                                <li><span>Validation Set:</span> <strong>1,500 (15%)</strong></li>
                                <li><span>Test Set:</span> <strong>1,500 (15%)</strong></li>
                                <li><span>Avg. Document Length:</span> <strong>256 tokens</strong></li>
                                <li><span>Vocabulary Size:</span> <strong>50,000+</strong></li>
                            </ul>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </section>

    <!-- Usage Section -->
    <section id="usage" class="section usage">
        <div class="container">
            <div class="section-header">
                <span class="section-badge">How to Use</span>
                <h2 class="section-title">Quick Start Guide</h2>
            </div>
            <div class="usage-steps">
                <div class="step">
                    <div class="step-number">01</div>
                    <div class="step-content">
                        <h3>Download Dataset</h3>
                        <p>Download the dataset in your preferred format (CSV, JSON, or TXT).</p>
                        <div class="code-snippet">
                            <code>wget https://rskworld.in/datasets/text-classification.zip</code>
                        </div>
                    </div>
                </div>
                <div class="step">
                    <div class="step-number">02</div>
                    <div class="step-content">
                        <h3>Load Data</h3>
                        <p>Load the dataset using pandas or your preferred library.</p>
                        <div class="code-snippet">
                            <code>import pandas as pd<br>df = pd.read_csv('train.csv')</code>
                        </div>
                    </div>
                </div>
                <div class="step">
                    <div class="step-number">03</div>
                    <div class="step-content">
                        <h3>Preprocess</h3>
                        <p>Apply tokenization and preprocessing using provided scripts.</p>
                        <div class="code-snippet">
                            <code>from preprocessing import TextPreprocessor<br>preprocessor = TextPreprocessor()</code>
                        </div>
                    </div>
                </div>
                <div class="step">
                    <div class="step-number">04</div>
                    <div class="step-content">
                        <h3>Train Model</h3>
                        <p>Train your classification model using transformers or sklearn.</p>
                        <div class="code-snippet">
                            <code>model.fit(X_train, y_train)<br>predictions = model.predict(X_test)</code>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </section>

    <!-- Download Section -->
    <section id="download" class="section download">
        <div class="container">
            <div class="section-header">
                <span class="section-badge">Download</span>
                <h2 class="section-title">Get the Dataset</h2>
            </div>
            <div class="download-options">
                <div class="download-card featured">
                    <div class="card-badge">Recommended</div>
                    <div class="card-icon">
                        <i class="fas fa-file-archive"></i>
                    </div>
                    <h3>Complete Package</h3>
                    <p>All formats, preprocessed data, and Python scripts included.</p>
                    <ul class="download-features">
                        <li><i class="fas fa-check"></i> CSV, JSON, TXT formats</li>
                        <li><i class="fas fa-check"></i> Train/Val/Test splits</li>
                        <li><i class="fas fa-check"></i> Preprocessing scripts</li>
                        <li><i class="fas fa-check"></i> Sample notebooks</li>
                    </ul>
                    <a href="text-classification.zip" class="btn btn-primary" download>
                        <i class="fas fa-download"></i>
                        Download ZIP (45 MB)
                    </a>
                </div>
                <div class="download-card">
                    <div class="card-icon">
                        <i class="fas fa-file-csv"></i>
                    </div>
                    <h3>CSV Only</h3>
                    <p>Raw dataset in CSV format.</p>
                    <a href="data/csv/full_dataset.csv" class="btn btn-outline" download>
                        <i class="fas fa-download"></i>
                        Download CSV
                    </a>
                </div>
                <div class="download-card">
                    <div class="card-icon">
                        <i class="fas fa-code"></i>
                    </div>
                    <h3>JSON Only</h3>
                    <p>Dataset in JSON format.</p>
                    <a href="data/json/full_dataset.json" class="btn btn-outline" download>
                        <i class="fas fa-download"></i>
                        Download JSON
                    </a>
                </div>
            </div>
        </div>
    </section>

    <!-- Footer -->
    <footer class="footer">
        <div class="container">
            <div class="footer-content">
                <div class="footer-brand">
                    <a href="https://rskworld.in" class="logo">
                        <i class="fas fa-brain"></i>
                        <span>RSK<span class="highlight">World</span></span>
                    </a>
                    <p>Your one-stop destination for free programming resources, source code, and development tools.</p>
                    <div class="social-links">
                        <a href="#" aria-label="Facebook"><i class="fab fa-facebook-f"></i></a>
                        <a href="#" aria-label="Twitter"><i class="fab fa-twitter"></i></a>
                        <a href="#" aria-label="Instagram"><i class="fab fa-instagram"></i></a>
                        <a href="#" aria-label="GitHub"><i class="fab fa-github"></i></a>
                        <a href="#" aria-label="LinkedIn"><i class="fab fa-linkedin-in"></i></a>
                    </div>
                </div>
                <div class="footer-links">
                    <div class="footer-column">
                        <h4>Quick Links</h4>
                        <ul>
                            <li><a href="https://rskworld.in">Home</a></li>
                            <li><a href="https://rskworld.in/about.php">About</a></li>
                            <li><a href="https://rskworld.in/contact.php">Contact</a></li>
                        </ul>
                    </div>
                    <div class="footer-column">
                        <h4>Resources</h4>
                        <ul>
                            <li><a href="#">Documentation</a></li>
                            <li><a href="#">Tutorials</a></li>
                            <li><a href="#">API Reference</a></li>
                        </ul>
                    </div>
                    <div class="footer-column">
                        <h4>Contact</h4>
                        <ul>
                            <li><i class="fas fa-envelope"></i> help@rskworld.in</li>
                            <li><i class="fas fa-phone"></i> +91 93305 39277</li>
                            <li><i class="fas fa-globe"></i> rskworld.in</li>
                        </ul>
                    </div>
                </div>
            </div>
            <div class="footer-bottom">
                <p>&copy; 2026 RSK World. All Rights Reserved. | Founded by <strong>Molla Samser</strong> | Designer & Tester: <strong>Rima Khatun</strong></p>
                <p>Content used for educational purposes only.</p>
            </div>
        </div>
    </footer>

    <!-- Back to Top Button -->
    <button class="back-to-top" aria-label="Back to Top">
        <i class="fas fa-arrow-up"></i>
    </button>

    <!-- Chart.js -->
    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
    
    <!-- Custom JavaScript -->
    <script src="assets/js/main.js"></script>
</body>
</html>

599 lines•28.4 KB
markup

About RSK World

Founded by Molla Samser, with Designer & Tester Rima Khatun, RSK World is your one-stop destination for free programming resources, source code, and development tools.

Founder: Molla Samser
Designer & Tester: Rima Khatun

Development

  • Game Development
  • Web Development
  • Mobile Development
  • AI Development
  • Development Tools

Legal

  • Terms & Conditions
  • Privacy Policy
  • Disclaimer

Contact Info

Nutanhat, Mongolkote
Purba Burdwan, West Bengal
India, 713147

+91 93305 39277

hello@rskworld.in
support@rskworld.in

© 2026 RSK World. All rights reserved.

Content used for educational purposes only. View Disclaimer