help@rskworld.in +91 93305 39277
RSK World
  • Home
  • Development
    • Web Development
    • Mobile Apps
    • Software
    • Games
    • Project
  • Technologies
    • Data Science
    • AI Development
    • Cloud Development
    • Blockchain
    • Cyber Security
    • Dev Tools
    • Testing Tools
  • About
  • Contact

Theme Settings

Color Scheme
Display Options
Font Size
100%

Sentiment Analysis Dataset

Comprehensive Sentiment Analysis dataset with 50,000+ labeled text samples across 3 sentiment classes (Positive, Neutral, Negative). Includes diverse text sources from product reviews, social media posts, and customer feedback. Pre-split into Training (80%) and Test (20%) sets. Available in CSV, JSON, and TXT formats. Features Python scripts for unlimited data generation, preprocessing with lemmatization & stemming, multi-method sentiment analysis (Lexicon, VADER, TextBlob), data visualization with word clouds, and ML model training (Naive Bayes, SVM, Logistic Regression, Random Forest). Compatible with NLTK, spaCy, scikit-learn, and popular NLP frameworks. Includes interactive demo website with Chart.js analytics dashboard. Perfect for sentiment classification, brand monitoring, customer review analysis, and NLP education projects.

Text Classification NLP Ready Machine Learning Download NLTK & spaCy 3 Classes Python Scripts scikit-learn
Download Free Source Code Live Demo RSK View Files
Sentiment Analysis Dataset - RSK World
Sentiment Analysis Dataset - RSK World
Text Classification NLP Machine Learning 3 Classes Python CSV/JSON

This project features a comprehensive Sentiment Analysis dataset designed for professional NLP, text classification, and machine learning applications. The dataset includes 50,000+ labeled text samples across 3 sentiment classes: Positive, Neutral, and Negative. Features diverse text sources from product reviews, social media posts, and customer feedback. Pre-split into Training (80%) and Test (20%) sets. Available in CSV, JSON, and TXT formats. Includes powerful Python scripts: generate_data.py for unlimited synthetic data generation, preprocess_data.py for text cleaning with lemmatization & stemming, analyze_sentiment.py for multi-method sentiment analysis (Lexicon, VADER, TextBlob), visualize_data.py for charts and word clouds, and train_model.py for ML model training (Naive Bayes, SVM, Logistic Regression, Random Forest). Compatible with NLTK, spaCy, TextBlob, VADER, and scikit-learn. The package includes interactive demo website with Chart.js analytics, comprehensive README.md, RELEASE_NOTES.md version history, and MIT License. Perfect for data scientists, researchers, students, and developers working on sentiment classification, brand monitoring, customer review analysis, and NLP education projects.

If you find this Sentiment Analysis Dataset useful, you can support with a small contribution.

Secure Fast Trusted
Pay via UPI QR
Scan or tap an amount to auto-generate
UPI QR
₹
Open UPI app
GPay PhonePe Paytm
Download Free Source Code

Dataset Overview

Complete sentiment analysis dataset with 50,000+ labeled text samples across 3 sentiment classes for NLP and machine learning.

  • 50,000+ labeled text samples
  • 3 sentiment classes
  • Positive, Neutral, Negative
  • Product reviews included
  • Social media posts included
  • Customer feedback samples
  • Pre-split: Train (80%), Test (20%)
  • Average text length: 142 characters
  • Balanced class distribution
  • Clean, high-quality labels
  • Perfect for NLP & ML training

Dataset Structure & Files

Well-organized folder structure with training and test splits plus comprehensive preprocessed data files.

  • data/sentiment_data.csv - Main dataset
  • data/sentiment_data.json - JSON format
  • data/sentiment_data.txt - Plain text format
  • data/train_data.csv - Training set (80%)
  • data/test_data.csv - Test set (20%)
  • preprocessed/cleaned_data.csv - Cleaned text
  • preprocessed/tokenized_data.json - Tokenized data
  • scripts/ - Python utilities
  • Consistent naming convention
  • Easy to load with pandas/NLTK
  • Ready for immediate use

Machine Learning Training

Complete training pipeline with support for 4 ML algorithms and cross-validation for reliable evaluation.

  • Naive Bayes classifier
  • Support Vector Machine (SVM)
  • Logistic Regression
  • Random Forest classifier
  • TF-IDF vectorization
  • Count vectorization
  • Cross-validation support
  • Model checkpointing
  • Performance metrics report
  • Best model auto-selection
  • Model export & persistence

Multiple File Formats

Dataset available in multiple formats for maximum compatibility with different NLP tools and frameworks.

  • CSV format (.csv files)
  • JSON format with metadata
  • Plain text format (.txt)
  • Pandas DataFrame ready
  • NLTK corpus compatible
  • spaCy pipeline ready
  • Easy format conversion
  • Unicode text support
  • UTF-8 encoding
  • Comment lines for metadata
  • Header row included

Analysis & Visualization

Comprehensive analysis tools with multiple sentiment analysis methods and visualization capabilities.

  • Lexicon-based analysis
  • VADER sentiment analyzer
  • TextBlob integration
  • Ensemble analysis method
  • Sentiment distribution charts
  • Word frequency analysis
  • Word cloud generation
  • Text length histogram
  • Source distribution pie chart
  • HTML report generation
  • Export visualization images

Compatible Frameworks

Works with all major NLP frameworks and libraries out of the box.

  • NLTK (Natural Language Toolkit)
  • spaCy industrial NLP
  • TextBlob sentiment analysis
  • VADER sentiment analyzer
  • scikit-learn ML library
  • pandas data manipulation
  • matplotlib visualization
  • wordcloud generation
  • TensorFlow/Keras ready
  • PyTorch compatible
  • Jupyter Notebook support

What You Get

Complete package with all files needed for professional sentiment analysis projects.

  • 50,000+ labeled text samples
  • 5 Python utility scripts
  • generate_data.py - Data generator
  • preprocess_data.py - Text preprocessor
  • analyze_sentiment.py - Sentiment analyzer
  • visualize_data.py - Data visualizer
  • train_model.py - Model trainer
  • Interactive demo website
  • Comprehensive README.md
  • RELEASE_NOTES.md version history
  • MIT License included

Interactive Demo Website

Beautiful demo website with dataset explorer, filtering, analytics dashboard, and step-by-step guide.

  • Modern animated design
  • Dataset preview explorer
  • Filter by sentiment class
  • Real-time sample browsing
  • Sentiment card visualization
  • Chart.js analytics dashboard
  • Dataset statistics display
  • Step-by-step usage guide
  • Python scripts documentation
  • Dark theme with gradients
  • Fully responsive layout

Python Scripts Included

Professional Python scripts for data generation, preprocessing, analysis, visualization, and model training.

  • generate_data.py - Unlimited data generation
  • preprocess_data.py - Text cleaning & tokenization
  • analyze_sentiment.py - Multi-method analysis
  • visualize_data.py - Charts & word clouds
  • train_model.py - ML model training
  • requirements.txt - Dependencies list
  • Command-line interface
  • Interactive mode support
  • Well-documented code
  • Type hints included
  • Easy to customize

Sentiment Classes

3 distinct sentiment classes covering the full spectrum of text sentiment for comprehensive classification.

  • Positive - Happy, satisfied, enthusiastic (18,500+ samples)
  • Neutral - Objective, balanced, informational (15,200+ samples)
  • Negative - Unhappy, disappointed, critical (16,300+ samples)
  • Product review sentiments
  • Social media post sentiments
  • Customer feedback sentiments
  • Clear labeling criteria
  • Human-verified labels
  • Balanced distribution option
  • Easy to extend classes
  • Total: 50,000+ samples

Credits & Acknowledgments

This dataset is provided for educational and research purposes. Core technologies and libraries are credited below.

  • Python 3.8+ - Programming language (PSF License)
  • NLTK - Natural Language Toolkit (Apache 2.0)
  • spaCy - Industrial NLP (MIT License)
  • TextBlob - Sentiment Analysis (MIT License)
  • VADER - Sentiment Analysis (MIT License)
  • scikit-learn - Machine Learning (BSD License)
  • matplotlib - Data Visualization (PSF License)
  • RSK World - Dataset creator and provider
  • GitHub Repository - Source code and releases
  • Author: Molla Samser | Designer: Rima Khatun
  • MIT License - Free for learning & research

Support & Contact

For commercial use, custom datasets, or integration help, please contact us.

  • Email: help@rskworld.in
  • Phone: +91 93305 39277
  • Website: RSKWORLD.in
  • Location: Nutanhat, Mongolkote, West Bengal, India
  • Author: Molla Samser
  • Designer & Tester: Rima Khatun
  • GitHub: Coming Soon
  • Sentiment Analysis Dataset Documentation
  • Technical Support Available
  • Custom Dataset Requests Welcome
Featured Content
Additional Sponsored Content

Download Free Source Code

Get the complete dataset bundle. You can view the files or download the dataset directly.

Download Free Source Code

Quick Links

Live Demo - Try Sentiment Analysis Click to explore
Download Free Source Code Click to explore
View Files (Browser) Click to explore
Explore All Dataset Projects by RSK World Click to explore
Explore All Data Science Projects by RSK World Click to explore

Categories

Text Classification NLP Machine Learning 3 Classes Python CSV/JSON

Technologies

Sentiment Analysis
NLTK
Text Classification
scikit-learn
Python

Explore More Datasets

NLP & Text Classification

Dataset Learning Dataset Computer Vision Python Image Classification
Satellite Image Dataset - rskworld.in
Satellite Image Dataset
Image Data

Satellite imagery dataset with land cover classification, urban planning, and en...

View Project
Action Recognition Dataset - rskworld.in
Action Recognition Dataset
Video Data

Video action recognition dataset with labeled video sequences for training actio...

View Project
Fraud Detection Dataset - rskworld.in
Fraud Detection Dataset
Tabular Data

Financial fraud detection dataset with transaction records, user behavior patter...

View Project
IoT Sensor Dataset - rskworld.in
IoT Sensor Dataset
Time Series Data

Internet of Things sensor dataset with temperature, humidity, pressure, and moti...

View Project
Question Answering Dataset - rskworld.in
Question Answering Dataset
Text Data

Question answering dataset with context passages, questions, and answers for tra...

View Project
View All Projects

About RSK World

Founded by Molla Samser, with Designer & Tester Rima Khatun, RSK World is your one-stop destination for free programming resources, source code, and development tools.

Founder: Molla Samser
Designer & Tester: Rima Khatun

Development

  • Game Development
  • Web Development
  • Mobile Development
  • AI Development
  • Development Tools

Legal

  • Terms & Conditions
  • Privacy Policy
  • Disclaimer

Contact Info

Nutanhat, Mongolkote
Purba Burdwan, West Bengal
India, 713147

+91 93305 39277

hello@rskworld.in
support@rskworld.in

© 2026 RSK World. All rights reserved.

Content used for educational purposes only. View Disclaimer

Support This Free Project

This project is completely free to download!

If you find it useful, consider supporting us with a small donation. Your support helps us create more free projects.

Pay via Razorpay

If you find this Sentiment Analysis Dataset useful, you can support with a small contribution.

Secure Fast Trusted
Payment Successful! Your download will start automatically...
Pay via UPI QR
Scan or tap an amount to auto-generate
UPI QR
₹
Open UPI app
GPay PhonePe Paytm