README.md

# NLP Text Analysis Bot

**Developer: RSK World**
**Website: https://rskworld.in**
**Email: help@rskworld.in**
**Phone: +91 93305 39277**
**Year: 2026**

## Overview

An advanced Natural Language Processing chatbot that provides comprehensive text analysis capabilities including sentiment detection, entity recognition, semantic understanding, and text preprocessing.

## Features

### Core Features
- **Text Preprocessing**: Cleaning, tokenization, stopword removal, and lemmatization
- **Sentiment Analysis**: Multi-method sentiment detection using VADER and transformer models
- **Entity Recognition**: Named entity extraction using spaCy
- **Semantic Understanding**: Keyword extraction, topic identification, and phrase analysis
- **NLP Pipeline**: Complete end-to-end text analysis workflow

### Advanced Features
- **Text Summarization**: Extractive and abstractive summarization using transformer models
- **Language Detection**: Automatic language detection with confidence scores
- **Text Classification**: Zero-shot text classification into multiple categories
- **Emotion Detection**: Advanced emotion analysis beyond basic sentiment (joy, sadness, anger, fear, etc.)
- **Readability Analysis**: Multiple readability metrics (Flesch, SMOG, Coleman-Liau, etc.)
- **Advanced Keyword Extraction**: TF-IDF based keyword extraction with n-grams
- **Part-of-Speech Tagging**: Complete POS analysis with distribution statistics
- **Text Similarity**: Calculate similarity between texts using multiple methods

## Technologies

- **NLTK**: Natural Language Toolkit for text processing
- **spaCy**: Advanced NLP library for entity recognition and semantic analysis
- **Python**: Core programming language
- **Transformers**: Hugging Face transformers for advanced NLP tasks
- **Flask**: Web framework for API and interface
- **scikit-learn**: Machine learning library for TF-IDF and similarity calculations
- **langdetect**: Language detection library
- **textstat**: Readability analysis library

## Installation

1. **Clone or download the project**

2. **Install Python dependencies:**
```bash
pip install -r requirements.txt
```

3. **Download spaCy English model:**
```bash
python -m spacy download en_core_web_sm
```

4. **Download NLTK data (automatically handled, but can be done manually):**
```python
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('vader_lexicon')
nltk.download('averaged_perceptron_tagger')
```

## Usage

### Running the Web Application

1. **Start the Flask server:**
```bash
python app.py
```

2. **Open your browser and navigate to:**
```
http://localhost:5000
```

3. **Enter text in the input field and click "Analyze Text"**

### Using the API

**Endpoint:** `POST /api/analyze`

**Request:**
```json
{
"text": "Your text to analyze here"
}
```

**Response:**
```json
{
"original_text": "...",
"preprocessing": {...},
"sentiment": {...},
"entities": {...},
"semantic": {...},
"summary": {...}
}
```

### Using as a Python Module

```python
from nlp_pipeline import NLPPipeline

# Initialize pipeline
pipeline = NLPPipeline()

# Analyze text
results = pipeline.analyze("Your text here")

# Access results
print(results['sentiment'])
print(results['entities'])
print(results['semantic'])
```

## Project Structure

```
nlp-text-analysis-bot/
├── app.py # Flask web application
├── nlp_pipeline.py # Main NLP pipeline orchestrator
├── text_preprocessing.py # Text cleaning and preprocessing
├── sentiment_analysis.py # Sentiment analysis module
├── entity_recognition.py # Named entity recognition
├── semantic_understanding.py # Semantic analysis module
├── templates/
│ └── index.html # Web interface
├── requirements.txt # Python dependencies
└── README.md # This file
```

## API Endpoints

- `GET /` - Web interface
- `POST /api/analyze` - Complete text analysis endpoint (all features)
- `POST /api/similarity` - Text similarity comparison endpoint
- `GET /api/health` - Health check endpoint

### API Usage Examples

**Text Analysis:**
```json
POST /api/analyze
{
"text": "Your text to analyze here"
}
```

**Text Similarity:**
```json
POST /api/similarity
{
"text1": "First text",
"text2": "Second text"
}
```

## Example Analysis Output

The comprehensive analysis provides:
- **Text Statistics**: Word count, sentence count, vocabulary richness
- **Language Detection**: Detected language with confidence scores
- **Sentiment Scores**: Overall sentiment with detailed breakdown
- **Emotion Detection**: Primary emotion and emotion distribution
- **Named Entities**: People, organizations, locations, etc.
- **Text Classification**: Category classification with confidence
- **Text Summarization**: Extractive or abstractive summary
- **Readability Metrics**: Multiple readability scores and grade levels
- **Advanced Keywords**: TF-IDF keywords, bigrams, and trigrams
- **POS Analysis**: Part-of-speech distribution and statistics
- **Keywords & Topics**: Main themes and important terms
- **Preprocessing Details**: Cleaned text and tokenization results

## Requirements

- Python 3.8+
- 4GB+ RAM recommended (for transformer models)
- Internet connection (for downloading models on first run)

## License

This project is provided by RSK World for educational and development purposes.

## Support

For support, contact:
- **Website**: https://rskworld.in
- **Email**: help@rskworld.in
- **Phone**: +91 93305 39277

---

**Developed by RSK World - 2026**

sentiment_analysis.py

Raw Download

"""
Sentiment Analysis Module
Analyzes sentiment using NLTK and transformers

Developer: RSK World
Website: https://rskworld.in
Email: help@rskworld.in
Phone: +91 93305 39277
Year: 2026
"""

from transformers import pipeline
from nltk.sentiment import SentimentIntensityAnalyzer
import nltk

# Download required NLTK data
try:
    nltk.data.find('vader_lexicon')
except LookupError:
    nltk.download('vader_lexicon', quiet=True)

class SentimentAnalyzer:
    """
    Sentiment analysis class using multiple methods
    Developer: RSK World - https://rskworld.in
    """
    
    def __init__(self):
        """Initialize sentiment analyzers"""
        # VADER sentiment analyzer (NLTK)
        self.vader_analyzer = SentimentIntensityAnalyzer()
        
        # Transformer-based sentiment analyzer
        try:
            self.transformer_analyzer = pipeline(
                "sentiment-analysis",
                model="cardiffnlp/twitter-roberta-base-sentiment-latest",
                device=-1  # Use CPU
            )
        except Exception as e:
            print(f"Warning: Could not load transformer model: {e}")
            self.transformer_analyzer = None
    
    def analyze_vader(self, text):
        """
        Analyze sentiment using VADER
        
        Args:
            text (str): Input text
            
        Returns:
            dict: VADER sentiment scores
        """
        scores = self.vader_analyzer.polarity_scores(text)
        
        # Determine label
        if scores['compound'] >= 0.05:
            label = 'positive'
        elif scores['compound'] <= -0.05:
            label = 'negative'
        else:
            label = 'neutral'
        
        return {
            'label': label,
            'compound': scores['compound'],
            'positive': scores['pos'],
            'neutral': scores['neu'],
            'negative': scores['neg']
        }
    
    def analyze_transformer(self, text):
        """
        Analyze sentiment using transformer model
        
        Args:
            text (str): Input text
            
        Returns:
            dict: Transformer sentiment results or None
        """
        if self.transformer_analyzer is None:
            return None
        
        try:
            # Truncate text if too long
            max_length = 512
            if len(text) > max_length:
                text = text[:max_length]
            
            result = self.transformer_analyzer(text)[0]
            
            # Map labels
            label_mapping = {
                'LABEL_0': 'negative',
                'LABEL_1': 'neutral',
                'LABEL_2': 'positive'
            }
            
            label = label_mapping.get(result['label'], result['label'].lower())
            
            return {
                'label': label,
                'score': result['score']
            }
        except Exception as e:
            print(f"Error in transformer analysis: {e}")
            return None
    
    def analyze(self, text):
        """
        Complete sentiment analysis using multiple methods
        
        Args:
            text (str): Input text
            
        Returns:
            dict: Combined sentiment analysis results
        """
        vader_result = self.analyze_vader(text)
        transformer_result = self.analyze_transformer(text)
        
        # Combine results
        result = {
            'vader': vader_result,
            'label': vader_result['label'],
            'score': vader_result['compound']
        }
        
        if transformer_result:
            result['transformer'] = transformer_result
            # Use transformer result if available (more accurate)
            result['label'] = transformer_result['label']
            result['score'] = transformer_result['score']
        
        return result

138 lines•3.9 KB

python

Theme Settings

Color Scheme

Display Options

Font Size

README.md