README.md

# NLP Text Analysis Bot

**Developer: RSK World**
**Website: https://rskworld.in**
**Email: help@rskworld.in**
**Phone: +91 93305 39277**
**Year: 2026**

## Overview

An advanced Natural Language Processing chatbot that provides comprehensive text analysis capabilities including sentiment detection, entity recognition, semantic understanding, and text preprocessing.

## Features

### Core Features
- **Text Preprocessing**: Cleaning, tokenization, stopword removal, and lemmatization
- **Sentiment Analysis**: Multi-method sentiment detection using VADER and transformer models
- **Entity Recognition**: Named entity extraction using spaCy
- **Semantic Understanding**: Keyword extraction, topic identification, and phrase analysis
- **NLP Pipeline**: Complete end-to-end text analysis workflow

### Advanced Features
- **Text Summarization**: Extractive and abstractive summarization using transformer models
- **Language Detection**: Automatic language detection with confidence scores
- **Text Classification**: Zero-shot text classification into multiple categories
- **Emotion Detection**: Advanced emotion analysis beyond basic sentiment (joy, sadness, anger, fear, etc.)
- **Readability Analysis**: Multiple readability metrics (Flesch, SMOG, Coleman-Liau, etc.)
- **Advanced Keyword Extraction**: TF-IDF based keyword extraction with n-grams
- **Part-of-Speech Tagging**: Complete POS analysis with distribution statistics
- **Text Similarity**: Calculate similarity between texts using multiple methods

## Technologies

- **NLTK**: Natural Language Toolkit for text processing
- **spaCy**: Advanced NLP library for entity recognition and semantic analysis
- **Python**: Core programming language
- **Transformers**: Hugging Face transformers for advanced NLP tasks
- **Flask**: Web framework for API and interface
- **scikit-learn**: Machine learning library for TF-IDF and similarity calculations
- **langdetect**: Language detection library
- **textstat**: Readability analysis library

## Installation

1. **Clone or download the project**

2. **Install Python dependencies:**
```bash
pip install -r requirements.txt
```

3. **Download spaCy English model:**
```bash
python -m spacy download en_core_web_sm
```

4. **Download NLTK data (automatically handled, but can be done manually):**
```python
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('vader_lexicon')
nltk.download('averaged_perceptron_tagger')
```

## Usage

### Running the Web Application

1. **Start the Flask server:**
```bash
python app.py
```

2. **Open your browser and navigate to:**
```
http://localhost:5000
```

3. **Enter text in the input field and click "Analyze Text"**

### Using the API

**Endpoint:** `POST /api/analyze`

**Request:**
```json
{
"text": "Your text to analyze here"
}
```

**Response:**
```json
{
"original_text": "...",
"preprocessing": {...},
"sentiment": {...},
"entities": {...},
"semantic": {...},
"summary": {...}
}
```

### Using as a Python Module

```python
from nlp_pipeline import NLPPipeline

# Initialize pipeline
pipeline = NLPPipeline()

# Analyze text
results = pipeline.analyze("Your text here")

# Access results
print(results['sentiment'])
print(results['entities'])
print(results['semantic'])
```

## Project Structure

```
nlp-text-analysis-bot/
├── app.py # Flask web application
├── nlp_pipeline.py # Main NLP pipeline orchestrator
├── text_preprocessing.py # Text cleaning and preprocessing
├── sentiment_analysis.py # Sentiment analysis module
├── entity_recognition.py # Named entity recognition
├── semantic_understanding.py # Semantic analysis module
├── templates/
│ └── index.html # Web interface
├── requirements.txt # Python dependencies
└── README.md # This file
```

## API Endpoints

- `GET /` - Web interface
- `POST /api/analyze` - Complete text analysis endpoint (all features)
- `POST /api/similarity` - Text similarity comparison endpoint
- `GET /api/health` - Health check endpoint

### API Usage Examples

**Text Analysis:**
```json
POST /api/analyze
{
"text": "Your text to analyze here"
}
```

**Text Similarity:**
```json
POST /api/similarity
{
"text1": "First text",
"text2": "Second text"
}
```

## Example Analysis Output

The comprehensive analysis provides:
- **Text Statistics**: Word count, sentence count, vocabulary richness
- **Language Detection**: Detected language with confidence scores
- **Sentiment Scores**: Overall sentiment with detailed breakdown
- **Emotion Detection**: Primary emotion and emotion distribution
- **Named Entities**: People, organizations, locations, etc.
- **Text Classification**: Category classification with confidence
- **Text Summarization**: Extractive or abstractive summary
- **Readability Metrics**: Multiple readability scores and grade levels
- **Advanced Keywords**: TF-IDF keywords, bigrams, and trigrams
- **POS Analysis**: Part-of-speech distribution and statistics
- **Keywords & Topics**: Main themes and important terms
- **Preprocessing Details**: Cleaned text and tokenization results

## Requirements

- Python 3.8+
- 4GB+ RAM recommended (for transformer models)
- Internet connection (for downloading models on first run)

## License

This project is provided by RSK World for educational and development purposes.

## Support

For support, contact:
- **Website**: https://rskworld.in
- **Email**: help@rskworld.in
- **Phone**: +91 93305 39277

---

**Developed by RSK World - 2026**

Theme Settings

Color Scheme

Display Options

Font Size

README.md