help@rskworld.in +91 93305 39277
RSK World
  • Home
  • Development
    • Web Development
    • Mobile Apps
    • Software
    • Games
    • Project
  • Technologies
    • Data Science
    • AI Development
    • Cloud Development
    • Blockchain
    • Cyber Security
    • Dev Tools
    • Testing Tools
  • About
  • Contact

Theme Settings

Color Scheme
Display Options
Font Size
100%
Back to Project
RSK World
nlp-text-analysis-bot
RSK World
nlp-text-analysis-bot
NLP Text Analysis Bot - Python + NLP + Flask + Machine Learning + Text Analysis + AI
nlp-text-analysis-bot
  • static
  • templates
  • .gitignore393 B
  • ADVANCED_FEATURES.md5.4 KB
  • CHANGELOG.md1.3 KB
  • FINAL_CHECK.md4.6 KB
  • GITHUB_RELEASE_INSTRUCTIONS.md4.1 KB
  • LICENSE1.2 KB
  • PROJECT_INFO.md2.7 KB
  • PROJECT_STATUS.md4 KB
  • QUICKSTART.md3.1 KB
  • README.md5.8 KB
  • RELEASE_NOTES.md3.8 KB
  • advanced_keywords.py3.9 KB
  • app.py3 KB
  • config.py668 B
  • emotion_detection.py4.3 KB
  • entity_recognition.py3 KB
  • example_usage.py2.7 KB
  • install.bat853 B
  • install.sh808 B
  • language_detection.py2.7 KB
  • nlp_pipeline.py7.1 KB
  • pos_tagging.py2.9 KB
  • readability_analysis.py3.5 KB
  • requirements.txt334 B
  • semantic_understanding.py4 KB
  • sentiment_analysis.py3.9 KB
  • setup.py1.4 KB
  • test_analysis.py2.5 KB
  • text_classification.py5 KB
  • text_preprocessing.py4.2 KB
  • text_similarity.py4.1 KB
  • text_summarization.py5 KB
  • validate_project.py4.2 KB
advanced_keywords.py
advanced_keywords.py
Raw Download
Find: Go to:
"""
Advanced Keyword Extraction Module
Uses TF-IDF and other advanced methods for keyword extraction

Developer: RSK World
Website: https://rskworld.in
Email: help@rskworld.in
Phone: +91 93305 39277
Year: 2026
"""

from sklearn.feature_extraction.text import TfidfVectorizer
from collections import Counter
import re
from text_preprocessing import TextPreprocessor

class AdvancedKeywordExtractor:
    """
    Advanced keyword extraction using TF-IDF
    Developer: RSK World - https://rskworld.in
    """
    
    def __init__(self):
        """Initialize keyword extractor"""
        self.preprocessor = TextPreprocessor()
        self.tfidf_vectorizer = TfidfVectorizer(
            max_features=100,
            stop_words='english',
            ngram_range=(1, 2),  # Unigrams and bigrams
            min_df=1
        )
    
    def extract_tfidf_keywords(self, text, top_n=10):
        """
        Extract keywords using TF-IDF
        
        Args:
            text (str): Input text
            top_n (int): Number of top keywords to return
            
        Returns:
            list: List of top keywords with scores
        """
        try:
            # Preprocess text
            preprocessed = self.preprocessor.preprocess(text)
            cleaned_text = preprocessed['cleaned_text']
            
            if not cleaned_text or len(cleaned_text.split()) < 2:
                return []
            
            # Calculate TF-IDF
            tfidf_matrix = self.tfidf_vectorizer.fit_transform([cleaned_text])
            feature_names = self.tfidf_vectorizer.get_feature_names_out()
            
            # Get scores
            scores = tfidf_matrix.toarray()[0]
            
            # Create keyword-score pairs
            keywords = [
                {'keyword': feature_names[i], 'score': float(scores[i])}
                for i in range(len(feature_names))
            ]
            
            # Sort by score
            keywords.sort(key=lambda x: x['score'], reverse=True)
            
            return keywords[:top_n]
        except Exception as e:
            print(f"Error in TF-IDF extraction: {e}")
            return []
    
    def extract_ngrams(self, text, n=2, top_n=10):
        """
        Extract n-grams from text
        
        Args:
            text (str): Input text
            n (int): N-gram size (1=unigrams, 2=bigrams, etc.)
            top_n (int): Number of top n-grams to return
            
        Returns:
            list: List of top n-grams
        """
        preprocessed = self.preprocessor.preprocess(text)
        tokens = preprocessed['filtered_tokens']
        
        if len(tokens) < n:
            return []
        
        # Generate n-grams
        ngrams = []
        for i in range(len(tokens) - n + 1):
            ngram = ' '.join(tokens[i:i+n])
            ngrams.append(ngram)
        
        # Count frequency
        ngram_freq = Counter(ngrams)
        
        return [
            {'ngram': ngram, 'frequency': count}
            for ngram, count in ngram_freq.most_common(top_n)
        ]
    
    def extract_key_phrases(self, text, top_n=10):
        """
        Extract key phrases using multiple methods
        
        Args:
            text (str): Input text
            top_n (int): Number of phrases to return
            
        Returns:
            dict: Key phrases from different methods
        """
        tfidf_keywords = self.extract_tfidf_keywords(text, top_n)
        bigrams = self.extract_ngrams(text, n=2, top_n=top_n)
        trigrams = self.extract_ngrams(text, n=3, top_n=top_n)
        
        return {
            'tfidf_keywords': tfidf_keywords,
            'bigrams': bigrams,
            'trigrams': trigrams,
            'top_keywords': [kw['keyword'] for kw in tfidf_keywords[:5]]
        }

127 lines•3.9 KB
python

About RSK World

Founded by Molla Samser, with Designer & Tester Rima Khatun, RSK World is your one-stop destination for free programming resources, source code, and development tools.

Founder: Molla Samser
Designer & Tester: Rima Khatun

Development

  • Game Development
  • Web Development
  • Mobile Development
  • AI Development
  • Development Tools

Legal

  • Terms & Conditions
  • Privacy Policy
  • Disclaimer

Contact Info

Nutanhat, Mongolkote
Purba Burdwan, West Bengal
India, 713147

+91 93305 39277

hello@rskworld.in
support@rskworld.in

© 2026 RSK World. All rights reserved.

Content used for educational purposes only. View Disclaimer