help@rskworld.in +91 93305 39277
RSK World
  • Home
  • Development
    • Web Development
    • Mobile Apps
    • Software
    • Games
    • Project
  • Technologies
    • Data Science
    • AI Development
    • Cloud Development
    • Blockchain
    • Cyber Security
    • Dev Tools
    • Testing Tools
  • About
  • Contact

Theme Settings

Color Scheme
Display Options
Font Size
100%
Back to Project
RSK World
nlp-text-analysis-bot
RSK World
nlp-text-analysis-bot
NLP Text Analysis Bot - Python + NLP + Flask + Machine Learning + Text Analysis + AI
nlp-text-analysis-bot
  • static
  • templates
  • .gitignore393 B
  • ADVANCED_FEATURES.md5.4 KB
  • CHANGELOG.md1.3 KB
  • FINAL_CHECK.md4.6 KB
  • GITHUB_RELEASE_INSTRUCTIONS.md4.1 KB
  • LICENSE1.2 KB
  • PROJECT_INFO.md2.7 KB
  • PROJECT_STATUS.md4 KB
  • QUICKSTART.md3.1 KB
  • README.md5.8 KB
  • RELEASE_NOTES.md3.8 KB
  • advanced_keywords.py3.9 KB
  • app.py3 KB
  • config.py668 B
  • emotion_detection.py4.3 KB
  • entity_recognition.py3 KB
  • example_usage.py2.7 KB
  • install.bat853 B
  • install.sh808 B
  • language_detection.py2.7 KB
  • nlp_pipeline.py7.1 KB
  • pos_tagging.py2.9 KB
  • readability_analysis.py3.5 KB
  • requirements.txt334 B
  • semantic_understanding.py4 KB
  • sentiment_analysis.py3.9 KB
  • setup.py1.4 KB
  • test_analysis.py2.5 KB
  • text_classification.py5 KB
  • text_preprocessing.py4.2 KB
  • text_similarity.py4.1 KB
  • text_summarization.py5 KB
  • validate_project.py4.2 KB
text_similarity.py
text_similarity.py
Raw Download
Find: Go to:
"""
Text Similarity Module
Calculates similarity between texts

Developer: RSK World
Website: https://rskworld.in
Email: help@rskworld.in
Phone: +91 93305 39277
Year: 2026
"""

import spacy
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity as sklearn_cosine_similarity
from text_preprocessing import TextPreprocessor

class TextSimilarityCalculator:
    """
    Text similarity calculation class
    Developer: RSK World - https://rskworld.in
    """
    
    def __init__(self):
        """Initialize similarity calculator"""
        try:
            self.nlp = spacy.load("en_core_web_sm")
        except OSError:
            print("Warning: spaCy model not found for similarity calculation")
            self.nlp = None
        
        self.preprocessor = TextPreprocessor()
        self.tfidf_vectorizer = TfidfVectorizer()
    
    def cosine_similarity(self, text1, text2):
        """
        Calculate cosine similarity between two texts
        
        Args:
            text1 (str): First text
            text2 (str): Second text
            
        Returns:
            float: Similarity score (0-1)
        """
        try:
            # Preprocess texts
            preprocessed1 = self.preprocessor.preprocess(text1)
            preprocessed2 = self.preprocessor.preprocess(text2)
            
            # Calculate TF-IDF vectors
            tfidf_matrix = self.tfidf_vectorizer.fit_transform([
                preprocessed1['cleaned_text'],
                preprocessed2['cleaned_text']
            ])
            
            # Calculate cosine similarity
            similarity = sklearn_cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])[0][0]
            
            return float(similarity)
        except Exception as e:
            print(f"Error in cosine similarity: {e}")
            return 0.0
    
    def semantic_similarity(self, text1, text2):
        """
        Calculate semantic similarity using spaCy
        
        Args:
            text1 (str): First text
            text2 (str): Second text
            
        Returns:
            float: Similarity score (0-1)
        """
        if self.nlp is None:
            return self.cosine_similarity(text1, text2)
        
        try:
            doc1 = self.nlp(text1)
            doc2 = self.nlp(text2)
            
            return float(doc1.similarity(doc2))
        except Exception as e:
            print(f"Error in semantic similarity: {e}")
            return 0.0
    
    def jaccard_similarity(self, text1, text2):
        """
        Calculate Jaccard similarity between texts
        
        Args:
            text1 (str): First text
            text2 (str): Second text
            
        Returns:
            float: Similarity score (0-1)
        """
        try:
            preprocessed1 = self.preprocessor.preprocess(text1)
            preprocessed2 = self.preprocessor.preprocess(text2)
            
            set1 = set(preprocessed1['filtered_tokens'])
            set2 = set(preprocessed2['filtered_tokens'])
            
            intersection = len(set1.intersection(set2))
            union = len(set1.union(set2))
            
            if union == 0:
                return 0.0
            
            return float(intersection / union)
        except Exception as e:
            print(f"Error in Jaccard similarity: {e}")
            return 0.0
    
    def calculate_all_similarities(self, text1, text2):
        """
        Calculate all similarity metrics
        
        Args:
            text1 (str): First text
            text2 (str): Second text
            
        Returns:
            dict: All similarity scores
        """
        return {
            'cosine_similarity': self.cosine_similarity(text1, text2),
            'semantic_similarity': self.semantic_similarity(text1, text2),
            'jaccard_similarity': self.jaccard_similarity(text1, text2),
            'average_similarity': (
                self.cosine_similarity(text1, text2) +
                self.semantic_similarity(text1, text2) +
                self.jaccard_similarity(text1, text2)
            ) / 3
        }

138 lines•4.1 KB
python

About RSK World

Founded by Molla Samser, with Designer & Tester Rima Khatun, RSK World is your one-stop destination for free programming resources, source code, and development tools.

Founder: Molla Samser
Designer & Tester: Rima Khatun

Development

  • Game Development
  • Web Development
  • Mobile Development
  • AI Development
  • Development Tools

Legal

  • Terms & Conditions
  • Privacy Policy
  • Disclaimer

Contact Info

Nutanhat, Mongolkote
Purba Burdwan, West Bengal
India, 713147

+91 93305 39277

hello@rskworld.in
support@rskworld.in

© 2026 RSK World. All rights reserved.

Content used for educational purposes only. View Disclaimer