Overview
This dataset includes labeled video clips across multiple categories for video classification tasks. Perfect for video understanding, video categorization, and video deep learning applications.
Core Features
Multiple Categories
Videos organized across multiple categories for comprehensive classification tasks.
Labeled Clips
All video clips are properly labeled and organized for easy access.
Train/Test Sets
Pre-organized training and test sets for immediate use in machine learning projects.
Frame Extraction
Built-in utilities for extracting frames from videos at specified intervals.
Ready for Models
Preprocessed and ready to use with popular video classification models.
OpenCV & FFmpeg
Comprehensive tools using OpenCV and FFmpeg for video processing.
🚀 Advanced Features
Video Data Augmentation
Automatically augment video frames with flipping, rotation, brightness, and contrast adjustments to increase dataset diversity and improve model generalization.
Intelligent Key Frame Extraction
Extract key frames using multiple methods: uniform sampling, scene change detection, or random selection for optimal feature representation.
Batch Processing
Process multiple videos efficiently in batches with memory optimization and progress tracking for large-scale datasets.
Video Quality Analysis
Automatically analyze video quality metrics including sharpness, brightness, resolution, and generate quality scores for dataset curation.
Video Summary Generation
Create concise summary videos from long videos by extracting and combining key frames for quick preview and analysis.
Comprehensive Dataset Reports
Generate detailed analytics reports with statistics, category distributions, and quality metrics for dataset management.
✨ Unique Features
Duplicate Video Detection
Automatically detect duplicate or similar videos using perceptual hashing to maintain dataset quality and avoid redundancy.
Smart Video Splitting
Intelligently split long videos into shorter segments with configurable duration and overlap for better training data preparation.
Auto Thumbnail Generation
Automatically generate high-quality thumbnails from videos using multiple methods: middle frame, first frame, or best quality frame selection.
Video Montage Creation
Create stunning montage videos from multiple sources arranged in customizable grid layouts for visualization and presentation.
Dataset Balance Analysis
Analyze and get recommendations for dataset balance across categories to ensure optimal training conditions.
Auto-Categorization (ML-Ready)
Framework for automatic video categorization using machine learning models with easy integration points.
Technologies
📖 How to Use - Step by Step Guide
Step 1: Install Dependencies
First, install all required Python packages:
pip install -r requirements.txt
This installs OpenCV, NumPy, and other essential libraries for video processing.
Step 2: Create Directory Structure
Set up the folder structure for organizing your videos:
python scripts/download_sample_data.py --create-structure
This creates the raw_videos/ directory with category folders (action, comedy, drama, sports, etc.)
Step 3: Add Your Videos
Place your video files in the appropriate category folders:
raw_videos/
├── action/
│ └── your_video.mp4
├── comedy/
│ └── your_video.mp4
└── ...
Tip: You can also use the interactive mode: python scripts/add_videos.py --interactive
Step 4: Organize Dataset
Automatically split videos into train (70%), test (20%), and validation (10%) sets:
python scripts/organize_dataset.py --input raw_videos --output data
This organizes your videos into the data/train/, data/test/, and data/validation/ directories.
Step 5: Process Videos (Optional)
Resize and normalize videos for consistent format:
python scripts/process_videos.py --input raw_videos --output data/train
This ensures all videos have uniform resolution (224x224) and format.
Step 6: Extract Frames (Optional)
Extract frames from videos for frame-based models:
python scripts/extract_frames.py --input data/train --output frames/train
Extracts frames at 1 frame per second (configurable in config.yaml).
Step 7: Verify Dataset
Check your dataset statistics and metadata:
python scripts/create_sample_metadata.py --summary
This generates a comprehensive report of your dataset including video counts per category.
Step 8: Use the Dataset
Start using your dataset in Python:
from utils.dataset_utils import get_videos_by_category
# Get videos by category
videos = get_videos_by_category('data/train')
print(videos)
See examples/video_loader_example.py for complete usage examples.
Quick Start Commands
For experienced users, here's a quick reference:
# Install dependencies
pip install -r requirements.txt
# Create structure
python scripts/download_sample_data.py --create-structure
# Organize dataset
python scripts/organize_dataset.py --input raw_videos --output data
# Extract frames
python scripts/extract_frames.py --input data/train --output frames/train
Documentation
For detailed usage instructions, examples, and API documentation, please refer to: