A comprehensive pipeline that automatically processes video content in your Logseq graph, extracting metadata, subtitles, and generating semantic tags to enhance your knowledge management workflow.
The Video Processing Pipeline transforms your Logseq graph by:
- 🔍 Scanning all pages and journal entries for video URLs
- 📹 Extracting video metadata (title, author, duration, platform)
- 📝 Extracting subtitles/transcripts from YouTube videos
- ✨ Enhancing blocks with
{{video URL}}syntax for better display - 🏷️ Analyzing content to generate relevant tags
- 📄 Creating tagged pages that connect videos by topic
- 🔗 Adding source references and timestamps
- YouTube - Full metadata + subtitle extraction
- Vimeo - Metadata via oEmbed API
- TikTok - Basic metadata (limited)
- Twitch - Videos and clips metadata
- Dailymotion - Metadata via oEmbed API
# Basic dry run (safe - no files modified)
python scripts/video_processor_cli.py /path/to/logseq/graph --dry-run
# Full processing
python scripts/video_processor_cli.py /path/to/logseq/graph
# With YouTube API key for enhanced subtitle extraction
python scripts/video_processor_cli.py /path/to/logseq/graph --youtube-api-key YOUR_API_KEY
# Custom configuration
python scripts/video_processor_cli.py /path/to/logseq/graph \
--tag-prefix "topic" \
--max-tags 3 \
--min-subtitle-length 200from logseq_py.pipeline.video_processor import VideoProcessingPipeline
# Configure pipeline
config = {
'dry_run': False, # Set to True for testing
'youtube_api_key': 'YOUR_API_KEY', # Optional but recommended
'tag_prefix': 'video-topic',
'max_tags_per_video': 5,
'backup_enabled': True
}
# Run pipeline
pipeline = VideoProcessingPipeline('/path/to/logseq/graph', config)
result = pipeline.run()
# Check results
if result['success']:
print(f"Enhanced {result['stats']['videos_enhanced']} videos")
print(f"Created {result['stats']['pages_created']} tag pages")| Option | Default | Description |
|---|---|---|
dry_run |
False |
Preview changes without modifying files |
youtube_api_key |
None |
YouTube Data API key for enhanced features |
tag_prefix |
video-topic |
Prefix for generated tag pages |
min_subtitle_length |
100 |
Minimum subtitle length to process |
max_tags_per_video |
5 |
Maximum tags extracted per video |
backup_enabled |
True |
Create backup before processing |
pip install logseq-python # Base functionality# For YouTube subtitle extraction
pip install youtube-transcript-api
# For advanced NLP (future feature)
pip install spacy- Check out this great tutorial: https://www.youtube.com/watch?v=abc123
- Another video: https://vimeo.com/123456- Check out this great tutorial: {{video https://www.youtube.com/watch?v=abc123}}
**Python Programming Fundamentals**
By: Code Academy
Tags: #video-topic-programming #video-topic-education #video-topic-python
- Another video: {{video https://vimeo.com/123456}}
**Design Principles**
By: Creative Studio
Tags: #video-topic-design #video-topic-artThe pipeline creates pages like video-topic-programming.md:
type:: video-topic
tag:: programming
created:: 2024-10-21
video-count:: 3
# Videos tagged with: programming
This page contains all videos related to the topic: **programming**
Found in 3 video(s) from your Logseq graph.
## 1. Python Programming Fundamentals
**Source Page:** [[Daily Notes]]
**Video URL:** https://www.youtube.com/watch?v=abc123
**Processed:** 2024-10-21
### Content Preview
> Welcome to this comprehensive Python tutorial. Today we'll cover variables, functions, loops, and object-oriented programming concepts...
## 2. JavaScript Basics
**Source Page:** [[Learning Resources]]
**Video URL:** https://www.youtube.com/watch?v=def456
**Processed:** 2024-10-21With a YouTube Data API key, you get:
- Video descriptions and full metadata
- Channel information
- View counts, like counts, publish dates
- Complete subtitle/transcript extraction
- Video tags and categories
Getting a YouTube API Key:
- Go to Google Cloud Console
- Create a project or select existing
- Enable YouTube Data API v3
- Create credentials (API key)
- Use with
--youtube-api-keyparameter
The pipeline analyzes video content using multiple methods:
- Topic Detection - Recognizes common video categories
- Keyword Extraction - Identifies frequently mentioned terms
- Entity Recognition - Finds proper nouns and names
- Semantic Ranking - Scores tags by relevance
Categories automatically detected:
- Technology, Science, Education
- Business, Health, Entertainment
- News, Lifestyle, and more
- Automatic Backups - Created before any modifications
- Dry Run Mode - Preview all changes safely
- Error Recovery - Graceful handling of API failures
- Logging - Detailed logs for debugging
- Problem: Videos scattered across notes, hard to find related content
- Solution: Automatic tagging creates topic-based collections
- Result: Easy discovery of all Python tutorials, design videos, etc.
- Problem: Raw URLs don't show video context in Logseq
- Solution: Enhanced blocks show titles, authors, duration
- Result: Rich video previews directly in your notes
- Problem: Difficult to see connections between video content
- Solution: Tag pages reveal content relationships
- Result: Discover learning paths and content clusters
- Problem: Can't search within video content
- Solution: Subtitle extraction makes video content searchable
- Result: Find specific topics within your video library
The pipeline scans:
- ✅ Main pages (
*.mdin root directory) - ✅ Journal entries (
journals/*.md) - ✅ All block levels and nested content
- ✅ Mixed content blocks (text + video URLs)
It preserves:
- ✅ Original block structure and formatting
- ✅ Existing tags and properties
- ✅ Block relationships and hierarchy
- ✅ Non-video content unchanged
| Graph Size | Processing Time | Memory Usage |
|---|---|---|
| Small (100 pages) | 30 seconds | ~50MB |
| Medium (1000 pages) | 5 minutes | ~100MB |
| Large (10000+ pages) | 30+ minutes | ~500MB |
Optimization Tips:
- Use
--dry-runfirst to estimate scope - Process during low-activity periods
- Consider batch processing for huge graphs
- YouTube API key significantly speeds subtitle extraction
- Always backup your graph before processing
- Test with
--dry-runon important data - Start small with a few pages to verify results
- YouTube API has daily quotas (10,000 units/day default)
- Each video processes ~3-5 API units
- Without API key, basic extraction still works
- Rate limiting is handled automatically
- Not all YouTube videos have subtitles
- Auto-generated captions may have errors
- Some videos restrict subtitle access
- Pipeline gracefully handles missing subtitles
"No videos found" but videos exist in graph:
- Check URL formats are supported
- Verify URLs aren't in code blocks or comments
- Enable DEBUG logging to see scanning process
Subtitle extraction fails:
- Install
youtube-transcript-api:pip install youtube-transcript-api - Some videos don't have available subtitles
- Check if video is public and accessible
API key errors:
- Verify API key is correct and active
- Check YouTube Data API v3 is enabled
- Ensure API quotas aren't exceeded
Performance issues:
- Use
--dry-runto estimate scope first - Process smaller batches if graph is huge
- Check available disk space for backups
python scripts/video_processor_cli.py /path/to/graph \
--dry-run --log-level DEBUG --log-file debug.logPlanned features:
- Multi-language support for subtitles
- Advanced NLP with spaCy/transformers
- Timestamp-based tagging for long videos
- Integration with more platforms (Instagram, LinkedIn)
- AI-powered summarization of video content
- Automated playlist generation based on tags
The video processing pipeline is part of the logseq-python project. Contributions welcome:
- Report bugs and suggest improvements
- Add support for new video platforms
- Improve text analysis algorithms
- Enhance documentation and examples
- Video Title Extraction - Core video utilities
- Pipeline Architecture - General pipeline framework
- API Reference - Complete API documentation
Here's what happens when you run the pipeline:
🚀 Starting Logseq Video Processing Pipeline
📁 Graph: /Users/you/logseq-graph
🔑 YouTube API key provided - Enhanced subtitle extraction enabled
🏷️ Tag prefix: video-topic
📊 Max tags per video: 5
2024-10-21 10:00:01 - INFO - Starting video processing pipeline
2024-10-21 10:00:02 - INFO - Found 45 blocks with video content
2024-10-21 10:00:03 - INFO - Processing video: https://www.youtube.com/watch?v=abc123
2024-10-21 10:00:04 - INFO - Extracted 2,340 chars of subtitles and 4 tags
2024-10-21 10:00:05 - INFO - Enhanced block with 1 videos
2024-10-21 10:00:06 - INFO - Created tag page: video-topic-programming
...
============================================================
📹 VIDEO PROCESSING REPORT
============================================================
✅ SUCCESS
📁 Graph: /Users/you/logseq-graph
🕐 Time: 2024-10-21 10:05:23
📊 STATISTICS:
• Blocks processed: 1,234
• Videos found: 45
• Videos enhanced: 45
• Subtitles extracted: 28
• Tags created: 12
• Pages created: 12
============================================================
🎉 Processing completed successfully!Your Logseq graph is now enhanced with rich video content, automatic tagging, and discoverable video collections! 🎬