How it works:
- Extracts frames using change detection (not just every Nth frame), with periodic capture for slow-evolving content like screen shares
- Filters out webcam/people-only frames automatically via face detection
- Transcribes audio (OpenAI Whisper API or local Whisper — no API needed)
- Sends frames to vision models to identify and recreate diagrams as Mermaid code
- Builds a knowledge graph (entities + relationships) from the transcript
- Extracts key points, action items, and cross-references between visual and spoken content
- Generates a structured report with everything linked together
Supports OpenAI, Anthropic, and Gemini as providers — auto-discovers available models and routes each task to the best one. Checkpoint/resume so long analyses survive failures. pip install planopticon
planopticon analyze -i meeting.mp4 -o ./output
Also supports batch processing of entire folders and pulling videos from Google Drive or Dropbox.Example: We ran it on a 90-minute training session: 122 frames extracted (from thousands of candidates), 6 diagrams recreated, full transcript with speaker diarization, 540-node knowledge graph, and a comprehensive report — all in about 25 minutes.
Python 3.10+, MIT licensed. Docs at https://planopticon.dev.