Show HN: PlanOpticon – Extract structured knowledge from video recordings

We built PlanOpticon to solve a problem we kept hitting: hours of recorded meetings, training sessions, and presentations that nobody rewatches. It extracts structured knowledge from video — transcripts, diagrams, action items, key points, and a knowledge graph — into browsable outputs (Markdown, HTML, PDF).

How it works:

  - Extracts frames using change detection (not just every Nth frame), with periodic capture for slow-evolving content like screen shares
  - Filters out webcam/people-only frames automatically via face detection
  - Transcribes audio (OpenAI Whisper API or local Whisper — no API needed)
  - Sends frames to vision models to identify and recreate diagrams as Mermaid code
  - Builds a knowledge graph (entities + relationships) from the transcript
  - Extracts key points, action items, and cross-references between visual and spoken content
  - Generates a structured report with everything linked together

Supports OpenAI, Anthropic, and Gemini as providers — auto-discovers available models and routes each task to the best one. Checkpoint/resume so long analyses survive failures.

  pip install planopticon
  planopticon analyze -i meeting.mp4 -o ./output

Also supports batch processing of entire folders and pulling videos from Google Drive or Dropbox.

Example: We ran it on a 90-minute training session: 122 frames extracted (from thousands of candidates), 6 diagrams recreated, full transcript with speaker diarization, 540-node knowledge graph, and a comprehensive report — all in about 25 minutes.

Python 3.10+, MIT licensed. Docs at https://planopticon.dev.