Jul 13, 2025
Here's the thing: I can't sit still for three hours.
A friend sent me this Andrew Huberman podcast last weekend. "You'll love this," they said. And they were probably right. But three hours? I tried. Really tried. Made it about 20 minutes before I started cleaning my desk. Then checking email. Then I lost track completely.
Sound familiar?
Let's talk about what's actually happening here. Long-form content is everywhere now. Podcasts routinely run 2-3 hours. The content is often incredible. Deep conversations, real insights, the kind of stuff that changes how you think.
But the format? It's broken for people like me.
Think about it:
I kept thinking: what if I could just... read it?
So I did what any reasonable programmer would do. I spent a weekend building a tool instead of just listening to the podcast.
I call it video-to-transcript. Here's what it does:
1. Takes any YouTube URL
2. Downloads the video (thanks, yt-dlp)
3. Extracts just the audio (ffmpeg)
4. Sends it to Deepgram for transcription
5. Figures out who's actually speaking (not just "Speaker 0")
6. Generates a beautiful EPUB for your e-reader (pandoc)
One command. That's it. Just run this with any YouTube URL and wait a few minutes.
Let me share the bits that made me smile while building this.
It's smart about caching. Download a 3-hour video once, and it never downloads again. Already transcribed something? It remembers. This matters when you're tweaking the output format for the fifteenth time at 1am.
It actually knows who's talking. Deepgram tells you someone is speaking, but not who. So I added a bit of AI magic. The tool reads the video description and figures out "Oh, Speaker 0 is Andrew, Speaker 1 is Lori." Simple, but it transforms the reading experience.
Everything has predictable names. No random file soup. Everything is named after the video ID. You always know where things are.
That 3-hour Huberman podcast? It's now a 50,000-word book on my Kindle.
The transcript reads like this:
Clean. Readable. Perfect for highlighting.
Now I can:
The tool works great for my use case. But I keep thinking about what else it could do:
Could it detect chapter breaks automatically? What about podcasts with three or four speakers? Integration with Readwise or Kindle highlights?
For now though, I'm happy. I've got a stack of three-hour podcasts waiting for me. Except now they're books.
Sometimes the best interface for consuming information is still just plain text. Who knew?
Now if you'll excuse me, I have a 3-hour podcast to read. Finally.