CAMERON WESTLAND

From Three-Hour Podcasts to Kindle-Ready Text

Jul 13, 2025

Here's the thing: I can't sit still for three hours.

A friend sent me this Andrew Huberman podcast last weekend. "You'll love this," they said. And they were probably right. But three hours? I tried. Really tried. Made it about 20 minutes before I started cleaning my desk. Then checking email. Then I lost track completely.

Sound familiar?

The Real Problem

Let's talk about what's actually happening here. Long-form content is everywhere now. Podcasts routinely run 2-3 hours. The content is often incredible. Deep conversations, real insights, the kind of stuff that changes how you think.

But the format? It's broken for people like me.

Think about it:

  • You can't skim a podcast
  • You can't highlight the good parts
  • You can't read it at 2am when you can't sleep
  • You definitely can't read it on your Kindle at the beach

I kept thinking: what if I could just... read it?

Building My Way Out

So I did what any reasonable programmer would do. I spent a weekend building a tool instead of just listening to the podcast.

I call it video-to-transcript. Here's what it does:

1. Takes any YouTube URL

2. Downloads the video (thanks, yt-dlp)

3. Extracts just the audio (ffmpeg)

4. Sends it to Deepgram for transcription

5. Figures out who's actually speaking (not just "Speaker 0")

6. Generates a beautiful EPUB for your e-reader (pandoc)

One command. That's it. Just run this with any YouTube URL and wait a few minutes.

The Magic Parts

Let me share the bits that made me smile while building this.

It's smart about caching. Download a 3-hour video once, and it never downloads again. Already transcribed something? It remembers. This matters when you're tweaking the output format for the fifteenth time at 1am.

It actually knows who's talking. Deepgram tells you someone is speaking, but not who. So I added a bit of AI magic. The tool reads the video description and figures out "Oh, Speaker 0 is Andrew, Speaker 1 is Lori." Simple, but it transforms the reading experience.

Everything has predictable names. No random file soup. Everything is named after the video ID. You always know where things are.

What It Actually Looks Like

That 3-hour Huberman podcast? It's now a 50,000-word book on my Kindle.

The transcript reads like this:

Clean. Readable. Perfect for highlighting.

Now I can:

  • Read at lunch
  • Highlight the interesting parts
  • Jump around between topics
  • Actually finish the thing

What's Next?

The tool works great for my use case. But I keep thinking about what else it could do:

Could it detect chapter breaks automatically? What about podcasts with three or four speakers? Integration with Readwise or Kindle highlights?

For now though, I'm happy. I've got a stack of three-hour podcasts waiting for me. Except now they're books.

Sometimes the best interface for consuming information is still just plain text. Who knew?

Now if you'll excuse me, I have a 3-hour podcast to read. Finally.