Podcast Transcription
Free Podcast Transcript Generator
Turn any episode into clean text. Drop your MP3, MP4, or M4A file and get a transcript in seconds. No account, no per-minute fee.
Drop your file here
or click to browse
MP3 · MP4 · WAV · M4A · OGG · WEBM · FLAC · Max 25MB · Max 30 min (60 min · Sign in)
How it works
Drop the episode
Upload your MP3, MP4, M4A, WAV, or FLAC. Cloud Recording from Riverside, exports from Descript, raw Zoom audio, anchor.fm files. All fine.
AI does the work
Whisper large-v3 runs on our backend and converts the audio to text. Most 30-minute episodes finish in under a minute.
Copy, download, or edit
Grab the text as plain TXT, SRT for subtitles, or paste it straight into your show notes editor. Tweak wrong words inline before exporting.
Why podcasters use Mictoo
Long episodes are fine
Up to 60 minutes per file once you sign in (free). For a two-part split, we keep timestamps relative so you can stitch the SRT files back together without doing math.
Accents and crosstalk hold up
Whisper large-v3 is the strongest open speech model we know of for non-native English. If you co-host with someone in Berlin or São Paulo, the transcript will not turn into mush.
Music beds do not break it
Our pipeline runs voice activity detection before transcription. Long instrumental intros get tagged as silence, not invented words. Sponsor reads with bed music underneath still come through clean.
No subscription
A lot of podcasters transcribe one or two episodes a month. Paying 15 dollars for a monthly seat is wasteful for that. Drop the file when you need it. We make money on ads and a future Pro tier for heavy users.
Your audio is not stored
Files stream straight to the speech provider, get transcribed, then go away. We do not keep your episodes, and the providers we use (Groq, OpenAI) do not train on API data.
AI summary for free after every episode
Each transcript comes with a GPT-generated summary and key points — exactly the raw material you need for show notes and chapter markers. Competitors typically charge 15-20 dollars a month for the same feature. We don't.
What podcasters actually do with the transcript
Show notes and blog posts
Paste the transcript into your CMS, mark the chapters, drop in links, ship the blog post. A 45-minute episode gives you 6000 to 8000 words of source material. Three or four lightly edited blog posts out of one recording.
Episode quote cards for social
Scan the transcript for the line that landed and turn it into a graphic. Much faster than scrubbing the audio file at 1.5x looking for the timestamp.
Searchable archive for your back catalog
Run your old episodes through batch transcription and you suddenly have a Ctrl+F across years of conversations. Useful when a guest comes back and you want to remember what you talked about last time.
YouTube auto-captions replacement
YouTube's auto-captions are mediocre for podcasts with two voices and any music. Upload a Mictoo SRT instead. Better punctuation, fewer wrong words, better accessibility.
Accessibility transcript link
A lot of podcasters add a "Read the transcript" link in their RSS show notes. That helps deaf and hard-of-hearing listeners, and it helps search engines find your content.
Pro tips for cleaner podcast transcripts
Strip the music intro and outro first
Whisper is good at ignoring music, but a 90-second instrumental cold open sometimes triggers phantom words. If your intro is the same every episode, just trim the first 1:30 in Audacity before upload. Saves a few minutes of cleanup later.
Export at 64 kbps mono if your raw file is huge
Voice does not need stereo, and 64 kbps is plenty for speech. A two-hour episode at 64 kbps mono is around 55 MB. That fits in the 60 MB signed-in limit without splitting. Use ffmpeg: ffmpeg -i episode.wav -ac 1 -b:a 64k episode.mp3.
For interviews with bad guest audio, transcribe each track separately
If you record on Riverside or SquadCast and have separate audio per speaker, upload each track on its own. Whisper has an easier time with one voice at a time. You get cleaner attribution, fewer dropped words during crosstalk.
Set the language explicitly for short episodes
Auto-detect samples the first chunk of audio. If you open with a one-word cold open or laugh, detection can fall back to the wrong language. For anything under 5 minutes, pick the language manually.
Punctuation will be imperfect. Fix the first 10 lines, then leave the rest
Whisper gets most punctuation right, but it sometimes misses semicolons and quoted speech. For show notes, the first 10 lines matter (people skim). Past that, ship it.
Use SRT export even if you do not need subtitles
SRT gives you timestamps every few seconds. Even if you are pasting into a blog post, those timestamps help you jump back to the audio to verify a quote. We have a free SRT generator on this site.
Frequently asked questions
Can I transcribe a 2-hour episode?
Yes, but split it first. Our per-file cap is 30 minutes free, or 60 minutes once you sign in. For a 2-hour episode, split into two or three parts and transcribe each. Our audio splitter guide walks through how to do it in 60 seconds with ffmpeg or Audacity.
Do I get speaker labels (host vs guest)?
Not automatically right now. Whisper itself does not do speaker diarization. If you have separate tracks per speaker (common in Riverside, SquadCast, Zencastr), upload each one separately and label them yourself in the final transcript. We are looking at adding diarization, but only when we can do it well.
How does it handle accents and bilingual podcasts?
Whisper large-v3 was trained on 680,000 hours of multilingual audio. Non-native English, regional accents, and code-switching all work better than smaller models. For a podcast that switches between English and Spanish mid-episode, pick "Auto-detect" as the language and Whisper will follow along.
What audio formats do you support for podcasts?
MP3, M4A, WAV, FLAC, OGG, WEBM, and AAC. Plus video files like MP4 and MOV (we extract the audio). If your podcast host gives you a download in any of these, you are set. AIFF and ALAC are not supported directly, convert to WAV first.
Is there a per-episode word limit?
No word limit. The only limit is the file size (25 MB free, 60 MB signed in) and duration (30 min free, 60 min signed in). A typical 60-minute episode produces around 9000 to 11000 words.
How accurate is podcast transcription compared to human transcribers?
For clean studio audio, Whisper large-v3 typically lands at 5 to 10 percent word error rate. Human transcribers are around 3 to 5 percent. For most show notes and blog repurposing work, AI is good enough. For court testimony or academic citation, hire a human.
Will my episode be stored on your servers?
No. We pipe the audio straight to the transcription provider (Groq, with OpenAI as backup). They process it and we discard it. We never write your podcast file to our database or our object storage.
Can I download as SRT for subtitles?
Yes. After transcription, hit the SRT download button. Use it directly in YouTube Studio, Premiere Pro, DaVinci Resolve, or any video editor.
Do you charge per minute?
No. Transcription on Mictoo is free. We are funded by ads at the moment, with a paid Pro tier coming later for users who need longer files or batch uploads.
My episode has explicit language. Will it get censored?
No filtering. The transcript reflects exactly what was said. If you want to edit profanity for a clean version, do that yourself after download.
Can I edit the transcript before downloading?
Yes. There is a basic editor in the result view. Fix any wrong words, then download the edited version as TXT or SRT.
Is podcast transcription on Mictoo compliant with GDPR?
We do not store the audio or the transcript on our servers after you leave the page. We are based in Europe, and our providers (Groq US, OpenAI US) have DPAs in place. For specific compliance questions, see our Privacy Policy or email info@mictoo.com.
Ready to transcribe?
Scroll up and drop your file. Transcript ready in about a minute.
↑ Back to the uploaderMore transcription tools