Timestamped Transcription
Free Time-Coded Transcripts
Get accurate timestamps for every line or word in your audio. Jump back to exact moments, cite specific quotes, build chapter markers. Free, no signup.
Drop your file here
or click to browse
MP3 · MP4 · WAV · M4A · OGG · WEBM · FLAC · Max 25MB · Max 30 min (60 min · Sign in)
How it works
Drop the file
MP3, M4A, MP4, WAV, FLAC, OGG, WEBM, AAC. We handle audio and video formats.
AI transcribes and timestamps
Whisper large-v3 generates the transcript with millisecond-precision timestamps for each segment (and optionally each word).
Pick your granularity and download
Choose sentence-level timestamps (most common) or word-level (for precise alignment work). Download as TXT with timestamps inline, SRT for subtitle workflows, or copy to clipboard.
Why Mictoo for timestamped transcription
Timestamps to the millisecond
Whisper outputs timestamps with millisecond precision. More precise than any common video frame rate, more than enough for citation work.
Sentence-level by default, word-level when you need it
Sentence-level keeps transcripts readable. Word-level is overkill for note-taking but essential for video editing and music alignment.
Free
No per-minute meter. No "timestamps cost extra" tier. Same price as plain transcription (free).
SRT export for video workflows
Timestamps in SRT format work directly in Premiere, DaVinci, CapCut, and YouTube Studio as subtitle tracks.
Inline timestamps in TXT for citation
Plain text with [00:01:23] markers at the start of each segment. Easy to paste into research notes, blog posts, or journalism drafts.
No file is stored
The audio streams to the transcription provider and is discarded. Nothing kept on our servers.
What people use timestamped transcripts for
Journalism and citation
Quoting a source from an interview? Put the timestamp next to the quote in your notes. When the editor or fact-checker asks "where did they say that exactly", you have the answer in two seconds.
Podcast chapter markers
Generate the transcript, scan for natural section breaks, copy the timestamps into your podcast host chapter feature. Modern players show chapters in the playback bar.
Video editing rough cuts
Get the transcript, mark the lines you want to keep, find them on the timeline by timestamp. "Paper editing" is much faster than scrubbing.
Academic research and qualitative coding
Researchers using NVivo, Atlas.ti, or MAXQDA tag transcript segments with codes. Timestamps let you jump back to the audio for the exact moment when coding ambiguous passages.
Music alignment for sing-along videos
Word-level timestamps for karaoke-style or lyric-video projects. Each word lights up at the exact moment it is sung.
Pro tips for timestamped transcription
Sentence-level timestamps work for 95 percent of use cases
Unless you are doing music alignment or per-word video captioning, sentence-level is what you want. More readable, easier to edit.
Word-level timestamps blow up file size and complexity
A word-level SRT for a 30-minute talk has thousands of entries. Use it only when you actually need per-word precision.
For podcasts, generate chapter markers from natural breaks
Look at the transcript for topic transitions, agenda changes, or guest intros. Copy those timestamps into your podcast host as chapter markers.
For journalism, save the timestamp with every quote you might use
Future you, 3 weeks later, will not remember which interview a quote came from, let alone where in the interview. The timestamp solves that.
SRT timestamps are zero-padded, TXT timestamps are not
SRT uses 00:01:23,456. TXT typically uses [1:23]. If you are pasting into a CMS that expects one format, convert before pasting.
For video editing, the timestamp in our SRT lines up against the audio in the original file
If you re-export your video at a different frame rate, the timestamps still match because they are in absolute time (milliseconds), not frames.
Timestamps drift on bad audio
When Whisper hallucinates words during music or silence, the timestamps for those phantom words are best-guess. Real speech timestamps remain accurate. Just trust speech sections, ignore music sections.
Frequently asked questions
What is the difference between sentence-level and word-level timestamps?
Sentence-level: one timestamp per line of text (usually a sentence). Word-level: one timestamp per word. Sentence-level is readable and good for citation, podcasting, and most video work. Word-level is for music alignment, karaoke videos, and per-word caption animations.
How precise are the timestamps?
Whisper outputs timestamps in milliseconds. They line up correctly at every common video frame rate (24, 25, 29.97, 30, 50, 60 fps) without offset.
Will timestamps drift over a long file?
Rare. Whisper aligns timestamps to the actual audio, so they stay accurate even for 60-minute files. Sub-second drift can happen on the last few segments of very long files. If you notice it, adjust manually.
Can I get a TXT file with inline timestamps, like [00:01:23] before each line?
Yes. Download as TXT and we include sentence-level timestamps inline. Format: [00:01:23] Sentence text here.
Does the SRT include timestamps?
Yes, that is the entire point of SRT format. Each subtitle entry has a start and end timestamp.
How does this compare to YouTube auto-captions with timestamps?
YouTube auto-captions have timestamps but no punctuation and lower accuracy. Ours have full punctuation, better accuracy, and standard SRT output that works in any video editor.
Can I jump to a specific timestamp in the audio from the transcript?
In our result view, click any timestamp to seek the audio player to that moment. After download, you would need a separate audio player to do this.
Will the timestamps work in Premiere or DaVinci Resolve?
Yes. Import the SRT into the timeline. Captions appear at the correct moments automatically.
What languages are supported for timestamped transcription?
The same 50+ languages as plain transcription. Timestamps come automatically with every transcript regardless of language.
Is the audio stored?
No. The file streams to the transcription provider and is discarded after processing.
Can I use word-level timestamps to make a karaoke video?
Yes, but you will need video software that can render per-word highlighting from a JSON or SRT format. Some tools (Premiere, After Effects, specialized karaoke software) support this directly.
How long does it take to generate timestamped transcripts?
The same as plain transcription, about 1 to 2 percent of audio length. Timestamps come automatically, no extra processing time.
Ready to transcribe?
Scroll up and drop your file. Transcript ready in about a minute.
↑ Back to the uploader