Question 1

What is the difference between sentence-level and word-level timestamps?

Accepted Answer

Sentence-level: one timestamp per line of text (usually a sentence). Word-level: one timestamp per word. Sentence-level is readable and good for citation, podcasting, and most video work. Word-level is for music alignment, karaoke videos, and per-word caption animations.

Question 2

How precise are the timestamps?

Accepted Answer

Whisper outputs timestamps in milliseconds. They line up correctly at every common video frame rate (24, 25, 29.97, 30, 50, 60 fps) without offset.

Question 3

Will timestamps drift over a long file?

Accepted Answer

Rare. Whisper aligns timestamps to the actual audio, so they stay accurate even for 60-minute files. Sub-second drift can happen on the last few segments of very long files. If you notice it, adjust manually.

Question 4

Can I get a TXT file with inline timestamps, like [00:01:23] before each line?

Accepted Answer

Yes. Download as TXT and we include sentence-level timestamps inline. Format: [00:01:23] Sentence text here.

Question 5

Does the SRT include timestamps?

Accepted Answer

Yes, that is the entire point of SRT format. Each subtitle entry has a start and end timestamp.

Question 6

How does this compare to YouTube auto-captions with timestamps?

Accepted Answer

YouTube auto-captions have timestamps but no punctuation and lower accuracy. Ours have full punctuation, better accuracy, and standard SRT output that works in any video editor.

Question 7

Can I jump to a specific timestamp in the audio from the transcript?

Accepted Answer

In our result view, click any timestamp to seek the audio player to that moment. After download, you would need a separate audio player to do this.

Question 8

Will the timestamps work in Premiere or DaVinci Resolve?

Accepted Answer

Yes. Import the SRT into the timeline. Captions appear at the correct moments automatically.

Question 9

What languages are supported for timestamped transcription?

Accepted Answer

The same 50+ languages as plain transcription. Timestamps come automatically with every transcript regardless of language.

Question 10

Is the audio stored?

Accepted Answer

No. The file streams to the transcription provider and is discarded after processing.

Question 11

Can I use word-level timestamps to make a karaoke video?

Accepted Answer

Yes, but you will need video software that can render per-word highlighting from a JSON or SRT format. Some tools (Premiere, After Effects, specialized karaoke software) support this directly.

Question 12

How long does it take to generate timestamped transcripts?

Accepted Answer

The same as plain transcription, about 1 to 2 percent of audio length. Timestamps come automatically, no extra processing time.

Timestamped Transcription
Free Time-Coded Transcripts

How it works

Drop the file

AI transcribes and timestamps

Pick your granularity and download

Why Mictoo for timestamped transcription

Timestamps to the millisecond

Sentence-level by default, word-level when you need it

Free

SRT export for video workflows

Inline timestamps in TXT for citation

No file is stored

What people use timestamped transcripts for

Journalism and citation

Podcast chapter markers

Video editing rough cuts

Academic research and qualitative coding

Music alignment for sing-along videos

Pro tips for timestamped transcription

Sentence-level timestamps work for 95 percent of use cases

Word-level timestamps blow up file size and complexity

For podcasts, generate chapter markers from natural breaks

For journalism, save the timestamp with every quote you might use

SRT timestamps are zero-padded, TXT timestamps are not

For video editing, the timestamp in our SRT lines up against the audio in the original file

Timestamps drift on bad audio

Frequently asked questions

Ready to transcribe?

Timestamped TranscriptionFree Time-Coded Transcripts