mictoo
SRT · VTT · Free · No signup

Free SRT Generator
For YouTube, Premiere, Final Cut, DaVinci, CapCut

Upload audio or video, get a clean .srt file with Whisper-quality timestamps in seconds. Ready to drop into your YouTube upload, NLE timeline, or web player. VTT also available.

AI summaryTranslate, 28 langsOpenAI Whisper
We'll fetch the video's captions instantly. Free.
or upload a file

Drop your file here

or click to browse

MP3 · MP4 · WAV · M4A · OGG · WEBM · FLAC  ·  Max 25MB  ·  Max 30 min (60 min · Sign in)

Got a bigger file? See how to compress.

Got a longer recording? See how to split.

SRT (SubRip Subtitle) is the most widely supported subtitle format in 2026. YouTube accepts it on every upload. Every major video editor (Premiere, Final Cut, DaVinci Resolve, CapCut, Vegas Pro) imports it as a captions track. Web video players accept either SRT or VTT, and Mictoo generates both from the same transcription.

This page is dedicated to the subtitle workflow: get a clean .srt file as the primary deliverable, with timing tight enough for caption display and formatting that drops straight into the destination without manual cleanup.

If you need the transcript as plain text or a doc, theaudio to text page is the right entry. For video files specifically, seevideo to text. For YouTube URLs, see YouTube to text.

How it works

🎬

Upload audio or video

Audio (MP3, M4A, WAV, FLAC, OGG, AAC) or video (MP4, MOV, WebM, MKV, AVI). Or paste a YouTube URL. We extract the audio track if it is a video file. Free tier up to 60 MB per upload.

⏱️

Whisper produces timestamped segments

Word-level timestamps grouped into caption-sized segments (2-7 seconds each, max 84 characters per line). Cuts land on sentence and clause boundaries so subtitles read naturally rather than mid-word.

📥

Download .srt or .vtt

SRT for video editors and YouTube uploads. VTT for HTML5 video and WebVTT-compliant players. Both formats hold the same content; pick whichever your destination needs.

Why Mictoo for SRT generation

Whisper-quality captions, not auto-transcribed slop

YouTube auto-captions and many quick caption tools use older, weaker ASR. Mictoo uses Whisper large-v3, the same model used by professional caption services. Proper nouns, technical terms, and accented speakers transcribe noticeably better.

Tight timing aligned to caption display conventions

Caption segments are sized 2-7 seconds and break on natural pause boundaries. Lines stay under 42 characters where possible (the BBC/Netflix caption standard). Reads cleanly on screen without manual re-segmentation.

SRT or VTT, your choice

Same source transcription, two output formats. SRT is the universal video editor format. VTT is the W3C standard for HTML5 video and the only format YouTube returns for the WebVTT-style "user captions" API. We provide both.

Free with no signup and no watermark

The .srt file is plain text with no watermark, no "generated by" tag, no embedded ad. The transcript inside is yours to use however you like, with no attribution requirement.

Translation to 50+ languages for multi-language subtitles

After the original SRT generates, translate the transcript to another language and download a translated SRT. Useful for creating subtitle tracks in multiple languages from one source recording.

When you need an SRT specifically

YouTube uploads with manual captions

YouTube auto-captions are visibly worse than Whisper. Generate the SRT here, upload as "manual captions" alongside the video. Cleaner text, fewer cringe-worthy proper-name errors.

Premiere Pro / Final Cut / DaVinci Resolve captions

Import the .srt as a captions track in your NLE. Adjust timing or styling inside the editor, burn into the video or export as a separate captions file alongside the master.

CapCut and mobile video editor subtitles

CapCut, InShot, and similar mobile editors import SRT subtitles. Generate the SRT here on a laptop or desktop, transfer to the phone, drop into the project as a subtitle track.

HTML5 video on your own website

For self-hosted video, use the VTT download (the spec-defined format for HTML5 <track> elements). Drop into your video player and the captions render natively in browsers.

Accessibility compliance (WCAG / ADA)

Captions are a baseline accessibility requirement for video content. Generate accurate SRT captions for every published video to meet WCAG and similar accessibility standards.

Translated subtitle tracks for international audience

Generate original-language SRT, translate to target language, download translated SRT. Drop both into your video as alternative subtitle tracks for viewers in different languages.

SRT generation tips

1

For YouTube, upload manually rather than auto-translate

YouTube can auto-translate captions, but the result is inferior to running Mictoo translation first and uploading the translated SRT manually. Auto-translate works from YouTube auto-captions (already weaker), compounding errors.

2

For NLEs that support styled subtitles, plain SRT is fine

SRT does not encode styling (color, font, position). Most NLEs (Premiere, Final Cut, DaVinci Resolve, CapCut) apply their own styling on import. You set color and font inside the editor; the SRT just provides text and timing.

3

For longer captions, edit segment lengths post-generation

Whisper segments default to 2-7 seconds. If you need shorter (TikTok/Reels style, often 1-2 seconds) or longer (lecture replay, 8-15 seconds), open the .srt in any text editor and adjust the timestamps directly. Simple SubRip format.

4

Validate the SRT before uploading to YouTube

YouTube silently rejects subtly malformed SRT. Run the downloaded file through an SRT validator (free web tools) to catch missing blank lines between segments or malformed timestamps before the upload.

What an SRT file actually contains

SubRip (.srt) is one of the simplest possible subtitle formats: a plain text file with numbered segments, each containing a start timestamp, an end timestamp, and one or two lines of caption text. A single segment looks like:

1
00:00:01,000 --> 00:00:03,500
Welcome to the show.

2
00:00:03,600 --> 00:00:06,200
Today we talk about
the subtitle generator workflow.

That is the whole format. No styling, no positioning, no font specification. The simplicity is why it works everywhere: parsing SRT is trivial enough that even indie video tools implement it without fuss.

SRT vs VTT: what is the actual difference

VTT (WebVTT) is the W3C standard for HTML5 video captions. It adds optional styling (positioning, colors, classes), multi-line cues, and metadata. For the basic case (text with timestamps), VTT is almost the same as SRT with a different header and a different timestamp separator (period instead of comma for fractional seconds).

Practical choice: use SRT if your target is a video editor or YouTube. Use VTT if your target is HTML5 video on your own website (the standard HTML <track>element expects VTT). Mictoo offers both downloads from the same source transcription.

How Whisper produces caption timing

Whisper outputs word-level timestamps for the whole transcription. We group consecutive words into caption segments using a few rules: keep segments under ~84 characters (so they fit on two lines of typical caption display), break at sentence and clause boundaries where possible, keep individual segments between 2 and 7 seconds. The resulting segments read naturally on screen rather than ending mid-clause.

Timestamp accuracy is typically within 100-300 ms of the actual word boundaries, which is comfortable for caption display (viewers tolerate small drift, especially when captions appear slightly before the speech).

Why "burned-in" captions are different

SRT files are external captions: the .srt file lives alongside the video, and the player or editor renders the text on top. Burned-in captions are pixels baked into the video frames during render. Burned-in captions cannot be turned off, cannot be translated, cannot be re-edited. External captions (SRT or VTT) can be toggled, replaced with translated versions, or edited without re-rendering.

For most use cases (YouTube, web video, NLE projects), external SRT captions are preferred for the flexibility. For platforms that do not support uploadable captions (some social platforms, downloaded video for offline viewing), burn the captions in during the video editor export, using the SRT as the source for caption text.

Common SRT pitfalls and how to avoid them

Missing blank line between segments: SRT requires a single blank line between numbered segments. Some tools omit it and the file silently fails to parse in strict players. Mictoo emits properly formatted SRT with blank lines.

Wrong line ending convention (CRLF vs LF): SRT specs tolerate either. YouTube and most NLEs handle both. Some older Windows-only tools require CRLF. Mictoo emits LF by default; convert with a text editor if your target tool needs CRLF.

Encoding: SRT files should be UTF-8 for non-ASCII characters (accented letters, non-Latin scripts, emoji). Mictoo emits UTF-8. If you see "garbled accents" in your destination tool, it is reading the file as Latin-1 or Windows-1252 instead of UTF-8.

Frequently asked questions

What input formats does the SRT generator accept?

Audio (MP3, M4A, WAV, FLAC, OGG, AAC) or video (MP4, MOV, WebM, MKV, AVI). For video, we extract the audio track on our side. You can also paste a YouTube URL instead of uploading. Free tier up to 60 MB per file.

What is the difference between SRT and VTT?

SRT is the universal subtitle format for video editors and YouTube. VTT is the W3C standard for HTML5 video on the web. Content is nearly identical; VTT supports optional styling and positioning. Mictoo provides both from the same source transcription.

Can I upload the SRT directly to YouTube?

Yes. In YouTube Studio, open the video, go to Subtitles, click Add, pick "Upload file", choose "With timing", and select the .srt. The captions appear within a few minutes. Higher quality than the YouTube auto-captions.

Will the SRT import into Premiere or Final Cut?

Yes. Both Premiere Pro and Final Cut Pro import SRT as a captions track. In Premiere: File › Import › select .srt. In Final Cut: drag the .srt onto the timeline. DaVinci Resolve and CapCut work similarly.

Are the timestamps frame-accurate?

They are within 100-300 ms of word boundaries, which is comfortable for caption display. For frame-accurate sync (broadcast captioning standard), edit the timestamps inside your NLE after import. For YouTube, web video, and most production use, the default timing is tight enough.

Can I generate SRT in languages other than English?

Yes. Whisper large-v3 supports 50+ languages for transcription. For short files, set the language manually in the dropdown for cleaner first-pass detection. The .srt output uses UTF-8 encoding so non-Latin scripts render correctly.

Can I generate translated SRT for multilingual subtitles?

Yes. Generate the original-language SRT first. Click Translate, pick a target language, and download the translated SRT. Useful for adding alternative subtitle tracks in multiple languages from one source recording.

Does the SRT contain speaker labels?

No. Whisper does not currently distinguish speakers in the transcript. Speaker diarisation is on our Pro tier roadmap. For now, captions are continuous text without "Speaker 1: ..." prefixes.

What if YouTube rejects my uploaded SRT?

Usually a formatting issue. Open the .srt in a text editor; check that segments are numbered sequentially, separated by blank lines, with timestamps in HH:MM:SS,mmm format. Run through a free SRT validator if the file looks fine to you but YouTube still rejects.

Will my uploaded audio or video be saved?

No. The file streams to the transcription provider, gets processed once, and is dropped from memory. We do not store the audio or video. The text transcript and SRT are downloaded directly to your browser and never written to disk on our side.

Generate clean SRT subtitles in seconds

Upload audio or video, paste a YouTube URL. Get .srt or .vtt ready for YouTube, your NLE, or your web player.

Generate SRT now