WebM · Loom · OBS · Screen recording

WebM to Text
Screen recordings into searchable transcripts

Loom and OBS exports, Chrome screen captures, browser MediaRecorder downloads. We strip the video, transcribe the audio, return a transcript you can search and quote.

AI summaryTranslate, 28 langsOpenAI Whisper

Language:

Drop your file here

or click to browse

MP3 · MP4 · WAV · M4A · OGG · WEBM · FLAC · Max 25MB · Max 30 min (60 min · Sign in)

Got a bigger file? See how to compress.

Got a longer recording? See how to split.

WebM is the format your browser writes when you record audio or video natively. Loom uses it. OBS can. Chrome extensions for screen capture default to it. The audio inside is almost always Opus, which transcribes well, but the video track makes the file big and complicates upload.

Drop the WebM in. We extract the audio server-side and discard the video. The transcript comes back with timestamps that match your original recording, so when you paste a quote back into Loom or your video editor, the time stamps still line up.

Free for files up to 60 MB. Screen recordings often exceed that; see the deep-dive below for an ffmpeg one-liner that extracts audio only and produces a tiny file in seconds.

How it works

📹

Upload your WebM

Loom downloads, OBS exports, Chrome screen recordings, browser MediaRecorder dumps. Audio-only or audio-plus-video both work.

✂️

We pull just the audio

On the server, we demux the WebM to extract the Opus or Vorbis audio track. The video portion (VP8, VP9, or AV1) is ignored entirely. Whisper only ever sees the audio.

🎯

Timestamped transcript ready

Get TXT for the plain text, SRT or VTT to drop straight into your video editor as captions, DOCX for documentation, JSON for word-level alignment.

Why use Mictoo for WebM recordings

No need to extract audio yourself first

Some transcription tools demand an audio-only file (MP3 or WAV). They will reject a video WebM and tell you to extract the audio in some other app first. We do that step for you, on the server, before transcription.

Timestamps match your original recording

When you download the SRT or VTT, the timestamps line up with your original WebM video timeline. Drop the subtitle file straight into Loom, YouTube, Premiere, DaVinci Resolve, or Final Cut without any time-offset work.

Works with any WebM audio codec

WebM audio is almost always Opus (modern), occasionally Vorbis (older files). Both decode automatically. Video codec (VP8, VP9, AV1) does not matter for transcription, we discard it.

Loom-style recordings transcribe cleanly

Loom records voice from your microphone at the front of the audio mix. That kind of close-mic clean voice is exactly what Whisper handles best, so transcripts of Loom recordings are usually very accurate.

The video stays on your computer

We do not save or share the video portion. The audio is decoded once for transcription and dropped. Your original WebM on your drive is untouched.

Where WebM files come from

Loom screen recordings

Loom is the dominant async-video tool, especially for SaaS teams. When you click Download on a Loom recording, the file is a WebM. Transcribe it for searchable text of your tutorial, bug report, or design walkthrough.

OBS Studio screencasts

OBS defaults to MP4 but is commonly configured to record WebM (especially on Linux). Useful for streamers, tutorial creators, and game developers who want a text version of their commentary.

Browser MediaRecorder API exports

When a web app records audio or video in your browser (interview tool, school assignment recorder, language-learning app), the output is almost always WebM. Drop the downloaded file here.

Chrome and Firefox screen-recorder extensions

Most browser screen-recorder extensions write WebM because it is the format the browser can produce natively. Smaller and faster than running a separate app like OBS.

Riverside, Riverfm, and async-podcast tools

Several async-podcast platforms produce WebM for the local backup recording. Useful when the cloud upload failed and you have only the local-file fallback.

Google Meet recordings (in some cases)

Google Meet usually saves to MP4, but in some configurations (browser-side fallback, specific Workspace setups) the export is WebM. Both work here.

WebM tips that save bandwidth

For large screen recordings, extract audio with ffmpeg first

A 30-minute Loom recording is often 200+ MB because of the video. Extracting just the audio drops it under 10 MB and uploads in seconds. One-liner: ffmpeg -i input.webm -vn -c:a copy audio.webm (no re-encoding). Upload the audio.webm here.

If the audio codec is Opus, you can also convert to .ogg for compatibility

ffmpeg -i input.webm -vn -c:a copy audio.ogg also works and may be more compatible with players you also need to use. Mictoo accepts both.

Microphone placement matters more than codec

Screen recordings often have great visuals but terrible audio because the microphone was 1 meter away from the speaker. For important recordings, use a headset mic or a close mic. Cleaner audio means more accurate transcripts, regardless of WebM vs MP4 vs anything else.

If you need the video separately, keep your WebM around

We only return the text transcript. If you also want to clip the video, do that with your video editor against the original WebM file. The transcript SRT timestamps will line up with your edits.

What WebM actually is

WebM is a container format Google released in 2010 as an explicitly open, royalty-free alternative to MP4. It is built on a subset of the Matroska (MKV) container, restricted to specific video codecs (VP8, VP9, more recently AV1) and audio codecs (Vorbis originally, Opus mostly today). The point was to give browsers a format they could play natively without licensing MPEG patents.

It worked. Today every major browser ships with native WebM playback, and the format is the default for video APIs built into the browser itself. When your web app records audio or video using MediaRecorder, WebM is what comes out.

Why audio-only WebM is rare in the wild

Most people get a WebM file from a screen recording, a video call, or a video upload. Audio-only WebM exists (some podcast tools use it) but is much less common. So in practice, when you arrive at this page with a WebM in hand, it almost always has video inside, which is why we lead with "we strip the video".

The audio codec inside is the same Opus codec Telegram uses for voice messages: small, clear, voice-friendly. Whisper handles Opus at 64 to 96 kbps stereo (typical for screen recordings) very well. Your transcript quality depends on microphone placement and room noise, not on Opus vs anything else.

Stripping the video saves a lot of bandwidth

A typical Loom recording at 720p uses 80 to 95% of its bytes for video, and only 5 to 20% for audio. So a 200 MB Loom screen recording usually has only 10 to 40 MB of actual audio. The ffmpeg one-liner in the Pro tips section extracts that audio without re-encoding, in seconds, on any laptop. Drop the extracted audio file here and the upload completes much faster than uploading the original 200 MB video.

WebM vs MP4 for the same recording

Both work in Mictoo. WebM uses Opus audio (slightly more efficient at the same bitrate), MP4 uses AAC audio (better tool support across legacy software). Transcript quality is identical between the two if the source recording quality is the same. The choice between them comes down to what your recording tool happens to export by default.

WebM vs other formats for screen recordings

All four work in Mictoo. WebM is what your browser writes natively, but you can transcribe any of these.

WebM

Container: Matroska subset
Audio codec: Opus (or Vorbis)
Source: Loom, OBS, browser
Strip video: Yes, server-side

MP4 →

Container: MP4
Audio codec: AAC
Source: Phones, cameras, most editors
Strip video: Yes, server-side

OGG →

Container: OGG
Audio codec: Opus, Vorbis, or FLAC
Source: Telegram voice, Linux apps
Strip video: N/A (audio-only)

MP3 →

Container: None (codec only)
Audio codec: MP3
Source: Podcasts, general audio
Strip video: N/A (audio-only)

Frequently asked questions

My Loom recording is a WebM. Can I transcribe it directly?

Yes. Download the recording from Loom (it will be a .webm), drop it in here. We extract the audio server-side, transcribe with Whisper, and return the transcript. No need to install any audio extraction tool yourself.

Will the timestamps match my original Loom or video editor timeline?

Yes. SRT and VTT timestamps reference the original WebM timeline, starting at 00:00:00. Drop the SRT into Loom, YouTube, Premiere, DaVinci Resolve, or Final Cut and it lines up automatically.

My WebM is too big (200+ MB). What is the fastest fix?

Extract audio only with ffmpeg, no re-encoding required: ffmpeg -i input.webm -vn -c:a copy audio.webm. The audio-only file is typically 5-20% the size of the full video. Upload the audio.webm here, transcription proceeds normally.

Will OBS WebM recordings work?

Yes, both audio-only and video-plus-audio OBS WebMs. OBS sometimes uses Vorbis for audio (older config) or Opus (newer config). We detect and decode both. If your OBS WebM is over 60 MB, extract audio with the ffmpeg one-liner first.

What audio codecs inside WebM do you support?

Both Opus (the modern default, almost all new WebM files) and Vorbis (the older codec, still present in some older recordings). Detection is automatic from the file headers, you do not specify which one.

What about the video codec (VP8, VP9, AV1) inside WebM?

Irrelevant for transcription. We discard the video track entirely before transcription. Whisper only sees the audio. So you can upload WebM with any video codec without affecting the transcript at all.

Can I upload a recording from a browser app like Riverside?

Yes. Browser-based recording apps (Riverside, Riverfm, Squadcast in some configs) output WebM as the local-backup file. Drop it here for a clean transcript.

My WebM has only audio, no video. Will it still work?

Yes. Audio-only WebM is rarer in the wild but absolutely supported. We handle the file exactly the same way, just without the video-stripping step.

Will the transcript include speaker labels for a multi-person recording?

Not automatically. Whisper does not separate speakers in mixed-audio WebMs. If your recording app gave you separate per-speaker tracks (Riverside, for example), transcribe each speaker track separately and label by hand. Speaker diarization is on the roadmap for the Pro tier.

Does the video portion go anywhere when I upload?

No. We demux the WebM on the server, send only the audio to the transcription model, and drop the rest from memory. We do not save the video, we do not analyse the video frames, we do not share anything with third parties.

Can I get the transcript synced to the video for captions?

Yes. Download as SRT or VTT and drop into your video editor or directly into the Loom playback page (if it supports subtitle import). The timestamps reference the original audio timeline, so they align frame-accurate with the video.

How long does a 30-minute WebM screen recording take to transcribe?

If you upload the full video (200+ MB), upload time dominates: usually 30-60 seconds on a typical home connection plus 30-50 seconds of transcription. If you extract audio first (10-15 MB), the whole thing finishes inside 30 seconds end to end.