mictoo
WAV · PCM · BWF · Free

WAV to Text
Transcribe any WAV in seconds

Drop a WAV from your DAW, field recorder, or interview rig. We turn it into an editable transcript with timestamps and exports for TXT, SRT, VTT, and DOCX.

AI summaryTranslate, 28 langsOpenAI Whisper

Drop your file here

or click to browse

MP3 · MP4 · WAV · M4A · OGG · WEBM · FLAC  ·  Max 25MB  ·  Max 30 min (60 min · Sign in)

Got a bigger file? See how to compress.

Got a longer recording? See how to split.

WAV files preserve the original audio without compression, which is exactly why your recorder, DAW, or studio rig probably saved one. That same property is why WAVs get huge fast. Mictoo accepts them directly so you do not have to convert before transcribing.

Drop the file in, get back an editable transcript with timestamps, an AI summary, and one-click exports to TXT, SRT, VTT, or DOCX. Useful for interviews, podcast show notes, lecture archives, field-recording logs, and DAW-bounce captions.

Free for files up to 60 MB. For longer studio bounces or multi-hour lectures, see how to compress audio or how to split audio before uploading.

How it works

📂

Upload your WAV

PCM 8 to 32-bit float, mono or stereo, sample rates 8 kHz to 192 kHz. Broadcast Wave (BWF) from Sound Devices, Zaxcom, Tascam pro recorders works the same way.

AI transcribes the speech

Whisper large-v3 reads the audio and converts speech to text. A 30-minute file usually finishes inside one minute. Upload speed is the bottleneck for large WAVs.

📋

Edit and export

Fix wrong words inline, then download TXT, SRT, VTT, or DOCX. Copy to clipboard if you just need the text. AI summary appears next to the transcript automatically.

Why use Mictoo for WAV files

Direct WAV transcription, no manual conversion

Some free transcribers reject WAV and tell you to convert to MP3 first. Mictoo accepts standard PCM WAV directly, including 24-bit and 32-bit float. One less step in your workflow.

PCM and Broadcast Wave (BWF) both work

BWF files from professional field recorders carry timecode and scene metadata in extra chunks. We read the audio, ignore the metadata chunks, and never write your original file back. Your timecode stays intact on your drive.

Sample rates and bit depths we actually handle

PCM 8-bit, 16-bit, 24-bit, 32-bit integer, and 32-bit float. Mono and stereo. Sample rates from 8 kHz to 192 kHz. Multi-channel WAVs get downmixed automatically before transcription.

Useful exports out of the box

Download as TXT for plain text, SRT or VTT for subtitles aligned to your timestamps, or DOCX for ready-to-edit Word documents. Copy to clipboard when you just want to paste somewhere.

Practical guidance for large WAV files

WAV is uncompressed, so files get big quickly. When yours is over our 60 MB cap, we tell you up front and walk you through the standard ffmpeg or Audacity recipe to bring it down without losing transcript quality.

Where WAV files come from

Interviews

Reporters and researchers capture interviews on handheld recorders (Zoom H5, H6, Tascam DR-40X) that default to WAV. Transcript becomes the source of pull quotes, citations, and the article draft.

Podcasts

When you bounce a finished episode in Logic, Reaper, or Pro Tools the master is usually 24-bit WAV. Upload that WAV (not the MP3 you publish) for the cleanest transcript, which becomes your show notes and SEO-friendly episode page.

Lectures

Teachers recording into Audacity with a USB mic end up with mono WAV files. Transcribe each lecture to make a searchable archive, give to students as captions, or feed into an LMS.

Field recordings

Documentary and nature recordists use Sound Devices or Zaxcom rigs that output Broadcast Wave with timecode. Transcript provides scene-level logs you can match against your timecode without touching the original file.

DAW and studio bounces

Audiobook narrators, voiceover artists, video editors all bounce 24-bit WAV intermediates. Use the WAV transcript to generate matching captions before the file gets compressed for delivery.

Archival audio

Libraries, museums, and family archive projects standardise on 24-bit WAV for long-term preservation. Run each WAV through transcription once and the archive becomes full-text searchable forever.

Recommended WAV settings for transcription

1

Aim for 16 kHz mono, 16-bit PCM

Whisper resamples to 16 kHz mono internally before transcription. Doing it on your side first makes the file about 12 times smaller than the original 48 kHz stereo 24-bit, with no meaningful difference for clean speech. ffmpeg one-liner: ffmpeg -i input.wav -ac 1 -ar 16000 -sample_fmt s16 output.wav.

2

Trim silence at the start and end

Field recorders often leave 30-60 seconds of dead air before and after the actual content. Audacity → Effect → Truncate Silence with default settings handles it quickly. Saves your 60 MB budget for words that matter.

3

Keep the original WAV in your project folder

The downsampled file is only for upload. Your original 24-bit master stays untouched on your drive for any future re-edit, archive copy, or higher-quality export.

4

For very long files, use a temporary MP3

A 90-minute mono 16-bit 16 kHz WAV is still 173 MB. For lectures or long-form podcasts, re-encode to a 64 kbps mono MP3 just for the upload. The MP3 is around 43 MB and transcribes with no meaningful quality difference for clean speech.

5

For noisy WAVs, denoise before upload

Background noise (wind, HVAC, room rumble, tape hiss) reduces accuracy more than any setting choice. Run the WAV through Audacity → Effect → Noise Reduction, or use the free Adobe Podcast Enhance web tool. Then upload the cleaned WAV.

WAV files in plain language

A WAV file is, in the standard case, raw uncompressed PCM audio with a small header on top. There is no codec, no perceptual model, no compression. The bytes in the file are the recording. That simplicity is why every DAW and field recorder on the planet can export WAV without negotiation, and it is also why WAV files are so much larger than MP3 or M4A files of the same length.

Why WAV is so large

File size is determined almost entirely by three numbers: sample rate (how many samples per second), bit depth (how many bits per sample), and channel count (mono or stereo). A one-minute stereo CD-quality recording (44.1 kHz, 16-bit, two channels) is 10.1 MB. A one-minute 24-bit 96 kHz field recording is around 33 MB. A one-hour 32-bit float stereo master at 48 kHz can land near 1.4 GB. WAV does not compress, so those numbers scale linearly with duration.

What this means for speech recognition

Whisper large-v3 (the model we run) resamples whatever you give it to 16 kHz mono before the first inference step. A 192 kHz 32-bit float multi-channel WAV ends up shaped exactly the same as a 16 kHz mono phone call by the time the model sees it. In our testing, the transcript quality difference between a 16 kHz mono WAV and a 96 kHz 24-bit stereo WAV of the same speech is statistically zero. What changes is your upload time and your file-size budget.

When uncompressed actually helps

There is one situation where WAV beats a low-bitrate MP3 for transcription: marginal audio. Very quiet voices, heavy ambient noise, dropouts from a flaky lavalier. MP3 encoders at low bitrates throw away exactly the high-frequency tail Whisper sometimes uses to disambiguate fricatives (s, f, sh sounds). If you already have a recording that transcribes poorly as MP3, the WAV version sometimes recovers words the compressed copy missed. For clean studio audio at any reasonable bitrate, you will not see the difference.

The Broadcast Wave (BWF) variant

Professional field recorders (Sound Devices, Zaxcom, recent Tascam and Zoom pro models) write Broadcast Wave, which is a regular WAV with extra metadata chunks: the bext chunk holds timecode and originator info, iXML carries scene and take numbers, sometimes there is a chna chunk for multi-channel naming. Mictoo reads BWF files the same as any other WAV. The metadata is ignored for transcription purposes, the audio is transcribed, and your original file on your drive is never touched or rewritten.

WAV vs other audio formats for transcription

All four formats work in Mictoo. Here is a practical comparison so you can pick the right starting format.

WAV

Size
Largest
Quality
Uncompressed
Best for
Studio, BWF, archival
Transcription
Works directly; downsample first if over 60 MB

MP3 →

Size
Smallest
Quality
Lossy (good at 128 kbps+)
Best for
Podcasts, long files, uploads
Transcription
Same accuracy as WAV for clean speech

FLAC →

Size
About half of WAV
Quality
Lossless compressed
Best for
Audiophile archives, CD rips
Transcription
Identical to WAV, smaller file

M4A →

Size
Small
Quality
Lossy AAC (very efficient)
Best for
iPhone Voice Memos, Apple ecosystem
Transcription
Same accuracy as WAV in practice

Need to convert before uploading? See how to compress audio.

Frequently asked questions

Can I transcribe a WAV for free?

Yes. Mictoo is free for files up to 60 MB. No signup needed, no watermark on exports, no upsell after the first transcription. For long studio bounces or multi-hour recordings, downsample to 16 kHz mono or re-encode to a short MP3 to stay under the cap.

Is WAV better than MP3 for transcription accuracy?

For clean speech at any reasonable MP3 bitrate (128 kbps or above), no meaningful difference. For noisy, low-gain, or otherwise marginal recordings, WAV can sometimes recover words a low-bitrate MP3 would miss. Most podcast and interview audio falls in the first category.

What are the best WAV settings for transcription?

16 kHz mono, 16-bit PCM is the practical sweet spot. Whisper resamples to that internally anyway. Higher sample rates and bit depths make the file larger without improving the transcript. Keep your original studio-quality WAV in your project folder, and use the downsampled version only for upload.

Do you support 24-bit and 32-bit float WAV?

Yes. Both work directly. Internally we normalise to 16-bit before sending to the speech model, which matches what Whisper expects. The extra bit depth gives you editing headroom in your DAW, but does not change the transcript.

Do you support Broadcast Wave (BWF) files?

Yes. BWF is a standard WAV with extra metadata chunks (bext, iXML, chna). We read the audio and ignore the metadata. The original file on your drive stays untouched, including all timecode and scene/take info.

Will WAV files from my Zoom, Tascam, or Sound Devices recorder work?

Yes. Zoom H1n, H5, H6, H8, Tascam DR-40X, DR-100mkIII, Portacapture X8, and Sound Devices MixPre / Scorpio all default to standard or Broadcast Wave. Drop the file straight in, no conversion needed.

What about exports from Pro Tools, Logic, Reaper, or Audacity?

All four export standard PCM WAV by default. Pro Tools and Logic typically write 24-bit at session sample rate, Reaper similar, Audacity writes whatever depth you configured. Mictoo accepts all of them as-is.

My WAV is over the 60 MB limit, what do I do?

WAV does not compress, so size scales with sample rate, bit depth, channel count, and duration. A 30-minute stereo 24-bit 48 kHz file is around 250 MB. Three fixes, in order: (1) downsample to 16 kHz mono 16-bit, which typically drops the file 10-12x with no transcript quality loss for clean speech; (2) trim leading and trailing silence with Audacity Truncate Silence; (3) for very long files, re-encode to a 64 kbps mono MP3 just for the upload. See our compress-audio and split-audio guides for exact steps.

Can I export SRT or VTT subtitles?

Yes. After transcription finishes you can download SRT or VTT with timestamps every few seconds. Both formats align with your original audio timeline, so they drop straight into your video editor or subtitle workflow.

Can I get timestamps in the transcript?

Yes. The default transcript view shows segment-level timestamps you can click to jump to that moment in the audio. Download as VTT or JSON for word-level granularity, or as SRT for segment-level subtitle format.

How accurate is the transcript for a noisy WAV?

Background noise (wind, HVAC, traffic, tape hiss) reduces accuracy noticeably. Run the WAV through Audacity → Effect → Noise Reduction or the free Adobe Podcast Enhance tool before uploading. The cleaned version typically transcribes much better.

Will my original WAV file be changed in any way?

No. The file you upload is read by our backend, sent to the transcription provider, and discarded after the response comes back. Your original file on your computer is never modified. We never write a transformed copy back to you.

What can I do with the transcript after it is generated?

Edit wrong words inline before exporting. Then download as TXT (plain text), SRT or VTT (subtitle format with timestamps), or DOCX (Word document). Copy directly to clipboard if you just need to paste somewhere. The AI summary appears alongside the transcript automatically.

Upload your WAV and get an editable transcript

Drop the file, wait under a minute, copy or export the text. Free for files up to 60 MB. No signup.

Transcribe a WAV now