AAC · ADTS · Raw codec stream

AAC to Text
Raw AAC streams transcribed cleanly

Drop a .aac file from a podcast CDN, a broadcast archive, or a ripped iPhone export. We handle ADTS and ADIF stream formats and return an editable transcript with timestamps.

AI summaryTranslate, 28 langsOpenAI Whisper

Language:

Drop your file here

or click to browse

MP3 · MP4 · WAV · M4A · OGG · WEBM · FLAC · Max 25MB · Max 30 min (60 min · Sign in)

Got a bigger file? See how to compress.

Got a longer recording? See how to split.

AAC is the audio codec that quietly powers most modern audio: YouTube, Apple Music, podcasts on most networks, broadcast radio archives, iPhone Voice Memos at lossy quality. Usually AAC lives inside a container like M4A or MP4. But sometimes you get a raw .aac file with no container at all, which is what this page is for.

Drop the raw .aac in. We detect whether it is an ADTS stream (the streamable format, most common) or ADIF (the file-only format, very rare), decode it, and run it through Whisper. The transcript comes back in seconds with timestamps and exports for TXT, SRT, VTT, or DOCX.

For AAC inside an M4A container (most iPhone Voice Memos), use the M4A to Text page instead, it has more iPhone-specific guidance.

How it works

📡

Upload your .aac file

Raw AAC, usually in ADTS stream format. Common sources: podcast CDN downloads, broadcast radio archives, ripped audio from FaceTime or iPhone, YouTube audio extracts.

🔍

We detect ADTS or ADIF

The two AAC stream wrappers. ADTS (Audio Data Transport Stream) is what almost all real-world .aac files use. ADIF (Audio Data Interchange Format) is older and rarer. We handle both.

📝

Edit and export the transcript

Fix words inline, then download TXT, SRT, VTT, or DOCX. Or copy directly to clipboard for pasting into a document or email.

Why use Mictoo for raw AAC files

Raw .aac without a container is supported

Most online transcribers expect a container (MP3, M4A, WAV). If you got a bare .aac file from a podcast CDN or a broadcast archive, they reject it. We accept the raw stream directly, ADTS or ADIF.

AAC-LC, HE-AAC, HE-AAC v2 all work

AAC-LC (Low Complexity, the standard profile, used by everyone) decodes natively. HE-AAC (High Efficiency, used by some broadcasters) and HE-AAC v2 (with parametric stereo, used at very low bitrates) both decode too. You do not specify which profile.

Bitrate from 32 kbps up to 320 kbps

Podcast networks usually ship AAC at 64 to 128 kbps. Apple Music uses 256 kbps. Broadcast archives can be down to 32 kbps. All of these transcribe cleanly. Below 32 kbps quality starts to hurt the model.

The decode step is automatic

Whisper does not read AAC directly. Our backend decodes the AAC stream into raw audio before passing to Whisper. Adds a fraction of a second to processing, you do not see the step.

Source file stays untouched

We read your AAC once, decode it, transcribe it, then drop the audio from memory. We never write anything back to your file or store it on our servers.

Where raw .aac files come from

Podcast network CDN downloads

Some podcast hosts ship the raw .aac stream rather than wrapping in MP3 or M4A, especially when the priority is the smallest possible file. Useful for transcribing for show notes or quoting in articles.

Broadcast radio archives

Some BBC, NPR, and public-radio archives offer downloads as raw AAC for efficiency. Lower bitrates (32-64 kbps) than typical music distribution, but plenty for spoken-word transcription.

YouTube audio-only extracts

When tools like yt-dlp extract audio from a YouTube video without re-encoding, the result is sometimes a raw .aac file (because YouTube serves AAC streams for many videos). Drop it here for the transcript.

iPhone audio rips outside an M4A

Some screen-recorder and audio-rip tools produce raw .aac instead of wrapping in M4A. Common from older third-party iPhone audio capture apps.

In-flight entertainment audio captures

Some airline entertainment systems stream AAC audio over their networks. People capturing audio from in-flight talks or audiobook material sometimes end up with raw .aac files.

Game and app voice line exports

Mobile games and apps often ship voice lines as raw AAC streams to save space and licensing. Modders and accessibility researchers occasionally need transcripts of these.

Working with raw AAC

If a player refuses your .aac, the file is probably ADIF (rare)

Most players expect ADTS and silently fail on ADIF. Mictoo accepts both, so for transcription you do not need to worry. If you also need to play the file in a stubborn player, convert with ffmpeg: ffmpeg -i input.aac -c:a copy -f adts output.aac (which keeps the same codec but ensures ADTS framing).

For long broadcast archives at very low bitrate, expect some accuracy loss

Below 32 kbps mono, AAC starts to compromise voice clarity. Whisper still tries but accuracy drops noticeably. If you have control over the source, re-encode at 64 kbps or higher before transcribing.

Convert to M4A if you also need to keep the file long-term

Raw .aac files are awkward to manage on macOS and Windows (poor player support, no metadata). Wrap in M4A with ffmpeg: ffmpeg -i input.aac -c:a copy output.m4a (no re-encoding). The M4A is the same audio with a friendlier container.

Set the language manually for short clips

Whisper auto-detect can mis-fire on clips under five minutes, especially with silence at the start. Pick the language explicitly in the dropdown for short broadcast clips or voice lines.

AAC is a codec, not a format

"What format is this file" gets confusing with AAC because AAC is just a codec, not a container. The codec compresses audio. The container packages compressed audio with metadata and timing information so a player knows how to navigate it. In most cases, AAC audio lives inside an M4A or MP4 container, which has all the metadata and seekability features modern players expect.

A raw .aac file has no container. It is just the bare codec output, sometimes wrapped in a minimal framing layer called ADTS that lets players sync to the start of any frame. Useful for streaming (each frame is independent and self-describing), inconvenient for offline use (no metadata, no chapter markers, no quick seek to a timestamp).

ADTS vs ADIF, briefly

ADTS (Audio Data Transport Stream) is the framing used for streamable AAC. Each AAC frame has a small header that lets a decoder lock on at any point in the stream, which is why ADTS is used for broadcast and HTTP streaming. Almost every raw .aac file in the wild is ADTS. ADIF (Audio Data Interchange Format) is the file-only alternative with a single header at the start and no per-frame sync, which makes it slightly smaller but unusable for streaming. ADIF is now rare; you mostly see it in legacy archives.

AAC-LC, HE-AAC, HE-AAC v2

AAC comes in profiles tuned for different bitrates. AAC-LC (Low Complexity) is the workhorse, used at 64 kbps and up for most modern audio. HE-AAC (High Efficiency) adds Spectral Band Replication to make low bitrates sound better, used in some broadcast and streaming contexts at 32 to 64 kbps. HE-AAC v2 adds Parametric Stereo for ultra-low bitrates, used by digital radio and some podcast distribution at 24 to 48 kbps. Mictoo decodes all three transparently, you do not pick a profile.

Why podcasts mostly use MP3, not AAC

AAC is technically better than MP3 at the same bitrate, yet most podcast networks still distribute MP3. The reason is historical compatibility: every podcast app on every device from 2005 onward plays MP3. AAC support is universal now too, but the install base of legacy MP3-only podcatchers was enough to keep MP3 as the safe choice. Networks that picked AAC tend to be newer and more closely tied to the Apple ecosystem.

AAC vs related audio formats

AAC the codec lives inside several containers. Pick the page that matches your actual file.

.aac (raw)

Container: None (ADTS or ADIF)
Typical source: Podcast CDN, broadcast, rip
Metadata support: None
For transcription: Direct (this page)

M4A →

Container: MP4 (audio-only)
Typical source: iPhone Voice Memos, GarageBand
Metadata support: Full (title, artist, chapters)
For transcription: Use M4A page

MP3 →

Container: None (codec only)
Typical source: Most podcasts, web audio
Metadata support: ID3 tags
For transcription: Use MP3 page

OGG (Opus) →

Container: OGG
Typical source: Telegram voice, Linux apps
Metadata support: Vorbis comments
For transcription: Use OGG page

Frequently asked questions

Will a raw .aac file from a podcast CDN work?

Yes. Most podcast CDN .aac files are ADTS streams. Drop the file in directly, we decode the ADTS framing and transcribe the audio inside. No conversion to MP3 or M4A first.

What is the difference between .aac and .m4a?

.aac is the raw codec stream with no container around it. .m4a is the same AAC audio wrapped in an MP4 container, which adds metadata support (title, artist, chapters) and easier player compatibility. For transcription, both decode to the same audio. We have separate pages for each because the user-side workflows differ.

Do you support HE-AAC and HE-AAC v2?

Yes. Both High Efficiency AAC profiles decode automatically. You do not specify the profile when uploading. HE-AAC v2 with parametric stereo (used at very low bitrates by digital radio) also works.

My .aac file is from a YouTube audio extract. Does it work?

Yes. Tools like yt-dlp can extract YouTube audio as raw AAC without re-encoding. The result is usually an ADTS stream. Drop it in here as-is, no conversion needed.

What is ADTS vs ADIF?

Two different ways to wrap raw AAC. ADTS (Audio Data Transport Stream) puts a small header on every AAC frame so a player can lock onto the stream at any point, used for broadcast and streaming. ADIF (Audio Data Interchange Format) has a single header at the start of the file, rarer today. We handle both.

Why does my .aac not play in iTunes / Apple Music?

iTunes and Music expect AAC inside an M4A container, not a raw .aac stream. The fix is to wrap in M4A: ffmpeg -i input.aac -c:a copy output.m4a. This is a container change without re-encoding, takes a second. For transcription you do not need this step, we accept raw .aac directly.

Will broadcast radio archives at very low bitrate transcribe well?

Reasonably. At 32 kbps mono and above, accuracy is good. Below 32 kbps the AAC encoder starts removing too much of the high-frequency consonant information Whisper uses, and accuracy drops noticeably. Most modern archives are 64 kbps or above.

Can I get timestamps from a raw .aac file?

Yes. Download as SRT or VTT for timestamps, or JSON for word-level alignment. Even though raw .aac has no native timing metadata, our decoder reconstructs the timeline based on sample positions, so timestamps are accurate against the audio.

Will my .aac file be saved on your servers?

No. The audio streams through to the transcription provider, gets decoded once for inference, and is dropped from memory after the response. We do not write the audio to disk. The text transcript is only stored if you sign in and choose to save it to your history.

Can I transcribe a .aac file in another language?

Yes, over 50 languages with auto-detect. Pick the language manually in the dropdown for short clips (under five minutes) where auto-detect sometimes mis-fires on silence or non-speech intros.

What about ALAC (Apple Lossless)? Same as AAC?

No, despite the similar name. ALAC is a lossless codec, AAC is lossy. ALAC always lives inside an M4A container, never as a raw .alac. If your file has ALAC audio, use the M4A page.

My .aac file has no metadata (artist, title). Is that normal?

Yes, completely. Raw .aac streams have no metadata layer, that is one of the reasons people wrap in M4A. If you need title or artist info, wrap in M4A first (ffmpeg one-liner above) and add tags in iTunes, MusicBrainz Picard, or another tagging tool.