Contents

podcast transcription

From an 8-hour podcast recording to clean show notes in one pass

How to take a long-form, multi-host podcast episode from raw recording to a publish-ready show notes page in under an hour.

May 13, 2026 11 min read 2,393 words By CleanScribe Editorial

If you have ever finished editing a 90-minute episode and then realised you still have to write the show notes, pull the chapter markers, and find the three best quotes for social — you know the problem this guide solves.

Most podcast workflows treat transcription as an afterthought. You record, you edit, you publish the audio, and then you stare at a blank document and try to reconstruct what happened in the episode from memory. The result is show notes that read like a press release no one asked for — a thin paragraph, a bullet list of timestamps that only the host can decode, and a link to "listen in your favourite app." That page is not doing anything for anyone.

The better approach treats the transcript as the first deliverable, not the last. Once you have a clean, named-speaker transcript from a good podcast transcription tool, everything else — show notes prose, chapter markers, quote pulls, SEO page — is a repurposing task, not a writing task from scratch. This guide walks through that workflow from the moment before you hit record to the moment you hit publish.

What show notes are actually for

Show notes serve three audiences, and most episode pages serve none of them well. A clean transcript is the unifying primitive that serves all three at once.

Search engines — the episode's SEO page. Without a transcript on the page, Google has nothing to index but the episode title and a 200-word summary. That is fine for listeners who already subscribe, and invisible to everyone else. Long-form transcripts on the page open the back catalogue to listeners who searched for a phrase that was discussed in episode 47 — two years ago, on a topic they care about right now. The phrases your guests use naturally, the specific terms your hosts argue about, the product names and book titles that come up in conversation — those are the long-tail keywords that a transcript surfaces for free, and that no amount of manual SEO copywriting would have caught.

Prospective listeners — the "is this for me" scan. Someone lands on the episode page from a search result or a friend's link. They have twelve seconds to decide whether to spend 90 minutes. Show notes that read like a real piece of writing — a short intro that sets up the episode's argument, a few lines on who the guest is and why it matters, a clear sense of what changed by the end — convert better than three bullet points and a list of timestamps. Polished prose, not a recording in text.

Existing listeners — the "what did they say about X" lookup. Your most loyal listeners are also the most likely to share episodes. They are the people who will text someone "you have to listen to this" and then realise they cannot find the quote they want to paste. Chapter markers and a full transcript turn an episode page into a searchable reference — the kind of page a listener bookmarks and comes back to, rather than scrolling past in their app.

Each audience needs something different from the same source material. Writing all three from scratch is the problem. Generating them from one clean transcript is the solution.

Before recording

The transcript quality you get out depends on the setup decisions you make before you hit record. Three preparations do most of the work.

Host name list

Write down every host and their preferred display name before the episode. "James" or "Dr Whitfield"? "Maya" or "M.C."? "Tomás" or "Tom"? Decide once, use everywhere. Pre-fill these at upload so the transcript labels every host's lines consistently from minute one — not just the lines where someone happens to say a name on camera. The pre-fill list is your single source of truth for the episode's speaker names, and it takes two minutes to prepare before any recording you plan to transcribe.

This is especially important for shows with recurring hosts but rotating guests. Build a saved template with the regular host names, add the guest's name before each episode, and upload with the full list. The transcript will be consistent across the back catalogue in a way that makes search and archive much more useful.

Guest intros at the top

Ask each guest to say their name and one-sentence bio in the first two minutes of the recording — standard interview-podcast practice for a reason, but worth being intentional about. Two payoffs for transcription:

Anchors named-speaker labelling for the rest of the episode. The transcription engine uses the spoken intro as a voice sample tied to a name, and that match persists through the full recording. A guest who introduces themselves clearly at minute one will have their name on every line, even when they are two hours in and speaking over another guest.
Gives prospective listeners a 30-second clip they can use to decide whether to keep listening. The guest intro, lightly edited, is also the first paragraph of your show notes.

Mic per voice

If your setup supports multi-track recording — Riverside, SquadCast, Zencastr, and most modern remote recording platforms do this by default — configure each host and guest on their own track. Transcribing per-track is dramatically more accurate than mixed-down audio. Each voice has its own clean waveform, so the podcast transcription engine is not guessing where one speaker ends and another begins; it knows, because each track is a separate channel.

If multi-track is not possible — a phone-based interview, a recording at a live event, a field recording with a single shared mic — lavalier microphones clipped to different positions still give the model more acoustic separation than a single room mic placed in the middle of the table. The principle is the same: the more distinct each voice's audio signature, the more accurate the speaker labelling in the resulting transcript.

During recording

Three habits in the room make the transcript cleaner without changing anything about the episode itself.

Handoff cues by name. "Maya, what did you make of that?" gives the transcription engine a name anchor every time you use it. Aim for the host's name being spoken two or three times in the first ten minutes of each guest segment. You are not interrupting the natural flow of conversation — good interviewers use names anyway — but you are giving the model repeated confirmations that what it thinks it knows about this voice is correct.

Do not talk over guests. Even with multi-track recording, crosstalk muddies the transcript at the points where it happens. The half-second of silence between one person finishing and another starting is not dead air. It is the model's margin for getting the attribution right. The silence is fine.

Note timestamps of moments you want as chapter markers. A piece of paper next to the mic, a sticky note on the monitor, a quick note in your DAW. Ten seconds during the episode is worth thirty minutes during the post-production scrub. When you mark "good chapter moment — topic shifts to monetisation at around 47min," you are saving yourself the scrub-through later, and giving yourself a shortcut into the transcript at exactly the right point.

Choosing a transcription approach

Not all podcast transcription options are created equal, and the right one depends on how many speakers you have, how often you publish, and how much cleanup time you can absorb per episode.

Option 1: Hire a human

A freelance transcriber or a premium transcription service will produce publication-grade accuracy. For a two-host show with one guest and clear audio, a good transcriber will get the speaker labels right and produce prose that needs minimal editing. The cost is $1.50–$4.00 per audio minute at current US rates. For a weekly two-hour show, that is $180–$480 every single week, before any editing time. Right for legal or compliance-driven podcast formats, where every speaker attribution must be defensible to a third party. Ruinous for indie shows running on Patreon revenue and a part-time producer.

Option 2: AI transcription with diarization

Tools like Otter.ai, Notta, and the basic plans from Rev offer speaker diarization — the technical term for "the model groups lines by which voice produced them." They will return a transcript with Speaker 1, Speaker 2, Speaker 3 labels. Word accuracy is good. Speaker labelling is fine for solo or two-host shows where the listener can guess who's who from context.

It falls apart with three or more voices. You will spend fifteen to thirty minutes per episode relabelling speakers in the output before it is usable. On a weekly show, that is over twenty hours per year that disappears into transcript cleanup — before you write a single word of show notes.

Option 3: AI transcription with named-speaker labelling

Pre-fill host names at upload; let spoken introductions handle guests. The podcast transcript comes back with "Maya" and "James" and "Dr Whitfield" as labels — not Speaker 1 and Speaker 2 — ready to feed directly into the show notes repurposing pass. Cleanup time per episode: five to ten minutes, mostly verifying that the model matched the guest's spoken name correctly in the first minute or two.

This is the approach that makes the rest of this guide possible. Without named-speaker labels, the transcript is still raw material that needs reconstruction before it is usable. With them, it is a clean draft.

If you want to try it on an episode you already have: the free tier is 120 minutes per month, no credit card. That is a long episode and a short one before you decide whether to upgrade.

Turning the transcript into show notes

Once you have a clean, named-speaker transcript, the show notes are a repurposing task. Here is the sequence that works.

Skim for chapter markers. Every clear topic shift is a candidate. A new guest joining, a question that pivots the conversation, the moment someone says "actually, I want to push back on that" — these are natural chapters. Aim for five to ten per hour of audio. Note the timestamp and a six-word title next to each one. This takes fifteen minutes on a ninety-minute transcript and produces the episode's structural skeleton.
Pull three quote graphics. Read through the transcript looking for one-line zingers that stand alone out of context — the kind of sentence that needs no setup to land. The guest's unexpected admission, the host's sharpest summary of the episode's argument, the moment someone says something that sounds counterintuitive until you hear the reasoning. These become your social cards: quote + name + episode link. Pull three. One for launch day, one for the week-after push, one for the back catalogue.
Write the show notes intro from the transcript. Find the opening exchange that introduces the episode topic — usually the first three to five minutes. Lightly edit it into two or three prose paragraphs. The voice is already yours. The content is already there. You are not writing from scratch; you are editing a transcript into a readable opening. This is the highest-leverage part of the whole pass, because it is the section that converts the prospective listener who landed from search.
Build the timestamped TOC. Chapter title, arrow, timestamp. Five to ten rows. Listeners use this constantly — the experienced podcast listener scans the TOC first, picks the section they want, and starts there. A good TOC also tells the search engine what the episode covers in specific language that a summary paragraph rarely achieves.
Publish the full transcript on the episode page. All of it, not a 500-word excerpt. SEO benefit: Google indexes the entire conversation, not just the summary. Accessibility benefit: deaf and hard-of-hearing listeners can read the episode they cannot hear. Discoverability benefit: someone searching for a guest's name, a book title, or a specific claim finds the episode page in their results because those phrases are on the page in text — not buried in audio that search engines cannot parse.

A short checklist

Before recording:

Host name list ready to pre-fill at upload
Guests asked to introduce themselves at the top
Multi-track recording configured (if available)

During recording:

Guests use each other's names every few minutes
No talking over
Note chapter-marker timestamps on paper

After:

Transcribe with named-speaker labelling
Skim for 5–10 chapter markers per hour
Pull 3 social-ready quote graphics
Write the show notes intro from the transcript
Publish the full transcript on the episode page

Where CleanScribe fits

We built CleanScribe around the podcast transcription use case specifically: long recordings, multiple voices, show notes as the end goal.

Named speakers, not numbers. Hosts are pre-filled from the upload form; guests are labelled from their spoken introductions. Every voice gets a name from minute one. When the episode comes back, "Speaker 4" is not a problem you have to solve before you can start writing.
Long files in one pass. Up to 8 hours per upload, no splitting. A 90-minute episode with three hosts and two guests goes in as one file and comes out as one transcript. No seams, no mismatched labelling at the join points, no stitching two partial transcripts together and hoping the speakers still line up.
Polished prose, not a recording in text. We strip the umms, the false starts, and the repeated half-sentences so the transcript reads as a conversation. The meaning stays; the noise goes. The original audio is still there for your editor to clip the social pulls from — the transcript is the clean version for publishing, not a replacement for the source recording.

The free tier is 120 minutes per month. No credit card. Try it on your last episode, and see how much of the show notes write themselves.

→ Start free at cleanscribe.ai/for/podcasters

Have a podcast workflow tip we should add to this guide? Email us — we update this piece as new tools and techniques become standard.