Really good post on hacker news for Transcribing tips (comments). Some notes:
- A commentor had the idea to also strip silences with the following FFMPEG code block. Less silence means less room for hallucinations, and it’s also cheaper!
ffmpeg -i video-audio.m4a \
-af "silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:\
stop_periods=-1:stop_duration=0.02:stop_threshold=-50dB,\
apad=pad_dur=0.02" \
-c:a aac -b:a 128k output_minpause.m4a -y
- You can also speed up the audio to reduce time as well
ffmpeg -i video-audio.m4a -filter:a “atempo=2.0” -ac 1 -b:a 64k video-audio-2x.mp3
- Other enhancements like normalization can also help boost efficiency