Transcription and Chapter support within the podcast

system

Chapter support was requested for the podcast, so I'm attempting to add it, along with transcriptions. This is my first forray into using AI tools for an actual purpose. I've tried transcribing the podcast using openai/whisper. See this guide on how it can be used; available on x86 and arm64.

My interest then shifted towards speaches-ai/speaches which adds text-to-speech support in addition to Whisper. Live example of Speaches hosted at https://huggingface.co/spaces/speaches-ai/speaches

Since the Raspberry Pi 4 and Pi 5 are supported without need for a dedicated GPU, I'm looking to run one of these implementations locally. More details to come as testing continues.

system

After experimenting with running Whisper on an i9 laptop with 4050 GPU, I've decided to drop all the way back to running things from an old 8gb ram Raspberry Pi 4 w/ SSD disk over USB 3.0

Is it slower on an 8gb ram arm64 device? Absolutely. I'd guess 100% slower minimum on tiny image and 400% slower on base image... but, it literally gets the same results since I'm not attempting to deal with live recordings, but offline recordings. Effectively takes a minute to transcribe each minute of the recording.

ggml-org/whisper.cpp

This is the main project repo. Whisper as of today is well supported in terms of generating .srt subtitles. Both the tiny and base images work just fine. I have no need for a webui, so this works just peachy and will be easier to automate.