•••
As I have previously stated, voice messages are evil. Unfortunately, there’s a lot of evil in this world and it can hardly ever be eradicated so we should find a way to live with it. Fortunately, there’s no stopping the progress and the new tools can make the living a little bit more comfortable.
Yesterday I suddenly realized that on a modern phone whisper.cpp should run rather well, and one could perhaps make a convenient local tool to parse voice messages. Of course, I know that Telegram, for one, has this built in, but I don’t feel like paying for a subscription for this purpose only, and then again voice messages are not exclusive to Telegram. To a local tool, on the other hand, one could send voice messages from any messenger app (at least all the apps I use allow to “share” a voice message as an audio file).
A quick search in app stores and GitHub demonstrated that I wasn’t the first person on the planet to fathom running whisper.cpp on a phone. However, the apps that do that can’t receive audio files through Android’s standard API (the “share” one). As usual, you want something done, you do it yourself.
I bootstrapped from the year-old article by Javed Alam that lays out running whisper.cpp in Termux in great detail. I had to figure out some of the working of the Termux itself do further build upon in, and ended up with an executable script in ~/bin/ that goes somewhat like this:
|
|
Naturally, ~/whisper.cpp has the repo clone with the executable built and models downloaded (according to Javed’s instructions). I opted for small that does Russian much better than base, although it’s noticeably slower (base works almost instantly while small takes several seconds).
The script is the secret sauce, in fact. If it exists in ~/bin/ then Termux tells the system that it can accept files, and then Termux appears in that “share” menu. You can then send those voice messages as files into it from Telegram or WhatsApp or wherever. ffmpeg will transcode them to the required format, and whisper.cpp will parse them and show to you as human-readable text.
This doesn’t solve searching for the information from these messages in chat, of course. And all in all, it’s a kludge. But you can at least know what the message is about when you have no way — or will — to listen to it; that’s good.
And now I hesitate: should I leave it as is, or should I develop a full-blown app? The app would be nicer and more comfortable to use, but I never developed anything for Android; it’ll cost a fortune in tokens…
Reactions