Greetings Citizens of Hopefully Useful AI.
It has come to my attention that there are plenty of videos, as well as workflows that would get so much better if there was the possibility of textifying their audio content.
That being said, I hear Whisper, at least in the past 9 months or so was the cream of the crop when it came to audio recognition. And was also open source to boot (shocker).
Therefore, I’d be quite pleased to know if anyone created a method to more easily make use of the model. Because dedicating mental space to remembering specific adhoc commands does not make for a good long term tool.
For reference, I can throw a 24GB of VRAM at the problem if need be, and am running a Windows machine. Anything like Oobabooga or A1111? (Or a standard program would work just as nicely.) That would be very much appreciated.
Type in your answer, and ENRICH the future of Lemmy with your knowledge. (As well as answer one’s question, pretty please.)
Thank you very much for reading and have a most fine of days!
https://github.com/ahmetoner/whisper-asr-webservice
this project might be of interest for you, it is a web service/api for transcribing with whisper ai. You can either use the web site or make programmatic calls to the API using it
Oh this is is quite interesting. Quite interesting indeed! I approve of this. Seems like it may be exactly what I was looking for.
Much appreciated!