Today I discovered…
Whisper.cpp
Plain C/C++ implementation of OpenAI’s Whisper automatic speech recognition (ASR) model inference without dependencies
💖 What I like about Whisper.cpp:
Quick setup with no hiccups (surprising for me as it is common to run into one or two issues when compiling a C project)
Plenty of ready-to-use examples for almost all use cases I had in my mind such as file transcription, live stream transcription, talk to llama, command/intention classification, node binding, android/iOS transcription, web support using was, etc.
Impressive performance in transcribing a short English audio file
👎 What I dislike about Whisper.cpp:
I didn’t have much success with the multilingual usage, even with the medium model
It used all the 4 core CPU I had - 350% (even when the base model was used)
Used 2-3x more resources than expected e.g. for the stream use case, the base.en model used close to 2GB memory compared to expected 850MB (tested on Linux and Intel CPU). I found decent performance on medium model for daily general purpose usage but the resource requirement was too high for it to be universally acceptable for low to mid range computers.
Overall, my recommendation for general daily purpose usage - use quantized small/medium model and stick to English language input. I am yet to try out OpenVINO optimisation and its impact. Will update recommendation when I have deeper insights.
⭐ Ratings and metrics
Based on my experience, I would rate this project as following
Production readiness: 8/10
Docs rating: 6/10
Time to POC(proof of concept): less than a day
Author: Georgi Gerganov @ggerganov
Demo | Source
🛡 License: MIT
Tech Stack: C, C++
If you discovered an interesting Open-Source project and want me to feature it in the newsletter, get in touch with me.
To support this newsletter and Open-Source authors, follow #OpenSourceDiscovery on LinkedIn and Twitter
I was working on a local/offline speech to text app and needed to figure out a way to use Whisper. There was one constraint, I cannot have any additional dependency. I needed to minimize the app's disk and runtime footprint to be able to make it useful in a daily usage.
After moving forward on this front on my own, I landed me on whisper.cpp, an Open Source project created by Georgi Gerganov (he also created the popular llama.cpp). Surprised to see that it already has a massive interest (34k stars) and has received massive contribution from close to 400 other people.
My experiments continued and this review is based on what I have learned after trying Whisper.cpp for a day.
By now, most of you'd already know about Whisper. If not, or just want to dive deeper, check out following links
* Paper - https://cdn.openai.com/papers/whisper.pdf
* Whisper intro by OpenAI - https://github.com/openai/whisper
* Official Whisper python package by OpenAI - https://github.com/openai/whisper