$ cat ~/work/vani.case
Vani.
Desktop app · In progress
// A cross-platform desktop dictation app — press a hotkey, speak naturally, and get AI-polished text injected into any active app via Whisper STT and Claude.
// Impact at a glance
- - Global hotkey dictation that injects polished text into any active app without focus loss
- - Whisper STT + Claude cleanup pipeline — transcription and editing in under 3 seconds
- - Machine-specific encrypted key storage, content protection, and no backend relay
// summary
Vani is an Electron desktop app for macOS and Windows. Press a global hotkey from any app, speak naturally, and get polished text injected directly into the focused window — powered by OpenAI Whisper for transcription and Claude for cleanup, with local model support planned.
// problem
Dictation tools either require constant app-switching or are locked to one input field. There is no frictionless way to speak naturally and get edited text wherever your cursor already is.
// what I built
A floating pill overlay appears on hotkey press, records mic audio with a live waveform, transcribes via Whisper, cleans up with Claude, and injects the result directly into the focused window — all without leaving your current app.
// core experience
- - Press Cmd+Shift+Space from any app — a floating overlay appears and starts recording immediately
- - Live waveform feedback with silence auto-stop; transcription and cleanup in under 3 seconds
- - Full dashboard for history, notes, model settings, and usage — accessible from the system tray
// architecture
- - Electron 30 main process with IPC surface for transcription, cleanup, text injection, notes, and history
- - React 18 + Vite renderer for both the dashboard and the floating overlay pill
- - OpenAI Whisper for STT, Claude for cleanup; electron-store with machine-specific AES encryption
// ai involvement
Whisper handles speech-to-text; Claude cleans up the raw transcript into polished, context-aware prose. Local model runtime via Faster-Whisper and llama.cpp is the next milestone.
// challenges
- - Injecting text reliably into any active app across macOS and Windows without stealing focus
- - Keeping the overlay lightweight and hidden from screen recorders with content protection enabled
- - Machine-specific encryption for API key storage without a backend or cloud dependency
// outcome
Core dictation loop is fully working — hotkey, record, transcribe, clean, inject. Local model runtime and notarization are the remaining milestones before public release.
// why this matters
It shows I can build ambient, privacy-conscious desktop AI tools that disappear into the user's workflow rather than demanding attention.
// reflection
The hardest part is injection — every app handles focus and input events differently. Reliability here matters more than features.
// capabilities
// links