๐Ÿ—ฃ
RP AIA ยท an area of the work

Voice AI

Voice intelligence โ€” speech in, speech understood, words read back.

BUILDING
The keyboard is the narrow gate between a fast mind and the machine โ€” thought outruns typing. Speech is the most natural human channel, yet most voice tools are cloud-bound transcribers that miss your intent and leak your words. Voice AI exists to make speaking to your machine โ€” and being understood โ€” effortless, private, and two-way.
Voice AI is the voice layer โ€” a two-way spoken channel. Speech becomes intent becomes action, and text is read back aloud. Speaker- and prosody-aware, it is the spoken interface to everything across the platform.
Speaking is about 3ร— faster than typing โ€” and 20โ€“63% more accurate. Pair voice with AI and one idea travels from mind to working solution roughly 5โ€“6ร— faster. The keyboard is the narrow pipe; voice + AI widen it.
โš™ Market baselines ยท placeholders pending RP-measured rates
0ร—
voice vs typing speed
Stanford/Baidu 2016
0 wpm
average typing
Dhakal 2018
0 wpm
average speaking
VirtualSpeech
~0ร—
end-to-end gain, voice + AI
compounded
Handwriting 13 wpm
Typing (avg) 52 wpm
Speaking 150 wpm
Speaking (fast) 220 wpm
Stageโ‘  Without AIโ‘ก With AI ยท voiceโ‘ข Gain
โ‘  Thinkform the idea ~12 min/idea (โ‰ˆ5 ideas/hr) AI prompts & seeds the idea โ†’ ~8 min ~1.5ร—
โ‘ก Convey500 words โ†’ the machine type 9.6 min ยท 52 wpm speak it ยท 150 wpm โ†’ 3.3 min 2.9ร—
โ‘ข Buildidea โ†’ working solution hand-built (baseline) AI expands the seed ยท 55% faster ~2.2ร—
ฮฃ End-to-endmind โ†’ solution ~55 min / idea ~10 min / idea โ‰ˆ5โ€“6ร—

Worked cost of moving one ~500-word idea from mind โ†’ machine โ†’ solution. Thought itself runs at ~400โ€“800 wpm โ€” far ahead of any output channel; the funnel's job is to widen the slowest pipe.

Sources: Ruan 2016 (Stanford/Baidu)Dhakal 2018 โ€” 136M keystrokesBrysbaert 2019 โ€” reading rateGitHub Copilot RCT

๐ŸŒฑ Seed
A faster way in โ€” dictate instead of type, fully on-device.
โ† shaped by the keyboard bottleneck: thought outruns the hands.
๐Ÿ›ค Path
Built the on-device speech-to-text pipeline with a local GUI and a hold-to-talk hotkey โ€” speech to text, anywhere on the Mac.
โ† shaped by the local-first rule โ€” nothing leaves the machine.
๐Ÿ”€ Pivot
Transcription alone wasn't enough. Added vocab-correction and on-demand on-device language-model refinement โ€” capturing not just the words, but the intent behind them.
โ† shaped by the realization that words โ‰  intent; raw transcripts drop domain terms and meaning.
๐Ÿ’Ž Crystal
Voice AI stopped being a dictation app and became the voice LAYER โ€” perception โ†’ intent โ†’ execution, the spoken interface to the whole platform.
โ† shaped by the stack principle โ€” phases are layers, not separate products.
โญ Principle
Voice as a natural two-way channel to all intelligence โ€” speak to it, it speaks back, in your language, on your device.
โ† shaped by the moonshot โ€” freeing human attention for higher thinking.
  • โœ“Voice pipeline built: STT โ†’ intent โ†’ execution router
  • โœ“On-device speech-to-text verified
  • โœ“Intent layer via tool-use โ€” text mode verified
  • โœ“Push-to-talk capture working
  • โ†’Speaker clustering via voice embeddings
  • โ†’Voice-activity detection + audio-type classification
  • โ†’Two-way voice assistant on the site (the orb)
  • โ†’Prosody-aware understanding
โ˜… the moonshot

Voice as a natural two-way channel to all intelligence โ€” speak to it, and it speaks back, in your language.

Imagine this working on your everyday tasks. The deepest how reveals itself when we build it together.

Build with me โ†’ See how it all fits โ€” RARE
โŒ‚Home
๐Ÿ”ŠOm
๐ŸŽ™Ask โœฆVision โ—ทRoadmap