Hi, I'm Sakharam Thorat π¨βπ»
AI Voice Systems Engineer | Real-time Speech AI | Conversational Agents
π I specialize in building production-grade real-time voice AI systems β from telephony pipelines to speech-to-speech agents.
My work spans across:
- ποΈ Speech AI (STT, TTS, Speech-to-Speech)
- β‘ Low-latency real-time systems
- π‘ SIP / VoIP / streaming architectures
- π§ Voice agents & conversational AI
- π§© On-device AI & optimization
- π Based in Pune, India
- πΌ 10+ years in real-time communication systems (Vonage, Avaya)
- π Built end-to-end voice AI pipelines (STT β LLM β TTS)
- β‘ Focused on low-latency conversational systems (<200msβ2s)
- π§ Exploring AI voice agents, speech models & edge AI
π§ Reach me at: srt.2011@outlook.com
π LinkedIn
- Real-time speech-to-speech AI (Azure OpenAI)
- Multi-provider STT (Deepgram, Google, Azure, Whisper)
- TTS pipelines (Edge, streaming synthesis)
- Voice agents with multi-turn conversations
- SIP / RTP / VoIP pipelines
- WebRTC + WebSocket streaming
- Codec optimization (G.711, G.729, Opus)
- Packetization (ptime), jitter & packet loss handling
- Noise suppression (RNN-based training on A100)
- CTC-based ASR experimentation
- NVIDIA NeMo, Vosk (Kaldi-based)
- Audio quality evaluation (PESQ, MOS)
- Whisper.cpp, Parakeet (on-device STT)
- Model quantization for low-memory inference
- CPU-optimized real-time pipelines
- aura-sde-interview-agent
Real-time voice AI agent for Google SDE interviews
β Full loop: Behavioural, Coding, System Design, Debugging
β Built with Gemini Live + Vertex AI
β Live grading + voice-only interaction
-
nemotron-stt (In Progress)
High-concurrency WebSocket ASR server using Nemotron-0.6B -
whisper-stt
Production-grade streaming STT using faster-whisper turbo -
qwen3-stt
Voice AI agent design + STT experimentation
-
VeloxTx (In Progress)
Multilingual low-latency translation engine
β βTrain anywhere, run anywhereβ philosophy -
velox-realtime (In Progress)
CPU-optimized real-time ASR (<30ms latency)
-
x-benchmark-tests
Benchmarking STT models (accuracy, latency, throughput) -
stt-systems-lab (In Progress)
Experimental lab for:- Nemotron
- Whisper Turbo
- Qwen3
- ONNX / browser inference
-
freeswitch-speech-ai
Real-time:- Transcription
- Translation
- Sentiment
- NER
- Call summarization
-
pjsua2-python
Prebuilt Python bindings for PJSIP (SIP stack)
- slm_framework_full (In Progress)
Unified framework for:- STT, TTS, Vocoder
- Small Language Models
- Real-time inference
- STT: Whisper, Deepgram, Azure, Google
- TTS: Edge, streaming pipelines
- Speech Models: NeMo, Vosk, Kaldi-based
- Metrics: WER, PESQ, MOS
- FastAPI, WebSockets, WebRTC
- LiveKit, Cloud Run, Vertex AI
- Kubernetes, Docker
- Python (primary)
- TypeScript / JavaScript
- Java
- Real-time AI voice agents with tool-calling
- Ultra-low latency speech-to-speech systems
- On-device conversational AI (edge-first)
- Production-grade multilingual voice pipelines
- πΌ LinkedIn
- π§ srt.2011@outlook.com
β If you're working on Voice AI / Speech Systems / Real-time AI β letβs collaborate!

