srthorat

Follow

Sakharam Thorat srthorat

Follow

1 follower · 7 following

Pune

Achievements

Achievements

srthorat/README.md

👋 Welcome to My GitHub Profile!

Hi, I'm Sakharam Thorat 👨‍💻
AI Voice Systems Engineer | Real-time Speech AI | Conversational Agents

🚀 I specialize in building production-grade real-time voice AI systems — from telephony pipelines to speech-to-speech agents.

My work spans across:

🎙️ Speech AI (STT, TTS, Speech-to-Speech)
⚡ Low-latency real-time systems
📡 SIP / VoIP / streaming architectures
🧠 Voice agents & conversational AI
🧩 On-device AI & optimization

🌟 About Me

📍 Based in Pune, India
💼 10+ years in real-time communication systems (Vonage, Avaya)
🔊 Built end-to-end voice AI pipelines (STT → LLM → TTS)
⚡ Focused on low-latency conversational systems (<200ms–2s)
🧠 Exploring AI voice agents, speech models & edge AI

📧 Reach me at: srt.2011@outlook.com
🔗 LinkedIn

🧠 What I Work On

🎙️ Voice AI & Speech Systems

Real-time speech-to-speech AI (Azure OpenAI)
Multi-provider STT (Deepgram, Google, Azure, Whisper)
TTS pipelines (Edge, streaming synthesis)
Voice agents with multi-turn conversations

⚡ Real-time Systems & Telephony

SIP / RTP / VoIP pipelines
WebRTC + WebSocket streaming
Codec optimization (G.711, G.729, Opus)
Packetization (ptime), jitter & packet loss handling

🧪 Speech ML & Audio Processing

Noise suppression (RNN-based training on A100)
CTC-based ASR experimentation
NVIDIA NeMo, Vosk (Kaldi-based)
Audio quality evaluation (PESQ, MOS)

📦 Edge AI & Optimization

Whisper.cpp, Parakeet (on-device STT)
Model quantization for low-memory inference
CPU-optimized real-time pipelines

🚀 Featured Projects

🎤 Voice AI Agents

aura-sde-interview-agent
Real-time voice AI agent for Google SDE interviews
→ Full loop: Behavioural, Coding, System Design, Debugging
→ Built with Gemini Live + Vertex AI
→ Live grading + voice-only interaction

🔊 Speech-to-Text Systems

nemotron-stt (In Progress)
High-concurrency WebSocket ASR server using Nemotron-0.6B
whisper-stt
Production-grade streaming STT using faster-whisper turbo
qwen3-stt
Voice AI agent design + STT experimentation

🌍 Speech Translation

VeloxTx (In Progress)
Multilingual low-latency translation engine
→ “Train anywhere, run anywhere” philosophy
velox-realtime (In Progress)
CPU-optimized real-time ASR (<30ms latency)

📊 Benchmarking & Research

x-benchmark-tests
Benchmarking STT models (accuracy, latency, throughput)
stt-systems-lab (In Progress)
Experimental lab for:
- Nemotron
- Whisper Turbo
- Qwen3
- ONNX / browser inference

📞 Telephony + Voice Infrastructure

freeswitch-speech-ai
Real-time:
- Transcription
- Translation
- Sentiment
- NER
- Call summarization
pjsua2-python
Prebuilt Python bindings for PJSIP (SIP stack)

🧩 Voice AI Platforms & Frameworks

slm_framework_full (In Progress)
Unified framework for:
- STT, TTS, Vocoder
- Small Language Models
- Real-time inference

🛠 Core Tech Stack

🎙️ Speech & AI

STT: Whisper, Deepgram, Azure, Google
TTS: Edge, streaming pipelines
Speech Models: NeMo, Vosk, Kaldi-based
Metrics: WER, PESQ, MOS

⚡ Real-time & Infra

FastAPI, WebSockets, WebRTC
LiveKit, Cloud Run, Vertex AI
Kubernetes, Docker

💻 Programming

Python (primary)
TypeScript / JavaScript
Java

🚀 What I’m Building Next

Real-time AI voice agents with tool-calling
Ultra-low latency speech-to-speech systems
On-device conversational AI (edge-first)
Production-grade multilingual voice pipelines

🤝 Let’s Connect

💼 LinkedIn
📧 srt.2011@outlook.com

⭐ If you're working on Voice AI / Speech Systems / Real-time AI — let’s collaborate!

Pinned Loading

aura-sde-interview-agent aura-sde-interview-agent Public

Real-time voice AI agent for Google SDE mock interviews. Built with Google ADK + Gemini Live native audio on Vertex AI. Full interview loop (Behavioural, Coding, System Design, Debugging, Debrief) …

Python 2
pjsua2-python pjsua2-python Public

Prebuilt cross-platform Python wheels for PJSUA2, version-matched to upstream PJSIP

Python