Skip to content
View srthorat's full-sized avatar

Block or report srthorat

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
srthorat/README.md

πŸ‘‹ Welcome to My GitHub Profile!

Hi, I'm Sakharam Thorat πŸ‘¨β€πŸ’»
AI Voice Systems Engineer | Real-time Speech AI | Conversational Agents


πŸš€ I specialize in building production-grade real-time voice AI systems β€” from telephony pipelines to speech-to-speech agents.

My work spans across:

  • πŸŽ™οΈ Speech AI (STT, TTS, Speech-to-Speech)
  • ⚑ Low-latency real-time systems
  • πŸ“‘ SIP / VoIP / streaming architectures
  • 🧠 Voice agents & conversational AI
  • 🧩 On-device AI & optimization

🌟 About Me

  • πŸ“ Based in Pune, India
  • πŸ’Ό 10+ years in real-time communication systems (Vonage, Avaya)
  • πŸ”Š Built end-to-end voice AI pipelines (STT β†’ LLM β†’ TTS)
  • ⚑ Focused on low-latency conversational systems (<200ms–2s)
  • 🧠 Exploring AI voice agents, speech models & edge AI

πŸ“§ Reach me at: srt.2011@outlook.com
πŸ”— LinkedIn


🧠 What I Work On

πŸŽ™οΈ Voice AI & Speech Systems

  • Real-time speech-to-speech AI (Azure OpenAI)
  • Multi-provider STT (Deepgram, Google, Azure, Whisper)
  • TTS pipelines (Edge, streaming synthesis)
  • Voice agents with multi-turn conversations

⚑ Real-time Systems & Telephony

  • SIP / RTP / VoIP pipelines
  • WebRTC + WebSocket streaming
  • Codec optimization (G.711, G.729, Opus)
  • Packetization (ptime), jitter & packet loss handling

πŸ§ͺ Speech ML & Audio Processing

  • Noise suppression (RNN-based training on A100)
  • CTC-based ASR experimentation
  • NVIDIA NeMo, Vosk (Kaldi-based)
  • Audio quality evaluation (PESQ, MOS)

πŸ“¦ Edge AI & Optimization

  • Whisper.cpp, Parakeet (on-device STT)
  • Model quantization for low-memory inference
  • CPU-optimized real-time pipelines

πŸš€ Featured Projects

🎀 Voice AI Agents

  • aura-sde-interview-agent
    Real-time voice AI agent for Google SDE interviews
    β†’ Full loop: Behavioural, Coding, System Design, Debugging
    β†’ Built with Gemini Live + Vertex AI
    β†’ Live grading + voice-only interaction

πŸ”Š Speech-to-Text Systems

  • nemotron-stt (In Progress)
    High-concurrency WebSocket ASR server using Nemotron-0.6B

  • whisper-stt
    Production-grade streaming STT using faster-whisper turbo

  • qwen3-stt
    Voice AI agent design + STT experimentation


🌍 Speech Translation

  • VeloxTx (In Progress)
    Multilingual low-latency translation engine
    β†’ β€œTrain anywhere, run anywhere” philosophy

  • velox-realtime (In Progress)
    CPU-optimized real-time ASR (<30ms latency)


πŸ“Š Benchmarking & Research

  • x-benchmark-tests
    Benchmarking STT models (accuracy, latency, throughput)

  • stt-systems-lab (In Progress)
    Experimental lab for:

    • Nemotron
    • Whisper Turbo
    • Qwen3
    • ONNX / browser inference

πŸ“ž Telephony + Voice Infrastructure

  • freeswitch-speech-ai
    Real-time:

    • Transcription
    • Translation
    • Sentiment
    • NER
    • Call summarization
  • pjsua2-python
    Prebuilt Python bindings for PJSIP (SIP stack)


🧩 Voice AI Platforms & Frameworks

  • slm_framework_full (In Progress)
    Unified framework for:
    • STT, TTS, Vocoder
    • Small Language Models
    • Real-time inference

πŸ›  Core Tech Stack

πŸŽ™οΈ Speech & AI

  • STT: Whisper, Deepgram, Azure, Google
  • TTS: Edge, streaming pipelines
  • Speech Models: NeMo, Vosk, Kaldi-based
  • Metrics: WER, PESQ, MOS

⚑ Real-time & Infra

  • FastAPI, WebSockets, WebRTC
  • LiveKit, Cloud Run, Vertex AI
  • Kubernetes, Docker

πŸ’» Programming

  • Python (primary)
  • TypeScript / JavaScript
  • Java

πŸš€ What I’m Building Next

  • Real-time AI voice agents with tool-calling
  • Ultra-low latency speech-to-speech systems
  • On-device conversational AI (edge-first)
  • Production-grade multilingual voice pipelines

🀝 Let’s Connect


⭐ If you're working on Voice AI / Speech Systems / Real-time AI β€” let’s collaborate!

Pinned Loading

  1. aura-sde-interview-agent aura-sde-interview-agent Public

    Real-time voice AI agent for Google SDE mock interviews. Built with Google ADK + Gemini Live native audio on Vertex AI. Full interview loop (Behavioural, Coding, System Design, Debugging, Debrief) …

    Python 2

  2. pjsua2-python pjsua2-python Public

    Prebuilt cross-platform Python wheels for PJSUA2, version-matched to upstream PJSIP

    Python