🎙️ Applied Speech Processing & Audio AI in Python

Welcome to the Applied Speech Processing & Audio AI repository. This project-based curriculum is a comprehensive roadmap that bridges the gap between low-level Digital Signal Processing (DSP) and high-level Artificial Intelligence.

Starting from the raw physics of sound waves, this repository guides you through extracting meaningful audio features, building machine learning classifiers, and ultimately deploying advanced modern architectures like Whisper (ASR), Zero-shot TTS, and real-time Voice Assistants.

🗺️ Course Roadmap & Curriculum

🎛️ Phase 1: Signals, Mathematics, & Tools

Lecture 1: Dissecting Sound in Python
- Concepts: Physics of sound, Analog-to-Digital Conversion (ADC), Sample Rate, Bit Depth, and the Nyquist Theorem.
- Lab: Loading audio and plotting waveforms using librosa and matplotlib.
Lecture 2: Decoding Sound with the Fourier Transform
- Concepts: Time vs. Frequency domains, the math behind the Discrete Fourier Transform (DFT).
- Lab: Utilizing np.fft to extract fundamental frequencies and comparing vowels vs. consonants.
Lecture 3: Seeing Sound with Spectrograms
- Concepts: Overcoming Fourier uncertainty, Short-Time Fourier Transform (STFT), and Windowing functions (Hanning/Hamming).
- Lab: Visualizing full audio signals over time and frequency.
Lecture 4: The Audio Fingerprint (MFCC)
- Concepts: The human auditory system, the Mel Scale, and Cepstral analysis.
- Lab: Extracting the 13 Mel-Frequency Cepstral Coefficients (MFCCs) and visualizing them as heatmaps.

🤖 Phase 2: Machine Learning on Audio

Lecture 5: Advanced Feature Extraction for ML
- Concepts: Zero-Crossing Rate (ZCR) and Spectral Centroid.
- Lab: Building structured pandas DataFrames to feed audio features into ML pipelines.
Lecture 6: Project 1 (Part A) - The World of Emotions
- Concepts: Introduction to the RAVDESS dataset.
- Lab: Batch processing 100+ audio files, MFCC extraction loop, data normalization, and train-test splitting.
Lecture 7: Project 1 (Part B) - Voice Lie Detector
- Concepts: Classical ML algorithms for audio classification.
- Lab: Implementing Random Forest and SVM via scikit-learn, evaluating accuracy matrices, and running a live microphone test to detect emotion/stress.

🧠 Phase 3: Modern AI Systems & Applications

Lecture 8: Project 2 - Wake Word Detection
- Concepts: Streaming vs. Static audio processing.
- Lab: Using pyaudio for live audio chunking and implementing a keyword trigger system (e.g., "Hey System").
Lecture 9: Project 3 - Offline Smart Typist (Speech-to-Text)
- Concepts: The evolution of ASR from HMMs to Transformers.
- Lab: Setting up OpenAI Whisper to achieve highly accurate English/Persian podcast transcriptions.
Lecture 10: Project 4 - Speaking AI (Text-to-Speech)
- Concepts: Online vs. Offline TTS engines and modern TTS architectures (e.g., VITS).
- Lab: Implementing rapid online speech with gTTS and offline voices with pyttsx3.

🌐 Phase 4: Advanced Processing & Deep Learning

Lecture 11: Project 5 - Biometric Authentication & Speaker ID
- Concepts: Speaker Embeddings (d-vector/x-vector) and Cosine Similarity math.
- Lab: Utilizing pyannote.audio to build a system that verifies who is speaking.
Lecture 12: Project 6 - Smart Noise Reduction
- Concepts: DSP Low-pass/High-pass filters and Spectral Gating.
- Lab: Magically cleaning background noise (e.g., hairdryer sounds) from audio files using noisereduce without degrading vocal quality.
Lecture 13: Project 7 - Voice Cloning with Deep Learning
- Concepts: Zero-shot TTS algorithms.
- Lab: Deploying the XTTS model on Google Colab to clone a human voice using just a 3-second audio sample.
Lecture 14: Final Project - Building the "Jarvis" Voice Assistant
- Concepts: System architecture integration (Microphone $\rightarrow$ Whisper $\rightarrow$ LLM $\rightarrow$ TTS).
- Lab: Assembling previous modules into a single, comprehensive Python script to create a fully conversational AI assistant. Discussion on freelancing and international market opportunities.

⚙️ Getting Started

Prerequisites

Make sure you have Python 3.8+ installed. It is highly recommended to use a virtual environment (venv or conda).

Installation

Clone the repository:

git clone [https://github.com/your-username/Applied-Speech-Processing-in-Python.git](https://github.com/your-username/Applied-Speech-Processing-in-Python.git)
cd Applied-Speech-Processing-in-Python


2. Install the required dependencies:
```bash
pip install -r requirements.txt

(Note: For Deep Learning models like Whisper and XTTS, you may need to install PyTorch separately based on your CUDA configuration). 3. Launch Jupyter Notebook:

jupyter notebook

👨‍💻 About the Author

Mostafa Kermani Nia | Full-Stack Engineer & Computer Engineering Mindset

I am a Full-Stack Engineer and the Chief Teaching Assistant for Artificial Intelligence at the University of Tehran. My technical philosophy revolves around bridging the gap between low-level systems programming (operating systems, memory management) and high-level Machine Learning architectures.

My ongoing research and professional work focus heavily on Systems for ML, efficient model architectures, and advanced computer vision/audio analysis, specifically concerning noise reduction and speaker identification algorithms.

If you have questions regarding the underlying mathematics, memory-efficient implementations, or scaling these audio models, feel free to open an issue or connect!

📜 License

This repository is licensed under the MIT License. See the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
01-Dissecting-Sound		01-Dissecting-Sound
02-FFT-Audio-Analysis		02-FFT-Audio-Analysis
03-Spectrogram		03-Spectrogram
04-MFCC-Feature-Extraction		04-MFCC-Feature-Extraction
05-ML-Features		05-ML-Features
06-Speech-Emotion-Recognition		06-Speech-Emotion-Recognition
07-Voice-Lie-Detector		07-Voice-Lie-Detector
08-Wake-Word-Detection		08-Wake-Word-Detection
09-Speech-to-Text		09-Speech-to-Text
10-Text-to-Speech		10-Text-to-Speech
11-Speaker-Identification		11-Speaker-Identification
12-Smart-Noise-Reduction		12-Smart-Noise-Reduction
13-Voice-Cloning		13-Voice-Cloning
14-Jarvis-Voice-Assistant		14-Jarvis-Voice-Assistant
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Applied Speech Processing & Audio AI in Python

🗺️ Course Roadmap & Curriculum

🎛️ Phase 1: Signals, Mathematics, & Tools

🤖 Phase 2: Machine Learning on Audio

🧠 Phase 3: Modern AI Systems & Applications

🌐 Phase 4: Advanced Processing & Deep Learning

⚙️ Getting Started

Prerequisites

Installation

👨‍💻 About the Author

📜 License

About

Releases

Packages

Contributors

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ Applied Speech Processing & Audio AI in Python

🗺️ Course Roadmap & Curriculum

🎛️ Phase 1: Signals, Mathematics, & Tools

🤖 Phase 2: Machine Learning on Audio

🧠 Phase 3: Modern AI Systems & Applications

🌐 Phase 4: Advanced Processing & Deep Learning

⚙️ Getting Started

Prerequisites

Installation

👨‍💻 About the Author

📜 License

About

Resources

Stars

Watchers

Forks

Releases

Packages

Contributors

Languages