Banks, utility companies, and telecommunication providers routinely record customer support calls. These recordings often contain sensitive customer data such as bank routing numbers, account numbers, Social Security Numbers (SSN), and other personally identifiable information (PII). Storing such data without protection raises serious privacy concerns and risks of data breaches.
To address this issue, we have developed Maskify—an automated system that detects and redacts sensitive entities from audio recordings. Maskify ensures that private information is removed from both the transcript and the audio, helping organizations comply with privacy regulations and protect customer data.
The Maskify pipeline follows these steps:
- Audio Input: Accepts audio files (.wav, .mp3, etc.) from customer support calls.
- Audio Preprocessing: Converts audio to 16kHz mono WAV format for consistency.
- Automatic Speech Recognition: Uses Faster-Whisper to transcribe audio and generate word-level timestamps.
- Text Preprocessing & Annotation: Cleans and annotates the transcript using tools like Doccano.
- PII Detection: Applies advanced NER models (DeBERTa or Mistral) to identify PII entities in the text.
- PII Marking & Redaction: Identifies and marks PII words/phrases, then mutes corresponding audio segments based on timestamps.
- Final Output: Produces a redacted transcript (Text/JSON) and a redacted audio file (with PII muted).
- Faster-Whisper: Audio transcription
- DeBERTa & Mistral: Named Entity Recognition (NER)
- Python, Jupyter Notebook: Backend and experimentation
- React, Vite: Interactive web frontend
- CSV/JSONL/XLSX: Data storage and annotation
-
Frontend:
- Vite: Lightning-fast frontend tooling
- React: Component-based UI
- Material UI (MUI): Pre-styled UI components
- Axios: For REST API communication
-
Backend:
- Flask (Python): Lightweight web framework
- Tempfile + Subprocess: File handling and audio conversion
- PIIDetector Class: Core logic for transcription, entity recognition, redaction, and audio editing
-
Python & Jupyter Notebooks: For prototyping and testing
-
CSV / JSON: For storing transcripts, labels, and results
-
Files are converted to standard format automatically
- Users can select:
- DeBERTa (precise span-based detection)
- Unsloth (flexible prompt-based output)
-
DeBERTa: Detects PII spans with confidence scores
-
Unsloth: Extracts PII in "LABEL: VALUE" format via prompting
-
Original transcript
-
Detected entities (with labels)
- Addresses
- Credit Card Numbers
- Social Security Numbers (SSNs)
- Bank Account Numbers
- BANK Routing Numbers
- Phone Numbers
- Name
- All files processed temporarily and deleted after serving results
- No personal data is stored
- Redacted audio/text ensures data privacy even if files are shared