Enterprise-Grade Privacy Intelligence System
NDRA-PII is an advanced multi-agent system designed to automatically detect, evaluate, and redact Personally Identifiable Information (PII) from unstructured documents. Unlike simple regex tools, NDRA-PII employs a Neuro-Semantic approach, combining NLP signals with rigorous, logic-based governance policies (NSRL) to determine the risk level of every detected entity before taking action.
- Multi-Agent Architecture: Specialized agents for Extraction, Classification, Fusion, Policy, and Redaction.
- NSRL Policy Engine: Configurable YAML-based rules (e.g., "Redact US SSN if High Severity").
- Smart Redaction: Re-flows documents to strictly mask sensitive data while preserving context.
- Audit Trails: Full decision lineage for compliance (GDPR, HIPAA, PCI-DSS).
- Frozen Production Mode: Defaults to verified end-to-end ingestion paths only, with experimental handlers disabled.
The system now defaults to a "working-set freeze" for predictable production behavior:
FREEZE_WORKING_SYSTEM=true(default)ENABLE_EXPERIMENTAL_INGESTION=false(default)
When frozen, NDRA accepts only verified MIME families and quarantines unsupported/experimental inputs. Experimental ingestion paths (archive recursion, image metadata-only handling, and MSG parsing) are opt-in and remain disabled unless explicitly enabled.
The system operates as a linear pipeline:
- Ingestion Agent: Reads PDF, DOCX, TXT.
- Classifier Agent: Detects PII (Presidio + Spacy + Regex).
- Fusion Agent: Merges overlapping entities (e.g., "New York" vs "New York City").
- Policy Agent: Applies NSRL rules to assign Risk Scores and Actions.
- Redaction Agent: Executes "Redact" actions.
- Output: Generates sanitized documents (
_redacted.pdf).
| Directory | Description |
|---|---|
agents/ |
Core logic for all autonomous agents. |
config/ |
System configuration and architectural constraints. |
nsrl/ |
Neuro-Semantic Rule Language definitions (Rules, Specs). |
schemas/ |
Pydantic data models for strict type safety. |
datasets/ |
Test data for benchmarking. |
tests/ |
Unit and integration test suite. |
reports/ |
Generated implementation and audit reports. |
- Python 3.10+
pip
git clone https://github.com/Tejaswini4119/NDRA-PII.git
cd NDRA
pip install -r requirements.txt
python -m spacy download en_core_web_lgThe most interactive way to use NDRA-PII is the CLI:
python -m ndra_stack.cli- Enter the path to your file (e.g.,
datasets/Testing_Set.pdf). - The system will process the file through all agents.
- The result will be saved in
output/asfilename_redacted.pdf.
Start the server:
python -m ndra_stackPOST to http://localhost:8001/analyze/upload.
ndra_stack/: packaged entrypoints (api.py,cli.py,__main__.py)main.py: legacy API module retained for backward compatibilityndrapiicli.py: legacy CLI module retained for backward compatibilitywebui/: static Web UI served at/ui
- Docker Desktop (or Docker Engine + Compose plugin)
- Ports available:
8001,9090,3000
cp .env.example .envPowerShell:
Copy-Item .env.example .envdocker compose build api
docker compose up -dThe project can be run end-to-end in a single container (API + Web UI + internal Prometheus for native charts):
docker build -t ndra-stack:allinone .
# If your build host has intermittent Docker DNS failures:
docker build --network=host -t ndra-stack:allinone .
docker run --rm -p 8001:8001 -p 9090:9090 \
-v "$PWD/uploads:/app/uploads" \
-v "$PWD/output:/app/output" \
-v "$PWD/quarantine:/app/quarantine" \
-v "$PWD/artifacts:/app/artifacts" \
-v "$PWD/audit:/app/audit" \
--name ndra-stack ndra-stack:allinoneEndpoints:
- API:
http://localhost:8001/ - Web UI:
http://localhost:8001/ui - Metrics:
http://localhost:8001/metrics - Internal Prometheus:
http://localhost:9090/targets
docker compose ps- API health:
http://localhost:8001/ - API metrics:
http://localhost:8001/metrics - Prometheus:
http://localhost:9090/targets - Grafana:
http://localhost:3000/
./uploads->/app/uploads./output->/app/output./quarantine->/app/quarantine./artifacts->/app/artifacts./audit->/app/audit
docker compose downPolicies are defined in nsrl/rules/. Example Rule:
id: "FIN-PCI-CC-HIGH-001"
meta:
name: "Credit Card High Risk"
priority: 100
conditions:
- type: "PII_TYPE"
operator: "EQUALS"
value: "CREDIT_CARD"
actions:
classification: "RESTRICTED"
severity: "CRITICAL"
score: 1.0
justification: "PCI-DSS mandated protection."python -m unittest discover tests- Tejaswini - Lead, Applied Intelligence & Data Semantics Engineer - NDRA Intelligence Systems
- Pardhu Sree Rushi Varma - Core Architecture, Security & Governance Engineer - NDRA Governance Systems
- Rupa Yeshvitha Karedla - Dataset Management & Preprocessing Engineer - NDRA Data-Pipelines
- Jeshwanth Kamutam - Data Collection Engineering & QA Testing Engineer - NDRA Data-Pipelines
- Sathya Sai Kolusu - Technical Documentation Evangelist & Engineering Communications - Technical Communications