cleancloud-io · javvaji-devops · Jun 11, 2026 · Jun 11, 2026 · Jun 11, 2026
@@ -100,7 +100,7 @@ Gaspillage minimum estimé : ~$25 944/mois
 - Détecte le gaspillage IA/ML coûteux : SageMaker, AML, Vertex AI — ressources GPU signalées comme candidats à risque plus élevé (500–23 000 $/mois)
 - Fonctionne sur AWS, Azure et GCP en un seul outil
 - S'exécute entièrement dans votre environnement — aucun agent, pas de SaaS, aucun credential stocké
-- 46 règles de détection sélectives et haut signal, conçues pour éviter les faux positifs en environnements IaC
+- 47 règles de détection sélectives et haut signal, conçues pour éviter les faux positifs en environnements IaC
 - Prêt pour CI/CD — codes de sortie d'application + sorties JSON/CSV/markdown
 
 ### Ce que CleanCloud ne fait PAS
@@ -151,6 +151,7 @@ L'infrastructure IA/ML inactive est la source de gaspillage cloud invisible à l
 | Endpoint SageMaker (GPU) | 500 – 23 000 $ / mois |
 | Instance Notebook SageMaker (GPU) | 500 – 23 000+ $ / mois |
 | Studio Apps SageMaker (KernelGateway/JupyterLab/CodeEditor) | 42 – 1 600+ $ / mois |
+| Domaine SageMaker (stockage EFS inactif) | Charges EFS continues |
 | Training Job SageMaker (job GPU runaway/bloqué) | 670 – 2 360+ $ / jour |
 | Cluster AML Compute Azure (GPU) | 600 – 15 000 $ / mois |
 | Instance de calcul Azure ML (GPU) | 600 – 15 000+ $ / mois |
@@ -165,7 +166,7 @@ L'infrastructure IA/ML inactive est la source de gaspillage cloud invisible à l
 CleanCloud détecte les endpoints à zéro invocation / zéro prédiction, l'activité de contrôle inactive sur les notebooks et apps managés, ainsi que les training jobs managés anormalement longs sur les 3 clouds. Les outils natifs montrent la facture — ils ne nomment pas la ressource concrète à examiner.
 
 ```bash
-cleancloud scan --provider aws --category ai          # PTUs Bedrock + endpoints + notebooks + Studio apps SageMaker + training jobs SageMaker + EC2 GPU
+cleancloud scan --provider aws --category ai          # PTUs Bedrock + endpoints + notebooks + domaines + Studio apps SageMaker + training jobs SageMaker + EC2 GPU
 cleancloud scan --provider azure --category ai        # clusters AML + instances ML + endpoints en ligne + AI Search + PTUs OpenAI
 cleancloud scan --provider gcp --category ai          # endpoints Vertex AI + Workbench + training jobs + Cloud TPU + Feature Stores
 cleancloud scan --provider aws --category all         # hygiène + IA/ML ensemble
@@ -432,7 +433,7 @@ Oui. CleanCloud n'a besoin d'accès réseau qu'aux endpoints API de votre cloud
 
 ## Ce que CleanCloud détecte
 
-46 règles pour AWS, Azure et GCP — conservatrices, haut signal, conçues pour éviter les faux positifs en environnements IaC.
+47 règles pour AWS, Azure et GCP — conservatrices, haut signal, conçues pour éviter les faux positifs en environnements IaC.
 
 **AWS :**
 - Compute : instances arrêtées 30+ jours (charges EBS continuent)
@@ -441,7 +442,7 @@ Oui. CleanCloud n'a besoin d'accès réseau qu'aux endpoints API de votre cloud
 - Plateforme : instances RDS inactives (HIGH)
 - Observabilité : logs CloudWatch à rétention infinie
 - Gouvernance : ressources sans tags, security groups inutilisés
-- IA/ML *(opt-in : `--category ai`)* : Bedrock Provisioned Throughput (Model Units) inactifs avec zéro invocation depuis 7+ jours ; endpoints SageMaker sans trafic `InvokeEndpoint` observé depuis 14+ jours ; instances Notebook SageMaker avec timestamps de contrôle inactifs depuis 14+ jours ; Studio Apps SageMaker (`KernelGateway`/`JupyterLab`/`CodeEditor`) sans signal d'activité récent exploitable depuis 7+ jours ; training jobs SageMaker toujours `InProgress` au-delà du seuil de 24h
+- IA/ML *(opt-in : `--category ai`)* : Bedrock Provisioned Throughput (Model Units) inactifs avec zéro invocation depuis 7+ jours ; endpoints SageMaker sans trafic `InvokeEndpoint` observé depuis 14+ jours ; instances Notebook SageMaker avec timestamps de contrôle inactifs depuis 14+ jours ; Domaines SageMaker sans apps en cours d'exécution sur tous les profils et espaces depuis 30+ jours (coût de stockage EFS continu) ; Studio Apps SageMaker (`KernelGateway`/`JupyterLab`/`CodeEditor`) sans signal d'activité récent exploitable depuis 7+ jours ; training jobs SageMaker toujours `InProgress` au-delà du seuil de 24h
 
 **Azure :**
 - Compute : VMs arrêtées (non désallouées) (HIGH)

@@ -151,6 +151,7 @@ Idle AI/ML infrastructure is the fastest-growing source of invisible cloud spend
 | SageMaker endpoint (GPU) | $500 – $23,000 / month |
 | SageMaker Notebook Instance (GPU) | $500 – $23,000+ / month |
 | SageMaker Studio Apps (KernelGateway/JupyterLab/CodeEditor) | $42 – $1,600+ / month |
+| SageMaker Domain (idle EFS storage) | Continuous EFS charges |
 | SageMaker Training Job (runaway/hung GPU job) | $670 – $2,360+ / day |
 | Azure AML compute cluster (GPU) | $600 – $15,000 / month |
 | Azure ML Compute Instance (GPU) | $600 – $15,000+ / month |
@@ -165,7 +166,7 @@ Idle AI/ML infrastructure is the fastest-growing source of invisible cloud spend
 CleanCloud detects zero-invocation / zero-prediction endpoints, stale managed notebook and app activity, and long-running managed training jobs across all three clouds. Native cost tools show the bill — they do not name the specific resource to review.
 
 ```bash
-cleancloud scan --provider aws --category ai          # Bedrock PTUs + SageMaker endpoints + notebooks + Studio apps + training jobs + idle GPU EC2
+cleancloud scan --provider aws --category ai          # Bedrock PTUs + SageMaker endpoints + notebooks + domains + Studio apps + training jobs + idle GPU EC2
 cleancloud scan --provider azure --category ai        # AML compute + ML instances + online endpoints + AI Search + OpenAI PTUs
 cleancloud scan --provider gcp --category ai          # Vertex AI endpoints + Workbench + training jobs + Cloud TPU + Feature Stores
 cleancloud scan --provider aws --category all         # hygiene + AI/ML together
@@ -432,7 +433,7 @@ Yes. CleanCloud only needs network access to your cloud provider's API endpoints
 
 ## What CleanCloud Detects
 
-46 rules across AWS, Azure, and GCP — conservative, high-signal, designed to avoid false positives in IaC environments.
+47 rules across AWS, Azure, and GCP — conservative, high-signal, designed to avoid false positives in IaC environments.
 
 **AWS:**
 - Compute: stopped instances 30+ days (EBS charges continue)
@@ -441,7 +442,7 @@ Yes. CleanCloud only needs network access to your cloud provider's API endpoints
 - Platform: idle RDS instances (HIGH)
 - Observability: infinite retention CloudWatch Logs
 - Governance: untagged resources, unused security groups
-- AI/ML *(opt-in: `--category ai`)*: idle Bedrock Provisioned Throughput (Model Units) with zero invocations 7+ days; idle SageMaker endpoints with no observed `InvokeEndpoint` traffic 14+ days; SageMaker Notebook Instances with stale control-plane timestamps 14+ days; SageMaker Studio apps (`KernelGateway`/`JupyterLab`/`CodeEditor`) with no usable recent activity signal 7+ days; SageMaker training jobs still `InProgress` beyond the 24h threshold
+- AI/ML *(opt-in: `--category ai`)*: idle Bedrock Provisioned Throughput (Model Units) with zero invocations 7+ days; idle SageMaker endpoints with no observed `InvokeEndpoint` traffic 14+ days; SageMaker Notebook Instances with stale control-plane timestamps 14+ days; SageMaker Domains with no running apps across all user profiles and spaces 30+ days (continuous EFS storage cost); SageMaker Studio apps (`KernelGateway`/`JupyterLab`/`CodeEditor`) with no usable recent activity signal 7+ days; SageMaker training jobs still `InProgress` beyond the 24h threshold
 
 **Azure:**
 - Compute: stopped (not deallocated) VMs (HIGH)

@@ -751,7 +751,30 @@ def run_aws_ai_doctor(profile: Optional[str], region: Optional[str] = None) -> N
         permissions_failed.append(("sagemaker:DescribeNotebookInstance", str(e)))
         warn(f"sagemaker:DescribeNotebookInstance - {e}")
 
-    # --- sagemaker:ListApps (aws.sagemaker.studio_app.idle) ---
+    # --- sagemaker:ListDomains + sagemaker:DescribeDomain (aws.sagemaker.domain.idle) ---
+    try:
+        sagemaker.list_domains(MaxResults=1)
+        permissions_tested.append("sagemaker:ListDomains")
+        success("sagemaker:ListDomains")
+    except Exception as e:
+        permissions_failed.append(("sagemaker:ListDomains", str(e)))
+        warn(f"sagemaker:ListDomains - {e}")
+
+    try:
+        # DescribeDomain — attempt only if a domain exists to avoid a spurious miss
+        _domains = sagemaker.list_domains(MaxResults=1)
+        _domain_list = _domains.get("Domains", [])
+        if _domain_list:
+            sagemaker.describe_domain(DomainId=_domain_list[0]["DomainId"])
+            permissions_tested.append("sagemaker:DescribeDomain")
+            success("sagemaker:DescribeDomain")
+        else:
+            info("sagemaker:DescribeDomain - not tested (no SageMaker domain found to probe)")
+    except Exception as e:
+        permissions_failed.append(("sagemaker:DescribeDomain", str(e)))
+        warn(f"sagemaker:DescribeDomain - {e}")
+
+    # --- sagemaker:ListApps (aws.sagemaker.studio_app.idle + aws.sagemaker.domain.idle) ---
     try:
         sagemaker.list_apps(MaxResults=1)
         permissions_tested.append("sagemaker:ListApps")