Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,31 @@ DCGM_FI_DEV_MEMORY_TEMP{gpu="0", UUID="GPU-604ac76c-d9cf-fef3-62e9-d92044ab6e52"
To integrate DCGM-Exporter with Prometheus and Grafana, see the full instructions in the [user guide](https://docs.nvidia.com/datacenter/cloud-native/gpu-telemetry/latest/).
`dcgm-exporter` is deployed as part of the GPU Operator. To get started with integrating with Prometheus, check the Operator [user guide](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html#gpu-telemetry).

### Scraping with OpenTelemetry Collector

Because DCGM-Exporter exposes Prometheus-format metrics, it can be scraped by the OpenTelemetry Collector using the Prometheus receiver.

```yaml
receivers:
prometheus:
config:
scrape_configs:
- job_name: dcgm-exporter
static_configs:
- targets: ["localhost:9400"]

exporters:
debug:

service:
pipelines:
metrics:
receivers: [prometheus]
exporters: [debug]
```

In Kubernetes, replace `localhost:9400` with the `dcgm-exporter` service name and port. Replace the `debug` exporter with the exporter for your metrics backend.

### TLS and Basic Auth

Exporter supports TLS and basic auth using [exporter-toolkit](https://github.com/prometheus/exporter-toolkit). To use TLS and/or basic auth, users need to use `--web-config-file` CLI flag as follows
Expand Down