Problem definition
Currently, there are 3 ways to provide sources to pgwatch: Postgres URI, a YAML file or a folder of YAML files containing info on which DBs to monitor.
When PGwatch is configured with a directory of YAML source files, it recursively walks the directory and loads all YAML files.
I'm deploying pgwatch on Kubernetes using pgwatch helm chart. I've created a ConfigMap from the user provided YAML files and mounted them inside a directory in pgwatch pod.
The created files inside the pod are:
/tmp/pgwatch-sources $ ls
custom-sources.yaml
/tmp/pgwatch-sources $ ls -al
total 12
drwxrwsrwx 3 root 1000 4096 Apr 15 19:35 .
drwxrwxrwt 1 root root 4096 Apr 15 19:35 ..
drwxr-sr-x 2 root 1000 4096 Apr 15 19:35 ..2026_04_15_19_35_17.1518305580
lrwxrwxrwx 1 root 1000 32 Apr 15 19:35 ..data -> ..2026_04_15_19_35_17.1518305580
lrwxrwxrwx 1 root 1000 26 Apr 15 19:35 custom-sources.yaml -> ..data/custom-sources.yaml
In Kubernetes, when a ConfigMap is mounted as a directory, hidden internal directories (e.g. ..data and timestamped directories created by the atomic writer) are present inside the mount. PGwatch traverses these hidden directories as well, causing the same YAML files to be read multiple times. The whole atomic writing algorithm is described here.
This results in duplicate source with name '%s' found during validate function call.
To reproduce the issue, you can manually create a configmap and mount it to the pgwatch pod running on Kubernetes. After that, provide this directory as source to pgwatch and check pgwatch logs.
Alternatively, you can deploy the helm-chart mentioned above to your Kubernetes cluster with
helm install pgwatch -n pgwatch --create-namespace \
-f values.yaml \
--set-file "pgwatch.sources.files.custom-sources\.yaml=custom-sources.yaml" \
.
and observe.
Possible Solutions
From the user perspective, the YAML files are provided once. Hidden/internal directories created by the underlying filesystem or orchestration problem should not result in duplicate source loading.
From the developer perspective, this is an implementation detail of how Kubernetes mounts ConfigMaps. That's not necessarily a problem that pgwatch should provide a solution.
I'd ask for finding a middle ground: There would be several possible solutions:
- Ignoring hidden directories(dirs starting with
.) during directory traversal.
- Provide a configuration flag to disable traversal of hidden directories. (Users can decide)
- (Less preffered) Ignore the
..data symlink and ..timestamp dir during traversal. This one is very Kubernetes specific, probably doesn't fit to pgwatch's nature.
I'd propose 1st or the 2nd option, rather a generic solution than a Kubernetes specific one.
Problem definition
Currently, there are 3 ways to provide sources to pgwatch: Postgres URI, a YAML file or a folder of YAML files containing info on which DBs to monitor.
When PGwatch is configured with a directory of YAML source files, it recursively walks the directory and loads all YAML files.
I'm deploying pgwatch on Kubernetes using pgwatch helm chart. I've created a ConfigMap from the user provided YAML files and mounted them inside a directory in pgwatch pod.
The created files inside the pod are:
In Kubernetes, when a ConfigMap is mounted as a directory, hidden internal directories (e.g. ..data and timestamped directories created by the atomic writer) are present inside the mount. PGwatch traverses these hidden directories as well, causing the same YAML files to be read multiple times. The whole atomic writing algorithm is described here.
This results in
duplicate source with name '%s' foundduring validate function call.To reproduce the issue, you can manually create a configmap and mount it to the pgwatch pod running on Kubernetes. After that, provide this directory as source to pgwatch and check pgwatch logs.
Alternatively, you can deploy the helm-chart mentioned above to your Kubernetes cluster with
and observe.
Possible Solutions
From the user perspective, the YAML files are provided once. Hidden/internal directories created by the underlying filesystem or orchestration problem should not result in duplicate source loading.
From the developer perspective, this is an implementation detail of how Kubernetes mounts ConfigMaps. That's not necessarily a problem that pgwatch should provide a solution.
I'd ask for finding a middle ground: There would be several possible solutions:
.) during directory traversal...datasymlink and..timestampdir during traversal. This one is very Kubernetes specific, probably doesn't fit to pgwatch's nature.I'd propose 1st or the 2nd option, rather a generic solution than a Kubernetes specific one.