Directory-based YAML source loading reads hidden Kubernetes mount internals and causes duplicate sources

### Problem definition
Currently, there are 3 ways to provide sources to pgwatch: Postgres URI, a YAML file or a folder of YAML files containing info on which DBs to monitor.

When PGwatch is configured with a directory of YAML source files, it recursively [walks](https://github.com/cybertec-postgresql/pgwatch/blob/master/internal/sources/yaml.go#L99) the directory and loads all YAML files.

I'm deploying pgwatch on Kubernetes using [pgwatch helm chart](https://github.com/cybertec-postgresql/pgwatch-charts/tree/686c0768ae517b0253656872a1fb263f56f8e4ab/helm/pgwatch). I've created a ConfigMap from the user provided YAML files and mounted them inside a directory in pgwatch pod. 

The created files inside the pod are:
```
/tmp/pgwatch-sources $ ls
custom-sources.yaml
/tmp/pgwatch-sources $ ls -al
total 12
drwxrwsrwx    3 root     1000          4096 Apr 15 19:35 .
drwxrwxrwt    1 root     root          4096 Apr 15 19:35 ..
drwxr-sr-x    2 root     1000          4096 Apr 15 19:35 ..2026_04_15_19_35_17.1518305580
lrwxrwxrwx    1 root     1000            32 Apr 15 19:35 ..data -> ..2026_04_15_19_35_17.1518305580
lrwxrwxrwx    1 root     1000            26 Apr 15 19:35 custom-sources.yaml -> ..data/custom-sources.yaml
```

In Kubernetes, when a ConfigMap is mounted as a directory, hidden internal directories (e.g. ..data and timestamped directories created by the atomic writer) are present inside the mount. PGwatch traverses these hidden directories as well, causing the same YAML files to be read multiple times. The whole atomic writing algorithm is described [here](https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/atomic_writer.go#L86-L138).

This results in `duplicate source with name '%s' found` during [validate](https://github.com/cybertec-postgresql/pgwatch/blob/master/internal/sources/types.go#L65) function call.

To reproduce the issue, you can manually create a configmap and mount it to the pgwatch pod running on Kubernetes. After that, provide this directory as source to pgwatch and check pgwatch logs. 
Alternatively, you can deploy the helm-chart mentioned above to your Kubernetes cluster with 
```
helm install pgwatch -n pgwatch --create-namespace \ 
  -f values.yaml \
  --set-file "pgwatch.sources.files.custom-sources\.yaml=custom-sources.yaml" \
  .
```
and observe. 

### Possible Solutions

From the user perspective, the YAML files are provided once. Hidden/internal directories created by the underlying filesystem or orchestration problem should not result in duplicate source loading.
From the developer perspective, this is an implementation detail of how Kubernetes mounts ConfigMaps. That's not necessarily a problem that pgwatch should provide a solution. 

I'd ask for finding a middle ground: There would be several possible solutions:

1. Ignoring hidden directories(dirs starting with `.`) during directory traversal.  
2. Provide a configuration flag to disable traversal of hidden directories. (Users can decide)  
3. (Less preffered) Ignore the `..data` symlink and `..timestamp` dir during traversal. This one is very Kubernetes specific, probably doesn't fit to pgwatch's nature. 

I'd propose 1st or the 2nd option, rather a generic solution than a Kubernetes specific one. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Directory-based YAML source loading reads hidden Kubernetes mount internals and causes duplicate sources #1368

Problem definition

Possible Solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Directory-based YAML source loading reads hidden Kubernetes mount internals and causes duplicate sources #1368

Description

Problem definition

Possible Solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions