Skip to content

Directory-based YAML source loading reads hidden Kubernetes mount internals and causes duplicate sources #1368

@serdardalgic

Description

@serdardalgic

Problem definition

Currently, there are 3 ways to provide sources to pgwatch: Postgres URI, a YAML file or a folder of YAML files containing info on which DBs to monitor.

When PGwatch is configured with a directory of YAML source files, it recursively walks the directory and loads all YAML files.

I'm deploying pgwatch on Kubernetes using pgwatch helm chart. I've created a ConfigMap from the user provided YAML files and mounted them inside a directory in pgwatch pod.

The created files inside the pod are:

/tmp/pgwatch-sources $ ls
custom-sources.yaml
/tmp/pgwatch-sources $ ls -al
total 12
drwxrwsrwx    3 root     1000          4096 Apr 15 19:35 .
drwxrwxrwt    1 root     root          4096 Apr 15 19:35 ..
drwxr-sr-x    2 root     1000          4096 Apr 15 19:35 ..2026_04_15_19_35_17.1518305580
lrwxrwxrwx    1 root     1000            32 Apr 15 19:35 ..data -> ..2026_04_15_19_35_17.1518305580
lrwxrwxrwx    1 root     1000            26 Apr 15 19:35 custom-sources.yaml -> ..data/custom-sources.yaml

In Kubernetes, when a ConfigMap is mounted as a directory, hidden internal directories (e.g. ..data and timestamped directories created by the atomic writer) are present inside the mount. PGwatch traverses these hidden directories as well, causing the same YAML files to be read multiple times. The whole atomic writing algorithm is described here.

This results in duplicate source with name '%s' found during validate function call.

To reproduce the issue, you can manually create a configmap and mount it to the pgwatch pod running on Kubernetes. After that, provide this directory as source to pgwatch and check pgwatch logs.
Alternatively, you can deploy the helm-chart mentioned above to your Kubernetes cluster with

helm install pgwatch -n pgwatch --create-namespace \ 
  -f values.yaml \
  --set-file "pgwatch.sources.files.custom-sources\.yaml=custom-sources.yaml" \
  .

and observe.

Possible Solutions

From the user perspective, the YAML files are provided once. Hidden/internal directories created by the underlying filesystem or orchestration problem should not result in duplicate source loading.
From the developer perspective, this is an implementation detail of how Kubernetes mounts ConfigMaps. That's not necessarily a problem that pgwatch should provide a solution.

I'd ask for finding a middle ground: There would be several possible solutions:

  1. Ignoring hidden directories(dirs starting with .) during directory traversal.
  2. Provide a configuration flag to disable traversal of hidden directories. (Users can decide)
  3. (Less preffered) Ignore the ..data symlink and ..timestamp dir during traversal. This one is very Kubernetes specific, probably doesn't fit to pgwatch's nature.

I'd propose 1st or the 2nd option, rather a generic solution than a Kubernetes specific one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    goPull requests that update Go codesourcesWhat sources and in what way to monitor

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions