OKD/OpenShift >4.14 - Is it still possible to run cluster/status on the Open vSwitch DB? #2299
Unanswered
glowing-axolotl
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
TL;DR is the
ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northboundcommand still supported in OKD/OCP?As per https://access.redhat.com/articles/6963671 , it may be sometimes necessary to rebuild the OVN databases for some nodes in specific conditions.
When rebuilding OVN on v4.8 - v4.13 you could run the following commands to get an overview of the OVN status:
Which would give an output in the line of:
When rebuilding OVN on 4.14 and later, you are told to verify that the pods are up and running, but the command
cluster/statusisn't used anymore.We can clearly see this removal as it is present in the doc for 4.13:
https://docs.okd.io/4.13/networking/ovn_kubernetes_network_provider/ovn-kubernetes-architecture-assembly.html#nw-ovn-kubernetes-list-database-contents_ovn-kubernetes-architecture
But not in 4.14:
https://docs.okd.io/4.14/networking/ovn_kubernetes_network_provider/ovn-kubernetes-architecture-assembly.html#nw-ovn-kubernetes-list-database-contents_ovn-kubernetes-architecture
It was removed from https://github.com/openshift/openshift-docs/ in commit 02c14ce669f70b55635bdfc6c65caa6736b53933 :
The command isn't explicitly deprecated anywhere, for example it is still used in the upstream documentation of OVN Kubernetes here https://ovn-kubernetes.io/installation/ha/#master2-master3-initialization .
At one point in time, it was even used in https://github.com/openshift/cluster-network-operator for the pod's probes (relevant bug is https://bugzilla.redhat.com/show_bug.cgi?id=1838343 ):
If we try running it on a 4.14 or, in this case, a 4.19 cluster we see that it doesn't work on any of the pods, both "control-plane" as well as ovnkube-node pods on masters:
Taking a look at the ovnkube-node-wk95k pod's probe:
It is a function that under the hood uses
ovn-appctl ... ovsdb-server/sync-status:Running it manually:
And, running it with our cluster/status gives:
From the above we can understand that the syntax of the command is correct. The problem is that the DB has no "cluster/status" command being recognized:
But this makes no sense, as this command is used in upstream OVN Kube and was used just not that long ago.
It is also used in many other resources:
Additionally, the Prometheus metric "ovn_db_cluster_id" was also removed from newer clusters (it was previously used to detect OVN DB partitionings between nodes with a PrometheusAlert using the query
sum(ovn_db_cluster_id{db_name="OVN_Northbound"}) by (cluster_id)).Is there any reason in particular for this change? Is there a way to use the command in modern releases of OKD or is it explicitly disabled? And if yes, is there a specific reason?
Kudos to anyone who can solve this (and if nobody can solve it, I'll just leave it open for reference since I spent some good hours on this and hopefully no one else gets lost in this rabbit hole).
Beta Was this translation helpful? Give feedback.
All reactions