📱

Read on Your E-Reader

Thousands of readers get articles like this delivered straight to their Kindle or Boox. New articles arrive automatically.

Learn More

This is a preview. The full article is published at kubernetes.io.

Avoiding Zombie Cluster Members When Upgrading to etcd v3.6

By Benjamin Wang VMware by Broadcom; Josh Berkus Red HatKubernetes Blog

Avoiding Zombie Cluster Members When Upgrading to etcd v3.6 This article is a mirror of an original that was recently published to the official etcd blog . The key takeaway ? Always upgrade to etcd v3.5.26 or later before moving to v3.6. This ensures your cluster is automatically repaired, and avoids zombie members. Issue summary Recently, the etcd community addressed an issue that may appear when users upgrade from v3.5 to v3.6 . This bug can cause the cluster to report "zombie members", which are etcd nodes that were removed from the database cluster some time ago, and are re-appearing and joining database consensus. The etcd cluster is then inoperable until these zombie members are removed. In etcd v3.5 and earlier, the v2store was the source of truth for membership data, even though the v3store was also present. As a part of our v2store deprecation plan , in v3.6 the v3store is the source of truth for cluster membership. Through a bug report we found out that, in some older clusters, v2store and v3store could become inconsistent. This inconsistency manifests after upgrading as seeing old, removed "zombie" cluster members re-appearing in the cluster. The fix and upgrade path We’ve added a mechanism in etcd v3.5.26 to automatically sync v3store from v2store, ensuring that affected clusters are repaired before upgrading to 3.6.x. To support the many users currently upgrading to 3.6, we have provided the following safe upgrade path: Upgrade your cluster to v3.5.26 or later. Wait and confirm that all members are healthy post-update. Upgrade to v3.6. We are unable to provide a safe workaround path for users who have some obstacle preventing updating to v3.5.26. As such, if v3.5.26 is not available from your packaging source or vendor, you should delay upgrading to v3.6 until it is. Additional technical detail Information below is offered for reference only. Users can follow the safe upgrade path without knowledge of the following details. This issue is encountered with clusters that have been running in production on etcd v3.5.25 or earlier. It is a side effect of adding and removing members from the cluster, or recovering the cluster from failure. This means that the issue is more likely the older the etcd cluster is, but it cannot be ruled out for any user regardless of the age of the cluster. etcd maintainers, working with issue reporters, have found three possible triggers for the issue based on symptoms and an analysis of etcd code and logs: Bug in : When restoring a snapshot using etcdctl snapshot restore (v3.4 and old versions) etcdctl snapshot restore , etcdctl was supposed to remove existing members before adding the new ones. In v3.4, due to a bug, old members were not removed, resulting in zombie members. Refer to the comment on etcdctl . : In rare cases, forcibly creating a new single-member cluster did not fully remove old members, leaving zombies. The issue was --force-new-cluster in v3.5 and earlier versions resolved in v3.5.22. Please refer to this PR...

Preview: ~500 words

Continue reading at Kubernetes

Read Full Article

More from Kubernetes Blog

Subscribe to get new articles from this feed on your e-reader.

View feed

This preview is provided for discovery purposes. Read the full article at kubernetes.io. LibSpace is not affiliated with Kubernetes.

Avoiding Zombie Cluster Members When Upgrading to etcd v3.6 | Read on Kindle | LibSpace