Increase etcd health check timeout

Under high load, etcd /health check QGET times out occasionally.
This has been observed during IPsec enabled system deployment when
controller-1 is unlocked and drbd is synchronizing. In such cases
the etcd /health check timed out and causes uncontrolled swact.

This change increase the timeout value to 5s.

Test Plan (DX system):
PASS: etcd package build and image build.
PASS: controller-0 successfully installed, bootstrapped and unlocked,
      with IPsec enabled.
PASS: controller-1 successfully installed, IPsec configed and enabled,
      IPsec SAs established between controllers.
PASS: After controller-1 is unlocked, verify there is no uncontrolled
      swact during drbd synchronization, and controller-1 comes up in
      "enabled" and "available" state.

Story: 2010940
Task: 49930

Change-Id: I7ba66599de255c204157de82115a415d5568920d
Signed-off-by: Andy Ning <andy.ning@windriver.com>
This commit is contained in:
Andy Ning 2024-04-23 16:26:54 -04:00
parent 1b0db90e43
commit 9c49fa31bb
2 changed files with 34 additions and 0 deletions

View File

@ -0,0 +1,33 @@
From f909d99825d4364d271d757747ce47c016467e01 Mon Sep 17 00:00:00 2001
From: Andy Ning <andy.ning@windriver.com>
Date: Fri, 19 Apr 2024 11:28:39 -0400
Subject: [PATCH] Increate health check timeout
Under high load, the /health check QGET times out occasionally.
This change increase the timeout value to 5s.
Signed-off-by: Andy Ning <andy.ning@windriver.com>
---
etcdserver/api/etcdhttp/metrics.go | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/etcdserver/api/etcdhttp/metrics.go b/etcdserver/api/etcdhttp/metrics.go
index e5c062e..74eec9a 100644
--- a/etcdserver/api/etcdhttp/metrics.go
+++ b/etcdserver/api/etcdhttp/metrics.go
@@ -134,7 +134,11 @@ func checkHealth(srv etcdserver.ServerV2, excludedAlarms AlarmSet) Health {
}
if h.Health == "true" {
- ctx, cancel := context.WithTimeout(context.Background(), time.Second)
+ time_out := time.Second*5
+ plog.Warningf("/health check; QGET timeout: %d", time_out)
+
+ //ctx, cancel := context.WithTimeout(context.Background(), time.Second)
+ ctx, cancel := context.WithTimeout(context.Background(), time_out)
_, err := srv.Do(ctx, etcdserverpb.Request{Method: "QGET"})
cancel()
if err != nil {
--
2.25.1

View File

@ -0,0 +1 @@
0001-Increate-health-check-timeout.patch