with garbd's suspect_timeout
In openstack-helm-infra, it launch evs.suspect_timeout=PT30S
for mariadb-server in configmap, mariadb-etc. This setting is
for three mariadb-server pod deployment, every mariadb-server
with same setting suspect_timeout=30s. But after change to two
mariadb-server and one garbd arbitrator. Setting in configmap
mariadb-etc evs.suspect_timeout=PT30S, only takes effect for 2
mariadb-server, for garbd arbitrator, it use galera default
setting evs.suspect_timeout=PT5S. If mariadb-server-1 exit
abnormal, after 5s, garbd arbitrator suspects mariadb-server-1
is dead, but as not reach 30s, mariadb-server-0 thinks mariadb-server-1
is not dead. In this state, quorum fail, garbd arbitrator and
mariadb-server-0 both set to none primary component, service
down.
For fix solution, set value.conf.data.config_override to override
wsrep_provider_option in mariadb helm chart, which makes garbd
arbitrator and mariadb-server launch with same setting for
"evs.suspect_timeout=PT5S", default value. By this way, mariadb
server recovery time will also improve. To update setting for
"evs.suspect_timeout", it should both update override for mariadb
and garbd helm chart.
Setting for "gmcast.listen_addr=tcp://0.0.0.0:<port>", takes
effect for both ipv4 and ipv6. So keeps such setting.
Reference link for wsrep option and galera cluster quorum
https://mariadb.com/kb/en/wsrep_provider_options/https://galeracluster.com/library/documentation/weighted-quorum.html
Closes-Bug: 1888546
Change-Id: I92af77fab929c9f598b7dc41543db6ad6238f812
Signed-off-by: Martin, Chen <haochuan.z.chen@intel.com>