You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While restarting the etcd-Server (etcd v3), patroni failed to selected a new etcd Server and shutting down the database until the etcd is running again.
How can we reproduce it (as minimally and precisely as possible)?
I couldn't consistently reproduce it but the best way would be to have the primary and secondary elected the same etcd-Server. The Primary should run multiple transaction. And then restart the etcd-Server (etcd does not need to be the leader).
What did you expect to happen?
If the etcd-Server is not the elected etcd: Nothing
If the etcd-Server is the elected etcd: electing a new etcd-Serer and retrying
Patroni/PostgreSQL/DCS version
Patroni version: 3.2.2 and 3.3.0
PostgreSQL version: 13 and 16
DCS (and its version): etcd 3.3.25 API 3.3
Patroni configuration file
etcd3:
hosts: IP1:2379,IP2:2379,IP3:2379,IP4:2379,IP5:2379,IP6:2379,IP7:2379,IP8:2379,IP9:2379protocol:
username: usernamepassword: passwordrestapi:
listen: "IP10:8008"connect_address: "IP11:8008"certfile: PATH-TO-CERTkeyfile: PATH-TO-KEYFILEauthentication:
authentication:
username: usernamepassword: passwordbootstrap:
# this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster# and all other cluster members will use it as a `global configuration`dcs:
ttl: 30loop_wait: 10retry_timeout: 10maximum_lag_on_failover: 1048576master_start_timeout: 30postgresql:
use_pg_rewind: trueremove_data_directory_on_rewind_failure: falseremove_data_directory_on_diverged_timelines: falseuse_slots: false# The following parameters are given as command line options# overriding the settings in postgresql.conf.parameters:
archive_command: PATH-TO-COMMANDmax_connections: 1100wal_level: logicalhot_standby: "on"max_wal_senders: 50max_replication_slots: 50max_worker_processes: 100wal_log_hints: "on"unix_socket_directories: '/var/run/postgresql/'recovery_conf:
restore_command: PATH-TO-COMMAND# Some possibly desired options for 'initdb'initdb: # Note: It needs to be a list (some options need values, others are switches)
- encoding: UTF8
- data-checksums# # Additional users to be created after initializing the clusterusers:
username: usernamepassword: passwordoptions:
- replication
- loginpostgresql:
# Custom clone method# The options --scope= and --datadir= are passed to the custom script by# patroni and passed on to pg_createcluster by pg_clonecluster_patronicreate_replica_method:
- pgbackrest
- pg_cloneclusterpgbackrest:
command: PATH-TO-COMMANDkeep_data: Trueno_params: Truepg_clonecluster:
command: PATH-TO-COMMANDlisten: "IP10:5433"connect_address: "IP11:5433"use_unix_socket: truedata_dir: /var/lib/postgresql/13/mainbin_dir: /usr/lib/postgresql/13/binconfig_dir: /etc/postgresql/13/mainpgpass: /var/lib/postgresql/13-main.pgpassuse_pg_rewind: trueremove_data_directory_on_rewind_failure: falseremove_data_directory_on_diverged_timelines: falseuse_slots: falseparameters:
archive_command: PATH-TO-COMMANDmax_connections: 1100wal_level: logicalhot_standby: "on"max_wal_senders: 50max_replication_slots: 50max_worker_processes: 100wal_log_hints: "on"unix_socket_directories: '/var/run/postgresql/'recovery_conf:
restore_command: PATH-TO-COMMANDauthentication:
replication:
username: usernamepassword: passwordrewind:
username: usernamepassword: password# A superuser role is required in order for Patroni to manage the local# Postgres instance. If the option `use_unix_socket' is set to `true', then# specifying an empty password results in no md5 password for the superuser# being set and sockets being used for authentication. The `password:' line is# nevertheless required. Note that pg_rewind will not work if no md5 password# is set.superuser:
username: usernamepassword: password# uncomment the below and reload Patroni to set tags (will be commented again by Puppet!)#tags:#nofailover: false#clonefrom: false
Seems to be a culprit, and maybe we can retry in this case, but etcd 3.3.25 is quite old (almost 4 years) and I am pretty sure there have been plenty of bugfixes since then. Therefore I would advice you first to upgrade to v3.5.15 (latest stable version) and check if it helps.
Sorry for the delayed response, I was on vacation. I'm using the newest version of etcd which available in ubuntu 22.04. It will take some time until I'm switching to Ubuntu 24.04 which uses 3.4.30. The error "patroni.dcs.etcd3.Unknown: <Unknown error: 'OK: HTTP status code 200; transport: missing content-type field', code: 2>" comes from gRPC. I will try to check if a newer etcd version will fix this problem (though i might not be able to use it in production). But i do think it makes sense to validate if the response is valid before executing the handle_server_response
I'm using the newest version of etcd which available in ubuntu 22.04. It will take some time until I'm switching to Ubuntu 24.04 which uses 3.4.30.
I am sorry, but this is a really bad excuse. You can always download etcd binaries from GitHub and run them as unprivileged user.
But i do think it makes sense to validate if the response is valid before executing the handle_server_response
Well, the HTTP status code is 200, what indicates success. It is not really clear what exactly Etcd sends in the actual response...
Without knowing the exact response it is not even possible to add an exception.
What happened?
While restarting the etcd-Server (etcd v3), patroni failed to selected a new etcd Server and shutting down the database until the etcd is running again.
How can we reproduce it (as minimally and precisely as possible)?
I couldn't consistently reproduce it but the best way would be to have the primary and secondary elected the same etcd-Server. The Primary should run multiple transaction. And then restart the etcd-Server (etcd does not need to be the leader).
What did you expect to happen?
If the etcd-Server is not the elected etcd: Nothing
If the etcd-Server is the elected etcd: electing a new etcd-Serer and retrying
Patroni/PostgreSQL/DCS version
Patroni configuration file
patronictl show-config
Patroni log files
PostgreSQL log files
Have you tried to use GitHub issue search?
Anything else we need to know?
No response
The text was updated successfully, but these errors were encountered: