In this series, “MySQL with Diagrams,” I’ll use diagrams to explain internals, architectures, and structures as detailed as possible. In basic terms, here’s how replication works: the transactions are written into a binary log on the source side, carried into the replica, and applied.
The replica’s connection metadata repository contains information that the replication receiver thread needs to connect to the replication source server and retrieve transactions from the source’s binary log. The connection metadata repository is written to the mysql.slave_master_info table.
The replica’s applier metadata repository contains information that the replication applier thread needs to read and apply transactions from the replica’s relay log. The applier metadata repository is written to the mysql.slave_relay_log_info table.
Let’s take a look at the threads.
Binary log dump thread
This thread is responsible for transmitting the binary log events to the connected replica. It can be identified in the output of the SHOW PROCESSLIST command on the source server, where it will be listed as the Binlog Dump thread.
Replication I/O receiver thread
On the replica side, the replica sets up an I/O thread, commonly referred to as the I/O receiver thread. This thread connects to the source and requests any new changes stored in the source’s binary logs. The I/O receiver thread continuously fetches updates and writes them into the relay log on the replica. The I/O thread was talking with the Binary Log Dump thread to reach the changes.
The status of this thread can be checked in the output of SHOW REPLICA STATUS under the field Slave_IO_running.
Replication SQL applier thread
Once updates are written to the relay log, the SQL applier thread on the replica takes over. If the replica_parallel_workers setting is set to 0, a single SQL applier thread is created to execute the transactions from the relay log. However, if replica_parallel_workers is set to a value greater than or equal to 1, multiple worker threads are initiated, along with a coordinator thread. The coordinator assigns transactions from the relay log to the worker threads, which then apply the transactions in parallel.
Applier thread parallel workers must be set carefully because having more parallel threads does not mean that the event will apply faster. Parallel threads’ efficiency can be checked via PERFORMANCE_SCHEMA.
Relay-Log index files: The index file keeps track of the logs, ensuring that the replication process can continue seamlessly even after interruptions.
Binary-Log index files: To keep track of which binary log files have been used, mysqld also creates a binary log index file that contains the names of the binary log files.
Synchronization and durability: The sync_binlog
setting on the replica dictates how often the relay log is flushed to disk. Tuning this setting can help balance performance with data safety. A higher setting provides better durability, ensuring that fewer transactions are lost in case of a failure.
Configuration for Optimal Performance
Several MySQL system variables play a key role in replication performance and stability:
– replica_net_timeout: This variable controls the timeout for the replica’s connection to the source. If the source does not respond within the specified time, the replica will attempt to reconnect. Properly tuning this variable can help maintain stable replication in environments with variable network conditions.
– sync_binlog: On the source server, this setting determines how frequently the binary log is flushed to disk. A value of 1
ensures maximum durability, as the binary log is flushed to disk after every transaction, though at the cost of performance.
– binlog_cache_size: Increasing the binlog_cache_size
can help manage large transactions more efficiently by reducing the need for temporary files, thus optimizing I/O operations during logging.
– replica_preserve_commit_order: Ensures that transaction commit order on the replica matches the primary. This is useful for maintaining strict consistency but may slightly impact performance.
– replica_parallel_workers: Defines the number of threads for parallel replication on a replica. Higher values can improve performance if transactions are independent.
MySQL Performance Tuning is an essential guide covering the critical aspects of MySQL performance optimization.
Very nice initiative! I’m looking forward to the next ones, good job!
Thank you so much for your thoughts Frederic!