Saturday, September 21, 2013

MySQL 5.7.2 features enhanced Multi-threaded slave which can be used to apply transactions in parallel even within a single database. Internal details of its working can be found in an earlier post. In this  post we will see how we can configure our replication slave to use this enhancement.

MySQL 5.7.2 has a new system variable  --slave-parallel-type which is dynamic. It can be set to the following values:

1. DATABASE  : (Default) Use the db partitioned MTS (1 worker per database)
2. LOGICAL_CLOCK:  Use logical clock based parallelization mode.

Apart from this the original option of --slave-parallel-workers=N is still valid and it sets that number of workers that we need to spawn. Also since the slave leverages the group of transactions that have committed in parallel on the slave, it makes sense to leave --binlog-max-flush-queue-time=0 which is the default value intact, on the master. This will ensure that the leader thread on the master flushes all the transactions queued in the FLUSH QUEUE of binlog group commit without getting timed out, thereby delivering maximum  parallelization on the slave.

Finally to summarize the steps to set up the enhanced MTS

ON MASTER:
1. start master with --binlog-max-flush-queue-time=0

ON SLAVE:
1.a. Start slave server with --slave-parallel-type=LOGICAL_CLOCK --slave-parallel-workers=N

Or alternatively,

1.b Start the slave server normally. Change the MTS options dynamically using
the following

mysql: STOP SLAVE: --if the slave is running
mysql: SET GLOBAL SLAVE_PARALLEL_TYPE='LOGICAL_CLOCK';
mysql: SET GLOBAL SLAVE_PARALLEL_WORKER=N;
mysql: START SLAVE:


A small Demo:

1. We created 5 tables in a single test database on master and used 5 clients to do inserts on them, in parallel.
2. The slave was configured as --slave-parallel-type="logical_clock"  and --slave-parallel-workers=5.
3. We let the slave replicate from the master and we checked the status of the workers by using
    performance schema tables for replication and show processlist command

Here is the sample output on the slave (click on the image to zoom)

When to use enhanced MTS

Since the slave uses the parallelization information from the master, it performs best when there are multiple clients on the master and there are multiple transactions committing at the same time. In case the master is underloaded, spawning multiple threads may not have effect on the slave performance, and may even lead to performance degradation.

Conclusion

This enhancement is available in MySQL 5.7.2 which can be downloaded from the MySQL download page. So try it out and let us know your valuable feedback.

Introduction

Re-applying binary logs generated from highly concurrent master on the slave has always been an area of focus. It is important for various reasons. First, in real-time systems, it becomes extremely important for the slave to keep up with the master. This can only be guaranteed if the slaves’ performance in reapplying the transactions from the binary log is similar (or at-least comparable) to that of master, which is accepting queries directly from multiple clients. Second, in synchronous replication scenarios, having a fast slaves, aids in reducing the response times as seen by the clients to the master. This can be made possible by applying transactions from the binary log in parallel. However if left uncontrolled, a simple round-robin multi-threaded applying will lead to inconsistency and the slave will no longer be the exact replica of the leader.

The infamous out of order commit problem

The Out of order execution of transaction on the slave if left uncontrolled will lead to the slave diverging from the master. Here is an example: consider two transactions T1 and T2 being applied on an initial state.

On Master we apply T1 and T2 in that order.
State0: x= 1, y= 1
T1: { x:= Read(y);
          x:= x+1;
          Write(x);
          Commit; }
State1: x= 2, y= 1

T2: { y:= Read(x);
          y:=y+1;
          Write(y);
          Commit; }
State2: x= 2, y= 3

On the slave however these two transactions commit out of order (Say T2 and then T1).
State0: x= 1, y= 1
T2: { y:= Read(x);
          y:= y+1;
          Write(y);
          Commit; }
State1: x= 1, y= 2

T1: { x:= Read(y);
          x:=x+1;
          Write(x);
          Commit; }
State2: x= 3, y= 2


As we see above the final state state 2 is different in the two cases. Needless to say that we need to control the transactions that can execute in parallel.

Controlled parallelization

The above problem can be solved by controlling what transactions can be executed in parallel with the ones being executed by the slave. This means we need to have some kind of information in the transactions themselves. Interesting to note that we can use the information of parallelization from the master on the slave. Since we have multiple transactions committing at the same time on the master, we can store the information of the transactions that were in the "process of committing" when this transaction committed. Now let's define the phrase "process of committing".

The process of committing: On the slave we need to make sure that the transactions that we schedule for parallel execution will be the one which do not have conflicting read and write set. This is the only and the necessary requirement for the slave  workers to work without conflicts. This also implies that if the transactions being executed in parallel do not have intersecting read and write sets, we don't care if they are committed out of order. Since MySQL uses lock based scheduling, all the transactions that have entered the prepared stage but not as yet committed will have disjoint read and write sets and hence can be executed in parallel.


Logical clock and commit parent

We have introduced a logical clock. Now before I am tackled by a mathematician from one side and a computer engineer from the other, let me explain. It is a simple counter which is stepped when a binlog group of transaction commits on the master. Essentially this clock is stepped every time the leader execute the flush stage of binlog group commit. The value of this clock is recorded on each transaction when it enters the prepare stage. This recorded value is the "commit parent"

The pseudo code is as follows.

During Prepare
trx.commit_parent= commit_clock.get_timestamp();

During Commit
for every binlog group
  commit_clock.step();

As it is evident by now the transactions with the same commit parent follow our guiding principle of slave side parallelization i.e. transactions that have entered the prepared stage but has not as yet committed, and hence can be executed in parallel.

Schematics of inlog prepare stage and commit parent
In the example we will take up three transactions (T1 T2 T3), two of which have been committed as a part of the same binlog group. T1 enters the prepare stage and get the commit parent as 0 since none of the group have been committed as yet. T1 assigns itself as the leader and then goes on to flush its transaction/statement cache. In the meanwhile transaction T2 enters the prepare stage. It is also assigned the same commit parent "0"(CP as used in the figure) since the commit clock has not as yet been stepped. T2 then goes on a wait for the leader to flush its cache in to the binlog. After the flush has been completed by the leader, it signals T2 to continue and both of them enter the Sync stage, where the leader thread  calls fsync() there by finishing the binlog commit process. The  transaction T3 however enters the prepare stage after the previous group has been synced and there-by ends up getting the next CP.

Another thing to note here is that the "group" of transactions that are being executed in parallel are not bounded by binlog commit group. There is a possibility that a transaction have entered the binlog prepare stage but could not make it to the current binlog group. Our approach takes care of such cases and makes sure that we relax the boundary of the group being executed in parallel on the slave.

On the slave we use the existing infrastructure of DB partitioned MTS to execute the tranactions in parallel, simply by modifying the scheduling logic.

Conclusion

This feature provides the great enhancement to the existing MySQL replication. To know more about the configuration options of this enhancement refer to this post.
This feature is available in MySQL 5.7.2 release. You can try it out and let us know the feedback.