Monday, October 1, 2012

With MySQL 5.6 coming closer to its release, I got a chance to sneak a look into the Row Based Replication (RBR). 5.6 release of MySQL will be a replication features packed one, and it is for this reason I write  another post on the new enhancements in Row Based Replication(RBR). RBR provides a safe way of replicating between master and slave and thats why RBR enhancements become even more important. RBR in 5.6 is far more optimized than what it was before and we hope it will be fun to deploy, maintain and use.

Folowing are some interesting enhancements in RBR in MySQL 5.6 Replication.

1. Optimized row image:

Large data transfer between master and slave while using RBR is a thing of past. Why log complete rows on the master? Why not just the changed colums?  The MySQL 5.6 release enables a DBA to configure the master to log only the changed columns in the row.  The Before Image (BI) will contain the columns for identifying the rows and the After Image (AI) will just contain the column that has changed.
  

Before MySQL 5.6: 

As one can observe in the illustration that the row images were full, i.e. all the columns in the row were logged for both the Before Image(BI) and the After Image AI

In MySQL 5.6:

In MySQL 5.6 the DBA can configure if full or minimal or NOBLOB image of rows should be logged in the binary log in the following ways.

1. start the server with --binlog-row-image=MINIMAL|FULL|NOBLOB
2. SET SESSION|GLOBAL binlog_row_image= MINIMAL|FULL|NOBLOB

In minimal image if there is a Primary Key (PK) or any kind of Unique Index/Key (UK) that can identify a row uniquely on the slave then the BI contains just that column. In the After Image (AI) only the changed columns are logged (See illustration below). 


In case of NOBLOB is used, the BLOB fields will be skipped from logging unless changed. In case  FULL is used the logging will continue as before.

As you can see the amount of data transfer will be reduced by a lot especially if you have used proper indexes/keys for the tables on the master and the slave. A detailed post of this enhancement be found here.

2. Batch operations on table without PK or UK:

I have already mentioned the details of this enhancement in a previous post. This is one of the long awaited optimization when using RBR. This enables the DBA to configure what algorithm will be used  on the slave to find the appropriate row for updating/deleting. We have introduced the concept of HASH_SCAN, in which an in-memory hash is being created on the slave, and provides a way to find and apply rows in a row event in O(n) time.  This enhancement provides a great scope for performance improvement on the slave for the events logged in row format.

3. Informational Log events: 

One of the difficulties DBA have faced till now, while using RBR is that, it is difficult to maintain. The Row images are binary in nature and is not suitable for human "consumption". This is where this enhancement comes handy. There are two ways of using this.

    1. Enabling switch --binlog-rows-query-log-events during the server startup
    2. Changing the corresponding session variable.

Once enabled this will cause the original query for the row event to be logged in the binary log as a special event which the DBA can check using one of the two  methods.

1. One can check these special events in the binary-log by using SHOW BINLOG EVENTS. The query corresponding the the row event will be logged as an event type: Row_query. below is one such example

mysql> SHOW BINLOG EVENTS;
+---------------+-----+-------------+-------+--------+-----------------------------------------------+
| Log_name      | Pos | Event_type  | S..id | End... | Info                                          |
+---------------+-----+-------------+-------+--------+-----------------------------------------------+
| master.000001 |   4 | Format_desc |     1 |    114 | Server ver: 5.6.7-m5-debug-log, Binlog ver: 4 |
| master.000001 | 114 | Query       |     1 |    200 | use `test`; CREATE TABLE NUM_INTS (a INT)     |
| master.000001 | 200 | Query       |     1 |    268 | BEGIN                                         |
| master.000001 | 268 | Rows_query  |     1 |    323 | # INSERT INTO NUM_INTS VALUES (1), (2), (3)   |
| master.000001 | 323 | Table_map   |     1 |    364 | table_id: 54 (test.NUM_INTS)                  |
| master.000001 | 364 | Write_rows  |     1 |    408 | table_id: 54 flags: STMT_END_F                |
| master.000001 | 408 | Query       |     1 |    477 | COMMIT                                        |
+---------------+-----+-------------+-------+--------+-----------------------------------------------+

In the above terminal output (snip-ed), one can see the event at pos= 268 is an informational event for the next Table_map and Write_rows events. 


2. Another way is to use the mysqlbinlog utility with  --vv switch (verbosity level 2 or more).

shell> mysqlbinlog -vv /master.000001
# at 268
#110401 14:24:29 server id 1  end_log_pos 323   Rows_query
# INSERT INTO t1 VALUES (1), (2), (3)
# at 323
#110401 14:24:29 server id 1  end_log_pos 364   Table_map: `test`.`NUM_INTS` mapped to number 54
# at 364
#110401 14:24:29 server id 1  end_log_pos 408   Write_rows: table id 54 flags:STMT_END_F

Conclusion: 

So we see how MySQL-5.6 Replication will provide a lot of features that enables a better, faster, reliable slave that is easy to maintain.  So why not just go try it out?