Monday, October 10, 2011

new features in MySQL Cluster 7.2.1

  • AQL (aka push down join)
    Further improvements and refinements compared to 7.2.0 from April
  • Index statistics
    A long over due feature, that aims to reduce(minimize) need of manual query tuning that previously has been essential for efficient SQL usage with ndb.
  • memcache access support
  • Active-Active replication enhancements
  • Various internal limits has been increased
    - Max row-size now 14k (previously 8k)
    - Max no of columns in table now 512 (previously 128)
  • Rebase to mysql-5.5 (7.2.1 is based on mysql-5.5.15)
  • Improved support for geographically separated cluster
    (note: single cluster...i.e not using asynchronous replication)

Brief introduction to AQL (aka join pushdown)

Basic concept is to evaluate joins down in data-nodes instead(in addition to) of in mysqld.
Ndb will examine query plan created by mysqld, and construct a serialized definition of this join, ship it down to data-nodes.
This join will in the data-nodes be evaluated in parallel (if appropriate), and the result set will be sent back to mysqld using a streaming interface.
Performance gain (latency reduction) is normally in the range of 20x for a 3-way join.

Brief introduction to Index statistics

The index statistics works a lot like Innodb persistent statistics.
When you execute analyze table T, data nodes will scan the indexes of T and produce a histogram of each index.
This histogram is stored in tables in ndb (mysql.ndb_index_stat_head and mysql.ndb_index_stat_sample). The histogram can then be used by any mysqld connected to this cluster. The histogram will not be generated until a new analyze table T is requested.

Brief introduction to Active-Active enhancements

MySQL Cluster has supported active-active asynchronous replication with conflict detection and conflict resolution since 6.3.
In prior version, the schema had to be modified, adding a timestamp column to each table and application has to be modified to maintain this timestamp column.
In this new version, no schema modification is required and no application modification is needed.
In previous version, conflict detection/resolution was performed on row-by-row basis.
In this new version, transaction boundaries are respected.
E.g in a row R is determined to be in conflict, not only this row-change will be resolved,
but entire transaction T that modified the row will be resolved and all transactions depending on the T transitively.
Longer descriptions can be found here and here

Sorry for omitting hex dumps and/or unformatted numbers

Mandatory late update: the join described here has now gained an extra 2x (but this improvement did not make 7.2.1)

3 comments:

wstrucke said...

Jonas,

Really glad to see your first post since May! Awesome news!

7.2 said...

The 7.2 version seems like a great upgrade from the last one. Thanks.

html5 audio player said...

Thanks for sharing your info. I really appreciate your efforts and I will be waiting for your further write ups thanks once again.