jonas blog: Star schema benchmark on MySQL Cluster 7.2

Tuesday, May 10, 2011

Star schema benchmark on MySQL Cluster 7.2

I decided to try the star schema benchmark on our latest 7.2 release (link). Star schema benchmark is an analytics oriented benchmark, and MySQL Cluster has not been developed to address this kind of workload. Nevertheless I couldn't resist trying...

Setup

2 data-nodes each running on a 4-way Xeon E7420 @ 2.13GHx (total 16 cores) 256Gb RAM
The mysqld was co-located with one of the data-nodes
I used memory tables

Results

Queries: link

Query	sf10	sf100
Q1.1	5	62
Q1.2	0.4	4.6
Q1.3	0.1	1.1
Q2.1	9	95
Q2.2	48	495
Q2.3	51	517
Q3.1	47	481
Q3.2	47	476
Q3.3	2	20
Q3.4	2	19
Q4.1	57	572
Q4.2	57	574
Q4.3	12	120

(time in sec, less time is better)

Facts

I have done no analysis to why some queries are faster than others
This type of queries is currently not in focus of our efforts to improve join performance (nor has it been so far)
for lineorder table I specified max_rows=500000000
ndb_join_pushdown=on (default in 7.2)

Thoughts and guesses

sf100 is almost exactly 10x slower than sf10. So I think a reasonable guess is that sf1000 is 10x slower than sf100
Star schema benchmark is an easy benchmark (compared to e.g TPC-H)
I think results looks quite good
My guess is that scaling out (e.g to 4 nodes) would scale well (although I didn't try) (e.g scale out by running 2 ndbmtd on each machine)
Running high write load in parallel doesnt increase query times signficantly (as tested in presentation page 45-46) which is quite cool (although I didn't try this time)

Load procedure for sf100

I manually split the 61Gb file (lineorder) into 10 6Gb files (using split(1))
Then I loaded these 10 in parallel with 10 clients using load data infile
I disabled indexes while loading using ndb_restore --disable-indexes (from an backup made on empty db containing just schema definition)
I then rebuilt the indexes using ndb_restore --rebuild-indexes using BuildIndexThreads=15 in config.ini

(for sf10 i did nothing special)

Load time for sf100

4h including rebuilding of indexes

References

Blog where percona tries SSB

Last minute update

OleJohn reports that Q2.2 and Q2.3 should be 10x faster if forcing a different join order (e.g by using straight_join or force index). But this I haven't tried

5 comments:

Matty said...: While not designed for data warehousing type workloads, these results demonstrate the possibility of using Cluster into real time analytics workloads on sub 2TB data sets.

Recommendation engines, funnel analysis, etc are all examples of web workloads that could fit

Great job !; May 10, 2011 at 8:15 AM
rpbouman said...: Hi Jonas!

thanks for sharing this!

How many replicas are you using?; May 10, 2011 at 2:48 PM
Jonas Oreland said...: 2 replicas; May 10, 2011 at 4:34 PM
NRI LIFE said...: It’s an amazing blog regarding the topic "Star schema benchmark on MySQL Cluster 7.2".
I’m here to share with you a tip to Expand your Business @ Zero Cost! Post Free Ads For B2B & B2C Needs at nrilife.com!; April 26, 2021 at 7:59 PM
Advatix Logistic said...: A really great impressive article blog. Thanks for posting. Advatix is an outstanding team of Supply Chain Experts with remarkable experience in supply chain and Logistics Design, Planning, and Management to transform businesses. Advatix Logistic is the best logistics company having years of experience it provides the best logistics services. It is the best logistics services provider.
Logistics Expert
Supply Chain Experts
ecommerce customer support; May 31, 2021 at 3:21 PM