Setup
- 2 data-nodes each running on a 4-way Xeon E7420 @ 2.13GHx (total 16 cores) 256Gb RAM
- The mysqld was co-located with one of the data-nodes
- I used memory tables
Results
Queries: link
Query | sf10 | sf100 |
---|---|---|
Q1.1 | 5 | 62 |
Q1.2 | 0.4 | 4.6 |
Q1.3 | 0.1 | 1.1 |
Q2.1 | 9 | 95 |
Q2.2 | 48 | 495 |
Q2.3 | 51 | 517 |
Q3.1 | 47 | 481 |
Q3.2 | 47 | 476 |
Q3.3 | 2 | 20 |
Q3.4 | 2 | 19 |
Q4.1 | 57 | 572 |
Q4.2 | 57 | 574 |
Q4.3 | 12 | 120 |
(time in sec, less time is better)
Facts
- I have done no analysis to why some queries are faster than others
- This type of queries is currently not in focus of our efforts to improve join performance (nor has it been so far)
- for lineorder table I specified max_rows=500000000
- ndb_join_pushdown=on (default in 7.2)
Thoughts and guesses
- sf100 is almost exactly 10x slower than sf10. So I think a reasonable guess is that sf1000 is 10x slower than sf100
- Star schema benchmark is an easy benchmark (compared to e.g TPC-H)
- I think results looks quite good
- My guess is that scaling out (e.g to 4 nodes) would scale well (although I didn't try) (e.g scale out by running 2 ndbmtd on each machine)
- Running high write load in parallel doesnt increase query times signficantly (as tested in presentation page 45-46) which is quite cool (although I didn't try this time)
Load procedure for sf100
- I manually split the 61Gb file (lineorder) into 10 6Gb files (using split(1))
- Then I loaded these 10 in parallel with 10 clients using load data infile
- I disabled indexes while loading using ndb_restore --disable-indexes (from an backup made on empty db containing just schema definition)
- I then rebuilt the indexes using ndb_restore --rebuild-indexes using BuildIndexThreads=15 in config.ini
Load time for sf100
- 4h including rebuilding of indexes
References
Last minute update
- OleJohn reports that Q2.2 and Q2.3 should be 10x faster if forcing a different join order (e.g by using straight_join or force index). But this I haven't tried