Wednesday, October 14, 2009

Dbspj preliminary numbers

So some 5 month later...
- Dbspj has an ndbapi
- Dbspj works enough for simple benchmarks!

Reminder, what is Dbspj:
- It's a new feature for Ndb
- It gives the possibility to push-down linked operations (e.g in SQL terminology: joins)
- It currently only supports left-outer-joins, and only some kinds of joins
- It is currently *not* in anyway integrated with mysqld (for accelerating SQL access)

Anyway so here is the benchmark setup
2 computers
- ndbapi running on one
- 2 datanodes running on other

On images below:
- red is new code, blue is corresponding "current" code
- Y-axis is run-time, so lower is better
- X-axis is "depth", i.e no of tables joined

Note: this is debug-compiled, so the actually absolute numbers are
not that interesting...rather the comparison...



Query 1:
depth 1: select * from T t1, T t2 where T1.pk = constant and T2.pk = T1.pk
depth 2: select * from T t1, T t2, T t3 where T1.pk = constant and T2.pk = T1.pk and T3.pk = T2.pk
etc...




Query 2:
depth 1: select * from T t1, T t2 where T2.pk = T1.pk
depth 2: select * from T t1, T t2, T t3 where T2.pk = T1.pk and T3.pk = T2.pk
etc...

3 comments:

Roland Bouman said...

Hi Jonas,

this is really exciting!

I'm just curious about a few things.

First, you mention that mysqld currently does not know about the feature, and thus does not emit NDBAPI calls to benefit from it. So I take it the SQL queries you mention are just to illustrate the kind of operation that is going on, just not literally?

Second, this query:
"depth 2: select * from T t1, T t2 where T1.pk = constant and T2.pk = T1.pk and T3.pk = T2.pk"

I guess that should have been:

depth 2: select * from T t1, T t2, T t3 where T1.pk = constant and T2.pk = T1.pk and T3.pk = T2.pk

Thirdly, I see that for the first query, that with "linked operations" the graph seems to flatten off at the end, wheras without linked operations time seems to increase linearly. What happes at higher depth?

Finally, how many rows are in the table? How do a full scan of T and a cartesian product of T,T compare to these results?

thanks in advance and kind regards,

Roland

Jonas Oreland said...

Hi Roland,

1) illustrate, yes
2) correction, yes
3) flatten, don't know. But it's debug-compiled, and there can be bugs still...
4) 50 rows in table
(this only affects query 2 though)
5) comparison, don't really understand what you're asking

Roland Bouman said...

Jonas, thanks!

By compare I mean, could you add another series for the same query, but leave out the join condition, or better yet, do the cartesian as a subquery in the from clause and then apply a where. Just to see how that measures up against a supposedly more optimized way of joining.