Thursday, April 23, 2009

distributed pushed-down join

just managed to run the first join inside the data-nodes.
the query correspondce to

SELECT t1.*, t2.*
FROM T1 t1
LEFT OUTER JOIN T1 t2 on t1.pk = t2.pk
WHERE t1.pk = avalue

- the code inside the data-nodes is general...but incomplete (e.g leaks memory, doesnt handle error correctly...)
- there is no ndbapi and no SQL,the program testing is a hard-coded c-program sending messages using the ndb-cluster wire-protocol
- the result-set is not managed (the c-program just hangs)

Summary:
- there is *many* things left to do, when this will actually hit a release is very uncertain (or if it ever will...)
- this is the coolest thing i implemented in a long long time

Details:
- the code is written so that the query and the parameters are sent separately so that we in the future could have the queries permanently stored in the data nodes.
- the query is:
SELECT t1.?1, t2.?2
FROM T1 t1
LEFT OUTER JOIN T1 t2 on t1.pk = t2.pk
WHERE t1.pk = ?3
i.e both values and projection is parameterized

- the serialized form of this request is 44bytes query and 32 byte parameters
- i could do a N-way (key-lookup) join with aribitrary tables and join-conditions using the code inside the data-node
- code *only* supports left-outer-joins and only lookups
- next step is doing scan+lookup

Hard-core details:

sh> cat mysql-cluster/ndb_1.out.log
DBSPJ: ::build()
DBSPJ: - loop 0 pos: 1
DBSPJ: getOpInfo(1)
DBSPJ: createNode - seize -> ptrI: 57344
DBSPJ: lookup_build: len=5
DBSPJ: attrCnt: 1 -> len: 0
DBSPJ: param len: 4
DBSPJ: attrCnt: 1
DBSPJ: - loop 1 pos: 6
DBSPJ: getOpInfo(1)
DBSPJ: createNode - seize -> ptrI: 57376
DBSPJ: lookup_build: len=5
DBSPJ: attrCnt: 1 -> len: 0
DBSPJ: added 57376 as child of 57344
DBSPJ: param len: 4
DBSPJ: attrCnt: 1
LQHKEYREQ to 6f70002
ClientPtr = H'0000e000 hashValue = H'87c3aa01 tcBlockRef = H'01080002
transId1 = H'00000000 transId2 = H'00000301 savePointId = H'00000000
Op: 0 Lock: 0 Flags: Simple Dirty NoDisk ScanInfo/noFiredTriggers: H'0
AttrLen: 0 (0 in this) KeyLen: 0 TableId: 4 SchemaVer: 1
FragId: 0 ReplicaNo: 0 LastReplica: 0 NextNodeId: 0
ApiRef: H'80000003 ApiOpRef: H'00000008
AttrInfo:
KEYINFO: ptr.i = 7(0xac3ee800) ptr.sz = 1(1)
H'0x00000000
ATTRINFO: ptr.i = 8(0xac3ee900) ptr.sz = 5(5)
H'0x00000000 H'0xffee0000 H'0x01080002 H'0x0000e000 H'0xfff00005
DBSPJ: execTRANSID_AI
execTRANSID_AI: ptr.i = 3(0xac3ee400) ptr.sz = 2(2)
H'0x00000004 H'0x00000000
LQHKEYREQ to 6f70002
ClientPtr = H'0000e020 hashValue = H'87c3aa01 tcBlockRef = H'01080002
transId1 = H'00000000 transId2 = H'00000301 savePointId = H'00000000
Op: 0 Lock: 0 Flags: Simple Dirty ScanInfo/noFiredTriggers: H'0
AttrLen: 0 (0 in this) KeyLen: 0 TableId: 4 SchemaVer: 1
FragId: 0 ReplicaNo: 0 LastReplica: 0 NextNodeId: 0
ApiRef: H'80000003 ApiOpRef: H'00000008
AttrInfo:
KEYINFO: ptr.i = 8(0xac3ee900) ptr.sz = 1(1)
H'0x00000000
ATTRINFO: ptr.i = 9(0xac3eea00) ptr.sz = 1(1)
H'0xfff00005

sh> cat api-signal-log.txt
---- Send ----- Signal ----------------
r.bn: 245 "DBTC", r.proc: 2, gsn: 12 "TCKEYREQ" prio: 1
s.bn: 32768 "API", s.proc: 3, s.sigId: 0 length: 8 trace: 1 #sec: 2 fragInf: 0
apiConnectPtr: H'00000020, apiOperationPtr: H'00000008
Operation: Read, Flags: Dirty Start Execute NoDisk IgnoreError Simple spj
keyLen: 0, attrLen: 0, AI in this: 0, tableId: 4, tableSchemaVer: 1, API Ver: 5
transId(1, 2): (H'00000000, H'00000300)
-- Variable Data --
SECTION 0 type=generic size=1
H'00000000
SECTION 1 type=generic size=19
H'000b0002 H'00050001 H'00000000 H'00000004 H'00000001 H'00000001 H'00050001
H'00000001 H'00000004 H'00000001 H'00000001 H'00040001 H'00000000 H'00000008
H'fff00005 H'00040001 H'00000000 H'00000008 H'fff00005
---- Received - Signal ----------------
r.bn: 2047 "API", r.proc: 3, r.sigId: -1 gsn: 10 "TCKEYCONF" prio: 1
s.bn: 245 "DBTC", s.proc: 2, s.sigId: 241503 length: 9 trace: 1 #sec: 0 fragInf:
0
H'80000005 H'00000000 H'00000000 H'00000001 H'00000000 H'00000300 H'00000008
H'80000002 H'00000000
---- Received - Signal ----------------
r.bn: 2047 "API", r.proc: 3, r.sigId: -1 gsn: 5 "TRANSID_AI" prio: 1
s.bn: 249 "DBTUP", s.proc: 2, s.sigId: 241503 length: 22 trace: 1 #sec: 0 fragIn
f: 0
H'80000007 H'00000008 H'00000000 H'00000301 H'fff30004 H'0000001f H'00000000
H'00000005 H'ef76733c H'deece672 H'ffffffff H'80000007 H'00000008 H'00000000
H'00000301 H'fff30004 H'0000001f H'00000000 H'00000005 H'ef76733c H'deece672
H'ffffffff

3 comments:

Antony said...

Is the code visible anywhere? Stuff like this would be good for FederatedX.

Frazer Clement said...

Cool - looks like you should go on trips more often.

How does the serialized query + parameter size compare to the alternative 2*TCKEYREQ size?

Jonas Oreland said...

antony: as mentioned in the blog,
this poc does not have SQL (in any form), it does in fact not even have a ndbapi. So there is no visible code yet...since it really only applied to ndb.

frazer: good point!
2 * tckeyreq = 120 bytes
spj (including tree) = 132 bytes
spj (wo/ tree) = 84 bytes