Wednesday, May 6, 2009

ArenaAllocator complete (Dbspj)

and i created a new DataBuffer2, which has pool-type as template argument.
it was "impossible" to alter DataBuffer to do was simply to hard-wired to ArrayPool. too bad.


so now everything in Dbspj can/will be using the ArenaAllocator...
rationale for ArenaAllocator is really not to have a "quick" release (the code still releases each object reset magic values) but that it's a kind of variable length allocation strategy that provide great locality of data...


also pushed infrastructure for "result-set correlation".
Dbspj will (at least for now) when joining only return rows once so in the case of a 2-way join, a SQL result set will return the rows from the first table one time *for each* match in second table.
Dbspj will not...(but ha_ndbcluster will have to do this...when presenting things for mysqld) and since it will not, there has to be a way to correlate rows from the 2 tables.
this we call "result-set correlation".

Saturday, May 2, 2009

memory allocation in ndb(mt)d

inside the data nodes a number of different techniques for allocating memory is used.
there are 2 main variants
- page allocations
- object allocations (e.i slab allocator)

and for the object allocations there are a number of different variants
- RWPool, "ordinary" slab, only objects of same type on 1 page, free-list per page and object
- WOPool, allocations which are short-lived, no free-list just ref-count
- ArrayPool, fixed array of packed object

and I'm just now introducing a ArenaAllocator (which will be used for spj)


- RWPool, WOPool and ArenaAllocator/Pool are "new", meaning that they are build from start with support for more dynamic memory handling in mind.
- ArrayPool is very fixed, and exists both in the structured form, and various hard-coded variants.


one thing in common for all the "new" allocators is that they use magic numbers per object, so that every time a pointer (once per signal) that validity of the object is checked, and a trap is generated on a invalid memory access.

another thing in common is that they are all used by 1 thread at a time, and only pages are allocated/freed to a global free-list.


- to support variable length data, linked list of fixed size objects are used (DataBuffer)
- for input data (from e.g sockets) a special DataBuffer with additional thread-safety code is used so that data can be passed between threads wo/ copying.


but there is no malloc/free new/delete :-)

Friday, May 1, 2009

distributed pushed-down join - part II

now also support table/index scan as root node, i.e tree = [ scan ] ? [ lookup ] *
still with the limitations:
- a child may only be dependent on immediate parent
- still only left outer join.

i also completed the parameterized scan-filter inside tup but does not yet support this in spj or ndbapi.

currently working on
- the parameterized scan-filer in spj (and a tiny bit in the ndbapi).
- result set correlation

currently thinking on
- arena based allocator for spj-requests, (compare mysqld mem-root concept)

discussed with frazer what spj "really is", concluded that it probably is some kind of "data-flow-engine"...should probably find some hip acronym/term for it...

current mental capacity is limiting spj to only one scan in a tree.
this limitation can of course be lifted...but not now, or my brain will fry.


wonder how i should make people giving me votes on planetmysql??