Sunday, June 8, 2008

boom-tjackalack! table-reorg is pushed

so...now table-reorg is in 6.4.
pushbuild found a few problems...that are fixed.

what is left:
1) detailed test-prg (which will check consistency after each step, by pausing schema-trans)
2) handling of cluster-crash during reorg
only way right now, is to restore a backup if you get crash during reorg
3) node failure during might cause SUMA to not scan some fragments
(this bug is an old one, existing in 4.1, that also affect unique index build)
4) reorg-abort (in certain state) leaves REORG_MOVED bit on records,
cause subsequent reorgs (to different partitioning) to create inconsistent data.

Not too bad...
I do however think it's quite testable (although maybe not extremely interesting wo/ add node)

Will start on add-node...and fix problems above in parallel

Thursday, June 5, 2008

almost push-time

I've now:
- fixed error handling (although testing is still not 100%)
- pushed the grand unified table state patch
- pushed a few patches in the series...

No one commented asking for a snapshot,
so i decided to push into 6.4 instead.

Will just spend some more time testing/cleaning up...

response to comment with questions

1) Which operations can I perform during a table reorg?
everything except DDL and node restart
ndb does currently only allow one DDL at a time, and the reorg is a DDL
ndb does currently prevent node restart while DDL in ongoing

2) What happens to an ongoing table reorg during
2a) node failure
reorg will be completed or aborted depending on how long it has progressed
(i.e if commit has been started)
2b) cluster failure, and recovery?
reorg will be completed or aborted depending on how long it has progressed
(i.e if commit has been written)

The reorg is committed after rows have been copied, but before rows has been
deleted/cleaned up

3) How do my a) SQL b) NDBAPI applications have to be changed to cope with table reorg?

Not at all, but
- your application can "hint" incorrectly if it does not check table state
and refresh it after reorg has been committed
- your application might encounter temporary errors due to the reorg,
this error is the same that you can get during a node restart, so no special
handling of this is needed.
And hopefully the temporary errors should be rare (testing will show...)

4) How can I trade off the duration of a reorganisation against its resource impact (CPU, Memory, Bandwidth etc.)

Currently you can't. speed is hard-coded. this will maybe be a future feature

5) What performance impact does re-org have on ongoing DML and query operations?

Don't know yet, not enough testing. debug-complied versions that I tested gave maybe 5-10% impact. (there is also another optimization that I want to do...which will reduce the impact)

6) What impact does re-org have on DDL operations?
Ongoing none, cause we only support one at a time.
And the re-org will prevent other DDL from starting while it's running

7) Will there be some easy way to re-org all cluster tables to balance across all available nodes?

write a stored procedure that list all tables, and reorgs them one by one.

8) How are indexes modified during table re-org?
ordered indexes are reorganised together with base table
unique indexes are currently untouched (this should probably change)

9) Which parts of the re-org are serial, and which are parallel?
Same as all other schema-transactions after wl3600.
I.e each operation-step is run parallel on each node,
but only one operation-step is run at a time.

This means that e.g copy and "cleanup" is run in parallel on all nodes.

10) Can I perform an online upgrade to a version of MySQL Cluster that supports re-org?

yes,

11) Can I restore a backup from an old version of MySQL Cluster and get online re-org features?

yes,

12) What are the down sides of this table re-org implementation?

none :-)
but there are some areas for improvement

3) Can re-org cope with heterogeneous NDBD nodes with different DataMemory capacities?

In the kernel, yes, but there is no SQL interface currently to expose this

14) How can I look at hash result to fragment id mapping tables?

Using a hand-written ndbapi program
(maybe will add this to ndb_desc)

---

Puh...
that comment held some may questions...
that i maybe should not be asking for more comments...