Tuesday, September 2, 2008

end of think-period

today, I think I finally cracked how to create(drop) a nodegroup.
basic concept is to
- temporary block gcp
- create(drop) the node group
- unblock gcp

(the same concept is btw used for adding a starting node to gcp)
the block should last for micro seconds

now it's only implementing it...


very happy that I now know how to proceed,
I've spent quite a lot of time trying to figure out a 100% safe
way of doing it...(wo/ blocking gcp)
but this solution will be efficient and fairly easy to implement.
(if any protocol dealing with (multi)node-failures can be considered easy)


Frazer Clement said...

Sounds interesting, but since you mention it, can you clarify :
- Why blocking GCP helps make the add/drop atomic / able to cope with failures / simplifies the problem?
- How GCP is blocked / unblocked? (Which presumably bounds the minimum block duration and consequent effects of blocking GCP)
- How API nodes become aware of the new configuration?

Jonas Oreland said...


1) The "biggest" problem with adding a node group is to make sure that a event listener gets to know that the nodegroup is added in a safe way.

The message that the nodegroup is added will be piggy backed on the GCP is complete message.

And it needs to come on the *same* gcp message from all the nodes.
(i.e gcp message with same epoch)

2) GCP is blocked/unblocked in DIH
as part of the schema-transaction to add a node group.

In the normal case, start of a GCP will be delayed by ~3 distributed messages.

- block gcp on all nodes
- tell suma which epoch to start from (on all nodes)
- unblock gcp in all nodes.

The thing that makes this approach easy is that blocking of GCP can quite easily be made (multi)node-fail safe (even in presence of master failures)

3) API nodes needs to be recycled to connect to new nodes. (i.e that is really more related to "add node" than "add nodegroup")

A started API that is connected to newly added nodes will then be informed about the new event "producers" by a special message piggy backed on the normal gcp-message as part of the event stream.

Anonymous said...

Who knows where to download XRumer 5.0 Palladium?
Help, please. All recommend this program to effectively advertise on the Internet, this is the best program!