isis 2 runtime parameters
DESCRIPTION
Isis 2 Runtime Parameters. Cornell University. Ken Birman. Parameters. Many features of Isis 2 depend on parameters you can modify to “shape” the behavior of the platform. They give you very fine control over behavior of Isis 2 There are three main categories of parameters - PowerPoint PPT PresentationTRANSCRIPT
ISIS2 RUNTIME PARAMETERSKen Birman
1
Cornell University
2
Parameters Many features of Isis2 depend on
parameters you can modify to “shape” the behavior of the platform. They give you very fine control over
behavior of Isis2
There are three main categories of parameters1. Those that determine how the system will
start up2. Those that determine how it sends
messages3. Those that control limits, timeouts and
other bounds
3
What happens when you call IsisSystem.Start()?
Startup Parameters
4
How IsisSystem.Start() works1. The library initializes itself and determines the
IP address of “local host.” If the host has several IP addresses, it picks the last of the IPv4 addresses
2. The system scans the “environment” variables to read values of the parameters. These will override the default values compiled into Isis2
1. In Linux/bash, use “export” to set them, either in .bashrc or in a shell script. Or call setenv(2)
2. In Windows, use the “set” command, or call Environment.SetEnvironmentVariable("something", somevalue);
5
How IsisSystem.Start() works1. Next, the system decides which network
interfaces it should use (all of them, unless you tell it otherwise by setting ISIS_NETWORK_INTERFACES)
1. Do this if you expect to run on machines that have a “production” network and a “management” network
2. Otherwise leave ISIS_NETWORK_INTERFACES alone2. Having done this, it attempts to contact the
ORACLE1. If the ORACLE isn’t found, it restarts the ORACLE2. Otherwise, it asks the ORACLE to let it join the
ISISMEMBERS system group
6
Logging Normally, upon restart, Isis2 creates a log file for
messages printed by the library You can inhibit this by setting ISIS_MUTE=true You can also direct that messages be echoed to the
Debug stream rather than the Console when calling IsisSystem.Start()
If you allow logging and want to write to the log, call IsisSystem.Write() or IsisSystem.WriteLine() Output goes to the log plus to Console, or Debug
stream
7
Fast start: But there can only be one…
For extreme speed, you can tell Isis2 not to hunt for the ORACLE (by specifying an argument to IsisSystem.Start) It will restart instantly. But if you
launch two instances this way, they won’t communicate with one-another.
So… do this only in the first instance that you launch
8
Overwhelming the Membership Oracle
If processes start one by one, no issue….
But what if you try to start 50 at once, or 500?
Oracle
Hello?
Welcome!
Oracle
9
Master/Worker If a system will be big, launching hundreds
of members can overload the ORACLE. Better performance: add many all at the
same time In this case use the Master/Worker pattern Master starts first, collects a list of the workers Workers start after the master and register
with it Then Master can add a batch of workers to the
system, and to any groups that are desired
10
Master: Accumulates workers, tells them what to do static void beMaster(string[] args) { IsisSystem.Start(); Semaphore waitForWorkers = new Semaphore(0,1); bool fullyStaffed = false List<Address> myWorkers = new List<Address>(); IsisSystem.RegisterAsMaster((NewWorker)delegate(Address worker) { lock (myWorkers) if (fullyStaffed) IsisSystem.RejectWorker(worker); else { myWorkers.Add(worker); if(myWorkers.Count() == GOAL) { fullyStaffed = true; waitForWorkers.Release(1); } } }); waitForWorkers.WaitOne(); IsisSystem.BatchStart(myWorkers); // This delays until they have all finished their batch start IsisSystem.WaitForWorkerSetup(myWorkers); Group.MultiJoin(myWorkers, new Group[] { myGroup }); // In front of this next line do whatever you want this application to do IsisSystem.WaitForever(); // If the master shuts down, its workers will too IsisSystem.Shutdown(); }
Accumulate workers
Main thread waits until enough workers have connected, then starts them all at once…… Then adds them all to groups we may want to use
11
RunAsWorker: Let Master run the show static void beWorker(string[] args) { // This next line assumes that argument 0 is the master's Address // You can also use new Address(mastersHost, 0) if you know the host IP // address of the master but don’t know the master’s pid. IsisSystem.RunAsWorker(args[0]); // This line blocks until the master issues the BatchStart() call // Notice that in this one special case we call it AFTER RunAsWorker! IsisSystem.Start(); // Before calling this next line do whatever setup this worker must do: // create your group handles and register callbacks – but don’t call Join // For example, you might call g = new Group(“something”), then call // g.ViewHandlers += myViewHandler; … etc – anything needed to have the // group ready for a Join. But you call SetUp done INSTEAD of g.Join(). IsisSystem.WorkerSetupDone(); // Now, for each group the Master created using a multijoin, you wait // for its first view to be reported. This is one way to do that: foreach (Group g in myGroups) while (!g.HasFirstView) Thread.Sleep(250); // WaitForever would freeze the main thread but if the worker has joined // groups (or gets added to groups by the master using MultiJoin(), the // worker could be quite active, receiving messages, sending them, etc) IsisSystem.WaitForever(); // If the master shuts down the worker will throw an // IsisException("master termination"); // If this next line actually executes, this particular worker will exit // (in effect, this worker is a normal Isis application by now, except that // if the master terminates, it does too. In particular, it can // deliberately chose to leave the system if it wishes to do so IsisSystem.Shutdown(); }
12
Master/Worker Timeline Worker Master
Oracle
IsisSystem.RunAsWorker(mAddress);IsisSystem.Start();
Reached goalIsisSystem.BatchStart(myWorkers)
;
IsisSystem.Start();
. . . Accumulate workers
Group g = new Group(“myGroup”);. . . Attach handlers for g, but don’t call Join
IsisSystem.WorkerSetupDone();
IsisSystem.WaitForever(); Setup done for all workers
IsisSystem.WaitForWorkerSetup(myWorkers);
Group.MultiJoin(myWorkers, new Group[] { myGroup });
IsisSystem.WaitForever();
Group myGroup = new Group(“myGroup”);. . . Attach handlers for myGroup, thenmyGroup.Join();
foreach (Group g in myGroups) while (!g.HasFirstView) Thread.Sleep(250);
New view
13
Why does this help? Workers only send one message to
Master Hence it experiences less load
It adds them all at once, first to the system, then to whatever groups the application will use Hence only one group view needs to be
sent, and it can be sent efficiently, using a broadcast
Overall load is much reduced
14
How to control what internet protocols Isis2 uses
Messaging Parameters
15
IP multicast / ISIS_UNICAST_ONLY Isis2 will broadcast to find the ORACLE unless
you tell it not to do so. Default: OK to use IP multicast, UDP, broadcast ISIS_UNICAST_ONLY: don’t use IP multicast. Still
requires UDP (older ISIS_TCP_ONLY feature was eliminated starting in Isis v2.1)
You must list the machines on which Isis2 ORACLE will run if you put the system in ISIS_UNICAST_ONLY mode. ISIS_HOSTS=“…”
16
Normal versus UNICAST_ONLY With normal IP multicast packets are still sent
directly
With ISIS_UNICAST_ONLY, packets travel on a tree of point-to-point links and must be forwarded, perhaps log2(N) times
IP multicast Unicast tree: power of 2 “reach”
17
ISIS_HOSTS Idea is to list the places where the
ORACLE can run
ISIS_HOSTS=c1.cs.cornell.edu,c2.cs.cornell.edu … orISIS_HOSTS=192.167.54.133,192.167.54.134
Processes running on other machines can join the system but can’t restart it from scratch
18
ISIS_HOSTS: numerical is best! We have seen bugs in the Linux DNS when
accessed from Mono. Sometimes it hangs To avoid this, use fully numerical IP addresses when
you set the values in ISIS_HOSTS Use the IPv4 addresses for the machines on which
you want the ORACLE to run. In this case DNS never hangs
The “ping” and “traceroute” commands are examples of ways you can look these up.
On Windows, string names are fine. On Linux, they work, but don’t put the DNS under heavy load.
19
ISIS_PORTp The system uses two standard IP ports
ISIS_PORTp: for p2p messages ISIS_PORTa: Set to ISIS_PORTp+1, for
acks/nacks
These ports should not be blocked by your firewall On Linux, also check iptables, which is like
a firewall
If two instances of Isis2 use non-overlapping port ranges, they will not notice one-another.
20
ISIS_MAXIPMCADDRS When permitted to use IP multicast, Isis2 tries not
to overuse that feature: ISIS_MCRANGE_LOW: low-end of the IPMC address
range Isis2 should use. By default, CLASSD+5000, where CLASSD is 244.0.0.0/8
ISIS_MCRANGE_HIGH: high-end of the IPMC range ISIS_MAXIPMCADDRS: limit on how many multicast
addresses Isis2 can use, system-wide. It is perfectly reasonable to set this to a small number, like 5 or 10. The system should work if ISIS_MAXIPMCADDRS2.
If ISIS_UNICAST_ONLY is true, then no IPMC addresses are used at all.
21
ISIS_TTL Broadcast and multicast messages are
automatically relayed by routers Each “hop” causes the “time to live” field in
the message to be decremented If the TTL reaches zero, the router drops the
packet Isis2 initializes the TTL value using
ISIS_TTL. You can set this to 0 or 1 to confine the
system to a single segment of your network.
22
ISIS_MAXMSGLEN Automatically adjusted but you can
provide a recommended value if you wish Isis2 will override the value in some
situations Normally not something you would need to
modify
If a message is too large, Isis2 will automatically fragment it and reassemble it prior to delivery
23
These are less often changed
Other limits and timeouts
24
ISIS_DEFAULTTIMEOUT Normally 45secs. OK to reduce if you wish.
Failure detection needs twice this long, hence 90s. This applies if you kill a process “suddenly” (e.g. ^C) or
if the machine on which it was running crashes 45s is very slow, but on cloud computing systems long
delays happen more often than you would expect! On lightly loaded clusters, you can set
ISIS_DEFAULTTIMEOUT much lower, but not less than 2s.
If you design a failure sensing solution of your own, call Isis.ProcessFailed(who) to tell us if a process crashes.
25
Help! I’ve been poisoned! If a process throws this exception, it
means that some other process thought it had failed If a dead process reappears, live members
send it a “you have been poisoned” message
Prevents system partitioning
Rule in Isis2: Only allow a single partition to remain alive at one time. If a partition forms, immediately shut one side down (the side lacking a majority)
26
Speeding up failure detection If a process will exit (rather than crash),
call IsisSystem.Shutdown() first. This rapidly announces the departure and
the process will immediately be removed from groups it belongs to
Like a fast failure notification – as if it said “bye!”
You can also eliminate a group rapidly (without killing its members) using g.Terminate()
27
Hints for EC2 users On EC2 we recommend using ISIS_UNICAST_ONLY
EC2 gives you a “virtual cluster” with nodes numbered from IP address xxx.xxx.xxx.0. You can use this range to set ISIS_HOSTS even before launching your application
If you use the Master/Worker startup mode, you can tell the system the master is at: new Address(xxx.xxx.xxx.0, 0);
This works because the master will run on node xxx.xxx.xxx.0 (due to ISIS_HOSTS) and the pid is ignored in the BeWorker call, so using 0 is fine.
28
How can it be done?
Debugging Isis2 issues
29
Debugging is hard… … debugging distributed systems even harder
Useful tools Visual studio. Keep in mind that even an exception thrown
inside Isis2 could be caused by a mistake in your code. All those upcalls will be issued from Isis2 stacks!
You can call IsisSystem.GetState() to obtain a string representing the state of the Isis system itself. But you’ll need help from Cornell experts to understand this data.
You can call IsisSystem.RunTimeStatsState() to obtain a self-explanatory string with counts of messages sent and received. The data itself is in IsisSystem.RTS, and you can access this at runtime.
30
Suggestions Isis2 is multithreaded. So write thread-
safe code. Don’t block during upcalls from Isis2 into
your code. The library assumes that upcalls will complete quickly and could malfunction otherwise.
Isis2 has a lot of threads. Don’t let this worry you.
We gave you the source code. If you notice a bug, post it to isis2.codeplex.com on the “issues” page
Post questions on the codeplex “discussions” page