Consider yourself to be running a big service provider like GOOGLE or Microsoft. You have your server running at your office, suddenly one day, the server crashes for unknown reason or some natural disaster strikes and your office gets demolished (sorry for being so optimistic! :-P). Your services like mail, chat video sharing will stop suddenly and users will not be able to access it until you correct the problem. This may take a very long time but your users cannot wait so long. They will migrate to some other service provider and your business will end up craching down hill. Here is a scenario
. You have a server running some application connected to a storage device where the entire data of your serivces are stored. Now, if the connection between the server and the storage or the connection to the server is cut, imagine the kind of damage your business can suffer. This is where Cluster Computing comes in for the rescue. You have some servers spread across geographically running for your services. If disaster of any form strikes at any of these nodes, in cluster computing, the node fails over some other node. In plain English, the services running on that node will be copied to some other node and the service will continue running almost immediately. So the user will not know about the server crash and he will the service uninterrupted as usual. This powerful concept is called Cluster Computing. Most people confuse it with Grid Computing. In grid, you have more than one computer running as a single computer. Here you have individual computers running individually. If any node crashes, the service will be shifted to some other node. Lets get more technical in cluster computing.
How the Cluster is Formed and Maintained?
Clusters are formed by using some cluster creating tools like OHAC, by writing stuff called agents and distributing your servers. Each node in a cluster gives out some signals similar to hearbeats. So, they are named Heartbeats. The nodes and the cluster tool identifies the other nodes in the cluster by this heartbeat. If a node fails to give out heartbeats, it is considered to have been dead and its services are shifted to some other node.
Components of a Cluster
Now that you know how a cluster works, lets look deeper into them and understand the different components that are essential for a cluster to work.
1. Quorum : You need a minimum of n/2 +1 nodes for a cluster to work.
2. Cluster Infrastructure : The cluster infrastructure is the software which is used to run the cluster like OHAC and also the servers and the storage devices which are all interconnected. Now, the cluster software decides to which node the service running should fail on in case of a disaster.
3. Agents : Usually, we use applications which we use are not meant to be run on multiple computers in the same time. For instance, your real player is not meant for a multi computer usage. These agents take care of this problem and make such software run on multiple computers. These programs act as a communication bridge between the cluster infrastructure and the service. In case of a disaster, they pick up certain parameters from the service application and pass it to the cluster infrastructure which will take care of the rest of the stuff.
4. Application : This is the service which is running on a server. Eg: Mail service. In case of a disaster, the cluster makes sure that this application keeps running safely for the user by failing over some other node.
A Special Case of Failure
Consider this case, your servers are running fine but then suddenly, the network cable for one of your node gets cut but the server which got isolated from your network can still access and write data to your storage device. We don’t want this do we? So to tackle this problem, there was another concept introduced called Data Fencing. By data fencing, you can protect your storage devices to be accessed and changed only by the nodes in your cluster. This Data Fencing is a very big and interesting topic which will be dealt in some future session of GLOSS.
You can get the source code for OHAC here http://www.opensolaris.org/os/community/haclusters/ohac/ Download it, build it and create your own clusters. When you start using OHAC, try to contribute to their project using the same link as above.
I have tried my best to explain the concept of Cluster Computing in a simple way and also I have explained only the most important components of this area. My advice is, not many people in this world are into cluster computing so its better to act fast and make your moves in this field.
My sincere thanks to
Kumar Abhishek (for the session on Cluster Computing and for giving me his presentation slides)
The Photography Wing of GLOSS (trust me, they have did a really good job this time)
and to you! (for reading through my blog :-))