OVERVIEW:
What is a cluster?
A cluster is a group of independent computers working together as a single
system to ensure that mission-critical applications and resources are as
highly-available as possible. The group of computers is managed as a single
system, it shares a common namespace, and it is specifically designed to
tolerate component failures. A cluster supports the addition or removal of
components in a way that's transparent to users. Clustered applications have
several advantages: fault-tolerance, high-availability, scalability, simplified
management and support for rolling upgrades as well as other planned maintenance
activities, to name a few.
There are two different types of cluster models in the industry: the shared
device model and the shared nothing model.
In the shared device model, applications running within a cluster can access
any hardware resource connected to any node in the cluster. As a result,
access to the data must be synchronized. In many such implementations, a
special component called a Distributed Lock Manager (DLM) is used for this
purpose. A DLM is a service that manages access to cluster hardware resources.
When multiple applications access the same resource, the DLM resolves any
conflicts that might arise. Along with this sophistication and complexity, a
DLM adds significant overhead to the cluster. Most of the performance loss is
displayed as additional traffic between nodes; however, a performance hit is
also realized due to the loss of serialized access to hardware resources.
By default, Microsoft Cluster Server and the Windows Cluster Service use the
shared nothing model. Because this model does not use a DLM, it does not have
the overhead incurred by using such a service. In the shared nothing model,
only one node can own and access a single hardware resource at any given time.
When failure occurs, a surviving node can take ownership of the failed node's
resources and make them available to users.
While both Microsoft Cluster Server and the Windows Cluster Service support
the shared nothing model, they can use the shared device model, but only if the
clustered application supplies its own DLM.
Why should organizations use clusters?
Generally speaking, hardware failure is not the predominant cause of downtime
for applications. The leading causes of downtime are typically related to
events that are external to the system, such as mis-configuration, power
outages, security breaches, and so forth. Clustering cannot help you solve
those types of problems. In addition, a cluster cannot protect you from
software incompatibilities, corrupt databases, viruses, catastrophes, or
mistakes. Clustering is best implemented when a substantial proportion of your
server downtime is caused by hardware failure, patching, and upgrades. If your
organization’s leading cause of downtime is the result of failures in
administration, software, or infrastructure, an investment in clustering
technology may not reduce your downtime.
You need to assess the reasons for server downtime in your organization. Look
at the problems that clustering solves, and then make a business decision as to
whether clustering is an appropriate solution. The primary focus of clustering
is solving problems that arise from hardware failure, such as a blown CPU, bad
memory, the loss of an entire server, or down time associated with patching and
upgrading. In addition, clustering allows you to continue providing resources
during planned outages that may cause downtime for users. A cluster system can
allow resources to be manually moved—or failed over—to one server while the
other is brought down to perform a rolling upgrade, a configuration change, or
other maintenance.
A rolling upgrade is the process of applying a service pack or other hardware
or software update to each node in the cluster while the other node continues
providing service. Rolling upgrades are typically a series of stages:
- Groups are moved from the node to be upgraded to another node.
- Take the node to be upgraded offline.
- Install or upgrade the software or hardware on the offline node.
- Bring the upgraded node online.
- Move the groups back to the upgraded node.
Then, repeat this process on each node in the cluster until the entire
cluster is upgraded. Rolling upgrades are very attractive from a server
management standpoint because services are only unavailable during the time it
takes to move resources from one node to the other. By design, clusters help
increase uptime. Increased uptime really means reduced downtime. Clustering can
help reduce both planned and unplanned downtime. When any mission critical
system fails, the consequences can include lost revenue, interruption of
services to customers, and knowledge workers unproductively sitting idle. In
organizations of all sizes, failures incur costs in many areas. Hidden costs
often include damage to your reputation among customers, suppliers, and
end-users; and the perception that your organization isn’t able to satisfy
customer needs. Understanding the limitations of clustering is just as important
as understanding the benefits. While clustering protects against the failure of
a node in the cluster, it does not provide any protection against other
problems, such as network failures, database corruption, loss of shared storage,
or disasters.
Before implementing a cluster in your environment, you should evaluate
whether this solution really solves enough of your problems to justify its cost.
Clustering adds complexity to your environment and administration. Therefore, it
is important that you understand and evaluate this technology in relation to
your overall goals and the needs of your network.
Goals
Upon completion of four days of intensive training in Microsoft clustering
technologies, attendees will be able to:
- Identify the best clustering technology to use in various situations
- Identify the best practices for building clusters
- Install and administer clusters
- Troubleshoot cluster problems
- Prepare and perform disaster recovery processes
For more information, please email info@mindsharp.com or call 952-230-6500,
option 2.