Document header start
Linux High Availability
How-To for Mitel SME v5.1.2
Version : 1.2.2
Status : Public release
Initial
release : DRAFT
Last
revision :
Next update : After feed-back from you guys
Initiator : Hsing-Foo Wang (hsing-foo.wang@star-support.com)
Tested
with : MITEL SME Server version 5.1.2
http://www.e-smith.org/
Available
at : (Primary) http://www.star-support.com/sme/Linux-HA/SME%20High%20Availability%20How-To.html
: (mirrors) http://www.northwestlinux.co.uk/sme/highavailability.html
Contributors
: Robert Heaton (rob@northwestlinux.co.uk), Steve Bovingdon (steve@bov.nu)
: Kees Blokland (kees@blokland.net), Paul
Miller (pmiller@innercite.com)
Latest changes : see at the end of this document.
Comments : see at the end of this document.
!Always check for the latest available How-to at www.star-support.com!
Copyright ? 2002 Hsing-Foo Wang (hsing-foo.wang@star-support.com).
This document may be freely redistributed as long as this document header remains intact.
Document header end
*** FOR TESTING PURPOSES ONLY. DO NOT USE ON PRODUCTION SERVERS ***
Note: This how-to is not finished yet, but I thought this first public release maybe useful to get some feedback. I would be interested in your results so please drop me an e-mail (hsing-foo@star-support.com)
To Do:
- Complete
review and restyle of the How-To!
- Install, configure MON services
monitoring tool for use with heartbeat
- Configure HA for different file systems (e.g. NFS)
- Create custom templates for services to serve the clusters IP address
-
Issue: You would like to create High Availability for your Mitel SME server environment.
Solution: Follow the steps below?
Disclaimer: It's all at your own risk and responsibility.
Mailing-List
There is a mailing-list that is especially setup to discuss the SME HA issue. To catch up with the latest, I advise you to subscribe.
The address for posting is smeha@bov.nu
For help and a description of available commands, send a message to: smeha-help@bov.nu
To subscribe to the list, send a message to: smeha-subscribe@bov.nu
(Thanks to Steve Bovingdon (steve@bov.nu) who has been so kind to set this up.)
Feedack
I love to
hear from you with any results or comments. Please subscribe to the mailing
lis and post it there
Introduction
High Availability can be achieved in many ways. The question is how 'high' do you want it...
Now I?m not going to discuss high availability and how it should be achieved. If you would like to discuss it go to http://www.linux-ha.org/ I?m just trying to get something working.
This How-To describes the ?poor man?s? High Availability. Instead of using an external shared SCSI storage cabinet or NAS, we will use cheap internal IDE drives.
All HA-Availability software used is Open Source software, and you have to accept all the individual licenses or ?terms of use? agreements
This How-To describes the following 'High Availability' and is intended to get you started. For now it?s intended for SME in server-only mode and focused on data redundancy. Complete system redundancy should be possible, but that is the next thing (unless you did it and are willing to share / contribute)
? Automatic Fail-over and Fail-back between 2 nodes (2 SME servers)
? Real-time replication of user data to both nodes (LAN RAID-1)
? On-line uncompressed user accessible backup of user data every 24 hours
*(not yet done at this moment, but should be relatively easy with Unison or Rsync)
So basically were doing 2 things here, HA and On-Line Backup.
?Human? understandable explanation:
2 servers are presented to the user as only 1 server with 1 virtual IP address.
If the first (primary) server fails, the 2nd server will take over it's tasks and will have all actual user data including open files. You can now safely 'repair' the defective server whilst the network and user data is still available. Besides a nightly tape back-up, user-data is mirrored to a back-up location (ibay) once every 24 hours so users can correct a ?lost document? within a time-frame of 24 hours themselves.
Both servers can be placed 80 meters apart from each other if we use UTP as heartbeat link. In case of fire or if you want to place the servers in different buildings. If they are next to each other, we will use the serial link and UTP. Just remember to use different power-outlet groups J
Advantages
? High Availability against relatively low investment
? Less down-time
? Less unproductive employees
? Less loss of work
? Less yelling bosses
? Less stress
?
Did you ever waited for your supplier to correct your
problem?
- Be there in the first place
- Have a qualified engineer
available
- Have spare parts if needed
- Re-install the OS if needed
-
Restore the latest back-up
(If they can manage the problem in 1 day?.)
This How-To is intended to reduce productivity loss in case of server failure.
Requirements
? Knowledge of MITEL SME (see http://www.e-smith.org/docs/manual/
) and ability in customizing
it.
(see http://www.e-smith.org/custom/)
?
2 hardware servers each
with 2 IDE hard disks (1 system and 1 data), 2 NICS and at least 1 free serial
port (data drives must have same capacity but do not need to be of the same
brand) Make sure you?re hardware is on the compatibility list use with SME. (Red
Hat 7.0)
(http://www.redhat.com/docs/manuals/linux/RHL-7-Manual/install-guide/s1-steps-hardware.html)
? 1 Null modem cable
? 1 cross-over Ethernet cable (UTP/RJ45)
? 1 hub/or switch (or your existing network :-) and at least 1 client attached to the ?network? for testing purposes
? SME V5.1.2 software (http://www.e-smith.org/downloads/)
? Good spirit, faith, coffee and a big cigar if all works fine?.
? Download the following rpm's/files and save them in separate directories like ?/install/ha, /install/unison, install/drbd. (also available here, also http://www.rpmfind.net/ is a good resource)
Linux-HA software
1.
http://www.ultramonkey.org/download/1.0.2/RPMS/perl-Net-SSLeay-1.05-5.i386.rpm
2.
http://www.ultramonkey.org/download/1.0.2/RPMS/ipvsadm-1.14-1.i386.rpm
3.
http://www.linux-ha.org/download/heartbeat-0.4.9.1-1.i386.rpm
4.
http://www.linux-ha.org/download/heartbeat-ldirectord-0.4.9.1-1.i386.rpm
Unison and ssh-panel software
5. http://www.ifost.org.au/~peterw/unison-2.7.7-2.i386.rpm
7. http://www.ifost.org.au/~peterw/ssh-keys-0.1.1-1.noarch.rpm
6. http://www.ifost.org.au/~peterw/unison-manager-0.2-1.noarch.rpm
8. ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/tk-8.3.1-53.i386.rpm
9. ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/tcl-8.3.1-53.i386.rpm
10. ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/XFree86-libs-4.0.3-5.i386.rpm
11. ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/expect-5.31-53.i386.rpm
(Peter from Ifost is busy with Linux-HA too based upon Unison and his contributions to SME, more to come?..)
LAN RAID-1 software (pre-compiled modules)
12. http://www.star-support.com/downloads/mitel/contrib/Linux-HA/drdb/drbd.zip
The LAN RAID-1 software has to be compiled. But to save you the trouble from installing compilation
software on your production SME server, I pre-compiled the modules for your convenience. Of course
you can compile them yourself. Read about it at http://www.complang.tuwien.ac.at/reisner/drbd/
The version used in this case is v5.8.1 which is the latest stable version at the time of writing.
Useful Tools
20: http://myezserver.com/downloads/mitel/contrib/service-control-0.0.1/
Documentation / references:
HA software: http://www.linux-ha.org
LAN RAID-1 Software: http://www.complang.tuwien.ac.at/reisner/drbd/
Unison, SSH panel software: http://www.ifost.org.au/~peterw/
E-smith forums: http://www.e-smith.org/bboard/list.php?f=3
Ultramonkey:
http://ultramonkey.sourceforge.net/
Test environment:
I will use the following names and ip's for the test environment:
? Node 1 is called 'goliath' with eth0 ip 192.168.1.201 and eth1 ip 10.0.0.1
? Node 2 is called 'david' with eth0 ip 192.168.1.202 and eth1 ip 10.0.0.2
? The cluster-server netbios name is 'The Rock' description ?High Available Server? with ip 192.168.1.210
Note: Don?t fill in the 192.168.1.210 address anywhere else besides the heartbeat config file. It?s a virtual ip number.
Text in purple = content of a (config) file
Text in green = console command as root
This How-To contains the following sections:
A. Initial Setup hardware/software
B. Installing and configuring
C. Creating on-line backup (not yet finished)
To end section B it can take a couple of hours for this How-To doesn?t contain any magic?. J
Ps. Where possible, I performed the steps on goliath completely, then david.
.
Let the game begin...
SECTION A (Initial Setup hardware/software)
Step 1.
?
Setup both your servers
(*) incl 2 NIC?s but with
only 1 hard disk (system drive) installed/active.
(* means setting up hardware only,
softwareinstallation will follow below)
(I disabled
the second IDE channel in the bios to disable the second hard disk at install
time).
I used a
small drive for just the server software and config files (10Gb)and used a larger drive as data drive (40Gb). SO we have the
system drive and the data drive.
? Connect the system drive on IDE-1 as master and (after initial SME install) the data drive on IDE-2 as master.
? Do not connect both servers to the network at this time!
(you could change the boot diskette to prevent the install process from claiming all hard drive space , but I don?t know how to do this and the above way works simple and fast)
Step 2.
To make sure that the users only see 1 server on the network (The_Rock), we have to make samba (workgroup) the same on both servers.
? Install SME version 5.1.2 as server only with the above mentioned details (david, goliath)
? Create ibay ?data? group=everyone, Read=Group/Write =group, no password
? Create ibay ?backup? group=admin, Read=everyone/Write =group, no password
? Change the workgroup setting so that Windows Workgroup =The_Rock.
? Access the remote access panel in the server manager and enable ssh connections. (both servers)
? Enable allow administrative command line access
(to remotely access your servers see: http://www.chiark.greenend.org.uk/~sgtatham/putty/)
Step 3.
Since we installed the server in server-only mode, we only have 1 configured NIC (eth0). So we have to add the Second NIC by hand so?..
(TIP: You can cut and paste commands into the console by pressing the right mouse button on the console. )
?
Figure out which module
from ?/lib/modules/2.2.19-7.0.8/net? you need to use with your
NIC
- You can use
kudzu to determined your 2nd NIC (enter: kudzu
--help for more
details)
- Or you could set the server in
server/gateway mode to see which card is discovered. Don?t forget to set the server back in server-mode only. (thanks Kees :-) )
?
Bind the module to the
NIC (eth1) by executing on the console:
/sbin/e-smith/config
set EthernetDriver2 rtl8139
(which is rtl8139.o in
/lib/modules/2.2.19-7.0.8/net, if you make a mistake in your type of card, just
re-enter the command with the right module name)
?
Create a custom
template for the new NIC:
mkdir
-p /etc/e-smith/templates-custom/etc/sysconfig/network-scripts/
ifcfg-eth1
and copy
?/etc/sysconfig/network-scripts/ifcfg-eth0? to it as
ifcfg-eth1
cp
/etc/sysconfig/network-scripts/ifcfg-eth0
/etc/e-smith/templates-custom/etc/sysconfig/network-scripts/ifcfg-eth1/ifcfg-eth1
?
Enter the newly created
directory and change the new config file of eth1 with the correct private net
values for goliath and david (below config is for goliath)
DEVICE=eth1
USERCTL=no
ONBOOT=yes
BOOTPROTO=none
IPADDR=10.0.0.1
NETMASK=255.255.255.0
NETWORK=10.0.0.0
BROADCAST=10.0.0.255
?
Expand the new template
by executing:
/sbin/e-smith/expand-template
/etc/sysconfig/network-scripts/ifcfg-eth1
?
copy
/etc/e-smith/events/actions/conf-ethernet to
/etc/e-smith/events/actions/conf-ethernetx
cp
/etc/e-smith/events/actions/conf-ethernet to
/etc/e-smith/events/actions/conf-ethernetx
? Edit the ethernetx.conf to look like the following and save it:
#!/usr/bin/perl -w
#----------------------------------------------------------------------
#
copyright (C) 1999-2001 e-smith, inc.
# cut to save
space?
# Please visit our web site www.e-smith.com for
details.
#----------------------------------------------------------------------
package esmith;
use strict;
use
Errno;
use esmith::config;
use esmith::util;
my %conf;
tie
%conf, 'esmith::config';
#------------------------------------------------------------
#
Update ethernet interface files. We always have at
least
# one ethernet adapter (eth0) for the
internal network. If
# running in server and gateway mode with a dedicated
connection,
# a second adapter (eth1) is used for
the external connection.
# If running in in HA mode, then we need at least 2 or maybe even
# 3
adapters. This could be a way to add
them.
#------------------------------------------------------------
unlink ("/etc/pump.conf");
#unlink
("/etc/sysconfig/network-scripts/ifcfg-eth0");
unlink
("/etc/sysconfig/network-scripts/ifcfg-eth1");
unlink
("/etc/sysconfig/network-scripts/ifcfg-eth2");
unlink
("/etc/sysconfig/network-scripts/ifcfg-lo:0");
if (($conf{'SystemHA'} =~ /highavailable/)
&& ($conf{'AccessType'} eq
"dedicated"))
{
defined $conf{'EthernetDriver2'}
&& ($conf{'EthernetDriver2'} ne
"unknown")
&& esmith::util::processTemplate
(\%conf,
"/etc/sysconfig/network-scripts/ifcfg-eth1");
# just
in case we have 3 cards in the system
# it's no use yet, but at least
it's there when we need it
defined $conf{'EthernetDriver3'}
&& ($conf{'EthernetDriver3'} ne
"unknown")
&& esmith::util::processTemplate
(\%conf,
"/etc/sysconfig/network-scripts/ifcfg-eth2");
}
exit
(0);
?
We now have to create a
new action so our new card will be recognized while maintaining compatibility
with future system updates
cd /etc/e-smith/events/console-save
ln -s ../actions/conf-ethernetx
S36conf-ethernetx
? In order to use this new version, you will have to create a new entry in the config database:
/sbin/e-smith/config set SystemHA highavailable
?
Integrate and
finalizing the new settings within the MITEL SME configuration:
/sbin/e-smith/signal-event
console-save
?
Restart the network to
activate the new NIC by executing ?etc/rc.d/init.d/network restart? and check
the presence of the new 2nd NIC by entering ?ifconfig? if all went well, it?s
there?.
(you could add more nic?s this way for other
purposes)
? Connect the cross-over cable from the 2nd NIC with the 2nd NIC on the other server (net 10.0.0.0)
?
All is correct if you
can ping from goliath to ip 10.0.0.2 (2nd NIC on david) and 192.168.1.202
and vise versa from david to
goliath.
? While we?re here we also connect the null-modem cable (if the servers are placed together otherwise we will use the cross-over connection (UTP) as heartbeat link (are you alive?)
?
Connect the serial
null-modem cable to the serial ports on each server
ttyS0=com1, ttyS1=com2,
so on which port you are connecting to. In our case they are both connected to
ttyS0. The ports don?t have to be the same on both servers (maybe you have a
UPS) but you have to know just on which one you connected the link
to.
?
On the goliath console you type:
cat
</dev/ttyS0
On
the david console you type: echo
hello >/dev/ttyS0
? You should see ?hello? appear on the goliath console. Do the same in reverse order (that is change roles) to double check. (Hit ctrl-c to stop goliath from listening first)
Step 4
Now we?re
ready to install the second Hard disk. Power down the servers and physically
install the drive on the second IDE channel as
You can also use a drive with data on them. Just connect it like explained above, and continue with preparing the mount point below.
New Drive without existing data:
We will only make 1 partition for the whole drive although more are possible. We?re going to use fdisk to partition and prepare the new drive so type fdisk /dev/hdc on the console.
? delete all current partitions with option d followed by the partition number.
? create a new partition with the n option, followed by option p followed by option 1
? write the partition information by option w
? create a the filesystem on your new partition mkfs ?ext2 /dev/hdc1
Preparing the mount point
We will use /home/e-smith/files/ibays/data/files as the mount point for our drive. To prepare it cd to the directory /home/e-smith/files/ibays/data
enter: chown root.shared files
enter: chmod 0775 files (or different permissions according to your wishes)
The drive is now ready for use with our HA system. We will integrate and mount it later at step 8.
SECTION B
Now we?re going to install SSH keys and Unison. We need the following downloaded rpm?s:
http://www.ifost.org.au/~peterw/unison-2.7.7-2.i386.rpm
http://www.ifost.org.au/~peterw/ssh-keys-0.1.1-1.noarch.rpm
http://www.ifost.org.au/~peterw/unison-manager-0.2-1.noarch.rpm
These are needed to satisfy dependencies:
ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/tk-8.3.1-53.i386.rpm
ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/tcl-8.3.1-53.i386.rpm
ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/XFree86-libs-4.0.3-5.i386.rpm
ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/expect-5.31-53.i386.rpm
(For now you have to get these, someone has to figure out if we really need all dependencies)
(put these packages in a separate directory and cd to it)
I chose Unison and Peter?s contributions for it contains 2 things; SSH key installation which is needed for a trusted login from 1 server to server 2 is made easy by the ssh-keys panel. And the Unison program that synchronizes data between 2 ibays which is used for the 24-hour on-line backup. You could also use rsync the main difference is that Rsync has 1 master source which is mirrored to other sources and Unison has 2 master sources which are compared and synchronized. It?s really up to you which one you prefer but in this case I use Unison.
Install the packages with rpm ?Uvh * and the packages will be installed (make sure you have them in a separate directory)
If you do it manually unpack them in the in the following order
- XFree86-libs
- tcl-8.3.1
- tk-8.3.1
- Expect-5.31
- ssh-keys
- unison-2.7.7-2
- unison-manager
Now you will find the new ?SSH Keys? panel in the security section of the e-smith manager. Select it and generate the local ssh key by pressing the ?create keys? button. When the keys are generated it will tell you. After that select SSH KEYS again and you are presented with the option of sending your key to other hosts. On goliath you will enter the ip and root password of david and vise versa.
If case of
an error like this:An error occurred trying to send the key at /etc/e-smith/web/panels/manager/cgi-bin/sshkeys line
272.
Try to establish a ssh connection manually first. On the (goliath) console enter:
ssh 192.168.1.202 and you will be asked to accept. You now are on the console of david. Exit the david console (type exit) and try to send the keys again via the ssh-keys panel. If success, the panel will show you other hosts your key has been sent to 192.168.1.202.
On BOTH servers
Now we are going to install Heartbeat. We need the following packages:
ftp://ftp.ultramonkey.org/pub/ultramonkey/ultramonkey-1.0.2/RPMS/perl-Net-SSLeay-1.05-5.i386.rpm
ftp://ftp.ultramonkey.org/pub/ultramonkey/ultramonkey-1.0.2/RPMS/ipvsadm-1.14-1.i386.rpm
http://www.linux-ha.org/download/heartbeat-0.4.9.1-1.i386.rpm
http://www.linux-ha.org/download/heartbeat-ldirectord-0.4.9.1-1.i386.rpm
(ldirectord is not needed for basic HA use, but for future use related to the http service)
? rpm ?Uvh * and the packages will be installed (make sure you have them in a separate directory)
If you do it manually unpack them in the in the following order
- perl-Net-SSLeay
- ipvsadm
- heartbeat
- ldirectord
After installation there will be a new directory called: ha.d at /etc/ha.d. This is the home of the configuration files of heartbeat. Example config files are now at /usr/share/doc/packages/heartbeat/. We will need 2 files: ha.cf and haresources. Copy them both to /etc/ha.d/. Next we need 1 more file which we will create ourselves.
Enter the /etc/ha.d directory and enter: touch authkeys on the console. This will create an empty file with this name. So now we have 3 config files:
ha.cf = configuration of the heartbeat link
haresources = configuration of the fail-over fail-back actions
authkeys = configuration of authentication between the 2 nodes (for now it?s empty)
We have to edit these files to configure our test environment. Let?s start with ha.cf
? Edit ha.cf with your favourite editor and make it look like listed below and save it:
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 10
initdead 120
serial /dev/ttyS0
baud 19200
udpport 694
udp eth1
node goliath
node david
The file itself is well documented so you can try the various options later. In this case, eth1 is our 2nd NIC with ip 10.0.0.1 and our serial cable is connected to ttyS0. Change these values so they match your set-up.
? Edit haresources and make it look like listed below:
goliath 192.168.1.210/24/eth0 smb
Hash out all other lines and save the file. Also this configuration files is well documented, so this line tells us that we will use virtual IP number 192.168.1.210 with the default subnet for our cluster and the Samba service will be started and stopped in case of fail-over and fail-back. We will change it later to use it with our LAN RAID-1 system.
The ip number 192.168.1.210 is the virtual ip number for our ?cluster? and eth0 is the first heartbeat ?are you alive?? link. Smb is the service to start (and take over) in case of failure of the other node.
The EXACT SAME line is needed for david, so don?t replace the first word goliath with david!
Next edit the authkeys file and add 2 lines to it like listed below:
auth 1
1 crc
Out of the 3 possible authentication methods that can be used (sha1, md5 and crc) we use the most simple one for we trust our fiend completely and we?re tight together on a private net. The heartbeat code needs the authkeys file to have 600 permissions so enter ?chmod 600 authkeys? on the console to change the permissions.
It?s time to give heartbeat a first try, here we go?? on the console enter:
/etc/rc.d/init.d/heartbeat start
1. If it returns: Starting High-Availability services: [ OK ] then it?s working!!
stop it by entering ?/etc/rc.d/init.d/heartbeat stop?
2. if it returns: Starting
High-Availability
services:
[ FAILED ]
Check the log files (which we defined in ha.cf) ha-log and ha-debug at the /var/log directory (ha-log, ha-debug). These will give you a lot of information.
Now when heartbeat starts, it will automatically start the services defined in haresources (in our case samba). You will notice that samba is stopped when we stop heartbeat. This is because there can only be 1 samba service active within the cluster. When heartbeat dies, the other server will start up the defined services. This means we have to disable the smb service on david at start-up time. I chose to use this rpm: http://myezserver.com/downloads/mitel/contrib/service-control-0.0.1/ .
Install it on david (and goliath) if you wish and look for a new entry ?services? in the e-smith manager. Deactivate the smb service for david and goliath.
Now start heartbeat on goliath and then on david. Take a look at the ha-log. Now stop heartbeat on goliath and after 5 seconds take a look at the ha-log on david. Start Heartbeat on goliath again and after 5 seconds take a look at ha-log at both servers. It?s working, right??
Stop heartbeat on both servers for now because we?ve got to go on:
/etc/rc.d/init.d/heartbeat stop
Now we are going to install the drbd modules and files. (we do this manually for they are precompiled)
Here?s where the files are needed:
drbdsetup -> /usr/sbin/drbdsetup
drbd.o -> /lib/modules/2.2.19-7.0.8/block/drbd.o
drbd -> /etc/rc.d/init.d/drbd
datadisk -> /etc/ha.d/resource.d/datadisk
drbd.conf -> /etc/drbd.conf
drbdsetup.8 -> /usr/share/man/man8/drbdsetup.8
drbd.conf.5 -> /usr/share/man/man5/drbd.conf.5
? Copy the files to their locations. Change the permissions on drbd.o to 644 (chmod 644 drbd.o)
See if you can load the drbd module: insmod drbd
Now check if it?s loaded enter: lsmod (it should be at the top of the list)
Now unload the module enter: rmmod drbd
(on BOTH servers)
The configuration file drbd.conf has to be changed to your own settings. Change it to match the below settings. More variables are possible (see drbd documentation):
resource drbd0 {
protocol=B
fsckcmd=fsck.ext2 -p -y
disk {
do-panic
}
net {
sync-rate=6M
tl-size=256
timeout=60
connect-int=10
ping-int=10
}
on goliath {
device=/dev/nb0
disk=/dev/hdc1
address=10.0.0.1
port=7788
}
on david {
device=/dev/nb0
disk=/dev/hdc1
address=10.0.0.2
port=7788
}
}
(on BOTH servers)
Now let?s see if we can establish a mirror between the 2 servers.
? enter /etc/rc.d/init.d/drbd start on the goliath console
? enter /etc/rc.d/init.d/drbd start on the david console
? enter cat /proc/drbd on the goliath console to see the status
? if all is well, then david is showing that it is syncing with primary (goliath)
David:
0: cs:SyncingAll st:Secondary/Primary ns:0 nr:192048 dw:192048 dr:0 gc:4,7,3
Goliath:
0: cs:SyncingAll st:Primary/Secondary ns:192384 nr:0 dw:132296 dr:192577 gc:4,7,3
(numbers can be different depending on the size of the drives used)
Upon first connection the status will be cs:SyncingAll. A Complete synchronisation of the disks can take quit a while. This is the case with a first setup or after fail-back. After a Fail-over/back situation within a small period of time, a Partial Sync will be performed, which takes less time. During Synchronisation the cluster can be used as normal, it?s a background task.
In the drbd configuration files we defined our new RAID-1 drive as /dev/nb0. This drive maybe only mounted to 1 node at a time!. This will be handled by the heartbeat script automatically. In the above status goliath is the primary node, so the RAID-1 drive should be controlled by goliath. You can check this by entering: df ?h on the goliath console which will give you:
[root@goliath ha.d]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda6 27G 406M 25G 2% /
/dev/hda1 15M 2.6M 11M 18% /boot
/dev/nb0 7.9G 829M 6.6G 11% /home/e-smith/files/ibays/data/files
(size and percentage are different depending on the drives used)
You can start and stop the drbd by entering:
/etc/rc.d/init.d/drbd stop or /etc/rc.d/init.d/drbd start
Stop drbd for now by entering /etc/rc.d/init.d/drbd first on david then on goliath. We will perform a full sync later.
So where are we?
1. We have a working heartbeat (fail-over fail-back) mechanism
2. We have a working LAN RAID-1 mechanism
And now we have to glue it together?..
Shut down drbd and heartbeat services If any is running (first david)
? /etc/rc.d/init.d/drbd stop
? /etc/rc.d/init.d/heartbeat stop
Step 9
Integrating the RAID-1 with heartbeat is the next thing to do.
? We have to tell drbd that we want the RAID-1 to be controlled by the secondary node upon fail-over/back and visa versa. (Make it work parallel with heartbeat)
Since we setup our RAID-1 with 1 resource called drbd0 (see drbd.conf) we have to change the /etc/ha.d/haresources file. (more resources are possible for each partition you make on the RAID-1)
Current entry in /etc/haresources:
Goliath 192.168.1.210/24/eth0 smb
Change it to:
Goliath 192.168.1.210/24/eth0 datadisk::drbd0 smb
The script datadisk is located at /etc/ha.d/resource.d and will take care of switching drbd state. Heartbeat will search for the script in /etc/rc.d/init.d directory. If it?s not there it will look for it in /etc/ha.d/resource.d
So the above line tells heartbeat to use the script ?datadisk? with argument ?drbd0? (::=argument separator)
Now we want to make our RAID-1 mountable at boot time. We have to edit the /etc/fstab file.
? Add the following line to /etc/fstab:
/dev/nb0 /home/e-smith/files/ibays/data/files ext2 noauto 0 0
Do NOT actually mount on boot time, the datadisk script together with the noauto option will take care of this.
On BOTH servers
Step 10
We finished the configuration of heartbeat and drbd. To start our High Available server, we first have to reboot the servers to make out RAID-1 mountable on both servers.
Keep in mind: Drbd always has to be started prior to heartbeat
Start drbd
? Reboot both servers.
?
On goliath enter:
/etc/rc.d/init.d/drbd
start
It
will ask you to cancel waiting for the other server an make this one primary.
Leave it at this state.
?
On
david enter: /etc/rc.d/init.d/drbd
start
It will respond with
SyncingAll abort?. Leave it at this state.
? The goliath console will now continue for drbd has found the secondary node and has made goliath primary node. (check this with by entering: cat /proc/drbd)
Start heartbeat
?
On
goliath enter:
/etc/rc.d/init.d/heartbeat start
goliath will
now mount the RAID-1
?
Check
the availability of the RAID-1 (/devnb0) by entering: df
?h
This can take a couple of
seconds so try a few times
? You can await the full synchronisation to be finished, and then start heartbeat on david or you can start heartbeat now by opening a second console on david (ALT-F2) or by using a new Putty connection.
? Enter in the new david console: : /etc/rc.d/init.d/heartbeat start
Check the availability of the RAID-1 by browsing your network from your client machine and look for The_rock server. (it can take a while for it to appear, otherwise search for it by it?s ip number which is 192.168.1.210) The_Rock has a data directory which is our RAID-1 drive.
Copy some files to it so you can check the availability of the files after a fail-over.
Before you can test fail-over/back procedures, you have to await completion of the full synchronisation process. With my 8Gb RAID-1 it took approx. 1 hour.
To test fail-over and fail-back, use the command start and stop for drbd and heartbeat as described above. If you?re more confident with the system you can start powering of machines and yank out cables?..
Procedure for testing after full or partial synchronisation:
1. Stop heartbeat on goliath
2. Check the heartbeat logfiles on goliath and david
3. Check RAID-1 availability by: df ?h on david and goliath
4. Check files availability on The_Rock
5. Copy some files to The_Rock
6. Start heartbeat on goliath
7. Check the heartbeat logfiles on goliath and david
8. Check RAID-1 availability by: df ?h on davis and goliath
9. Check files availability on The_Rock
Did it work
?? J
You can make drbd and heartbeat start up at boot time. This is up to you. Personally I only want goliath to auto start drbd and heartbeat so I get an automatic Fail-over but a manual Fail-back.
I figure if goliath dies on me and david is automatically is taking over, I have to check and repair goliath for something went wrong. To bring the cluster back in an normal state I want to control the start-up sequence and checking. But again it?s up to you.
To start heartbeat and drbd at boot time. Add the following symbolic links to /etc/rc7.d
ln ?s /etc/rc.d/init.d/heartbeat /etc/rc7.d/K35heartbeat
ln ?s /etc/rc.d/init.d/heartbeat /etc/rc7.d/S99heartbeat
ln ?s /etc/rc.d/init.d/drbd /etc/rc7.d/K36drbd
ln ?s /etc/rc.d/init.d/drbd /etc/rc7.d/S98drbd
That?s it!
It?s time to light your cigar! J
Thanks for taking the effort to read this How-To. You
comments are always welcome.
Revision History:
Date |
Revision |
Changes |
|
|
|
|
1.0.0 |
DRAFT release |
|
1.0.1 |
Changes to some URL links Added revision history |
|
1.0.2 |
Changes to some URL links Cosmetic changes Changed drbd installation section (thanks to Robert Heaton) |
|
1.0.3 |
Cosmetic changes Changed some important details (Thanks to Robert Heaton again) Added automatic startup links and auto mounting the RAID-1 at boot time Added ?COMMENTS? section |
|
1.1.0 |
Public release Cosmetic changes Changed drbd set-up procedure |
|
1.1.1 |
Cosmetic and spelling changes (Thanks to Robert Heaton again) |
|
|
Added the availability of the mailing list (Thanks to Steve Bovingdon (steve@bov.nu) |
|
|
Added some comments |
|
1.2.1 |
Spelling changes (Thanks to Rober Heaton and Paul Miller) Major change in adding extra NIC?s (Thanks to Kees Blokland (kees@blokland.net) Changed document header, Author is now Initiator for I feel the document will be changed radically over time. :-) |
|
1.2.1 |
Ultramonkey changes
download location |
|
|
|
COMMENTS (some comments/results I received via e-mail)
Peter Werner (peter_a_werner@yahoo.com)
way cool!
that looks really good, congrats.
ifost is actually working on some high availability
stuff, using heartbeat and unison. basically, the
slave uses unison to keep a backup of the masters /etc
and /home, and heartbeat to detect if the master dies.
if the aster does die, it copies over some files,
merges some files (like /home/e-smith/configuration )
reboots and comes up as the new master. not sure when
it will be done though. i will add a link to your
document on my page when you release it.
great stuff!
cheers
-pete
?
Robert Heaton (rob@northwestlinux.co.uk)
Hi!
I?ve followed the How-To and all is working!
Instead of using a Cross-over Ethernet Cable I used a 10BASE Hub, (So I can see the link working between the two machines)
So Now I have flashing lights all over the place!
I?m now up to: ?We will mount it later at /home/e-smith/files/ibays/data/files by editing fstab.?
I take it this will be done in the next update to the how-to??
Kind regards,
Rob.
?
Jeff C (jcoleman_AT_rstrat.com)
Bravo! Congratulations.
-jeff
?
Judy Morgann (judymorgann@yahoo.com)
Hi,
this is the best news i have ever seen here! very great
job!!! i think this is the
most important function e-smith lacks of. e-smith is a
greate server distribution but with your howto it can take place in enterprise.
taking over other services like apache, imap works greate
too.
some "problem" i try
to figure out is how to replicate the user profiles from an e-smith pdc.
we have some e-smith servers running as
pdc for our win2k/xp clients. but if one fails user can?t log in
to another server cause of missing the user and machine
accounts.
i try to copy smbpasswd and MACHINE.SID from one server to the others via
rsync, but clients can?t logon.
if someone has a idea it would be very
nice.
thanks again for your greate work!!!
judy
?
Alan Robertson (alanr@unix.sh) http://www.linux-ha.org
Using Phillip's DRBD software together with heartbeat is a very powerful
combination with no single points of failure. This is something many
commercial high-end HA systems don't provide.
-- Alan Robertson
alanr@unix.sh
?