Document header start

Linux High Availability How-To for Mitel SME v5.1.2

 

FEB 11TH 2003, UPDATED HOW-TO IN PROGRESS?..

 

Version             : 1.2.2

Status               : Public release

Initial release     : DRAFT 14-02-2002

Last revision      : 24-03-2002

Next update       : After feed-back from you guys

Initiator              : Hsing-Foo Wang (hsing-foo.wang@star-support.com)

Tested with        : MITEL SME Server version 5.1.2 http://www.e-smith.org/
Available at        : (Primary) http://www.star-support.com/sme/Linux-HA/SME%20High%20Availability%20How-To.html

                        : (mirrors) http://www.northwestlinux.co.uk/sme/highavailability.html

Contributors       : Robert Heaton (rob@northwestlinux.co.uk), Steve Bovingdon (steve@bov.nu)
                                :
Kees Blokland (kees@blokland.net), Paul Miller (pmiller@innercite.com)

Latest changes  : see at the end of this document.

Comments        : see at the end of this document.

 

!Always check for the latest available How-to at www.star-support.com!

 

Copyright ? 2002 Hsing-Foo Wang (hsing-foo.wang@star-support.com).

This document may be freely redistributed as long as this document header remains intact.

Document header end

 

*** FOR TESTING PURPOSES ONLY. DO NOT USE ON PRODUCTION SERVERS ***

 

Note: This how-to is not finished yet, but I thought this first public release maybe useful to get some feedback. I would be interested in your results so please drop me an e-mail (hsing-foo@star-support.com)

To Do:

- Complete review and restyle of the How-To!
- Install, configure MON services monitoring tool for use with heartbeat

- Configure HA for different file systems (e.g. NFS)

- Create custom templates for services to serve the clusters IP address

- Lot?s of stuff????

 

Issue: You would like to create High Availability for your Mitel SME server environment.

 

Solution: Follow the steps below?

 

Disclaimer: It's all at your own risk and responsibility.

 

Mailing-List

There is a mailing-list that is especially setup to discuss the SME HA issue. To catch up with the latest, I advise you to subscribe.

 

The address for posting is smeha@bov.nu

For help and a description of available commands, send a message to: smeha-help@bov.nu

To subscribe to the list, send a message to: smeha-subscribe@bov.nu

 

(Thanks to Steve Bovingdon (steve@bov.nu) who has been so kind to set this up.)

 

Feedack

I love to hear from you with any results or comments. Please subscribe to the mailing lis and post it there

Introduction

 

High Availability can be achieved in many ways. The question is how 'high' do you want it...

Now I?m not going to discuss high availability and how it should be achieved. If you would like to discuss it go to http://www.linux-ha.org/ I?m just trying to get something working.

This How-To describes the ?poor man?s? High Availability. Instead of using an external shared SCSI storage cabinet or NAS, we will use cheap internal IDE drives.

All HA-Availability software used is Open Source software, and you have to accept all the individual licenses or ?terms of use? agreements

 

This How-To describes the following 'High Availability' and is intended to get you started. For now it?s intended for SME in server-only mode and focused on data redundancy. Complete system redundancy should be possible, but that is the next thing (unless you did it and are willing to share / contribute)

?         Automatic Fail-over and Fail-back between 2 nodes (2 SME servers)

?         Real-time replication of user data to both nodes (LAN RAID-1)

?         On-line uncompressed user accessible backup of user data every 24 hours

*(not yet done at this moment, but should be relatively easy with Unison or Rsync)

 

So basically were doing 2 things here, HA and On-Line Backup.

 

?Human? understandable explanation:

2 servers are presented to the user as only 1 server with 1 virtual IP address.

If the first (primary) server fails, the 2nd server will take over it's tasks and will have all actual user data including open files. You can now safely 'repair' the defective server whilst the network and user data is still available. Besides a nightly tape back-up, user-data is mirrored to a back-up location (ibay) once every 24 hours so users can correct a ?lost document? within a time-frame of 24 hours themselves.

Both servers can be placed 80 meters apart from each other if we use UTP as heartbeat link. In case of fire or if you want to place the servers in different buildings. If they are next to each other, we will use the serial link and UTP. Just remember to use different power-outlet groups J

 

Advantages

?         High Availability against relatively low investment

?         Less down-time

?         Less unproductive employees

?         Less loss of work

?         Less yelling bosses

?         Less stress

?         Did you ever waited for your supplier to correct your problem?
- Be there in the first place
- Have a qualified engineer available
- Have spare parts if needed
- Re-install the OS if needed
- Restore the latest back-up
(If they can manage the problem in 1 day?.)

 

 

 

 

This How-To is intended to reduce productivity loss in case of server failure.


 

 

 

Requirements

 

?         Knowledge of MITEL SME (see http://www.e-smith.org/docs/manual/

) and ability in customizing it.
(see http://www.e-smith.org/custom/)

?         2 hardware servers each with 2 IDE hard disks (1 system and 1 data), 2 NICS and at least 1 free serial port (data drives must have same capacity but do not need to be of the same brand) Make sure you?re hardware is on the compatibility list use with SME. (Red Hat 7.0)
(
http://www.redhat.com/docs/manuals/linux/RHL-7-Manual/install-guide/s1-steps-hardware.html)

?         1 Null modem cable

?         1 cross-over Ethernet cable (UTP/RJ45)

?         1 hub/or switch (or your existing network :-) and at least 1 client attached to the ?network? for testing purposes

?         SME V5.1.2 software (http://www.e-smith.org/downloads/)

?         Good spirit, faith, coffee and a big cigar if all works fine?.

?         Download the following rpm's/files and save them in separate directories like ?/install/ha, /install/unison, install/drbd. (also available here, also http://www.rpmfind.net/ is a good resource)

 

Linux-HA software

1.        http://www.ultramonkey.org/download/1.0.2/RPMS/perl-Net-SSLeay-1.05-5.i386.rpm

2.        http://www.ultramonkey.org/download/1.0.2/RPMS/ipvsadm-1.14-1.i386.rpm

3.        http://www.linux-ha.org/download/heartbeat-0.4.9.1-1.i386.rpm

4.        http://www.linux-ha.org/download/heartbeat-ldirectord-0.4.9.1-1.i386.rpm

 

Unison and ssh-panel software

5. http://www.ifost.org.au/~peterw/unison-2.7.7-2.i386.rpm

7. http://www.ifost.org.au/~peterw/ssh-keys-0.1.1-1.noarch.rpm

6. http://www.ifost.org.au/~peterw/unison-manager-0.2-1.noarch.rpm

8. ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/tk-8.3.1-53.i386.rpm

9. ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/tcl-8.3.1-53.i386.rpm

10. ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/XFree86-libs-4.0.3-5.i386.rpm

11. ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/expect-5.31-53.i386.rpm

(Peter from Ifost is busy with Linux-HA too based upon Unison and his contributions to SME, more to come?..)

 

LAN RAID-1 software (pre-compiled modules)

12. http://www.star-support.com/downloads/mitel/contrib/Linux-HA/drdb/drbd.zip

The LAN RAID-1 software has to be compiled. But to save you the trouble from installing compilation

software on your production SME server, I pre-compiled the modules for your convenience. Of course

you can compile them yourself. Read about it at http://www.complang.tuwien.ac.at/reisner/drbd/

The version used in this case is v5.8.1 which is the latest stable version at the time of writing.

 

Useful Tools

20: http://myezserver.com/downloads/mitel/contrib/service-control-0.0.1/

 

 

Documentation / references:

HA software: http://www.linux-ha.org

LAN RAID-1 Software: http://www.complang.tuwien.ac.at/reisner/drbd/

Unison, SSH panel software: http://www.ifost.org.au/~peterw/

E-smith forums: http://www.e-smith.org/bboard/list.php?f=3

Ultramonkey: http://ultramonkey.sourceforge.net/

 


Test environment:

 

I will use the following names and ip's for the test environment:

 

?         Node 1 is called 'goliath' with eth0 ip 192.168.1.201 and eth1 ip 10.0.0.1

?         Node 2 is called 'david' with eth0 ip 192.168.1.202 and eth1 ip 10.0.0.2

?         The cluster-server netbios name is 'The Rock' description ?High Available Server? with ip 192.168.1.210

 

Note: Don?t fill in the 192.168.1.210 address anywhere else besides the heartbeat config file. It?s a virtual ip number.

 

Text in purple = content of a (config) file

Text in green = console command as root

 

This How-To contains the following sections:

A. Initial Setup hardware/software

B. Installing and configuring

C. Creating on-line backup (not yet finished)

 

To end section B it can take a couple of hours for this How-To doesn?t contain any magic?.   J

 

Ps. Where possible, I performed the steps on goliath completely, then david.

.

 

Let the game begin...


 

SECTION A (Initial Setup hardware/software)

 

Step 1.

?         Setup both your servers (*) incl 2 NIC?s but with only 1 hard disk (system drive) installed/active.
(* means setting up hardware only, softwareinstallation will follow below)
(I disabled the second IDE channel in the bios to disable the second hard disk at install time).
I used a small drive for just the server software and config files (10Gb)and used a larger drive as data drive (40Gb). SO we have the system drive and the data drive.

?         Connect the system drive on IDE-1 as master and (after initial SME install) the data drive on IDE-2 as master.

?         Do not connect both servers to the network at this time!

 

(you could change the boot diskette to prevent the install process from claiming all hard drive space , but I don?t know how to do this and the above way works simple and fast)

 

 

Step 2.

To make sure that the users only see 1 server on the network (The_Rock), we have to make samba (workgroup) the same on both servers.

?         Install SME version 5.1.2 as server only with the above mentioned details (david, goliath)

?         Create ibay ?data? group=everyone, Read=Group/Write =group, no password

?         Create ibay ?backup? group=admin, Read=everyone/Write =group, no password

?         Change the workgroup setting so that Windows Workgroup =The_Rock.

?         Access the remote access panel in the server manager and enable ssh connections. (both servers)

?         Enable allow administrative command line access

 

 (to remotely access your servers see: http://www.chiark.greenend.org.uk/~sgtatham/putty/)

 

On BOTH servers


Step 3.

Since we installed the server in server-only mode, we only have 1 configured NIC (eth0). So we have to add the Second NIC by hand so?..

(TIP: You can cut and paste commands into the console by pressing the right mouse button on the console. )

 

?         Figure out which module from ?/lib/modules/2.2.19-7.0.8/net? you need to use with your NIC
- You can use kudzu to determined your 2nd NIC (enter: kudzu --help for more details)
- Or you could set the server in server/gateway mode to see which card is discovered. Don?t forget to set  the server back in server-mode only. (thanks Kees :-) )

?         Bind the module to the NIC (eth1) by executing on the console:
/sbin/e-smith/config set EthernetDriver2 rtl8139
(which is rtl8139.o in /lib/modules/2.2.19-7.0.8/net, if you make a mistake in your type of card, just re-enter the command with the right module name)

?         Create a custom template for the new NIC:
mkdir -p /etc/e-smith/templates-custom/etc/sysconfig/network-scripts/ ifcfg-eth1
and copy ?/etc/sysconfig/network-scripts/ifcfg-eth0? to it as ifcfg-eth1
 
cp /etc/sysconfig/network-scripts/ifcfg-eth0  /etc/e-smith/templates-custom/etc/sysconfig/network-scripts/ifcfg-eth1/ifcfg-eth1

?         Enter the newly created directory and change the new config file of eth1 with the correct private net values for goliath and david (below config is for goliath)

DEVICE=eth1
USERCTL=no
ONBOOT=yes
BOOTPROTO=none
IPADDR=10.0.0.1
NETMASK=255.255.255.0
NETWORK=10.0.0.0
BROADCAST=10.0.0.255


 

?         Expand the new template by executing:
/sbin/e-smith/expand-template /etc/sysconfig/network-scripts/ifcfg-eth1

?         copy /etc/e-smith/events/actions/conf-ethernet to /etc/e-smith/events/actions/conf-ethernetx
cp /etc/e-smith/events/actions/conf-ethernet to /etc/e-smith/events/actions/conf-ethernetx

?         Edit the ethernetx.conf to look like the following and save it: 

 

 #!/usr/bin/perl -w

 

#----------------------------------------------------------------------
# copyright (C) 1999-2001 e-smith, inc.
# cut to save space?

# Please visit our web site www.e-smith.com for details.
#----------------------------------------------------------------------

 

package esmith;

 

use strict;
use Errno;
use esmith::config;
use esmith::util;

 

my %conf;
tie %conf, 'esmith::config';

 

#------------------------------------------------------------
# Update ethernet interface files. We always have at least
# one ethernet adapter (eth0) for the internal network. If
# running in server and gateway mode with a dedicated connection,
# a second adapter (eth1) is used for the external connection.
# If running in in HA mode, then we need at least 2 or maybe even
# 3 adapters. This could be a way to add them.
#------------------------------------------------------------

 

unlink ("/etc/pump.conf");
#unlink ("/etc/sysconfig/network-scripts/ifcfg-eth0");
unlink ("/etc/sysconfig/network-scripts/ifcfg-eth1");
unlink ("/etc/sysconfig/network-scripts/ifcfg-eth2");
unlink ("/etc/sysconfig/network-scripts/ifcfg-lo:0");

 


if (($conf{'SystemHA'} =~ /highavailable/)
 && ($conf{'AccessType'} eq "dedicated"))
{

 

    defined $conf{'EthernetDriver2'}
    && ($conf{'EthernetDriver2'} ne "unknown")
    && esmith::util::processTemplate (\%conf,
  "/etc/sysconfig/network-scripts/ifcfg-eth1");


 # just in case we have 3 cards in the system
 # it's no use yet, but at least it's there when we need it
 

    defined $conf{'EthernetDriver3'}
    && ($conf{'EthernetDriver3'} ne "unknown")
    && esmith::util::processTemplate (\%conf,
  "/etc/sysconfig/network-scripts/ifcfg-eth2");

 

}
exit (0);

 

 

?         We now have to create a new action so our new card will be recognized while maintaining compatibility with future system updates
cd  /etc/e-smith/events/console-save
ln -s ../actions/conf-ethernetx S36conf-ethernetx

?         In order to use this new version, you will have to create a new entry in the config database: 

/sbin/e-smith/config set SystemHA highavailable

?         Integrate and finalizing the new settings within the MITEL SME configuration:

/sbin/e-smith/signal-event console-save

?         Restart the network to activate the new NIC by executing ?etc/rc.d/init.d/network restart? and check the presence of the new 2nd NIC by entering ?ifconfig? if all went well, it?s there?.
(you could add more nic?s this way for other purposes)

?         Connect the cross-over cable from the 2nd NIC with the 2nd NIC on the other server (net 10.0.0.0)

?         All is correct if you can ping from goliath to ip 10.0.0.2 (2nd NIC on david) and 192.168.1.202
and vise versa from david to goliath.

?         While we?re here we also connect the null-modem cable (if the servers are placed together otherwise we will use the cross-over connection (UTP) as heartbeat link (are you alive?)

?         Connect the serial null-modem cable to the serial ports on each server
ttyS0=com1, ttyS1=com2, so on which port you are connecting to. In our case they are both connected to ttyS0. The ports don?t have to be the same on both servers (maybe you have a UPS) but you have to know just on which one you connected the link to.

?         On the goliath console you type: cat </dev/ttyS0
On the david console you type:
echo hello >/dev/ttyS0

?         You should see ?hello? appear on the goliath console. Do the same in reverse order (that is change roles) to double check. (Hit ctrl-c to stop goliath from listening first)

 


 

Step 4

Now we?re ready to install the second Hard disk. Power down the servers and physically install the drive on the second IDE channel as Master Drive. Normally IDE 1 Master=hda, Slave=hdb and IDE 2 Master=hdc, Slave=hdd. So in our case the new drive will be device /dev/hdc.

 

You can also use a drive with data on them. Just connect it like explained above, and continue with preparing the mount point below.

 

New Drive without existing data:

We will only make 1 partition for the whole drive although more are possible. We?re going to use fdisk to partition and prepare the new drive so type fdisk /dev/hdc on the console.

?         delete all current partitions with option d followed by the partition number.

?         create a new partition with the n option, followed by option p followed by option 1

?         write the partition information by option w

?         create a the filesystem on your new partition  mkfs ?ext2 /dev/hdc1

 

Preparing the mount point

We will use /home/e-smith/files/ibays/data/files as the mount point  for our drive. To prepare it cd to the directory /home/e-smith/files/ibays/data

enter: chown root.shared files

enter: chmod 0775 files (or different permissions according to your wishes)

 

The drive is now ready for use with our HA system. We will integrate and mount it later at step 8.

 

On BOTH servers

 


 

 

SECTION B

 

Step 6

Now we?re going to install SSH keys and Unison. We need the following downloaded rpm?s:

 

http://www.ifost.org.au/~peterw/unison-2.7.7-2.i386.rpm

http://www.ifost.org.au/~peterw/ssh-keys-0.1.1-1.noarch.rpm

http://www.ifost.org.au/~peterw/unison-manager-0.2-1.noarch.rpm

 

These are needed to satisfy dependencies:

ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/tk-8.3.1-53.i386.rpm

ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/tcl-8.3.1-53.i386.rpm

ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/XFree86-libs-4.0.3-5.i386.rpm

ftp://ftp.rpmfind.net/linux/redhat/7.1/en/os/i386/RedHat/RPMS/expect-5.31-53.i386.rpm

(For now you have to get these, someone has to figure out if we really need all dependencies)

(put these packages in a separate directory and cd to it)

 

I chose Unison and Peter?s contributions for it contains 2 things; SSH key installation which is needed for a trusted login from 1 server to server 2 is made easy by the ssh-keys panel. And the Unison program that synchronizes data between 2 ibays which is used for the 24-hour on-line backup. You could also use rsync the main difference is that Rsync has 1 master source which is mirrored to other sources and Unison has 2 master sources which are compared and synchronized. It?s really up to you which one you prefer but in this case I use Unison.

 

Install the packages with rpm ?Uvh * and the packages will be installed (make sure you have them in a separate directory)

If you do it manually unpack them in the in the following order

-          XFree86-libs

-          tcl-8.3.1

-          tk-8.3.1

-          Expect-5.31

-          ssh-keys

-          unison-2.7.7-2

-          unison-manager

 

Now you will find the new ?SSH Keys? panel in the security section of the e-smith manager. Select it and generate the local ssh key by pressing the ?create keys? button. When the keys are generated it will tell you. After that select SSH KEYS again and you are presented with the option of sending your key to other hosts. On goliath you will enter the ip and root password of david and vise versa.

If case of an error like this:
An error occurred trying to send the key at /etc/e-smith/web/panels/manager/cgi-bin/sshkeys line 272.

Try to establish a ssh connection manually first. On the (goliath) console enter:

ssh 192.168.1.202 and you will be asked to accept. You now are on the console of david. Exit the david console (type exit) and try to send the keys again via the ssh-keys panel. If success, the panel will show you other hosts your key has been sent to 192.168.1.202.

 

On BOTH servers


 

Step 7

Now we are going to install Heartbeat. We need the following packages:

 

ftp://ftp.ultramonkey.org/pub/ultramonkey/ultramonkey-1.0.2/RPMS/perl-Net-SSLeay-1.05-5.i386.rpm

ftp://ftp.ultramonkey.org/pub/ultramonkey/ultramonkey-1.0.2/RPMS/ipvsadm-1.14-1.i386.rpm

http://www.linux-ha.org/download/heartbeat-0.4.9.1-1.i386.rpm

http://www.linux-ha.org/download/heartbeat-ldirectord-0.4.9.1-1.i386.rpm

(ldirectord is not needed for basic HA use, but for future use related to the http service)

 

?         rpm ?Uvh * and the packages will be installed (make sure you have them in a separate directory)

If you do it manually unpack them in the in the following order

-          perl-Net-SSLeay

-          ipvsadm

-          heartbeat

-          ldirectord

 

After installation there will be a new directory called: ha.d at /etc/ha.d. This is the home of the configuration files of heartbeat. Example config files are now at /usr/share/doc/packages/heartbeat/. We will need 2 files: ha.cf and haresources. Copy them both to /etc/ha.d/. Next we need 1 more file which we will create ourselves.

Enter the /etc/ha.d directory and enter: touch authkeys on the console. This will create an empty file with this name. So now we have 3 config files:

ha.cf                 = configuration of the heartbeat link

haresources      = configuration of the fail-over fail-back actions

authkeys           = configuration of authentication between the 2 nodes (for now it?s empty)

 

We have to edit these files to configure our test environment. Let?s start with ha.cf

?         Edit ha.cf with your favourite editor and make it look like listed below and save it:

 

debugfile /var/log/ha-debug

logfile /var/log/ha-log

logfacility     local0

keepalive 2

deadtime 10

initdead 120

serial  /dev/ttyS0

baud    19200

udpport 694

udp     eth1

node    goliath

node    david

 

The file itself is well documented so you can try the various options later. In this case, eth1 is our 2nd NIC with ip 10.0.0.1 and our serial cable is connected to ttyS0. Change these values so they match your set-up.

 

?         Edit haresources and make it look like listed below:

 

goliath    192.168.1.210/24/eth0 smb

 

Hash out all other lines and save the file. Also this configuration files is well documented, so this line tells us that we will use virtual IP number 192.168.1.210 with the default subnet for our cluster and the Samba service will be started and stopped in case of fail-over and fail-back. We will change it later to use it with our LAN RAID-1 system.

 

The ip number 192.168.1.210 is the virtual ip number for our ?cluster? and eth0 is the first heartbeat ?are you alive?? link. Smb is the service to start (and take over) in case of failure of the other node.

The EXACT SAME line is needed for david, so don?t replace the first word goliath with david!

 

 

Next edit the authkeys file and add 2 lines to it like listed below:

 

auth 1

1 crc

 

Out of the 3 possible authentication methods that can be used (sha1, md5 and crc) we use the most simple one for we trust our fiend completely and we?re tight together on a private net. The heartbeat code needs the authkeys file to have 600 permissions so enter ?chmod 600 authkeys? on the console to change the permissions.

 

It?s time to give heartbeat a first try, here we go?? on the console enter:

 

/etc/rc.d/init.d/heartbeat start

 

1. If it returns: Starting High-Availability services:                       [   OK   ] then it?s working!!

stop it by entering ?/etc/rc.d/init.d/heartbeat stop?

 

2. if it returns: Starting High-Availability services:                       [ FAILED ] Houston we have a??.

 

Check the log files (which we defined in ha.cf) ha-log and ha-debug at the /var/log directory (ha-log, ha-debug). These will give you a lot of information.

 

Now when heartbeat starts, it will automatically start the services defined in haresources (in our case samba). You will notice that samba is stopped when we stop heartbeat. This is because there can only be 1 samba service active within the cluster. When heartbeat dies, the other server will start up the defined services. This means we have to disable the smb service on david at start-up time. I chose to use this rpm: http://myezserver.com/downloads/mitel/contrib/service-control-0.0.1/ .

Install it on david (and goliath) if you wish and look for a new entry ?services? in the e-smith manager. Deactivate the smb service for david and goliath.

 

Now start heartbeat on goliath and then on david. Take a look at the ha-log. Now stop heartbeat on goliath and after 5 seconds take a look at the ha-log on david. Start Heartbeat on goliath again and after 5 seconds take a look at ha-log at both servers. It?s working, right??

 

Stop heartbeat on both servers for now because we?ve got to go on:

/etc/rc.d/init.d/heartbeat stop


 

 

Step 8

Now we are going to install the drbd modules and files. (we do this manually for they are precompiled)

Here?s where the files are needed:

 

drbdsetup          -> /usr/sbin/drbdsetup

drbd.o               -> /lib/modules/2.2.19-7.0.8/block/drbd.o

drbd                  -> /etc/rc.d/init.d/drbd

datadisk            -> /etc/ha.d/resource.d/datadisk

drbd.conf           -> /etc/drbd.conf

drbdsetup.8       -> /usr/share/man/man8/drbdsetup.8

drbd.conf.5        -> /usr/share/man/man5/drbd.conf.5

 

?         Copy the files to their locations. Change the permissions on drbd.o to 644 (chmod 644 drbd.o)

 

See if you can load the drbd module: insmod drbd

Now check if it?s loaded enter: lsmod (it should be at the top of the list)

Now unload the module enter: rmmod drbd

(on BOTH servers)

 

The configuration file drbd.conf has to be changed to your own settings. Change it to match the below settings. More variables are possible (see drbd documentation):

 

resource drbd0 {

 

protocol=B

fsckcmd=fsck.ext2 -p -y

 

  disk {

    do-panic

  }

 

  net {

    sync-rate=6M

    tl-size=256

    timeout=60

    connect-int=10

    ping-int=10

  }

 

  on goliath {

    device=/dev/nb0

    disk=/dev/hdc1

    address=10.0.0.1

    port=7788

  }

 

  on david {

    device=/dev/nb0

    disk=/dev/hdc1

    address=10.0.0.2

    port=7788

  }

}    

 

(on BOTH servers)


 

Now let?s see if we can establish a mirror between the 2 servers.

?         enter /etc/rc.d/init.d/drbd start on the goliath console

?         enter /etc/rc.d/init.d/drbd start on the david console

?         enter cat /proc/drbd on the goliath console to see the status

?         if all is well, then david is showing that it is syncing with primary (goliath)

 

David:

0: cs:SyncingAll st:Secondary/Primary ns:0 nr:192048 dw:192048 dr:0 gc:4,7,3

 

Goliath:

0: cs:SyncingAll st:Primary/Secondary ns:192384 nr:0 dw:132296 dr:192577 gc:4,7,3

 

(numbers can be different depending on the size of the drives used)

 

Upon first connection the status will be cs:SyncingAll. A Complete synchronisation of the disks can take quit a while. This is the case with a first setup or after fail-back. After a Fail-over/back situation within a small period of time, a Partial Sync will be performed, which takes less time. During Synchronisation the cluster can be used as normal, it?s a background task.

 

In the drbd configuration files we defined our new RAID-1 drive as /dev/nb0. This drive maybe only mounted to 1 node at a time!. This will be handled by the heartbeat script automatically. In the above status goliath is the primary node, so the RAID-1 drive should be controlled by goliath. You can check this by entering: df ?h on the goliath console which will give you:

 

[root@goliath ha.d]# df -h

Filesystem            Size  Used Avail Use% Mounted on

/dev/hda6              27G  406M   25G   2% /

/dev/hda1              15M  2.6M   11M  18% /boot

/dev/nb0              7.9G  829M  6.6G  11% /home/e-smith/files/ibays/data/files

 

(size and percentage are different depending on the drives used)

 

You can start and stop the drbd by entering:

/etc/rc.d/init.d/drbd stop or /etc/rc.d/init.d/drbd start

 

Stop drbd for now by entering /etc/rc.d/init.d/drbd first on david then on goliath. We will perform a full sync later.

 

 

So where are we?

1.       We have a working heartbeat (fail-over fail-back) mechanism

2.       We have a working LAN RAID-1 mechanism

And now we have to glue it together?..

 

Shut down drbd and heartbeat services If any is running (first david)

?         /etc/rc.d/init.d/drbd stop

?         /etc/rc.d/init.d/heartbeat stop


 

Step 9

Integrating the RAID-1 with heartbeat is the next thing to do.

 

?         We have to tell drbd that we want the RAID-1 to be controlled by the secondary node upon fail-over/back and visa versa. (Make it work parallel with heartbeat)

 

Since we setup our RAID-1 with 1 resource called drbd0 (see drbd.conf) we have to change the /etc/ha.d/haresources file. (more resources are possible for each partition you make on the RAID-1)

 

Current entry in /etc/haresources:

Goliath 192.168.1.210/24/eth0 smb

 

Change it to:

Goliath 192.168.1.210/24/eth0 datadisk::drbd0 smb

 

The script datadisk is located at /etc/ha.d/resource.d and will take care of switching drbd state. Heartbeat will search for the script in /etc/rc.d/init.d directory. If it?s not there it will look for it in /etc/ha.d/resource.d

 

So the above line tells heartbeat to use the script ?datadisk? with argument ?drbd0? (::=argument separator)

 

Now we want to make our RAID-1 mountable at boot time. We have to edit the /etc/fstab file.

 

?         Add the following line to /etc/fstab:

 

/dev/nb0   /home/e-smith/files/ibays/data/files     ext2    noauto  0 0

 

Do NOT actually mount on boot time, the datadisk script together with the noauto option will take care of this.

 

On BOTH servers

 


 

 

Step 10

 

We finished the configuration of heartbeat and drbd. To start our High Available server, we first have to reboot the servers to make out RAID-1 mountable on both servers.

Keep in mind: Drbd always has to be started prior to heartbeat

 

Start drbd

?         Reboot both servers.

?         On goliath enter: /etc/rc.d/init.d/drbd start
It will ask you to cancel waiting for the other server an make this one primary. Leave it at this state.

?         On david enter: /etc/rc.d/init.d/drbd start
It will respond with SyncingAll abort?. Leave it at this state.

?         The goliath console will now continue for drbd has found the secondary node and has made goliath primary node. (check this with by entering: cat /proc/drbd)

 

Start heartbeat

?         On goliath enter: /etc/rc.d/init.d/heartbeat start 
goliath will now mount the RAID-1

?         Check the availability of the RAID-1 (/devnb0) by entering: df ?h
This can take a couple of seconds so try a few times

?         You can await the full synchronisation to be finished, and then start heartbeat on david or you can start heartbeat now by opening a second console on david (ALT-F2) or by using a new Putty connection.

?         Enter in the new david console: : /etc/rc.d/init.d/heartbeat start 

Check the availability of the RAID-1 by browsing your network from your client machine and look for The_rock server. (it can take a while for it to appear, otherwise search for it by it?s ip number which is 192.168.1.210) The_Rock has a data directory which is our RAID-1 drive.

Copy some files to it so you can check the availability of the files after a fail-over.

 

Before you can test fail-over/back procedures, you have to await completion of the full synchronisation process. With my 8Gb RAID-1 it took approx. 1 hour.

 

To test fail-over and fail-back, use the command start and stop for drbd and heartbeat as described above. If you?re more confident with the system you can start powering of machines and yank out cables?..

 

Procedure for testing after full or partial synchronisation:

1.       Stop heartbeat on goliath

2.       Check the heartbeat logfiles on goliath and david

3.       Check RAID-1 availability by: df ?h on david and goliath

4.       Check files availability on The_Rock

5.       Copy some files to The_Rock

6.       Start heartbeat on goliath

7.       Check the heartbeat logfiles on goliath and david

8.       Check RAID-1 availability by: df ?h on davis and goliath

9.       Check files availability on The_Rock

 

Did it work ??   J

You can make drbd and heartbeat start up at boot time. This is up to you. Personally I only want goliath to auto start drbd and heartbeat so I get an automatic Fail-over but a manual Fail-back.

I figure if goliath dies on me and david is automatically is taking over, I have to check and repair goliath for something went wrong. To bring the cluster back in an normal state I want to control the start-up sequence and checking. But again it?s up to you.

 

To start heartbeat and drbd at boot time. Add the following symbolic links to /etc/rc7.d

 

ln ?s /etc/rc.d/init.d/heartbeat /etc/rc7.d/K35heartbeat

ln ?s /etc/rc.d/init.d/heartbeat /etc/rc7.d/S99heartbeat

ln ?s /etc/rc.d/init.d/drbd /etc/rc7.d/K36drbd

ln ?s /etc/rc.d/init.d/drbd /etc/rc7.d/S98drbd

 

 

That?s it!

 

 

It?s time to light your cigar!   J

 

 

 

Thanks for taking the effort to read this How-To. You comments are always welcome.

 

 

 

Revision History:

Date

Revision

Changes

 

 

 

14-02-2002

1.0.0

DRAFT release

17-02-2002

1.0.1

Changes to some URL links

Added revision history

20-02-2002

1.0.2

Changes to some URL links

Cosmetic changes

Changed drbd installation section (thanks to Robert Heaton)

22-02-2002

1.0.3

Cosmetic changes

Changed some important details (Thanks to Robert Heaton again)

Added automatic startup links and auto mounting the RAID-1 at boot time

Added ?COMMENTS? section

05-03-2002

1.1.0

Public release

Cosmetic changes

Changed drbd set-up procedure

12-03-2002

1.1.1

Cosmetic and spelling changes (Thanks to Robert Heaton again)

 

 

Added the availability of the mailing list (Thanks to Steve Bovingdon (steve@bov.nu)

 

 

Added some comments

21-03-2002

1.2.1

Spelling changes (Thanks to Rober Heaton and Paul Miller)

Major change in adding extra NIC?s (Thanks to Kees Blokland (kees@blokland.net)

Changed document header, Author is now Initiator for I feel the document will be changed radically over time. :-)

09-08-2002

1.2.1

Ultramonkey changes download location

 

 

 

 


 

 

COMMENTS (some comments/results I received via e-mail)

 

 

Peter Werner (peter_a_werner@yahoo.com)

 

way cool!

 

that looks really good, congrats.

 

ifost is actually working on some high availability

stuff, using heartbeat and unison. basically, the

slave uses unison to keep a backup of the masters /etc

and /home, and heartbeat to detect if the master dies.

if the aster does die, it copies over some files,

merges some files (like /home/e-smith/configuration )

reboots and comes up as the new master. not sure when

it will be done though. i will add a link to your

document on my page when you release it.

 

great stuff!

 

cheers

-pete

 

?          

 

Robert Heaton (rob@northwestlinux.co.uk)

 

Hi!

 

I?ve followed the How-To and all is working!

 

Instead of using a Cross-over Ethernet Cable I used a 10BASE Hub, (So I can see the link working between the two machines)

So Now I have flashing lights all over the place!

 

 

I?m now up to: ?We will mount it later at /home/e-smith/files/ibays/data/files by editing fstab.?

I take it this will be done in the next update to the how-to??

 

Kind regards,

Rob.

 

?          

 

Jeff C (jcoleman_AT_rstrat.com)

 

Bravo!  Congratulations.

 

-jeff

 

?          

 

Judy Morgann (judymorgann@yahoo.com)

 

Hi,

this is the best news i have ever seen here! very great job!!! i think this is the most important function e-smith lacks of. e-smith is a greate server distribution but with your howto it can take place in enterprise.
taking over other services like apache, imap works greate too.
some "problem" i try to figure out is how to replicate the user profiles from an e-smith pdc.
we have some e-smith servers running as pdc for our win2k/xp clients. but if one fails user can?t log in to another server cause of missing the user and machine accounts.
i try to copy smbpasswd and MACHINE.SID from one server to the others via rsync, but clients can?t logon.
if someone has a idea it would be very nice.

thanks again for your greate work!!!

judy

 

?          

 

Alan Robertson (alanr@unix.sh) http://www.linux-ha.org

 

Using Phillip's DRBD software together with heartbeat is a very powerful

combination with no single points of failure.  This is something many

commercial high-end HA systems don't provide.

 

       -- Alan Robertson

          alanr@unix.sh

 

?