Backup

This section covers:

  1. Full system backup
  2. User account backup
  3. Other considerations
    Media integrity
    Data safty
     

Put SGE and MPICH together under Ubuntu 9.10

It turns out to be more difficult than I thought. Although SGE and MPICH are provided by Ubuntu 9.10 repository, and they both work well alone, SGE just doesn't start MPI jobs properly.

Two issues are involved:

1) the MPICH installation provided Ubuntu repository is not put under the shared directory among all nodes.

2) SGE is not aware of the existance of mpi and one needs a SGE startmpi and stopmpi scriptes for Parallel Environment.

So here describes a reinstallation of MPICH and SGE under /home, which is assumed to be a directory shared by all nodes.

Install Sun Grid Engine (as Ubuntu package)

Sun Grid Engine (SGE) is used as the job scheduler for this experimental cluster.

CPU slots on nodes <-> SGE queues <-> Users

Find top 10 memory or CPU consuming jobs

Find Out The Top 10 Memory Consuming Process:

ps auxf | sort -nr -k 4 | head -10

 

Find Out top 10 CPU Consuming Process:

ps auxf | sort -nr -k 3 | head -10

Install MPICH (as Ubuntu package)

1. Install MPICH on master node:

sudo aptitude install mpich-bin mpich-mpd-bin mpich-shmem-bin libmpich1.0-dev

 

2. Sample mpi program:

Passwordless ssh among nodes

This is done on master node, your home directory.

If you don't have .ssh/id_rsa.pub, run

ssh-keygen -t rsa

 

Then (or if you already have .ssh/id_rsa.pub):

ssh-copy-id -i ~/.ssh/id_rsa.pub node01
ssh node01

 

To make new accounts to have passwordless ssh

Install compilers

Before installing mpich, the parallel programming tool kit, let's make sure we have basic compilers installed. In a science computing setting, these compilers are essential:

gcc, gfortran, and g77

Although gfortran can compile most fortran 77 code, sometimes it is desired to have a full fortran 77 compiler.

This command installs gcc

sudo apt-get install build-essential linux-headers-`uname -r`

This command installs gfortran and f77

Set up NIS user authentication

On a cluster, user authentication and account information should be synchronized among master and slave nodes. A simple solution is to set up NIS server on the master and NIS client on the slave.

 

To install NIS server on master node:

Set up Network File System

The master node will export /home as NFS to all slave nodes.

On master node:

sudo aptitude install nfs-kernel-server

Then edit file /etc/exports, add this line:

Install OS on slave nodes

The two slave nodes have the same installation procedure except they have different node name and IP.

Syndicate content