This section covers:

  1. Full system backup
  2. User account backup
  3. Other considerations
    Media integrity
    Data safty

Put SGE and MPICH together under Ubuntu 9.10

It turns out to be more difficult than I thought. Although SGE and MPICH are provided by Ubuntu 9.10 repository, and they both work well alone, SGE just doesn't start MPI jobs properly.

Two issues are involved:

1) the MPICH installation provided Ubuntu repository is not put under the shared directory among all nodes.

2) SGE is not aware of the existance of mpi and one needs a SGE startmpi and stopmpi scriptes for Parallel Environment.

So here describes a reinstallation of MPICH and SGE under /home, which is assumed to be a directory shared by all nodes.

Install Sun Grid Engine (as Ubuntu package)

Sun Grid Engine (SGE) is used as the job scheduler for this experimental cluster.

CPU slots on nodes <-> SGE queues <-> Users

Find top 10 memory or CPU consuming jobs

Find Out The Top 10 Memory Consuming Process:

ps auxf | sort -nr -k 4 | head -10


Find Out top 10 CPU Consuming Process:

ps auxf | sort -nr -k 3 | head -10

Install MPICH (as Ubuntu package)

1. Install MPICH on master node:

sudo aptitude install mpich-bin mpich-mpd-bin mpich-shmem-bin libmpich1.0-dev


2. Sample mpi program:

Passwordless ssh among nodes

This is done on master node, your home directory.

If you don't have .ssh/, run

ssh-keygen -t rsa


Then (or if you already have .ssh/

ssh-copy-id -i ~/.ssh/ node01
ssh node01


To make new accounts to have passwordless ssh

Install compilers

Before installing mpich, the parallel programming tool kit, let's make sure we have basic compilers installed. In a science computing setting, these compilers are essential:

gcc, gfortran, and g77

Although gfortran can compile most fortran 77 code, sometimes it is desired to have a full fortran 77 compiler.

This command installs gcc

sudo apt-get install build-essential linux-headers-`uname -r`

This command installs gfortran and f77

Set up NIS user authentication

On a cluster, user authentication and account information should be synchronized among master and slave nodes. A simple solution is to set up NIS server on the master and NIS client on the slave.


To install NIS server on master node:

Set up Network File System

The master node will export /home as NFS to all slave nodes.

On master node:

sudo aptitude install nfs-kernel-server

Then edit file /etc/exports, add this line:

Install OS on slave nodes

The two slave nodes have the same installation procedure except they have different node name and IP.

Syndicate content