Welcome to the FAQ page of the Linux cluster of the QBMI.
What is the configuration of the cluster?
The cluster is composed of 16 two AMD 2800+ processor nodes and one (node 14) two AMD
2400+ processor node (the head node
(schroedinger) and 16 slave nodes) and two Dell PC nodes (the login and nfs node
(neumann) and the backup node (heisenberg)). The neumann and heisenberg nodes are
dual Xeon 2.66 GHz and single processor Celeron 1.7 GHz PCs, respectively.
All the nodes (except heisenberg) have 1GB of operating memory. Neumann has two 250 GB
disks on which there are installed three software RAID 1 (fully redundant) raids
mounted as /, /home and /opt/program file systems.
/dev/md1 /
/dev/md0 /home
/dev/md2 /opt/programs
Neumann, schroedinger and heisenberg have Giga Bit Ethernet cards and they are all connected
to a Giga Bit Ethernet Switch which is connected into a Giga Bit Ethernet Switch of the
Computer Center. In addition, neumann and heisenberg have second Giga Bit Ethernet cards
which serve to connect them using a cross over cable. The home directories on neumann
are mounted on heisenberg employing nfs using that Giga Bit link.
The home directories as well as the directory /opt/programs which are physically on the
disks in neumann are mounted employing nfs over the Fast Ethernet internal network on the
head node and all 16 slave nodes.
Each slave node has two 100 megabit Fast Ethernet cards and two 40 GB hard disks
which are used mainly for the /scratch directory (70 GB; software RAID 0).
The user node is a very fast dual Xeon Processor 2.66GHz node having a
wide I/O bandwidth. Its video card is "nVidia, QuadroFX 500, 128MB, dual monitor VGA
or DVI/VGA capable".
The users must use this node (neumann.cem.msu.edu) to connect to the cluster,
not the head node (schroedinger).
The backup node is a Celeron 1.7 GHz node having 256 MB of operating memory and
a 120 GB disk. It is dedicated only to making backups on four 240 GB USB2 disks as well as
generating web pages. Users cannot login to the backup node.
How can I get my account?
Contact Paul Reed,
How can I login to the cluster?
Only ssh, scp and sftp are allowed. The example commands are as follows:
to login
ssh neumann.cem.msu.edu
to connect by sftp
sfpt neumann.cem.msu.edu
to copy a file using scp
to copy the filename file from the cluster to yourcomputer.cem.msu.edu into
the /home/yourname direcory.
scp filename yourcomputer.cem.msu.edu:/home/yourdirectory
To copy the filename file from yourcomputer.cem.msu.edu and the directory
/home/yourname to the current directory on the cluster
scp yourcomputer.cem.msu.edu:/home/yourdirectory/filename .
Can I login directly to a slave node from my computer?
No. You can login directly only to the user node
and then you can login to any slave
node using ssh. The hostnames of the slave nodes are as follows:
c1-2
c1-3
....
c1-17
E.g. if you want to login to c1-17, type
ssh c1-17.
What is the operating system of the cluster?
Red Hat 9 Linux is installed on all 19 nodes.
Is there a queuing system on the cluster?
No. You can start your job manually on a slave node.
How do I know on which slave node I can start a job?
Click
here
to display the average loads of all the nodes and choose a slave node
having the lowest load average. If there is no slave node having
the load average about zero or one, you have to wait.
Only two single processor jobs or one double processor job
can run on a slave node.
You can also click
here
to see who is running jobs on each slave node.
When you find an available slave node, login to that node using ssh and start your job
on that slave node.
Can I run a job on the head node (schroedinger) or neumann or heisenberg?
No. No job can run on the head node (the policy will change very soon) or neumann
or backup node. However, you can run visualization programs on neumann (molden, nmr draw, etc).
If there is a running job on the three nodes it will be killed. The three nodes
do not serve for running jobs.
Can I run my job in parallel?
Yes, however, only some. There are two ways how to run a job in parallel on a Linux
cluster. Some programs (e.g. Gaussian (see the discussion below)) can be run
in parallel on a computer with the shared memory.
It means for us that a Gaussian job could be run in parallel employing two processors of
the same slave node.
The Gaussian program was tested to run in parallel on two processors
sharing the memory - see the discussion below.
On the other hand, it is possible to run some programs (e.g. Amber) on more slave nodes
using MPI or different parallel environment (e.g. Linda for Gaussian;
Linda was not purchased). One has to keep in mind that if a job is run in parallel using MPI
(or Linda) the internal network could become the bottleneck of the system.
The cluster has an internal network which is only 100 megabits per second.
That might not be enough for an efficient use of MPI on the cluster.
Should I compare the results obtained by the single and MPI codes
of the same program?
Yes. When you run a program in parallel try to run the corresponding
single processor code for a few input files and compare the results.
Do you backup the hard disks?
Regular backups of the home directories are done once per eight days.
Each backup is kept on a usb2 disk for 1 month. The home directories are stored on a fully
redundant software raid (RAID 1) so if one of the disks fails there will be no loss of data.
On the other hand, RAID 1 does NOT replace making backups since if you remove a file by mistake
it can be recovered ONLY from a backup.
Is there any size quota on my files?
No. However, the disk space is limited, and so a file size quota might be imposed in the future.
Where can I keep my
TEMPORARY
files when I am running a program (e.g. Gaussian) on a slave node?
Each slave node has a 70 GB filesystem (/scratch) for that purpose.
Do not save your TEMPORARY files in your home directory.
That directory is on a filesystem which is mounted via nfs. It means that all the data would
go via the Ethernet network and the speed of the cluster would be degraded. Moreover,
e.g. Gaussian would crush since it has no I/O buffer.
Where can I keep my
OUTPUT (PERMANENT)
files when I am running a program on a slave node?
Keep those files in your home directory (or a directory which is a subdirectory of your
home directory). Output files are usually small and they will not slow down the cluster.
If your output files are huge, then keep them in the /scratch filesystem on the slave
node.
Could you tell us whether some compilers were installed?
Yes, the
intel
c++
icc
and fortran
ifc
compilers are installed on neumann and schroedinger.
You have to add into your .cshrc or .login file the lines which are available
here.
The documentation is available here:
Fortran,
c++
Where are the programs installed?
Programs which are meant to be run on slave nodes are installed in /opt/programs.
The corresponding file system resides on the neumann disks and is mounted employing
nfs on the head node and all the slave nodes.
Visualization programs are installed in /usr/local on neumann. That directory is not exported.
Where can I install a program?
You can install a program in your home directory or in a subdirectory of your home
directory. If you need to install a program in /opt/programs, please contact
Paul Reed,
I have a program and do not know how to install it. What should I do?
Please contact Paul Reed,
Could you tell us which programs are installed on the cluster?
As of November 7, 2002 there are four program packages installed on the cluster.
Gaussian 98, rel. A11
Dock
Dock 5, rel. 5.1.0
(Dock 5
pdf manual).
The current release of Dock 5 (which is installed on the cluster) does not work.
Dock 4, rel. 4.02
(Dock 4
pdf manual).
Amber 7
Mopac 509mn
Do I have to modify my .cshrc file to be able to run Gaussian 98?
You have to add into your .cshrc or .login file the lines as follows:
setenv g98root /opt/programs
setenv GAUSSARC $g98root/g98/arch/archive.arc
setenv GAUSS_SCRDIR /local
source $g98root/g98/bsd/g98.login
Since the g98 binaries were compiled on a computer with Red Hat 7.3 they will not run
in parallel (%NPROC=2) on a computer with Red Hat 9 since the Red Hat people changed
the threading. However, the code will run in the single processor mode.
The "%NPROC" command MUST NOT appear in the input file.
And what about Dock 4?
You can add into your .cshrc or .login file the line as follows:
set path = (/opt/programs/DOCK_4.0.2/bin $path)
And what about Amber 7?
You can add into your .cshrc or .login file the line as follows
set path = (/opt/programs/amber7-RH9/exe $path)
setenv AMBERHOME /opt/programs/amber7-RH9
for the single processor code or
set path = (/opt/programs/amber7-RH9-MPICH/exe $path)
set path = (/opt/programs/mpich-1.2.5/bin $path)
setenv AMBERHOME /opt/programs/amber7-RH9-MPICH
for the parallel (MPICH) code
Could you tell us how to run Amber 7 on a single processor?
In order for you to run Amber 7 on a single processor use the executables
in /opt/programs/amber7-RH9/exe (e.g. sander, gibbs, ...).
And how can I run Amber 7 in parallel?
Only sander, gibbs and roar are parallel programs. Use the corresponding executables
in /opt/programs/amber7-RH9-MPICH/exe. The parallel code runs under MPI. It is possible to
run parallel executables on more slave nodes
(the number of processors must be a power of 2, and no greater than 128),
however, the network is too slow for parallel
programs to run efficiently on two or more slave nodes.
So, it is strongly recommended to run parallel executables
only on two processors of
the same slave node.
You can see the results of tests
here.
In order to run e.g. sander in parallel on two processors of the same slave node,
use the command as follows:
mpirun -machinefile machines.LINUX -np 2 /opt/programs/amber7-MPICH/exe/sander
and add other sander options.
The machines.LINUX file contains the hostname of the machine on which sander will
run on two processors. The hostname must be the same as the hostname of the slave node on which
sander is started.
E.g. if you want to run sander in parallel on compute-1-7 then login to that slave node
and start sander on that slave node.
The machines.LINUX file has to contain two lines as follows:
compute-1-7
compute-1-7
The hostname must be twice (on a separate line) in the machines.LINUX
file in order for sander to run on two processors.
Could you tell us how to run Mopac?
There are two executable files of the Mopac program. Both files were compiled using
the Portland group Fortran compiler (pgf77). The difference between the files
is that one was compiled with and the other one without an optimization.
The former and latter files are stored in
/opt/programs/mopac509mn/bin and /opt/programs/mopac509mn-NO-OPT/bin, respectively.
The optimized mopac code fails to reproduce the correct results for the test9 input file.
The unoptimized mopac code provides the correct results for the test9 input file.
When the mopac program was complied employing g77, the optimized code failed to
provide the correct results for the test9 input file as well.
Should I use the optimized or unoptimized Mopac code?
The optimized code is faster and it provides the correct results for 13 out of 14 test
files. So, in most cases it is better to use the optimized code.
You can click
here
to see the test9 input file.
Do I have to modify my .cshrc file to be able to run Mopac?
You can add into your .cshrc or .login file the line as follows:
set path = (/opt/programs/mopac509mn/bin $path)
to run the optimized code or
/opt/programs/mopac509mn-NO-OPT/bin
to run the unoptimized code.
Then type mopac to run the Mopac program.
Did somebody compare how fast programs run on the cluster?
Yes. Click
here
to see the results.
Can I run a Gaussian job in parallel?
Yes. However, only some and only on one slave node using two processors
of that slave node (use
%NPROC=2
). You cannot run a Gaussian job
on more slave nodes. It makes sense to run in parallel only HF and
DFT jobs. Post-HF jobs will not run in parallel.
See the results of the tests.
Click here to read comments regarding running Gaussian
jobs in parallel from
Mike Frisch.