Welcome to the FAQ page of the Linux cluster of the QBMI.




What is the configuration of the cluster?
The cluster is composed of 16 two AMD 2800+ processor nodes and one (node 14) two AMD 2400+ processor node (the head node (schroedinger) and 16 slave nodes) and two Dell PC nodes (the login and nfs node (neumann) and the backup node (heisenberg)). The neumann and heisenberg nodes are dual Xeon 2.66 GHz and single processor Celeron 1.7 GHz PCs, respectively. All the nodes (except heisenberg) have 1GB of operating memory. Neumann has two 250 GB disks on which there are installed three software RAID 1 (fully redundant) raids mounted as /, /home and /opt/program file systems.

/dev/md1 /
/dev/md0 /home
/dev/md2 /opt/programs

Neumann, schroedinger and heisenberg have Giga Bit Ethernet cards and they are all connected to a Giga Bit Ethernet Switch which is connected into a Giga Bit Ethernet Switch of the Computer Center. In addition, neumann and heisenberg have second Giga Bit Ethernet cards which serve to connect them using a cross over cable. The home directories on neumann are mounted on heisenberg employing nfs using that Giga Bit link.
The home directories as well as the directory /opt/programs which are physically on the disks in neumann are mounted employing nfs over the Fast Ethernet internal network on the head node and all 16 slave nodes.
Each slave node has two 100 megabit Fast Ethernet cards and two 40 GB hard disks which are used mainly for the /scratch directory (70 GB; software RAID 0).
The user node is a very fast dual Xeon Processor 2.66GHz node having a wide I/O bandwidth. Its video card is "nVidia, QuadroFX 500, 128MB, dual monitor VGA or DVI/VGA capable". The users must use this node (neumann.cem.msu.edu) to connect to the cluster, not the head node (schroedinger).
The backup node is a Celeron 1.7 GHz node having 256 MB of operating memory and a 120 GB disk. It is dedicated only to making backups on four 240 GB USB2 disks as well as generating web pages. Users cannot login to the backup node.

How can I get my account?
Contact Paul Reed,

How can I login to the cluster?
Only ssh, scp and sftp are allowed. The example commands are as follows:

to login
ssh neumann.cem.msu.edu

to connect by sftp
sfpt neumann.cem.msu.edu

to copy a file using scp

to copy the filename file from the cluster to yourcomputer.cem.msu.edu into the /home/yourname direcory.

scp filename yourcomputer.cem.msu.edu:/home/yourdirectory

To copy the filename file from yourcomputer.cem.msu.edu and the directory /home/yourname to the current directory on the cluster

scp yourcomputer.cem.msu.edu:/home/yourdirectory/filename .

Can I login directly to a slave node from my computer?
No. You can login directly only to the user node and then you can login to any slave node using ssh. The hostnames of the slave nodes are as follows:
c1-2
c1-3
....
c1-17

E.g. if you want to login to c1-17, type

ssh c1-17.

What is the operating system of the cluster?
Red Hat 9 Linux is installed on all 19 nodes.

Is there a queuing system on the cluster?
No. You can start your job manually on a slave node.

How do I know on which slave node I can start a job?
Click here to display the average loads of all the nodes and choose a slave node having the lowest load average. If there is no slave node having the load average about zero or one, you have to wait. Only two single processor jobs or one double processor job can run on a slave node.
You can also click here to see who is running jobs on each slave node. When you find an available slave node, login to that node using ssh and start your job on that slave node.

Can I run a job on the head node (schroedinger) or neumann or heisenberg?
No. No job can run on the head node (the policy will change very soon) or neumann or backup node. However, you can run visualization programs on neumann (molden, nmr draw, etc). If there is a running job on the three nodes it will be killed. The three nodes do not serve for running jobs.

Can I run my job in parallel?
Yes, however, only some. There are two ways how to run a job in parallel on a Linux cluster. Some programs (e.g. Gaussian (see the discussion below)) can be run in parallel on a computer with the shared memory. It means for us that a Gaussian job could be run in parallel employing two processors of the same slave node. The Gaussian program was tested to run in parallel on two processors sharing the memory - see the discussion below.
On the other hand, it is possible to run some programs (e.g. Amber) on more slave nodes using MPI or different parallel environment (e.g. Linda for Gaussian; Linda was not purchased). One has to keep in mind that if a job is run in parallel using MPI (or Linda) the internal network could become the bottleneck of the system. The cluster has an internal network which is only 100 megabits per second. That might not be enough for an efficient use of MPI on the cluster.

Should I compare the results obtained by the single and MPI codes of the same program?
Yes. When you run a program in parallel try to run the corresponding single processor code for a few input files and compare the results.

Do you backup the hard disks?
Regular backups of the home directories are done once per eight days. Each backup is kept on a usb2 disk for 1 month. The home directories are stored on a fully redundant software raid (RAID 1) so if one of the disks fails there will be no loss of data. On the other hand, RAID 1 does NOT replace making backups since if you remove a file by mistake it can be recovered ONLY from a backup.

Is there any size quota on my files?
No. However, the disk space is limited, and so a file size quota might be imposed in the future.

Where can I keep my TEMPORARY files when I am running a program (e.g. Gaussian) on a slave node?
Each slave node has a 70 GB filesystem (/scratch) for that purpose. Do not save your TEMPORARY files in your home directory. That directory is on a filesystem which is mounted via nfs. It means that all the data would go via the Ethernet network and the speed of the cluster would be degraded. Moreover, e.g. Gaussian would crush since it has no I/O buffer.

Where can I keep my OUTPUT (PERMANENT) files when I am running a program on a slave node?
Keep those files in your home directory (or a directory which is a subdirectory of your home directory). Output files are usually small and they will not slow down the cluster.
If your output files are huge, then keep them in the /scratch filesystem on the slave node.

Could you tell us whether some compilers were installed?
Yes, the intel c++ icc and fortran ifc compilers are installed on neumann and schroedinger. You have to add into your .cshrc or .login file the lines which are available here.
The documentation is available here: Fortran, c++

Where are the programs installed?
Programs which are meant to be run on slave nodes are installed in /opt/programs. The corresponding file system resides on the neumann disks and is mounted employing nfs on the head node and all the slave nodes.
Visualization programs are installed in /usr/local on neumann. That directory is not exported.

Where can I install a program?
You can install a program in your home directory or in a subdirectory of your home directory. If you need to install a program in /opt/programs, please contact Paul Reed,

I have a program and do not know how to install it. What should I do?
Please contact Paul Reed,

Could you tell us which programs are installed on the cluster?
As of November 7, 2002 there are four program packages installed on the cluster.

Gaussian 98, rel. A11

Dock
  • Dock 5, rel. 5.1.0 (Dock 5 pdf manual).
  • The current release of Dock 5 (which is installed on the cluster) does not work.
  • Dock 4, rel. 4.02 (Dock 4 pdf manual).

  • Amber 7

    Mopac 509mn

    Do I have to modify my .cshrc file to be able to run Gaussian 98?
    You have to add into your .cshrc or .login file the lines as follows:

    setenv g98root /opt/programs
    setenv GAUSSARC $g98root/g98/arch/archive.arc
    setenv GAUSS_SCRDIR /local
    source $g98root/g98/bsd/g98.login

    Since the g98 binaries were compiled on a computer with Red Hat 7.3 they will not run in parallel (%NPROC=2) on a computer with Red Hat 9 since the Red Hat people changed the threading. However, the code will run in the single processor mode. The "%NPROC" command MUST NOT appear in the input file.

    And what about Dock 4?
    You can add into your .cshrc or .login file the line as follows:

    set path = (/opt/programs/DOCK_4.0.2/bin $path)

    And what about Amber 7?
    You can add into your .cshrc or .login file the line as follows

    set path = (/opt/programs/amber7-RH9/exe $path)
    setenv AMBERHOME /opt/programs/amber7-RH9

    for the single processor code or

    set path = (/opt/programs/amber7-RH9-MPICH/exe $path)
    set path = (/opt/programs/mpich-1.2.5/bin $path)
    setenv AMBERHOME /opt/programs/amber7-RH9-MPICH

    for the parallel (MPICH) code

    Could you tell us how to run Amber 7 on a single processor?
    In order for you to run Amber 7 on a single processor use the executables in /opt/programs/amber7-RH9/exe (e.g. sander, gibbs, ...).

    And how can I run Amber 7 in parallel?
    Only sander, gibbs and roar are parallel programs. Use the corresponding executables in /opt/programs/amber7-RH9-MPICH/exe. The parallel code runs under MPI. It is possible to run parallel executables on more slave nodes (the number of processors must be a power of 2, and no greater than 128), however, the network is too slow for parallel programs to run efficiently on two or more slave nodes. So, it is strongly recommended to run parallel executables only on two processors of the same slave node. You can see the results of tests here. In order to run e.g. sander in parallel on two processors of the same slave node, use the command as follows:

    mpirun -machinefile machines.LINUX -np 2 /opt/programs/amber7-MPICH/exe/sander

    and add other sander options.
    The machines.LINUX file contains the hostname of the machine on which sander will run on two processors. The hostname must be the same as the hostname of the slave node on which sander is started.
    E.g. if you want to run sander in parallel on compute-1-7 then login to that slave node and start sander on that slave node. The machines.LINUX file has to contain two lines as follows:

    compute-1-7
    compute-1-7

    The hostname must be twice (on a separate line) in the machines.LINUX file in order for sander to run on two processors.

    Could you tell us how to run Mopac?
    There are two executable files of the Mopac program. Both files were compiled using the Portland group Fortran compiler (pgf77). The difference between the files is that one was compiled with and the other one without an optimization. The former and latter files are stored in /opt/programs/mopac509mn/bin and /opt/programs/mopac509mn-NO-OPT/bin, respectively. The optimized mopac code fails to reproduce the correct results for the test9 input file. The unoptimized mopac code provides the correct results for the test9 input file. When the mopac program was complied employing g77, the optimized code failed to provide the correct results for the test9 input file as well.

    Should I use the optimized or unoptimized Mopac code?
    The optimized code is faster and it provides the correct results for 13 out of 14 test files. So, in most cases it is better to use the optimized code. You can click here to see the test9 input file.

    Do I have to modify my .cshrc file to be able to run Mopac?
    You can add into your .cshrc or .login file the line as follows:

    set path = (/opt/programs/mopac509mn/bin $path)

    to run the optimized code or

    /opt/programs/mopac509mn-NO-OPT/bin

    to run the unoptimized code.
    Then type mopac to run the Mopac program.

    Did somebody compare how fast programs run on the cluster?
    Yes. Click here to see the results.

    Can I run a Gaussian job in parallel?
    Yes. However, only some and only on one slave node using two processors of that slave node (use %NPROC=2 ). You cannot run a Gaussian job on more slave nodes. It makes sense to run in parallel only HF and DFT jobs. Post-HF jobs will not run in parallel. See the results of the tests. Click here to read comments regarding running Gaussian jobs in parallel from Mike Frisch.