Computer Systems Design (COMP6281 ) Lab Tutorial

Note:
I put up this web page in order to give some help on using the cluster lab so that you can accomplish your assignments. Generally I will answer your questions on this page instead of reply your email individually.
The information on this page needs to be validated. Any contributions are welcome. ~ Lin ~
Electronic Assignment Submission URL:
https://mailhost.cs.concordia.ca/cgi-bin/e-as-sub.cgi
**
Please do maintenance on your email account regularly so that it does not exceed your quota limit. I can't send email to some of you because of your quota problem.

Content
1 About the Beowulf lab at Concordia
2 How to access the lab
3 MPI
4 PVM
5 Other userful resources
6 Questions & Answers

 

1 About the Beowulf lab at Concordia.

http://www.cs.concordia.ca/Beowulf/

http://www.scyld.com/support/docs/beowulf.html

Back to top

2 How to access the lab

You need a remote connection software that support SSH(Secure Shell) protocol to access the system.

Connecting from a Unix/Linux machine in the CS lab: Type "ssh -X guthlaf" to login with user name and password of your CS account, as long as you have registered this course.
Connecting from a Windows2000 machine in the CS lab: You can use either "Telnet (Tera Term)" or "SSH Secure Shell Client", both are available on the desktop of each Windows2000 machine.

Telnet:
(1) Open "Telnet", you will see that it starts with a dialog on top of a window;

(2) Make sure the following information are correctly filled in,
~ Protocol:
TCP/IP
~ Host: guthlaf
~ TCP port: 22
~ Service: SSH
then click the "OK" button, you will be prompted with a dialog;

(3) Fill in with your user name and passphrase, then click the "OK button. You will be connected to the host.

SSH Secure Shell Client:
(1) Open "SSH Secure Shell Client", it will start with a window;
(2) Click the "Quick Connect" button, you will be prompted with a dialog;
(3) Fill in the form: host name (
guthlaf), user name(your CS account user name), port number(22), then click the "Connect" button;
(4) You will be asked to input your password in order to be connected.

Connecting from a Unix/Linux machine connected to the Internet at home:

This is similiar to connecting to it from a Unix/Linux machine in the lab, except that the host name should be guthlaf.cs.concordia.ca, namely you need to type:
ssh -X guthlaf.cs.concordia.ca

Connecting from a Windows95/98/2000/XP machine connected to the Internet at home: (1) Download and install a SSH program software onto your machine.
(2) Follow the same instruction as above: connecting from Windows2000 machine in the lab, except that the host name should be guthlaf.cs.concordia.ca
CS home directory

Your CS home directory is available under: /cshome/...
e.g.
Your Unix home directory: /cshome/grad/your_account or /cshome/ugrad/your_account
Your NT home directory: /cshome/nthome/your_account

SSH software download:

(1) Putty
http://www.oocities.org/wonlin/ine4481/putty.zip

(2) Tera Term Telnet
http://hp.vector.co.jp/authors/VA002416/teraterm.html

(3) SSH Secure Shell Client
http://www.ssh.com/support/downloads/secureshellwks/non-commercial.html

Back to top

3 MPI

Resources:

MPICH home page:
http://www-unix.mcs.anl.gov/mpi/mpich/
MPI User guide:
http://www.cs.concordia.ca/Beowulf/mpich-1.2.2.3/guide.pdf
Scyld Beowulf MPICH Note:
http://www.cs.concordia.ca/Beowulf/mpich-1.2.2.3/README.SCYLD
Beginners Guide to MPI Message Passing Interface:
http://www.jicompsci.org/MPI/MPIguide/MPIguide.html

Quick instructions on compilation and execution:

Compilation:
mpicc -c foo.c // C program
mpif77 -c foo.f // Fortran77 program
mpiCC -c foo.C // C++ program
mpif90 -c foo.f // Fortran90 program
Linking:
mpicc -o foo foo.o // C program
mpif77 -o foo foo.o // Fortran77 program
mpiCC -o foo foo.o // C++ program
mpif90 -o foo foo.o // Fortran90 program
Combining compilation and linking:
mpicc -o foo foo.c // C program
mpif77 -o foo foo.f // Fortran77 program
mpiCC -o foo foo.C // C++ program
mpif90 -o foo foo.f // Fortran90 program

Execution:
mpirun -np 3 myexecutable
// run the program "myexecutable" on three processors

MPI sample:

//////////////////////////////////////////////////
// hello.c

#include <stdio.h>
#include "mpi.h"
main(int argc, char **argv)
{
  int rank, size, tag, rc, i;
  MPI_Status status;
  char message[20];

  rc = MPI_Init(&argc, &argv);
  rc = MPI_Comm_size(MPI_COMM_WORLD, &size);
  rc = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  tag = 7;
  if (rank == 0) {
    strcpy(message, "Hello, world");
    for (i=1; i<size; i++)
      rc = MPI_Send(message, 13, MPI_CHAR, i, tag, MPI_COMM_WORLD);
  }
  else
    rc = MPI_Recv(message, 13, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &status);
  printf( "node %d : %.13s\n", rank,message);
  rc = MPI_Finalize();
}
////////////////////////////////////////////////

[li_wang@guthlaf mpi]$ mpicc -o hello hello.c
[li_wang@guthlaf mpi]$ ls
hello hello.c hello.o
[li_wang@guthlaf mpi]$ mpirun -np 3 hello
node 0 : Hello, world
node 1 : Hello, world
node 2 : Hello, world

[li_wang@guthlaf mpi]$

Back to top

4 PVM

Resources:

PVM home page:
http://www.epm.ornl.gov/pvm/pvm_home.html
Scyld Beowulf Scyld notes:
http://www.cs.concordia.ca/Beowulf/pvm-3.4.3/Readme.Scyld
PVM User guide
http://www.netlib.org/pvm3/book/pvm-book.html

Quick instructions on compilation and execution:

(1) Compilation:
cc -lpvm3 -o foo foo.c
or
gcc -lpvm3 -o foo foo.c

(2) Copy your executable files to the directory:
~/pvm3/bin/BEOSCYLD

(3) Start PVM
Run the command:

pvm
You will see the PVM console:
pvm>

(4) Configure the runtime environment:
for examle, add a host to the virtual machine: add .1
remove a host from the virtual machine: delete .1
It depends on how many hosts are needed to run your program.
Host names are: .0 .1 .2 .3 .4 .5 .6 .7 .8 .9
You could use "conf
" command to see the current configuration of the running PVM.

(5) Run your program
Use "spawn" command:
pvm>
spawn -> myprogram


An alternative way of compilation is to use "aimk" command combined with a configuration file "Makefile.aimk", that can help you to compile multiple files. You may see the examples as follows.

An alternative way of PVM configuration is to predefine your configuration in a host file, for example:
file name: hostfile.txt
------------------
.1
.2
.3
.4
.5
.6
.7
.8
.9
------------------
Use the command "pvm hostfile.txt" to start the PVM, the virtual machine will be automatically configured according to the description of the file.

PVM manual

PVM(1PVM) PVM Version 3.4 PVM(1PVM)

NAME
pvm - PVM version 3 console

SYNOPSIS
pvm [ -options ] [ hostfile ]

DESCRIPTION
Pvm is a stand alone PVM task which allows the user to interactively query and modify the virtual machine. The
console can be started and stopped multiple times on any of the hosts in the virtual machine without affecting PVM or any applications that may be running.

When started pvm determines if PVM is already running and if not automatically executes pvmd3 on this host, passing pvmd3 the command line options and host file. Thus PVM need not be running to start the console. Once started the console prints the prompt:
pvm>

The following console commands are available:
add hostname(s) - Add hosts to virtual machine
alias - Define/list command aliases
conf - List virtual machine configuration
delete hostname(s) - Delete hosts from virtual machine
echo - Echo arguments
export - Add environment variables to spawn export list
halt - Stop pvmds
help [command] - Print helpful information about a command
id - Print console task id
jobs - List running jobs
kill task-tid - Terminate tasks
mstat host-tid - Show status of hosts
ps -a - List all PVM tasks
pstat task-tid - Show status of tasks
quit - Exit console
reset - Kill all tasks
setenv - Display/set environment variables
sig signum task - Send signal to task
spawn [opt] a.out - Spawn task
  opts are:
  -(count) number of tasks, default is 1
  -(host) spawn on host, default is any
  -(ARCH) spawn on hosts of ARCH
  -? enable debugging
  -> redirect task output to console
  -> file redirect task output to file
  ->>file redirect task output append to file
trace - Set/display trace event mask
unexport - Remove environment variables from spawn export list
unalias - Undefine command alias
version - Show libpvm version

pvm reads $HOME/.pvmrc before reading commands from the tty, so it can be used to customize the console environment, for example:
alias ? help
alias j jobs
setenv PVM_EXPORT DISPLAY
# print my id
echo new pvm shell
id

EXAMPLES
pvm
Starts up pvmd3 on the local host or connects to running
pvmd3.

pvm hostfile
Starts up console and pvmd3, which inturn reads the host
file and adds the listed computers to the virtual machine.

PVM examples:

(1)Download the zipped examples file. Extract the files and copy them to the directory ~/pvm3/examples/
(2)Enter the directory: cd ~/pvm3/examples/
then run the command "aimk hello hello_other". The example programs "hello.c" and "hello_other.c" will be compiled and the executable files will be moved to the directory "~/pvm3/bin/BEOSCYLD/" automatically.
You might want to read the configuration file "Makefile.aimk" and modify it in order to compile your own programs.
However, you can compile each program manually by following the compilation instruction provided above.
(3)Run one executable file: hello
[guthlaf examples]$ pvm
pvm> add .1
add .1
1 successful
HOST DTID
.1 80000
pvm> spawn -> hello
spawn -> hello
[1]
1 successful
t80001
pvm> [1:t40002] EOF
[1:t80001] i am t80001
[1:t80001] from t40002: hello, world from .-1
[1:t80001] EOF
[1] finished

Back to top

5 Other useful resources

"Make" tutorial http://www.opussoftware.com/tutorial/TutMakefile.htm
http://vertigo.hsrl.rutgers.edu/ug/make_help.html
http://www.eng.hawaii.edu/Tutor/Make/
C, C++ compiler tutorial http://galton.uchicago.edu/~gosset/Compdocs/gcc.html

Back to top

6 Questions & Answers

Q1

For the MPI, I always meet problem during linking the library although I already specified the path for the library in the makefile. In the end, I copied my .cc file into the same directory where the library files are, and bypass the problem. The attachment is my makefile for the MPI program stored in the current directory; and the library file is libgd.a and is stored in the gimage directory.
// makefile
C++ = g++
PWD = /home/ugrad/dm_tang/asg
LIBDIR = $(PWD)/gimage
INCDIR = $(PWD)/gimage
LIBS = -lgd

TARGET = Convolute
OBJS = Convolute.o

$(TARGET): $(OBJS)
$(C++) -o $(TARGET) $(OBJS) $(LIBS) -I$(INCDIR) -L$(LIBDIR)

Convolute.o: Convolute.cc
$(C++) -c $(@:.o=.cc) $(LIBS) -I$(INCDIR) -L$(LIBDIR)

true:

//
Answer:
(1) You don't need to set the variable PWD
(2) You should copy the gimage package to the gimage directory that should exist in the your Convolute program directory.

Q2 More serious problem is about the PVM. I do not know what I need to set the PVM_ROOT and PVM_ARCH ? I tried some pathes, but it does not work. So I even can not compile my sample files using "aimk". It will be very nice of you to give me some instructions on how to set the environment for the PVM.
Answer:
You don't need to set the PVM_ROOT and PVM_ARCH. I guess you probably got some hints from some students from last semester. I have noticed that the configuration of the lab had been changed.
To solve the problem, you can either use "unsetenv" to remove the environment variables, or set the variable as follows:
setenv PVM_ROOT /usr/share/pvm3
setenv PVM_ARCH BEOSCYLD
Q3 Is it possible to connect to the cluster(lab) from home develop and run my program? How do we connect? (ssh?) Answer: Yes. Read the instruction above and download the software.
I was told that there's a web page about MPI and I don't remember exactly where this page is. It's something about "Concordia XXwulf" in which I don't remember what XX is.
Answer: http://www.cs.concordia.ca/Beowulf/
Q4

Could you please tell me how I can run the sequancial code that the professor gave out for assignment 1?
Answer:
The following are the steps to run the sequential code of assignment 1:
(1) Extract those source code files (*.Z), either using Winzip program on Windows system, or using command "gzip -d file.Z" and "tar -xf file.tar" on Unix machine. Make sure that the directory "gimage" that contains gimage source code files is under the directory "SEQ_DiscConv" that contains "Convolute.cc" etc.
(2) Compile the "gimage" library files. Enter the gimage directory and run the command "make".
(3) Due to the problem that the compiler running on Guthlaf is different from on other machines, you need to modify a few things on the source code as well as compilation file. Here you are:
--(a) Edit "Convolute.cc" on line 76: " for (int y =0; ........). Declare the variable y outside the "for" statement.
-- int y;
-- for (y=0; .........) "
--(b) Edit "makefile" on the first line: change " C++ = CC " into " C++ = g++". This is because there is no CC on Guthlaf.
(4) Compile the Convolute.cc file: enter the SEQ_DiscConv directory and run the command "make".
(5) Now you can run the program.
There is no CC compiler on that machine. What should I do?
Answer: Some compilers available on Guthlaf are: cc, gcc, c++, g++.

Q5 I found that when I run my program on multiple hosts ( using add .2, add .3 etc ), my program always be throwed out because of "Segmentation fault (core dumped)".
Answer:
This seems a memory violation problem. Check your program to see whether there is any memory violation statements, such as. array boundary
Q6 When I run my program on just one host, I found that the main program occupies most of the cpu time, the child process only gets very little time to run. So the main process is very busy in a loop to continue to check if it gets the feedback from the children, but the children have little time to run, so the parent process wastes a lot of time in the loop.
Answer:
If you use non-blocking receive in a loop and that loop does nothing except for receiving, definitely it won't be broken up to allow other process to get in. If this is the case, please use blocking recieve instead of non-blocking recieve loop , or let the process sleep for some time in order to release the resource for other processes. However, it's only for multi-process on one host, not for parallel.
Q7

I use pvm_mcast() to send a terminal signal to all children inside main process, but it looks that the children do not receive it.
Answer:
The example program master1 and slave1 is a typical use of pvm_mcast(), and it works fine. Try to follow this example.

Q8

I am trying to write a program using PVM, I compiled the demo program hello along with hello_other and I was able to run it. The problem is that I don't see any outputs from the printf() function. I tried fflush(stdout), fprintf(stdout, ...) but nothing is working as it is supposed to.
Answer:
Sometimes as I know PVM programs don't always output completely to your terminal probably because of any synchronous problem of PVM. In case this happens, you can reset the PVM environment by issuing "reset" command, or you may just halt the PVM("halt" command) and restart a new one. If the program is correct, it should print correctly even though not always.

Q9 When I develop my pvm program, I tried the pvm_joingroup(), but it does not work. I am not sure if it is because of my program or this function does not work on the beowulf cluster?
Answer:

You may look at the "gexample.c" which shows a typical use of pvm_joingroup(). To compile it, a separate library libgpvm3.a must be linked with your programs that make use of any of the group functions. To make it easy, you can use "aimk" to compile your program. You need to follow the compiling and linking statements of gexample.c in the file "Makefile.aimk" and modify it for your own program.
Q10 Is there a synchronous pvm_send() function? (I searched by google, but did not find it.)
Answer:

No, as I know the current PVM only support asynchronous message sending ( pvm_send and pvm_psend ). However, PVM support blocking recieve(pvm_recv and pvm_precv) and non-blocking recieve(pvm_nrecv). If you run on a multi-node PVM, you should not have this problem as I understand.
Q11

I tried to use the Convolute, and get the attached pictures. It seems that there is no mask. It just changed the color. Is this correct? If not , what's wrong? If it is correct, why there is no masks?
Answer:
I produce the same result as you do which is normal. You can try some photos, then you will see how it "masks" them. The mask here is a file that is used for convoluting images. Your main focus should be on parallelizing the algorithm so that your parallel program produces the same result as the sequential one does.

Q12

I have already compiled and run the program by using g++. However, I found that mpicc cannot compile the original Convolute.cc. There are lots of errors. It seems that mpicc does not support c++ because many errors are about cout and ifstream.
Answer:
mpicc is a MPI compiler for C program while mpiCC is a MPI compiler for C++ program. You should use mpiCC because Convolute.cc is a C++ program.

Also, the linker cannot find the function gdImageCreateFromGif and gdImageGif?
Answer:
I noticed that when compiling a program using mpiCC with a library such as "-lgd -Igimage -Lgimage", the compiler cannot find out the library. It seems the wraper file has been changed because "mpiCC" worked well before. Anyway, I found out a solution to help you guys overcome the current urgent difficulty.
1) Use the following command to get the compiling and linking information of mpiCC:
[li_wang@guthlaf ~]$ mpiCC -compile_info
g++ -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1 -DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1 -I/usr/include -DHAVE_MPI_CPP -I/usr/include/mpi2c++ -fexceptions -c
[li_wang@guthlaf ~]$ mpiCC -link_info
g++ -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1 -DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1 -L/usr/lib -lpmpich++ -lmpich -lbproc -Wl,--undefined=beowulf_sched_shim,--undefined=get_beowulf_job_map -lbproc -lbpsh -lpvfs -lbeomap -lbeostat -ldl
2) Edit the command above by inserting your own command option right after "g++", then compile and link your program.
For example: to compile "Convolute.cc" I use the following command and it works well
g++ -o Convolute Convolute.cc -lgd -Igimage -Lgimage -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1 -DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1 -L/usr/lib -lpmpich++ -lmpich -lbproc -Wl,--undefined=beowulf_sched_shim,--undefined=get_beowulf_job_map -lbproc -lbpsh -lpvfs -lbeomap -lbeostat -ldl
I know it is too long to input correctly. Luckily most SSH programs have copy and paste functionality.

Q13 I want to send a masked image, which is a type of "class Image". Can I send this by using MPI_SEND, or is there another function to perform this?
Answer:

There are many ways to overcome this problem, e.g. object serialization or extracting the data from your object.

Q14

Is there a limit of the buffer size in MPI_SEND?
Answer:

You could look at the following page which describes about MPI size limit.
http://hpcf.nersc.gov/vendor_docs/ibm/pe/am106mst39.html#HDRMSL

Q15

"pl_24486: p4-error: interrupt SIGSEGV: 11 Do you know what does it mean?
Answer:
If your program fails with "p4_error: interrupt SIGSEG" the problem is probably not with MPI. Instead, check for program bugs including
1) Array overwrites or accesses beyond array bounds. Be particularly careful of a[size] in C, where a is declared as int a[size].
2) Invalid pointers, including null pointers.
3) Missing or mismatched parameters to subroutines or functions. Fortran users should check that all MPI calls include the integer error return parameter and that any status variable is dimensioned as an array of size MPI_STATUS_SIZE.

More generally, how could we find out the meaning of a error message?
Answer:
Unfortunately I couldn't find out an error message list for you. You need to read relevant documentation yourself to understand how to debug your program, especially troubleshooting part.

Q16 How can i access my programs stored in my directory when i log in to Guthlaf from Linux?
Answer:
Your CS home directory is available under: /cshome/...
e.g.
Your Unix home directory:
/cshome/grad/your_account
or
/cshome/ugrad/your_account
Your NT home directory:
/cshome/nthome/your_account
Q17

have problems running the code and seeing output..though the program compiles well. Does it have to do with the environment (directory) ? and could u please tell me where to place my executable file..
Answer:
1) Make sure your executable files exist in the directory "~/pvm3/bin/BEOSCYLD/"
2) Stop(halt) your pvm machine and restart it, then try your program again.
3) If it still doesn't work, there might be bugs in your program.

Q18

I think I have problem at message passing. Master sends data to slave, but seems not receive data from slave correctly. How can I debug it? I read some about debug. But still do not really understand how it works. Can u tell me a simple way to debug message passing.
Answer:
I debug programs by inserting printf() statements into programs which is the easist way for me.
The command option "spawn -@(file) yourprogram" is also a simple way to trace your program.

Q19 It seems that I cannot add more than 9 processors in PVM now. What's wrong with it?
Answer:
The following are the hardware Information of the cluster

11 IBM X330 model 8674 31x dual CPU node.

Master Node:

CPU: 2 x Pentium III 1.26 GHz
RAM: 2GB ECC SDRAM 133 mhz
NIC: 2 x 10/100 Ether adapter 06P3601
HDD: 2 x 36GB Ultra 160 SCSI

10 Compute Nodes:

CPU: 2 x Pentium III 1.26 GHz
RAM: 2GB ECC SDRAM 133 mhz
NIC: 2 x 10/100 Ether adapter 06P3601
HDD: 18GB Ultra 160 SCSI

That's to say, you are allowed to add nodes from .0 to .9 such that you may have a PVM with maximum of 11 nodes (22 CPUs).
You could use "beostatus" command to check the status of the cluster. You need to press "q" to quit.

Q20 How we hand in our assignment? Upload the code to somewhere or put it in disk?
Answer:
Upload your complete assignment and hand in a hard copy(paper). You don't need to submit a floppy.
Q21

I was able to start and stop pvm on guthlaf a few minutes ago. Now, when I try to start pvm it shows error msg:
[xdai_1@guthlaf lab2]$ pvm
libpvm [pid26933] mksocs() connect: Connection refused
libpvm [pid26933] socket address tried: /tmp/pvmtmp026481.0
pvmd already running.
libpvm [pid26933] mksocs() connect: Connection refused
libpvm [pid26933] socket address tried: /tmp/pvmtmp026481.0
libpvm [pid26933] mksocs() connect: Connection refused
libpvm [pid26933] socket address tried: /tmp/pvmtmp026481.0
libpvm [pid26933] mksocs() connect: Connection refused
libpvm [pid26933] socket address tried: /tmp/pvmtmp026481.0
libpvm [pid26933]: pvm_mytid(): Can't contact local daemon
[xdai_1@guthlaf lab2]$
Any idea what goes wrong? I checked all processes belong to me. All seems normal:
[xdai_1@guthlaf lab2]$ ps -aux |grep xdai_1
xdai_1 23547 0.0 0.0 3228 1456 pts/12 S 06:28 0:00 -tcsh
xdai_1 26847 0.0 0.0 3224 1812 pts/11 S 10:46 0:00 -tcsh
xdai_1 26882 0.0 0.0 3208 1796 pts/15 S 10:46 0:00 -tcsh
xdai_1 27035 17.0 0.0 2884 952 pts/12 R 11:07 0:00 ps -aux
xdai_1 27036 0.0 0.0 1732 592 pts/12 S 11:07 0:00 grep xdai_1
What can I do to fix this problem?
Answer:
You can do the following before
start pvm again:
1- Use ps command to check your running processes. Kill all pvm processes.
2- Delete the files that your pvm generated in the directory /tmp/.

Back to top