History for BatchSystemAdmin
changed:
-
**The HPCf Batch System**
boreas.hpcf.upr.edu runs OpenPBS and Maui to manage cpu resources, this file explains how we set up OpenPBS and maui to manage the queue. User level documentation on the batch queue is available on our public web site:
http://www.hpcf.upr.edu/docs/batch/
**PBS**
**What is PBS?**
The Portable Batch System, PBS, is a batch job and computer system resource
management package. It was developed with the intent to be conformant with
the POSIX 1003.2d Batch Environment Standard. As such, it will accept batch
jobs, a shell script and control attributes, preserve and protect the job
until it is run, run the job, and deliver output back to the submitter.
**Homepage and Download**
http://www.openpbs.org
The program is only available for registered users along with most
of the documentation. Available documentation consists of:
* PBS Commands: html version of common man pages
* Using xpbs: Documentation for xpbs, a GUI to PBS commands
* System Administrator's Guide: provides information on building, installing and configuring OpenPBS.
**Additional Useful Sites**
http://www.msi.umn.edu/smp/info/jobs/
Supercomputing Institute at University of Minnesota
http://www.hpc.gatech.edu/starting/o2000.html#HEADING19
PBS information at the Georgia Tech HPC Group
http://www.chpc.utah.edu/policies/pbs.html
PBS User Guide at the University of Utah's Center for High Performance
Computing
**Installation**
PBS is distributed as source code. The distribution is installed under
/usr/people/humberto/src/OpenPBS_2_3_12. The configure command used to compile
the version currently running was:
<pre>
# Necessary to be able to build tcl/tk for 64 bits
> setenv LDFLAGS "-L/usr/freeware/lib64"
# Set compilation configuration options
> ./configure
--enable-docs
--set-server-home=/usr/local/spool/pbs
--enable-plock-daemons=7
--enable-syslog
--with-scp
--enable-nodemask
--enable-array
--set-cc
--enable-gui
--with-tcl=/usr/freeware
--set-default-server=boreas.hpcf.upr.edu
--set-cflags=-64
</pre>
On manutara, I disabled the gui (no xpbs)
<pre>
./configure \
--enable-docs \
--set-server-home=/usr/local/spool/pbs \
--enable-plock-daemons=7 \
--enable-syslog \
--with-scp \
--set-cc \
--set-default-server=manutara.hpcf.upr.edu \
--set-cflags=-64
</pre>
**Configuration**
PBS consists of three daemons, each one with its own configuration. The PBS
Server (pbs_server) is in charge of accepting jobs for submission and then
according to the scheduler decisions putting each job to run. The PBS Mom
(pbs_mom) runs on every node controlled by PBS and is in charge of running
the jobs, monitoring the node so it won't get overloaded and informing the
server of the status of jobs.
Finally, the PBS scheduler (pbs_sched) is in charge of making the decissions
of what runs when and where (we replace pbs_sched by maui). The server presents the scheduler with a list of
jobs and attributtes, and based on a user-configurable policy it determines
the order in which jobs will run.
The configuration files are stored in every daemon's private directory, under
$PBS_HOME/<daemon_name>_priv
**Server Configuration**
Unlike Mom and the Job Scheduler, the Job Server (pbs_server) is configured
while it is running. Only the nodes file has to be created before the server
is started. This file ($PBS_HOME/server_priv/nodes) lists all the nodes
controlled by PBS. It's not necessary if the system has only one exclusive
node, i.e one that is used by one and only one job at a time. If there's
only one node in the system and is timeshared, it has to be declared in
this file.
For example, the nodes file for boreas consists of:
boreas.hpcf.upr.edu np=20
The np declares the number
of virtual processors that this node has.
The complete syntax is: <node_name>[:ts] [property...] [np=#] (see page 21
of Admin Guide)
After the nodes file is ready, the server is started and its configuration
is done through the qmgr(8) command. qmgr provides the means to send
commands to the server, either as arguments to qmgr (using the -c flag) or
through its interface (by typing qmgr, a prompt ">" will come up waiting
for commands).
The syntax for commands is (more details on the man page):
command server [names] [attr OP value[,attr OP value,...]]
command queue [names] [attr OP value[,attr OP value,...]]
command node [names] [attr OP value[,attr OP value,...]]
Where,
command is the command to perform on a object (objects being server,
queue and node). Commands are:
active -> sets active objects. Primarly to avoid using object's
names in commands
create -> to create a new object
delete -> destroy an object
set -> define or alter the attributes of the object
unset -> clear attributes of the object
list -> list the current attirbutes and values for the object
print -> print all queue and server attributes in a format usable
as input for qmgr
names is a list of one or more names of specific objects. The name list
is in the form:
[name][@server][,queue_name[@server]...]
the name is declared when the object is first created.
To record the changes made to the server configuration, use the command:
qmgr -c "print server" > $PBS_HOME/server_priv/server_conf
When the server is started, the configuration file may be reread using:
qmgr < $PBS_HOME/server_priv/server_conf
To start the server the first time:
<pre>
# /usr/local/sbin/pbs_server -t create
</pre>
The server and mom must run as root, maui will run as the batch user.
See below for example server configurations.
**Mom Configuration**
(page 31 Admin Guide)
The configuration file is stored in $PBS_HOME/mom_priv/config (mom doesn't
require that this file exists. Another file may be specified using -c, or if
the default values are ok, mom may be run without havign to read any config
file). The current configuration is as suggested on the maui site, for integrating maui and PBS.
Current configuration options are:
<pre>
$logevent 0x1FF
$clienthost boreas.hpcf.upr.edu
$restricted boreas.hpcf.upr.edu
</pre>
**Scheduler Configuration**
We replace the PBS sceduler with maui. Maui documentation is extensive, and is available at http://www.supercluster.org/.
Maui runs under the batch userid, and is set up in /usr/people/batch/pbs/maui-3.0.5
Maui by default dedicates a machine to a job, so the most important configuration change is to specify NODEACCESSPOLICY = SHARED
The current configuration reads:
<pre>
# maui.cfg 3.0
SERVERHOST boreas.hpcf.upr.edu
# primary admin must be first in list
ADMIN1 batch
ADMIN2 humberto william
RMTYPE[0] PBS
# parameters documented at http://supercluster.org/documentation/maui/parameters.html
# use the 'showconfig' command to display current maui settings
RMPOLLINTERVAL 00:01:00
SERVERPORT 42559
SERVERMODE NORMAL
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3
#DEFAULTDOMAIN <DOMAIN>
# Priority Weights
QUEUETIMEWEIGHT 2
XFACTORWEIGHT 1000
RESOURCEWEIGHT 800
# FairShare
FSPOLICY OFF
FSDEPTH 7
FSINTERVAL 86400
FSDECAY 0.80
# Policies
BACKFILLPOLICY ON
BACKFILLTYPE FIRSTFIT
ALLOCATIONPOLICY MINRESOURCE
RESERVATIONPOLICY CURRENTHIGHEST
MAXJOBPERUSERPOLICY OFF
MAXJOBPERUSERCOUNT 8
MAXPROCPERUSERPOLICY OFF
MAXPROCPERUSERCOUNT 256
MAXPROCSECONDPERUSERPOLICY OFF
MAXPROCSECONDPERUSERCOUNT 36864000
MAXJOBQUEUEDPERUSERPOLICY OFF
MAXJOBQUEUEDPERUSERCOUNT 2
MAXPROCPERGROUPPOLICY OFF
SMAXPROCPERGROUPCOUNT 128
MAXPROCPERGROUPCOUNT 160
NODEACCESSPOLICY SHARED
</pre>
**Starting and Stoping the Services**
The daemons are started automatically at boot time using the /etc/init.d/pbs
script. The mom and the scheduler don't have any special command line options.
(see pbs_mom and pbs_sched man pages).The most relevant option for the
server (pbs_server) is the -t flag, which Specifies the impact on jobs which
were in execution, running, when the server shut down. The default value
is warm, which means that all rerunnable jobs which were running when the
server went down are requeued. The current setting is to start up hot, i.e.
all jobs are requeued except non-rerunnable jobs that were executing. Any
rerunnable job which was executing when the server went down will be run
immediately (see man page for details and the /etc/init.d/pbs script).
To stop them, a simple TERM signal should work (that's what the script does).
-- Main.RicardoBaratto <br>
**Setting up the batch cpusets**
See the IRIX Admin: Resource Administration book chapter on cpusets, available from:
http://techpubs.sgi.com/library/dynaweb_bin/ebt-bin/0650/nph-infosrch.cgi/infosrchtpl/SGI_Admin/IA_Resource/@InfoSearch__BookTextView/4458
We have an open boot cpuset with 4 processors (see man boot_cpuset) defined in
/etc/config/boot_cpuset.config
<pre>
MEMORY_LOCAL
MEMORY_MANDATORY
CPU 0
CPU 1
CPU 2
CPU 3
</pre>
and enabled with chkconfig:
<pre>
# chkconfig -f boot_cpuset on
</pre>
An exclusive Batch cpuset is defined in
/etc/config/batch_cpuset.config
This is the batch_cpuset.config file:
<pre>
EXCLUSIVE
MEMORY_LOCAL
MEMORY_MANDATORY
CPU 4
CPU 5
CPU 6
CPU 7
CPU 8
CPU 9
CPU 10
CPU 11
CPU 12
CPU 13
CPU 14
CPU 15
CPU 16
CPU 17
CPU 18
CPU 19
CPU 20
CPU 21
CPU 22
CPU 23
</pre>
The permissions are set up so only root can execute on the cpuset (PBS runs as root).
<pre>
# ls -l /etc/config/batch_cpuset.config
-rwx------ 1 root sys 268 Mar 13 09:36 batch_cpuset.config
</pre>
The boot cpuset is configured at boot time, and init is attached to it. All system processes will therefore be restricted to the first 4 cpus in the machine.
To use the other 20 cpus, we need to start the batch cpuset and attach the PBS system to the cpuset. This is done by the /etc/init.d/pbs scripts:
I've found that maui must also be attached to the batch cpuset for backfilling to work reliably.
<pre>
#!/bin/sh
##
## Start/stop pbs services
##
IS_ON=/sbin/chkconfig
PBS_SERVER=pbs_server
# (from man pbs_server)
# When pbs server is restarted after a crash, all jobs are requeued
# except non-rerunnable jobs that were executing. Any rerunnable job
# which was executing when the server went down will be run immediately.
# After those jobs are restarted, then normal scheduling takes place for
# all remaining queued jobs. If a job cannot be restarted immediately
# the server will attempt to restart it periodically for upto 5 minutes.
# (check man page for details)
PBS_SERVER_OPTIONS="-t hot"
PBS_MOM=pbs_mom
PBS_SCHED=pbs_sched
PBS_QMGR=qmgr
PBS_SBINDIR=/usr/local/sbin
PBS_BINDIR=/usr/local/bin
CONFIG_FILE=/usr/local/spool/pbs/server_priv/server_conf
MAUIBINDIR=/usr/local/bin
MAUI=maui
MAUIUSER=batch
SU=/sbin/su
if $IS_ON verbose || test -t 1; then
ECHO=echo
VERBOSE=-v
else # quiet startup and shutdown
ECHO=:
VERBOSE=
fi
case "$1" in
'start')
if $IS_ON pbs ; then
$ECHO "PBS Services: "
$ECHO "Starting Batch cpuset: "
cpuset -q Batch -c -f /etc/config/batch_cpuset.config
$ECHO -n "PBS mom: $PBS_MOM"
cpuset -q Batch -A $PBS_SBINDIR/$PBS_MOM
$ECHO "."
# Use maui instead of pbs scheduler
# $ECHO -n "PBS scheduler: $PBS_SCHED"
# $PBS_SBINDIR/$PBS_SCHED
# $ECHO "."
$ECHO -n "PBS server: $PBS_SERVER"
cpuset -q Batch -A $PBS_SBINDIR/$PBS_SERVER $PBS_SERVER_OPTIONS
$PBS_BINDIR/$PBS_QMGR < $CONFIG_FILE > /dev/null 2>&1
$ECHO -n "maui scheduler: $MAUI"
cpuset -q Batch -A $SU $MAUIUSER -c "$MAUIBINDIR/$MAUI"
$ECHO "."
fi
;;
'stop')
killall $PBS_MOM
killall $MAUI
killall $PBS_SERVER
cpuset -q Batch -m
cpuset -q Batch -d
;;
*)
$ECHO "usage: $0 {start|stop}"
esac
</pre>
PBS was setup using no scheduler, maui will handle. I did set up the queues and nodes per the OpenPBS docs:
<pre>
#
# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default resources_default.mem = 100mb
set queue default resources_default.ncpus = 1
set queue default resources_default.walltime = 00:00:00
set queue default resources_available.mem = 16gb
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = False
set server acl_host_enable = True
set server acl_hosts = *.hpcf.upr.edu
set server managers = batch@boreas.hpcf.upr.edu
set server operators = batch@boreas.hpcf.upr.edu
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600
</pre>
The node can be created as follows:
<pre>
create node boreas.hpcf.upr.edu np=20
</pre>
-- Main.HumbertoOrtiz - 13 Mar 2001 <br>
**PBS configuration on manutara/ehecatl**
The PBS on manutara/ehecatl consists of two nodes. One is on manutara(4 cpus) and the second on ehecatl (8 cpus). Manutara is the master node of the cluster which runs the server and the maui scheduler. The nodes file on manutara /usr/local/spool/pbs/server_priv/nodes is:
<pre>
ehecatl.uprm.edu np=8
manutara.uprm.edu np=4
</pre>
There are two cpuset on manutara: batch cpuset for jobs assigned by PBS and boot cpuset for system and interactive jobs <br>
The batch cpuset is defined in /etc/config/batch_cpuset.config and consists of 4 nodes with the same speed 250MHZ:
<pre>
EXCLUSIVE
MEMORY_LOCAL
MEMORY_MANDATORY
CPU 2
CPU 3
CPU 4
CPU 5
</pre>
The boot cpuset consists of the rest 4 cpus and defined in /etc/config/boot_cpuset.config :
<pre>
MEMORY_LOCAL
MEMORY_MANDATORY
CPU 0
CPU 1
CPU 6
CPU 7
</pre>
All three deamons run on manutara: pbs_server, pbs_sched by maui, and pbs_mom. Therefore The PBS script /etc/init.d/pbs on manutara attaches batch cpuset to all of them when it starts PBS and dettaches when it stops. It looks like the pbs script on boreas.
<br>
The ehecatl node doesn have any cpusets and the pbs script on ehecatl /etc/init.d/pbs serves only to start or stop mom:
<pre>
##!/bin/sh
##
## Start/stop pbs services
##
IS_ON=/sbin/chkconfig
# (from man pbs_server)
# (check man page for details)
PBS_MOM=pbs_mom
PBS_SBINDIR=/usr/local/sbin
if $IS_ON verbose || test -t 1; then
ECHO=echo
VERBOSE=-v
else # quiet startup and shutdown
ECHO=:
VERBOSE=
fi
case "$1" in
'start')
if $IS_ON pbs ; then
$ECHO "PBS Services: "
$ECHO -n "PBS mom: $PBS_MOM"
$PBS_SBINDIR/$PBS_MOM
$ECHO "."
fi
;;
'stop')
killall $PBS_MOM
;;
*)
$ECHO "usage: $0 {start|stop}"
esac
</pre>
Server configuration on manutara and ehecatl:
<pre>
#
# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default resources_default.mem = 150mb
set queue default resources_default.ncpus = 1
set queue default resources_default.walltime = 00:00:00
set queue default resources_available.mem = 2gb
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = False
set server max_user_run = 8
set server acl_host_enable = True
set server acl_hosts = *.uprm.edu
set server managers = batch@manutara.uprm.edu
set server operators = batch@manutara.uprm.edu
set server default_queue = default
set server log_events = 511
set server mail_from = root@hpcf.upr.edu
set server query_other_jobs = True
set server scheduler_iteration = 600
</pre>
Mom configuration on manutara:
<pre>
$logevent 0x1ff
$clienthost manutara.uprm.edu
$restricted *.uprm.edu
$usecp ehecatl.uprm.edu:/usr/people /usr/people
$usecp ehecatl.uprm.edu:/disk4 /disk4
</pre>
Mom configuration on ehecatl:
<pre>
$logevent 0x1ff
$clienthost manutara.uprm.edu
$restricted *.uprm.edu
$usecp manutara.uprm.edu:/usr/people /usr/people
$usecp manutara.uprm.edu:/disk4 /disk4
</pre>
<B>NFS configuration on manutara/ehecatl.</B>
Manutara is a master node and it has two NFS filesystems exported to ehecatl the client: /usr/people and /disk4. Both of them contain users' home directories. The exported filesystems are described in the /etc/exports file on manutara:
<pre>
/usr/people -rw,access=ehecatl.uprm.edu
/disk4 -rw,access=ehecatl.uprm.edu
</pre>
After editing the /etc/exports file execute the exportfs command:
<pre>
exportfs -a
</pre>
The /etc/fstab file on a client has to contain filesytems exported by the master node.<br>This is the /etc/fstab file on ehecatl:
<pre>
/dev/root / xfs rw,raw=/dev/rroot 0 0
manutara.uprm.edu:/usr/people /usr/people nfs rw 0 1
manutara.uprm.edu:/disk4 /disk4 nfs rw 0 1
</pre>
After editing the /etc/fstab file execute the mount command:
<pre>
mount -t nfs -a
</pre>
Rps servers mountd have to be uncommented in the /etc/inetd.conf files on both manutara and ehecatl.
<pre>
# RPC-based services
# These use the portmapper instead of /etc/services.
#
# we only support mountd versions 1 and 3
mountd/1,3 stream rpc/tcp wait/lc root /usr/etc/rpc.mountd mountd
mountd/1,3 dgram rpc/udp wait/lc root /usr/etc/rpc.mountd mountd
sgi_mountd/1 stream rpc/tcp wait/lc root /usr/etc/rpc.mountd mountd
sgi_mountd/1 dgram rpc/udp wait/lc root /usr/etc/rpc.mountd mountd
#rstatd/1-3 dgram rpc/udp wait root /usr/etc/rpc.rstatd rstatd
</pre>
Be sure you you reread this files after changing them with the following command:
<pre>
etc/killall -HUP inetd command
</pre>
<B>NIS configuration on manutara/ehecatl.</B>
Yp, ypmaster, ypserv must be enable on manutara via chkconfig command:
<pre>
chkconfig -f yp on
chkconfig -f ypmaster on
chkconfig -f ypserv on
</pre>
Yp must be enable on ehecatl via chkconfig command
<pre>
chkconfig -f yp on
</pre>
Stop and start /etc/init.d/network:
<pre>
/etc/init.d/network stop
/etc/init.d/network start
</pre>
-- Main.ElenaLeyderman - 06 Sep 2001 <br>
**OpenPBS and maui on cafeina**
I set up OpenPBS and maui on cafeina, sort of following the instructions above, with some minor and some major changes. I'm not running any cpusets on cafeina, I hope all our users will be well behaved, and use the honor system for batch jobs. If not, I'll kill them (or their jobs anyway).
I installed versions: OpenPBS_2_3_16.tar.gz and maui-3.0.7p8.tar.gz
OpenPBS configuration flags were:
<pre>
./configure
--enable-docs
--set-server-home=/usr/local/spool/pbs
--enable-plock-daemons=7
--enable-syslog
--with-scp
--enable-nodemask
--set-cc
--enable-gui
--with-tcl=/usr/freeware
--set-default-server=boreas.hpcf.upr.edu
--set-cflags=-64
</pre>
maui runs as the batch user, and is installed in
<pre>/usr/people/batch/src/maui-3.0.7/</pre>
I created a nodes file for PBS in <pre>/usr/local/spool/pbs/server_priv/nodes</pre> with the line
<pre>
cafeina.hpcf.upr.edu np=32
</pre>
But it didn't seem to work. I did create the node after I ran qmgr with the line:
<pre>
create node cafeina.hpcf.upr.edu np=32
</pre>
and there is a server_conf file in the server_priv directory with the correct node definition, so the name of the file may have changed. Until I created the node in qmgr, maui and pbs were running, but maui did not detect any processors available.
The mom configuration file reads:
<pre>
$logevent 0x1FF
$clienthost cafeina.hpcf.upr.edu
$restricted cafeina.hpcf.upr.edu
</pre>
The queues were created as follows:
<pre>
#
# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default resources_default.mem = 100mb
set queue default resources_default.ncpus = 1
set queue default resources_default.walltime = 00:00:00
set queue default resources_available.mem = 24gb
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = True
set server acl_hosts = *.hpcf.upr.edu
set server managers = batch@cafeina.hpcf.upr.edu
set server operators = batch@cafeina.hpcf.upr.edu
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600
</pre>
Here is the /etc/init.d/pbs script (no cpusets):
<pre>
#!/bin/sh
##
## Start/stop pbs services
##
IS_ON=/sbin/chkconfig
PBS_SERVER=pbs_server
PBS_SERVER_OPTIONS="-t hot"
PBS_MOM=pbs_mom
PBS_SBINDIR=/usr/local/sbin
MAUIBINDIR=/usr/local/bin
MAUI=maui
MAUIUSER=batch
SU=/sbin/su
if $IS_ON verbose || test -t 1; then
ECHO=echo
VERBOSE=-v
else # quiet startup and shutdown
ECHO=:
VERBOSE=
fi
case "$1" in
'start')
if $IS_ON pbs ; then
$ECHO "PBS Services: "
$ECHO -n "PBS mom: $PBS_MOM"
$PBS_SBINDIR/$PBS_MOM
$ECHO "."
$ECHO -n "PBS server: $PBS_SERVER"
$PBS_SBINDIR/$PBS_SERVER $PBS_SERVER_OPTIONS
$ECHO -n "maui scheduler: $MAUI"
$SU $MAUIUSER -c "$MAUIBINDIR/$MAUI"
$ECHO "."
fi
;;
'stop')
killall $PBS_MOM
killall $MAUI
killall $PBS_SERVER
;;
*)
$ECHO "usage: $0 {start|stop}"
esac
</pre>
Remember to run <pre>chkconfig -f pbs on</pre>
-- Main.HumbertoOrtiz - 04 Sep 2002 <br>