Skip to content.

UPR HPCf

Sections
Personal tools
You are here: Home » Members » humberto's Home » Test ZWiki » BatchSystemAdmin
Views

BatchSystemAdmin

last edited 5 years ago by humberto

The HPCf Batch System

boreas.hpcf.upr.edu runs OpenPBS? and Maui to manage cpu resources, this file explains how we set up OpenPBS? and maui to manage the queue. User level documentation on the batch queue is available on our public web site:

http://www.hpcf.upr.edu/docs/batch/

PBS

What is PBS?

The Portable Batch System, PBS, is a batch job and computer system resource management package. It was developed with the intent to be conformant with the POSIX 1003.2d Batch Environment Standard. As such, it will accept batch jobs, a shell script and control attributes, preserve and protect the job until it is run, run the job, and deliver output back to the submitter.

Homepage and Download

http://www.openpbs.org

The program is only available for registered users along with most of the documentation. Available documentation consists of:

  • PBS Commands: html version of common man pages
  • Using xpbs: Documentation for xpbs, a GUI to PBS commands
  • System Administrator's Guide: provides information on building, installing and configuring OpenPBS?.

Additional Useful Sites

http://www.msi.umn.edu/smp/info/jobs/ Supercomputing Institute at University of Minnesota

http://www.hpc.gatech.edu/starting/o2000.html#HEADING19 PBS information at the Georgia Tech HPC Group

http://www.chpc.utah.edu/policies/pbs.html PBS User Guide at the University of Utah's Center for High Performance Computing

Installation

PBS is distributed as source code. The distribution is installed under /usr/people/humberto/src/OpenPBS?_2_3_12. The configure command used to compile the version currently running was:

# Necessary to be able to build tcl/tk for 64 bits
> setenv LDFLAGS "-L/usr/freeware/lib64"
# Set compilation configuration options
> ./configure  
--enable-docs 
--set-server-home=/usr/local/spool/pbs 
--enable-plock-daemons=7 
--enable-syslog 
--with-scp 
--enable-nodemask 
--enable-array 
--set-cc 
--enable-gui 
--with-tcl=/usr/freeware 
--set-default-server=boreas.hpcf.upr.edu 
--set-cflags=-64

On manutara, I disabled the gui (no xpbs)

./configure \
--enable-docs \
--set-server-home=/usr/local/spool/pbs \
--enable-plock-daemons=7 \
--enable-syslog \
--with-scp \
--set-cc \
--set-default-server=manutara.hpcf.upr.edu \
--set-cflags=-64

Configuration

PBS consists of three daemons, each one with its own configuration. The PBS Server (pbs_server) is in charge of accepting jobs for submission and then according to the scheduler decisions putting each job to run. The PBS Mom (pbs_mom) runs on every node controlled by PBS and is in charge of running the jobs, monitoring the node so it won't get overloaded and informing the server of the status of jobs. Finally, the PBS scheduler (pbs_sched) is in charge of making the decissions of what runs when and where (we replace pbs_sched by maui). The server presents the scheduler with a list of jobs and attributtes, and based on a user-configurable policy it determines the order in which jobs will run.

The configuration files are stored in every daemon's private directory, under $PBS_HOME/_priv

Server Configuration

Unlike Mom and the Job Scheduler, the Job Server (pbs_server) is configured while it is running. Only the nodes file has to be created before the server is started. This file ($PBS_HOME/server_priv/nodes) lists all the nodes controlled by PBS. It's not necessary if the system has only one exclusive node, i.e one that is used by one and only one job at a time. If there's only one node in the system and is timeshared, it has to be declared in this file. For example, the nodes file for boreas consists of:

boreas.hpcf.upr.edu np=20

The np declares the number of virtual processors that this node has. The complete syntax is: [:ts]? [property...]? [np=#]? (see page 21 of Admin Guide)

After the nodes file is ready, the server is started and its configuration is done through the qmgr(8) command. qmgr provides the means to send commands to the server, either as arguments to qmgr (using the -c flag) or through its interface (by typing qmgr, a prompt ">" will come up waiting for commands).

The syntax for commands is (more details on the man page): command server [names]? [attr OP value[,attr OP value,...]]? command queue [names]? [attr OP value[,attr OP value,...]]? command node [names]? [attr OP value[,attr OP value,...]]?

Where, command is the command to perform on a object (objects being server, queue and node). Commands are: active -> sets active objects. Primarly to avoid using object's names in commands create -> to create a new object delete -> destroy an object set -> define or alter the attributes of the object unset -> clear attributes of the object list -> list the current attirbutes and values for the object print -> print all queue and server attributes in a format usable as input for qmgr

names is a list of one or more names of specific objects. The name list is in the form: [name]?[@server]?[,queue_name[@server]?...] the name is declared when the object is first created.

To record the changes made to the server configuration, use the command: qmgr -c "print server" > $PBS_HOME/server_priv/server_conf When the server is started, the configuration file may be reread using: qmgr < $PBS_HOME/server_priv/server_conf

To start the server the first time:

# /usr/local/sbin/pbs_server -t create

The server and mom must run as root, maui will run as the batch user.

See below for example server configurations.

Mom Configuration (page 31 Admin Guide)

The configuration file is stored in $PBS_HOME/mom_priv/config (mom doesn't require that this file exists. Another file may be specified using -c, or if the default values are ok, mom may be run without havign to read any config file). The current configuration is as suggested on the maui site, for integrating maui and PBS.

Current configuration options are:

$logevent 0x1FF
$clienthost boreas.hpcf.upr.edu
$restricted boreas.hpcf.upr.edu

Scheduler Configuration

We replace the PBS sceduler with maui. Maui documentation is extensive, and is available at http://www.supercluster.org/.

Maui runs under the batch userid, and is set up in /usr/people/batch/pbs/maui-3.0.5

Maui by default dedicates a machine to a job, so the most important configuration change is to specify NODEACCESSPOLICY = SHARED

The current configuration reads:

# maui.cfg 3.0
SERVERHOST                              boreas.hpcf.upr.edu
# primary admin must be first in list
ADMIN1                                   batch
ADMIN2  humberto william
RMTYPE[0]                     PBS
# parameters documented at http://supercluster.org/documentation/maui/parameters.html
# use the showconfig command to display current maui settings

RMPOLLINTERVAL 00:01:00

SERVERPORT 42559 SERVERMODE NORMAL

LOGFILE maui.log LOGFILEMAXSIZE 10000000 LOGLEVEL 3

#DEFAULTDOMAIN

# Priority Weights

QUEUETIMEWEIGHT 2 XFACTORWEIGHT 1000 RESOURCEWEIGHT 800

# FairShare

FSPOLICY OFF FSDEPTH 7 FSINTERVAL 86400 FSDECAY 0.80

# Policies

BACKFILLPOLICY ON BACKFILLTYPE FIRSTFIT ALLOCATIONPOLICY MINRESOURCE RESERVATIONPOLICY CURRENTHIGHEST

MAXJOBPERUSERPOLICY OFF MAXJOBPERUSERCOUNT 8

MAXPROCPERUSERPOLICY OFF MAXPROCPERUSERCOUNT 256

MAXPROCSECONDPERUSERPOLICY OFF MAXPROCSECONDPERUSERCOUNT 36864000

MAXJOBQUEUEDPERUSERPOLICY OFF MAXJOBQUEUEDPERUSERCOUNT 2

MAXPROCPERGROUPPOLICY OFF SMAXPROCPERGROUPCOUNT 128 MAXPROCPERGROUPCOUNT 160

NODEACCESSPOLICY SHARED

Starting and Stoping the Services

The daemons are started automatically at boot time using the /etc/init.d/pbs script. The mom and the scheduler don't have any special command line options. (see pbs_mom and pbs_sched man pages).The most relevant option for the server (pbs_server) is the -t flag, which Specifies the impact on jobs which were in execution, running, when the server shut down. The default value is warm, which means that all rerunnable jobs which were running when the server went down are requeued. The current setting is to start up hot, i.e. all jobs are requeued except non-rerunnable jobs that were executing. Any rerunnable job which was executing when the server went down will be run immediately (see man page for details and the /etc/init.d/pbs script). To stop them, a simple TERM signal should work (that's what the script does).

-- Main.RicardoBaratto?

Setting up the batch cpusets

See the IRIX Admin: Resource Administration book chapter on cpusets, available from: http://techpubs.sgi.com/library/dynaweb_bin/ebt-bin/0650/nph-infosrch.cgi/infosrchtpl/SGI_Admin/IA_Resource/@InfoSearch__BookTextView/4458

We have an open boot cpuset with 4 processors (see man boot_cpuset) defined in

/etc/config/boot_cpuset.config

MEMORY_LOCAL
MEMORY_MANDATORY

CPU 0 CPU 1 CPU 2 CPU 3

and enabled with chkconfig:

# chkconfig -f boot_cpuset on

An exclusive Batch cpuset is defined in

/etc/config/batch_cpuset.config

This is the batch_cpuset.config file:

EXCLUSIVE
MEMORY_LOCAL
MEMORY_MANDATORY

CPU 4 CPU 5 CPU 6 CPU 7 CPU 8 CPU 9 CPU 10 CPU 11 CPU 12 CPU 13 CPU 14 CPU 15 CPU 16 CPU 17 CPU 18 CPU 19 CPU 20 CPU 21 CPU 22 CPU 23

The permissions are set up so only root can execute on the cpuset (PBS runs as root).

# ls -l /etc/config/batch_cpuset.config
-rwx------       1 root   sys                    268 Mar 13 09:36 batch_cpuset.config

The boot cpuset is configured at boot time, and init is attached to it. All system processes will therefore be restricted to the first 4 cpus in the machine.

To use the other 20 cpus, we need to start the batch cpuset and attach the PBS system to the cpuset. This is done by the /etc/init.d/pbs scripts:

I've found that maui must also be attached to the batch cpuset for backfilling to work reliably.

#!/bin/sh
##
## Start/stop pbs services
##
IS_ON=/sbin/chkconfig
PBS_SERVER=pbs_server
# (from man pbs_server)
# When pbs server is restarted after a crash, all jobs are requeued 
# except non-rerunnable jobs that were executing. Any rerunnable job 
# which was executing when the server went down will be run immediately.
# After those jobs are restarted, then normal scheduling takes place for
# all remaining queued jobs. If  a job cannot be restarted immediately
# the  server  will attempt  to restart it periodically for upto 5 minutes.
# (check man page for details)
PBS_SERVER_OPTIONS="-t hot"
PBS_MOM=pbs_mom
PBS_SCHED=pbs_sched
PBS_QMGR=qmgr
PBS_SBINDIR=/usr/local/sbin
PBS_BINDIR=/usr/local/bin
CONFIG_FILE=/usr/local/spool/pbs/server_priv/server_conf

MAUIBINDIR=/usr/local/bin MAUI=maui MAUIUSER=batch SU=/sbin/su

if $IS_ON verbose || test -t 1; then ECHO=echo VERBOSE=-v else # quiet startup and shutdown ECHO=: VERBOSE= fi

case "$1" in start) if $IS_ON pbs ; then $ECHO "PBS Services: "

$ECHO "Starting Batch cpuset: " cpuset -q Batch -c -f /etc/config/batch_cpuset.config

$ECHO -n "PBS mom: $PBS_MOM" cpuset -q Batch -A $PBS_SBINDIR/$PBS_MOM $ECHO "."

# Use maui instead of pbs scheduler # $ECHO -n "PBS scheduler: $PBS_SCHED" # $PBS_SBINDIR/$PBS_SCHED # $ECHO "."

$ECHO -n "PBS server: $PBS_SERVER" cpuset -q Batch -A $PBS_SBINDIR/$PBS_SERVER $PBS_SERVER_OPTIONS $PBS_BINDIR/$PBS_QMGR < $CONFIG_FILE > /dev/null 2>&1

$ECHO -n "maui scheduler: $MAUI" cpuset -q Batch -A $SU $MAUIUSER -c "$MAUIBINDIR/$MAUI" $ECHO "." fi ;; stop) killall $PBS_MOM killall $MAUI killall $PBS_SERVER cpuset -q Batch -m cpuset -q Batch -d ;; *) $ECHO "usage: $0 {start|stop}" esac

PBS was setup using no scheduler, maui will handle. I did set up the queues and nodes per the OpenPBS? docs:

#
# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default resources_default.mem = 100mb
set queue default resources_default.ncpus = 1
set queue default resources_default.walltime = 00:00:00
set queue default resources_available.mem = 16gb
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = False
set server acl_host_enable = True
set server acl_hosts = *.hpcf.upr.edu
set server managers = batch@boreas.hpcf.upr.edu
set server operators = batch@boreas.hpcf.upr.edu
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600

The node can be created as follows:

create node boreas.hpcf.upr.edu np=20

-- Main.HumbertoOrtiz - 13 Mar 2001

PBS configuration on manutara/ehecatl

The PBS on manutara/ehecatl consists of two nodes. One is on manutara(4 cpus) and the second on ehecatl (8 cpus). Manutara is the master node of the cluster which runs the server and the maui scheduler. The nodes file on manutara /usr/local/spool/pbs/server_priv/nodes is:

ehecatl.uprm.edu np=8
manutara.uprm.edu np=4

There are two cpuset on manutara: batch cpuset for jobs assigned by PBS and boot cpuset for system and interactive jobs

The batch cpuset is defined in /etc/config/batch_cpuset.config and consists of 4 nodes with the same speed 250MHZ:

EXCLUSIVE MEMORY_LOCAL MEMORY_MANDATORY

CPU 2 CPU 3 CPU 4 CPU 5

The boot cpuset consists of the rest 4 cpus and defined in /etc/config/boot_cpuset.config :

MEMORY_LOCAL MEMORY_MANDATORY

CPU 0 CPU 1 CPU 6 CPU 7

All three deamons run on manutara: pbs_server, pbs_sched by maui, and pbs_mom. Therefore The PBS script /etc/init.d/pbs on manutara attaches batch cpuset to all of them when it starts PBS and dettaches when it stops. It looks like the pbs script on boreas.

The ehecatl node doesn have any cpusets and the pbs script on ehecatl /etc/init.d/pbs serves only to start or stop mom:

 ##!/bin/sh
 ##
 ## Start/stop pbs services
 ##
 IS_ON=/sbin/chkconfig
 # (from man pbs_server)
 # (check man page for details)
 PBS_MOM=pbs_mom
 PBS_SBINDIR=/usr/local/sbin

if $IS_ON verbose || test -t 1; then ECHO=echo VERBOSE=-v else # quiet startup and shutdown ECHO=: VERBOSE= fi

case "$1" in start) if $IS_ON pbs ; then $ECHO "PBS Services: " $ECHO -n "PBS mom: $PBS_MOM" $PBS_SBINDIR/$PBS_MOM $ECHO "."

fi ;; stop) killall $PBS_MOM ;; *) $ECHO "usage: $0 {start|stop}" esac

Server configuration on manutara and ehecatl:

#
# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default resources_default.mem = 150mb
set queue default resources_default.ncpus = 1
set queue default resources_default.walltime = 00:00:00
set queue default resources_available.mem = 2gb
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = False
set server max_user_run = 8
set server acl_host_enable = True
set server acl_hosts = *.uprm.edu
set server managers = batch@manutara.uprm.edu
set server operators = batch@manutara.uprm.edu
set server default_queue = default
set server log_events = 511
set server mail_from = root@hpcf.upr.edu
set server query_other_jobs = True
set server scheduler_iteration = 600

Mom configuration on manutara:

$logevent 0x1ff
$clienthost manutara.uprm.edu
$restricted *.uprm.edu
$usecp ehecatl.uprm.edu:/usr/people /usr/people
$usecp ehecatl.uprm.edu:/disk4 /disk4
Mom configuration on ehecatl:
$logevent 0x1ff
$clienthost manutara.uprm.edu
$restricted *.uprm.edu
$usecp manutara.uprm.edu:/usr/people /usr/people
$usecp manutara.uprm.edu:/disk4 /disk4

NFS configuration on manutara/ehecatl.

Manutara is a master node and it has two NFS filesystems exported to ehecatl the client: /usr/people and /disk4. Both of them contain users' home directories. The exported filesystems are described in the /etc/exports file on manutara:

/usr/people -rw,access=ehecatl.uprm.edu
/disk4 -rw,access=ehecatl.uprm.edu
After editing the /etc/exports file execute the exportfs command:
exportfs -a

The /etc/fstab file on a client has to contain filesytems exported by the master node.
This is the /etc/fstab file on ehecatl:

/dev/root / xfs rw,raw=/dev/rroot 0 0
manutara.uprm.edu:/usr/people /usr/people nfs rw 0 1
manutara.uprm.edu:/disk4 /disk4 nfs rw 0 1
After editing the /etc/fstab file execute the mount command:
mount -t nfs -a

Rps servers mountd have to be uncommented in the /etc/inetd.conf files on both manutara and ehecatl.

# RPC-based services
# These use the portmapper instead of /etc/services.
#
# we only support mountd versions 1 and 3 
mountd/1,3       stream  rpc/tcp wait/lc         root    /usr/etc/rpc.mountd      mountd
mountd/1,3      dgram   rpc/udp wait/lc  root    /usr/etc/rpc.mountd      mountd
sgi_mountd/1 stream rpc/tcp wait/lc      root    /usr/etc/rpc.mountd      mountd
sgi_mountd/1 dgram  rpc/udp wait/lc      root    /usr/etc/rpc.mountd      mountd
#rstatd/1-3  dgram      rpc/udp wait     root    /usr/etc/rpc.rstatd      rstatd
Be sure you you reread this files after changing them with the following command:
 etc/killall -HUP inetd command

NIS configuration on manutara/ehecatl.

Yp, ypmaster, ypserv must be enable on manutara via chkconfig command:

 
chkconfig -f yp on 
chkconfig -f ypmaster on 
chkconfig -f ypserv on 

Yp must be enable on ehecatl via chkconfig command

 
chkconfig -f yp on 

Stop and start /etc/init.d/network:

/etc/init.d/network stop
/etc/init.d/network start

-- Main.ElenaLeyderman? - 06 Sep 2001

OpenPBS? and maui on cafeina

I set up OpenPBS? and maui on cafeina, sort of following the instructions above, with some minor and some major changes. I'm not running any cpusets on cafeina, I hope all our users will be well behaved, and use the honor system for batch jobs. If not, I'll kill them (or their jobs anyway).

I installed versions: OpenPBS?_2_3_16.tar.gz and maui-3.0.7p8.tar.gz

OpenPBS? configuration flags were:

./configure  
--enable-docs 
--set-server-home=/usr/local/spool/pbs 
--enable-plock-daemons=7 
--enable-syslog 
--with-scp 
--enable-nodemask 
--set-cc 
--enable-gui 
--with-tcl=/usr/freeware 
--set-default-server=boreas.hpcf.upr.edu 
--set-cflags=-64

maui runs as the batch user, and is installed in

/usr/people/batch/src/maui-3.0.7/

I created a nodes file for PBS in

/usr/local/spool/pbs/server_priv/nodes
with the line

cafeina.hpcf.upr.edu np=32

But it didn't seem to work. I did create the node after I ran qmgr with the line:

create node cafeina.hpcf.upr.edu np=32

and there is a server_conf file in the server_priv directory with the correct node definition, so the name of the file may have changed. Until I created the node in qmgr, maui and pbs were running, but maui did not detect any processors available.

The mom configuration file reads:

$logevent 0x1FF
$clienthost cafeina.hpcf.upr.edu
$restricted cafeina.hpcf.upr.edu

The queues were created as follows:

#
# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default resources_default.mem = 100mb
set queue default resources_default.ncpus = 1
set queue default resources_default.walltime = 00:00:00
set queue default resources_available.mem = 24gb
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = True
set server acl_hosts = *.hpcf.upr.edu
set server managers = batch@cafeina.hpcf.upr.edu
set server operators = batch@cafeina.hpcf.upr.edu
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600

Here is the /etc/init.d/pbs script (no cpusets):

#!/bin/sh
##
## Start/stop pbs services
##
IS_ON=/sbin/chkconfig

PBS_SERVER=pbs_server PBS_SERVER_OPTIONS="-t hot" PBS_MOM=pbs_mom PBS_SBINDIR=/usr/local/sbin

MAUIBINDIR=/usr/local/bin MAUI=maui MAUIUSER=batch SU=/sbin/su

if $IS_ON verbose || test -t 1; then ECHO=echo VERBOSE=-v else # quiet startup and shutdown ECHO=: VERBOSE= fi

case "$1" in start) if $IS_ON pbs ; then $ECHO "PBS Services: "

$ECHO -n "PBS mom: $PBS_MOM" $PBS_SBINDIR/$PBS_MOM $ECHO "."

$ECHO -n "PBS server: $PBS_SERVER" $PBS_SBINDIR/$PBS_SERVER $PBS_SERVER_OPTIONS

$ECHO -n "maui scheduler: $MAUI" $SU $MAUIUSER -c "$MAUIBINDIR/$MAUI" $ECHO "." fi ;; stop) killall $PBS_MOM killall $MAUI killall $PBS_SERVER ;; *) $ECHO "usage: $0 {start|stop}" esac

Remember to run

chkconfig -f pbs on

-- Main.HumbertoOrtiz - 04 Sep 2002

 

Powered by Plone

This site conforms to the following standards: