Note | ||
---|---|---|
| ||
This site is in the process of being reviewed and updated. |
Pre-Installation
User virtualization (consistent username, UID, and GID values)
The username of the user submitting a job must be recognized on the compute host where the job runs and each user must have unique and consistent UID/GID values.
Creating the sgeadmin user account
Similarly virtualized.
Home directories
Grid Engine runs jobs in the user's home directory. For every user, and on every compute host, a home directory is present and contains all the desired dot-file configurations.
Hostnames and DNS
Grid Engine likes DNS and both forward and reverse DNS queries must be configured.
On all hosts, edit /etc/services
Code Block |
---|
Wiki Markup |
h3. Pre-Installation h5. User virtualization (consistent username, UID, and GID values) The username of the user submitting a job must be recognized on the compute host where the job runs and each user must have unique and consistent UID/GID values. h5. Creating the sgeadmin user account Similarly virtualized. h5. Home directories Grid Engine runs jobs in the user's home directory. For every user, and on every compute host, a home directory is present and contains all the desired dot-file configurations. h5. Hostnames and DNS Grid Engine likes DNS and both forward and reverse DNS queries must be configured. h5. On all hosts, edit /etc/services {code} sge_qmaster 536/tcp # Sun Grid Engine queue master sge_execd 537/tcp # Sun Grid Engine exec daemon {code} h5. Creating the SGE root directory and exporting it via NFS to all cluster nodes. All compute farm members must share a common path to the SGE root so be careful to ensure that the path to the GridEngine files is the same on the master node as it is on the other servers and compute elements. This path should be what is used globally as the SGE root directory. For example: {code} /opt/sge {code} h5. On Execution Hosts, NFS mount the Grid Engine directory of the Master node, $SGE_ROOT h5. On Submit hosts Insert the proper line into system or user .bashrc files. {code} |
Creating the SGE root directory and exporting it via NFS to all cluster nodes.
All compute farm members must share a common path to the SGE root so be careful to ensure that the path to the GridEngine files is the same on the master node as it is on the other servers and compute elements. This path should be what is used globally as the SGE root directory. For example:
Code Block |
---|
/opt/sge
|
On Execution Hosts, NFS mount the Grid Engine directory of the Master node, $SGE_ROOT
On Submit hosts
Insert the proper line into system or user .bashrc files.
Code Block |
---|
. /opt/sge/default/common/settings.sh
{code}
h5. Application and data files
The prolog and epilog script feature of Grid Engine provides a generic mechanism for implementing a site-specific |
Application and data files
The prolog and epilog script feature of Grid Engine provides a generic mechanism for implementing a site-specific stage-in/stage-out
...
facility.
...
Alternatively,
...
these
...
steps
...
could
...
be
...
embedded
...
into
...
jobs
...
scripts
...
directly.
...
Shared
...
filesystem
...
options
...
If
...
you
...
plan
...
to
...
install
...
into
...
a
...
shared
...
NFS
...
filesystem,
...
make
...
sure
...
the
...
server
...
is
...
not
...
mounting
...
the
...
filesystem
...
with
...
options
...
that
...
block
...
the
...
root
...
user
...
or
...
remap
...
the
...
root
...
UID
...
value
...
to
...
a
...
non-priviledged
...
value.
...
Grid
...
Engine
...
can
...
run
...
as
...
a
...
non-root
...
user
...
but
...
it
...
needs
...
to
...
be
...
started
...
by
...
root.
...
There
...
are
...
also
...
setuid
...
binaries
...
in
...
the
...
distribution
...
that
...
will
...
break
...
if
...
root-squashing
...
is
...
enabled.
...
Classic
...
Spooling
...
vs.
...
Berkeley-DB
...
Spooling
...
If
...
you
...
are
...
just
...
starting
...
out
...
with
...
Grid
...
Engine,
...
use
...
classic
...
spooling.
...
If
...
your
...
cluster
...
is
...
less
...
than
...
20
...
nodes
...
in
...
size,
...
use
...
classic
...
spooling.
...
Once
...
you
...
have
...
the
...
system
...
up
...
and
...
running
...
for
...
a
...
while
...
you'll
...
easily
...
be
...
able
...
to
...
tell
...
if
...
your
...
standard
...
sorts
...
of
...
workload
...
and
...
workflows
...
are
...
being
...
affected
...
by
...
spool
...
performance.
...
By
...
that
...
time,
...
you'll
...
be
...
comfortable
...
enough
...
with
...
Grid
...
Engine
...
that
...
you'll
...
have
...
no
...
trouble
...
backing
...
up
...
your
...
configuration
...
and
...
reinstalling
...
with
...
berkeley
...
spooling
...
enabled.
...
The
...
automatic
...
install
...
scripts
...
are
...
not
...
worth
...
dealing
...
with
...
on
...
small
...
clusters
...
For
...
clusters
...
smaller
...
than
...
30
...
nodes
...
in
...
size
...
(where
...
I
...
already
...
have
...
passwordless
...
SSH
...
access
...
set
...
up)
...
it
...
is
...
actually
...
quicker
...
to
...
manually
...
log
...
into
...
each
...
node
...
and
...
invoke
...
the
...
"./install_execd"
...
script
...
by
...
hand.
Qmaster Installation
Unpacking and initial setup
Code Block |
---|
h3. Qmaster Installation h5. Unpacking and initial setup {code} [DIRxSRVx10:root@host ~]# SGE_ROOT=/opt/sge; export SGE_ROOT [DIRxSRVx10:root@host ~]# cd ${SGE_ROOT} [DIRxSRVx10:root@host ~]# gzip -dc sge-6.0u8-common.tar.gz | tar xvpf - [DIRxSRVx10:root@host ~]# gzip -dc sge-6.0u8-bin-lx24-x86.tar.gz | tar xvpf - [DIRxSRVx10:root@host ~]# gzip -dc sge-6.0u8-bin-lx24-amd64.tar.gz | tar xvpf - [DIRxSRVx10:root@host ~]# util/setfileperm.sh $SGE_ROOT {code} h5. Create a db spool dir and start the installation on the master host {code} |
Create a db spool dir and start the installation on the master host
Code Block |
---|
[DIRxSRVx10:root@host ~]# export SGE_ROOT=/opt/sge [DIRxSRVx10:root@host ~]# mkdir -p /var/spool/sge [DIRxSRVx10:root@host ~]# chown -R sgeadmin /var/spool/sge [DIRxSRVx10:root@host ~]# cd $SGE_ROOT [DIRxSRVx10:root@host ~]# ./install_qmaster |
Accept defaults except
- User name to install as sgeadmin
- Grid Engine group id range of 20000-20200
- <administrator_mail> set to sgeadmin@example.com
- Adding admin and submit hosts set to server1 server2 server3
- Do you want to add your shadow host(s) now? (y/n) [y] >> n
Execution Host Installation
Add execution hosts as administrative hosts
All execution hosts must be administrative hosts during their installation. You may verify your administrative hosts with the command
Code Block |
---|
{code} h5. Accept defaults except * User name to install as sgeadmin * Grid Engine group id range of 20000-20200 * <administrator_mail> set to sgeadmin@example.com * Adding admin and submit hosts set to server1 server2 server3 * Do you want to add your shadow host(s) now? (y/n) \[y] >> n h3. Execution Host Installation h5. Add execution hosts as administrative hosts All execution hosts must be administrative hosts during their installation. You may verify your administrative hosts with the command {code} [DIRxSRVx10:root@host ~]# qconf -sh {code} |
and
...
you
...
may
...
add
...
new
...
administrative
...
hosts
...
on
...
the
...
master
...
host
...
with
...
the
...
command
Code Block |
---|
} [DIRxSRVx10:root@host ~]# qconf -ah <hostname> {code} h5. Create spooling directories on each execution host: {code} |
Create spooling directories on each execution host:
Code Block |
---|
[DIRxSRVx10:root@host ~]# mkdir -p /var/spool/sge
[DIRxSRVx10:root@host ~]# chown sgeadmin /var/spool/sge
{code}
h5. Run the installer script in |
Run the installer script in auto-install
...
mode
...
The
...
install_execd
...
script
...
allows
...
options
...
which
...
will
...
install
...
the
...
exec
...
daemon
...
with
...
default
...
options,
...
without
...
interactive
...
input,
...
and
...
[DIRxSRVx10:optionally]
...
without
...
creating
...
the
...
default
...
queue.
Code Block |
---|
} [DIRxSRVx10:root@host ~]# export SGE_ROOT=/opt/sge [DIRxSRVx10:root@host ~]# cd ${SGE_ROOT} [DIRxSRVx10:root@host ~]# ./install_execd -auto -fast [DIRxSRVx10:-noqueue] {code} h5. Run |
Run the installer script in interactive mode
Code Block |
---|
the installer script in interactive mode {code} [DIRxSRVx10:root@host ~]# export SGE_ROOT=/opt/sge [DIRxSRVx10:root@host ~]# cd ${SGE_ROOT} [DIRxSRVx10:root@host ~]# ./install_execd {code} h5. Accept defaults except # Do you want to configure a local spool directory for this host |
Accept defaults except
- Do you want to configure a local spool directory for this host (y/n)
...
- [n]
...
- >>
...
- y
...
- Enter
...
- path
...
- /var/spool/sge
...
When
...
the
...
install
...
script
...
is
...
done,
...
Grid
...
Engine
...
should
...
be
...
installed
...
and
...
running.
...
Run
Code Block |
---|
} [DIRxSRVx10:root@host ~]# qstat -f {code} |
and
...
you
...
should
...
see
...
an
...
entry
...
for
...
all.q@hostname.
...
If
...
so,
...
everything
...
is
...
set
...
up.
...
Troubleshooting
Reinstallation
BEFORE you reinstall the server for any reason, you MUST stop the execution host daemons. Then after the install you must reinstall the execution hosts
Grid Engine messages
Grid Engine messages can be found at:
/tmp/qmaster_messages
...
(during
...
qmaster
...
startup)
...
/tmp/execd_messages
...
(during
...
execution
...
daemon
...
startup)
...
After
...
startup
...
the
...
daemons
...
log
...
their
...
messages
...
in
...
their
...
spool
...
directories.
...
Qmaster:
...
/var/spool/qmaster/messages
...
Exec
...
daemon:
...
<execd_spool_dir>/<hostname>/messages
...
Queue
...
error
...
states
...
If
...
a
...
queue
...
enters
...
an
...
error
...
state,
...
the
...
queue
...
must
...
be
...
reset
...
before
...
further
...
jobs
...
will
...
be
...
sheduled
...
on
...
that
...
queue.
...
To
...
reset
...
a
...
queue,
...
become
...
sgeadmin
...
on
...
the
...
qmaster
...
and
...
run
...
the
...
command
Code Block |
---|
} [DIRxSRVx10:root@host ~]# qmod -cq <queuename> {code} h5. For |
For NFS-mounted
...
spool
...
dirs,
...
ensure
...
a
...
spool
...
dir
...
exists
...
and
...
permissions
...
are
...
set
Code Block |
---|
} [DIRxSRVx10:root@host ~]# mkdir <SGE_CELL>/spool/<HOSTNAME> [DIRxSRVx10:root@host ~]# chown sgeadmin.root <SGE_CELL>/spool/<HOSTNAME>/ {code} h3. Resources # {link:Department-Based Resource Allocation within Grid Engine|http://bioteam.net/dag/sge6-funct-share-dept.html} # {link:File-Staging approaches in Grid Engine|http://gridengine.sunsource.net/howto/filestaging/} # {link:Delegated File Staging with GridEngine|http://gridengine.sunsource.net/howto/filestaging/filestaging6.html} # {link:Sun's Compute Server technology|https://computeserver.developer.network.com/} aims to enable Java developers to easily and efficiently use the Sun Grid Compute Utility as a platform for the distributed execution of parallel computations. # {link:GridEngine Documents and Binaries|http://gridengine.sunsource.net/servlets/ProjectDocumentList} # {link:DRMAA Java API|http://gridengine.sunsource.net/nonav/source/browse/~checkout~/gridengine/doc/javadocs/index.html?content-type=text/html} |
Resources
- aims to enable Java developers to easily and efficiently use the Sun Grid Compute Utility as a platform for the distributed execution of parallel computations.