You are here: Home Experiment Information US ATLAS Grid Operations US ATLAS gridui0x Node Configuration

US ATLAS gridui0x Node Configuration

by John S. De Stefano Jr. last modified Aug 19, 2008 04:00 PM
Contributors: Dantong Yu, John Hover, Tadashi Maeno, Torre Wenaus
System configuration details and customizations for each gridui0x.usatlas server.

All of these systems rely heavily upon the Panda database servers.

 

gridui0x System Configuration Detail

Details on facility-installed operating systems and packages.

  gridui01 gridui02 gridui03 gridui04 gridui05 gridui06 gridui07 gridui08 gridui09 gridui10 gridui11 gridui12
Hardware Spec

PowerEdge 1750, Dual Intel Xeon CPU 3.06GHz, 5GB memory, and three 73GB SCSI local Drives, and 1GE NIC

PowerEdge 1750, Dual Intel Xeon CPU 3.06GHz, 5GB memory, and three 73GB SCSI local Drives, and 1GE NIC PowerEdge 1750, Dual Intel Xeon CPU 3.06GHz, 5GB memory, and three 73GB SCSI local Drives, and 1GE NIC PowerEdge 1750, Dual Intel Xeon CPU 3.06GHz, 5GB memory, and three 73GB SCSI local Drives, and 1GE NIC Peguin Computing Relion 2600SA, Dual Quad-Core Intel Xeon CPU E5335 @ 2.00GHz. (eight cores per host), 16 GB memory, and six 750GB SATA drives. (software RAID 10 provided 2TB local storage) Peguin Computing Relion 2600SA, Dual Quad-Core Intel Xeon CPU E5335 @ 2.00GHz. (eight cores per host), 16 GB memory, and six 750GB SATA drives. (software RAID 10 provided 2TB local storage) Peguin Computing Relion 2600SA, Dual Quad-Core Intel Xeon CPU E5335 @ 2.00GHz. (eight cores per host), 16 GB memory, and six 750GB SATA drives. (software RAID 10 provided 2TB local storage) TBD TBD TBD TBD TBD
Base system installation RHEL 3
glite-UI-3.0.x
condor-6.8.4, classads
racf-config (AFS, NFS, LDAP)
RHEL 4
glite-UI-3.0.x
condor-6.8.1
racf-config (AFS, NFS, LDAP)
RHEL 3
glite-UI-3.0.x
condor-6.8.4, classads
racf-config (AFS, NFS, LDAP)
RHEL 4
glite-UI-3.0.x
condor-6.8.1
racf-config (AFS, NFS, LDAP)
RHEL 4
local /home/sm
LDAP
RHEL 4
local /home/sm
LDAP
RHEL 4
local /home/sm
LDAP
RHEL 4
local /home/sm
LDAP
RHEL 4
local /home/sm
LDAP
RHEL 4
local /home/sm
LDAP
RHEL 4
local /home/sm
LDAP
RHEL 4
local /home/sm
LDAP
Ports Required
25080, 25443 (Panda server)
25880 (Monitor)
25080, 25443 (Panda server)
25880 (Monitor)
25080, 25443 (Panda server)
25880 (Monitor)
25080, 25443 (Panda server)
25880 (Monitor)
25080, 25443 (Panda server)
25880 (Monitor)
25080, 25443 (Panda server)
25880 (Monitor)
TBD (External AutoPilot) 25080, 25443 (Panda server)
25880 (Monitor)
TBD (Internal AutoPilot) TBD (Internal AutoPilot) TBD (Job Submission Plots) TBD (External AutoPilot)
Supporting system packages -- -- -- MySQL-python24-1.2.2-1
python23-2.3.5-10.el4.pyv
python23-devel-2.3.5-10.el4.pyv
python24-2.4.2-10.el4.pyv
python24-crypto-2.0.1-2.el4.pyv
python24-curl-7.12.1-1.rf
python24-devel-2.4.2-10.el4.pyv
python24-docs-2.4.2-10.el4.pyv
python24-elementtree-1.2.6-5.el4.pyv
python24-fpconst-0.7.1-1.py24
python24-ldap-2.0.10-1.el4.pyv
python24-optik-1.5-2.py24
python24-soappy-0.11.6-1.py24
python24-tkinter-2.4.2-10.el4.pyv
python24-tools-2.4.2-10.el4.pyv
gridsite-1.1.14-1.i386 gridsite-1.1.14-1.i386 gridsite-1.1.14-1.i386 TBD TBD TBD TBD TBD
Critical level
High:
Loss stops all production and analysis. All data transfers and data stage-ins managed by PandaMover will stop as well.
High:
Loss impacts the Panda monitoring.
Loss also stops development and testing.
Its loss stops all Autopilot pilots/jobs,  therefore impacting production and analysis.
High:
Loss stops all US-managed Autopilot pilot jobs. When PandaMover are in production, all US ATLAS data transfers managed by Panda mover will stop.
Medium: High: High: High: Medium: High: High: High: High:
Subnet
18518518554545454541851855454
Status
RetiringRetiringRetiringOn-lineOn-lineOn-lineOn-lineOn-lineOn-lineOn-linePendingPending

 

gridui0x User cron Configuration Detail

Details on US ATLAS cron jobs on each server.

  cron Entries
gridui01 Database back-up:
15 0-23 * * * /data/sm/prod/panda/test/copyArchive.sh > /dev/null 2>&1
Adding files to DQ2 datasets:
0-59/5 * * * * /data/sm/prod/panda/test/add.sh > /dev/null 2>&1
Log rotate:
10 1 * * * /data/sm/prod/panda/test/logrotate.sh > /dev/null 2>&1
gridui02 Adding files to DQ2 datasets:
5,15,25,35,45,55 * * * * /data/sm/dev/panda/test/add.sh > /dev/null 2>&1
Log rotate:
2 2 * * * /data/sm/dev/panda/test/logrotate.sh > /dev/null 2>&1
gridui03 Autopilot (details TBD)
gridui04 TBD
gridui05 TBD
gridui06 TBD
gridui07 TBD
gridui08 TBD
gridui09 TBD
gridui10 TBD
gridui11 TBD
gridui12 TBD

 

gridui0x User Software Configuration Detail

Details on Apache, Panda, and other services and daemons.

  Panda Server
Autopilot Process
Autopilot Web Services Transformation Server Monitoring Server
gridui01 Apache/2.0.58 (Unix) mod_ssl/2.0.58 OpenSSL/0.9.7a mod_python/3.2.8 Python/2.4.3 mod_gridsite/1.1.18 DQ2/0.2.12

Apache: /data/sm/prod/httpd
Python: /data/sm/prod/python
Panda: /data/sm/prod/panda
gridsite : /data/sm/prod/gridsite
DQ2: /data/sm/prod/DQ2_0_2_12
--   Apache/2.0.58 (Unix) mod_ssl/2.0.58 OpenSSL/0.9.7a DAV/2 SVN/1.3.1

Apache : /data/sm/prod/httpd
Subversion : /data/sm/subv
Apache: ~sm/mon/httpd
gridui02 Apache/2.0.58 mod_ssl/2.0.58 OpenSSL/0.9.7a mod_python/3.2.8 Python/2.4.3 mod_gridsite/1.2.3 DQ2/0.2.12

Apache: /data/sm/prod/httpd
Python: /data/sm/prod/python
Panda: /data/sm/prod/panda
gridsite : /data/sm/prod/gridsite
DQ2: /data/sm/prod/DQ2_0_2_12
-- Apache: ~sm/mon/httpd

(notes: it is a monitoring server).
TBD Apache: ~sm/mon/httpd

(note:  it is the primary Panda monitoring server.)
gridui03 -- Python: 
/usatlas/u/sm/autopilot/pilotScheduler.py
/usatlas/u/wenaus/work/autopilot/pilotScheduler.py

Bash:
/usatlas/u/sm/autopilot/pilotCron.sh
/usatlas/u/thor/autopilot/autopilot/pilotCron.sh
/usatlas/u/wenaus/work/autopilot/pilotCron.sh
  -- Apache: /data/sm/mon/httpd
gridui04 TBD --   TBD TBD
gridui05 TBD --   TBD TBD
gridui06 TBD --   TBD TBD
gridui07 TBD --   TBD TBD
gridui08 TBD --   TBD TBD
gridui09 TBD --   TBD TBD
gridui10 TBD --   TBD TBD
gridui11 TBD --   TBD TBD
gridui12 TBD --   TBD TBD

 

Software Definitions

Detailed definitions of processes and servers.

Autopilot process A pilot scheduler that submits pilot jobs with condorg from a number of submit hosts, including gridui03 and hosts at Madison, TRIUMF, and Lyon. Autopilot manages databases containing queue configuration information and pilot metadata, including job status. Loss of gridui03 loses the US autopilot managed pilots. At BNL, Autopilots are currently used for PandaMover, and not for production and analysis pilots at this point.  It does not provide external services.   Autopilot heavily depends on the database servers, local batch queue, and OSG gatekeepers. 
Autopilot web service A specialized Panda monitoring server. It serves up the pilot code fetched with HTTP for all Autopilot pilots and jobs when pilots start and finish.
Monitoring server Monitors the running jobs, provides a web interface for Panda server, and serves up the pilot code fetched with HTTP.  Uses HTTP and HTTPS protocols.
Panda server The main Panda component, which provides a task queue that centrally manages all job information. The Panda server receives jobs from clients through an HTTP interface into the task queue. The Autopilot retrieves jobs and updates their status through an HTTP interface.  Uses HTTP and HTTPS protocols.
Transformation server Transformations are Python scripts for running applications like Athena. The transformation server manages a cache repository that contains various transformations. The Autopilot retrieves transformations from the server through an HTTP interface. Uses HTTP and HTTPS protocols.
Logging Server Panda logger uses the logging module of python with an adapted version of its http handler, so that logs issued from python (panda services) go via http to apache (the monitor instances) where they are fed to a log table in MySQL. Nothing to do with either the pilot logs or application logs. gridui05/6/7 all handle logger http messages; the pandamon address is used as the logger http address.

 

Panda HTTP/HTTPS Server Codes

Detailed definitions of server status and error codes.

200 Session succeeded.
401 File not found.
403 Permission denied.
Time out The Panda server is busy.

 

OSG Software Status

Details to be added.

 

Grid Environment Configuration

To be added: what login setups should be run to configure for grid use.

Document Actions