US ATLAS gridui0x Node Configuration
All of these systems rely heavily upon the Panda database servers.
gridui0x System Configuration Detail
Details on facility-installed operating systems and packages.
gridui01 | gridui02 | gridui03 | gridui04 | gridui05 | gridui06 | gridui07 | gridui08 | gridui09 | gridui10 | gridui11 | gridui12 | |
Hardware Spec | PowerEdge 1750, Dual Intel Xeon CPU 3.06GHz, 5GB memory, and three 73GB SCSI local Drives, and 1GE NIC | PowerEdge 1750, Dual Intel Xeon CPU 3.06GHz, 5GB memory, and three 73GB SCSI local Drives, and 1GE NIC | PowerEdge 1750, Dual Intel Xeon CPU 3.06GHz, 5GB memory, and three 73GB SCSI local Drives, and 1GE NIC | PowerEdge 1750, Dual Intel Xeon CPU 3.06GHz, 5GB memory, and three 73GB SCSI local Drives, and 1GE NIC | Peguin Computing Relion 2600SA, Dual Quad-Core Intel Xeon CPU E5335 @ 2.00GHz. (eight cores per host), 16 GB memory, and six 750GB SATA drives. (software RAID 10 provided 2TB local storage) | Peguin Computing Relion 2600SA, Dual Quad-Core Intel Xeon CPU E5335 @ 2.00GHz. (eight cores per host), 16 GB memory, and six 750GB SATA drives. (software RAID 10 provided 2TB local storage) | Peguin Computing Relion 2600SA, Dual Quad-Core Intel Xeon CPU E5335 @ 2.00GHz. (eight cores per host), 16 GB memory, and six 750GB SATA drives. (software RAID 10 provided 2TB local storage) | TBD | TBD | TBD | TBD | TBD |
Base system installation | RHEL 3 glite-UI-3.0.x condor-6.8.4, classads racf-config (AFS, NFS, LDAP) | RHEL 4 glite-UI-3.0.x condor-6.8.1 racf-config (AFS, NFS, LDAP) | RHEL 3 glite-UI-3.0.x condor-6.8.4, classads racf-config (AFS, NFS, LDAP) | RHEL 4 glite-UI-3.0.x condor-6.8.1 racf-config (AFS, NFS, LDAP) | RHEL 4 local /home/sm LDAP | RHEL 4 local /home/sm LDAP | RHEL 4 local /home/sm LDAP | RHEL 4 local /home/sm LDAP | RHEL 4 local /home/sm LDAP | RHEL 4 local /home/sm LDAP | RHEL 4 local /home/sm LDAP | RHEL 4 local /home/sm LDAP |
Ports Required | 25080, 25443 (Panda server) 25880 (Monitor) | 25080, 25443 (Panda server) 25880 (Monitor) | 25080, 25443 (Panda server) 25880 (Monitor) | 25080, 25443 (Panda server) 25880 (Monitor) | 25080, 25443 (Panda server) 25880 (Monitor) | 25080, 25443 (Panda server) 25880 (Monitor) | TBD (External AutoPilot) | 25080, 25443 (Panda server) 25880 (Monitor) | TBD (Internal AutoPilot) | TBD (Internal AutoPilot) | TBD (Job Submission Plots) | TBD (External AutoPilot) |
Supporting system packages | -- | -- | -- | MySQL-python24-1.2.2-1 python23-2.3.5-10.el4.pyv python23-devel-2.3.5-10.el4.pyv python24-2.4.2-10.el4.pyv python24-crypto-2.0.1-2.el4.pyv python24-curl-7.12.1-1.rf python24-devel-2.4.2-10.el4.pyv python24-docs-2.4.2-10.el4.pyv python24-elementtree-1.2.6-5.el4.pyv python24-fpconst-0.7.1-1.py24 python24-ldap-2.0.10-1.el4.pyv python24-optik-1.5-2.py24 python24-soappy-0.11.6-1.py24 python24-tkinter-2.4.2-10.el4.pyv python24-tools-2.4.2-10.el4.pyv | gridsite-1.1.14-1.i386 | gridsite-1.1.14-1.i386 | gridsite-1.1.14-1.i386 | TBD | TBD | TBD | TBD | TBD |
Critical level | High: Loss stops all production and analysis. All data transfers and data stage-ins managed by PandaMover will stop as well. | High: Loss impacts the Panda monitoring. Loss also stops development and testing. Its loss stops all Autopilot pilots/jobs, therefore impacting production and analysis. | High: Loss stops all US-managed Autopilot pilot jobs. When PandaMover are in production, all US ATLAS data transfers managed by Panda mover will stop. | Medium: | High: | High: | High: | Medium: | High: | High: | High: | High: |
Subnet | 185 | 185 | 185 | 54 | 54 | 54 | 54 | 54 | 185 | 185 | 54 | 54 |
Status | Retiring | Retiring | Retiring | On-line | On-line | On-line | On-line | On-line | On-line | On-line | Pending | Pending |
gridui0x User cron Configuration Detail
Details on US ATLAS cron jobs on each server.
cron Entries | |
gridui01 | Database back-up: 15 0-23 * * * /data/sm/prod/panda/test/copyArchive.sh > /dev/null 2>&1 Adding files to DQ2 datasets: 0-59/5 * * * * /data/sm/prod/panda/test/add.sh > /dev/null 2>&1 Log rotate: 10 1 * * * /data/sm/prod/panda/test/logrotate.sh > /dev/null 2>&1 |
gridui02 | Adding files to DQ2 datasets: 5,15,25,35,45,55 * * * * /data/sm/dev/panda/test/add.sh > /dev/null 2>&1 Log rotate: 2 2 * * * /data/sm/dev/panda/test/logrotate.sh > /dev/null 2>&1 |
gridui03 | Autopilot (details TBD) |
gridui04 | TBD |
gridui05 | TBD |
gridui06 | TBD |
gridui07 | TBD |
gridui08 | TBD |
gridui09 | TBD |
gridui10 | TBD |
gridui11 | TBD |
gridui12 | TBD |
gridui0x User Software Configuration Detail
Details on Apache, Panda, and other services and daemons.
Panda Server | Autopilot Process | Autopilot Web Services | Transformation Server | Monitoring Server | |
gridui01 | Apache/2.0.58 (Unix) mod_ssl/2.0.58 OpenSSL/0.9.7a mod_python/3.2.8 Python/2.4.3 mod_gridsite/1.1.18 DQ2/0.2.12 Apache: /data/sm/prod/httpd Python: /data/sm/prod/python Panda: /data/sm/prod/panda gridsite : /data/sm/prod/gridsite DQ2: /data/sm/prod/DQ2_0_2_12 | -- | Apache/2.0.58 (Unix) mod_ssl/2.0.58 OpenSSL/0.9.7a DAV/2 SVN/1.3.1 Apache : /data/sm/prod/httpd Subversion : /data/sm/subv | Apache: ~sm/mon/httpd | |
gridui02 | Apache/2.0.58 mod_ssl/2.0.58 OpenSSL/0.9.7a mod_python/3.2.8 Python/2.4.3 mod_gridsite/1.2.3 DQ2/0.2.12 Apache: /data/sm/prod/httpd Python: /data/sm/prod/python Panda: /data/sm/prod/panda gridsite : /data/sm/prod/gridsite DQ2: /data/sm/prod/DQ2_0_2_12 | -- | Apache: ~sm/mon/httpd (notes: it is a monitoring server). | TBD | Apache: ~sm/mon/httpd (note: it is the primary Panda monitoring server.) |
gridui03 | -- | Python: /usatlas/u/sm/autopilot/pilotScheduler.py /usatlas/u/wenaus/work/autopilot/pilotScheduler.py Bash: /usatlas/u/sm/autopilot/pilotCron.sh /usatlas/u/thor/autopilot/autopilot/pilotCron.sh /usatlas/u/wenaus/work/autopilot/pilotCron.sh | -- | Apache: /data/sm/mon/httpd | |
gridui04 | TBD | -- | TBD | TBD | |
gridui05 | TBD | -- | TBD | TBD | |
gridui06 | TBD | -- | TBD | TBD | |
gridui07 | TBD | -- | TBD | TBD | |
gridui08 | TBD | -- | TBD | TBD | |
gridui09 | TBD | -- | TBD | TBD | |
gridui10 | TBD | -- | TBD | TBD | |
gridui11 | TBD | -- | TBD | TBD | |
gridui12 | TBD | -- | TBD | TBD |
Software Definitions
Detailed definitions of processes and servers.
Autopilot process | A pilot scheduler that submits pilot jobs with condorg from a number of submit hosts, including gridui03 and hosts at Madison, TRIUMF, and Lyon. Autopilot manages databases containing queue configuration information and pilot metadata, including job status. Loss of gridui03 loses the US autopilot managed pilots. At BNL, Autopilots are currently used for PandaMover, and not for production and analysis pilots at this point. It does not provide external services. Autopilot heavily depends on the database servers, local batch queue, and OSG gatekeepers. |
Autopilot web service | A specialized Panda monitoring server. It serves up the pilot code fetched with HTTP for all Autopilot pilots and jobs when pilots start and finish. |
Monitoring server | Monitors the running jobs, provides a web interface for Panda server, and serves up the pilot code fetched with HTTP. Uses HTTP and HTTPS protocols. |
Panda server | The main Panda component, which provides a task queue that centrally manages all job information. The Panda server receives jobs from clients through an HTTP interface into the task queue. The Autopilot retrieves jobs and updates their status through an HTTP interface. Uses HTTP and HTTPS protocols. |
Transformation server | Transformations are Python scripts for running applications like Athena. The transformation server manages a cache repository that contains various transformations. The Autopilot retrieves transformations from the server through an HTTP interface. Uses HTTP and HTTPS protocols. |
Logging Server | Panda logger uses the logging module of python with an adapted version of its http handler, so that logs issued from python (panda services) go via http to apache (the monitor instances) where they are fed to a log table in MySQL. Nothing to do with either the pilot logs or application logs. gridui05/6/7 all handle logger http messages; the pandamon address is used as the logger http address. |
Panda HTTP/HTTPS Server Codes
Detailed definitions of server status and error codes.
200 | Session succeeded. |
401 | File not found. |
403 | Permission denied. |
Time out | The Panda server is busy. |
OSG Software Status
Details to be added.
Grid Environment Configuration
To be added: what login setups should be run to configure for grid use.