CRS Software Overview
The new CRS Batch software was designed to allow users a transparent transition from old CRS batch software, whenever possible. In many instances, functionality and flexibility of the new software has been enhanced to address users' requests. An example is the ability of the new CRS batch software to leverage the use of commodity hardware (Intel-based servers) and open-source Linux OS to simplify the interface to the RCF/ACF mass storage system (HPSS).
The new CRS system has a number of enhancements and new features. It's design goals were driven by the requests from STAR and PHENIX documented here, with comments regarding the feaibility of each goal and how it will be implemented
Key features include the following:
- Asynchronous staging of files from HPSS
- Multiple queues with configurable shares of the farm
- Multiple executables with different inputs/outputs per job
- Staging algorithm that handles staging jobs with multiple inputs efficiently
- Callback execution on job completion or error
- Modular design so new input/output/staging methods can be easily supported
Top Down Overview
CRS is essentially a input-output file abstraction layer on top of a batch system. It takes a job-description-file that has in it the inputs, program(s) to run over these inputs, and expected output files from these programs, and will take care of moving the data in and out of the various supported subsystems.
The supported subsystems now include the following
- HPSS (staged via David Yu's batch system, fetched via PFTP)
- DCACHE (I/O via dccp)
- NFS (via cp command)
- LOCAL (via open(), read(), write() calls)
Input files are fetched directly from wherever they are specified, but in the case of HPSS the job issues stage commands before running on the farm, so the farm is not blocked while the tapes are mounted and the files are staged.
Output files are written to the locations specified in the job files and if the directory does not exist the job will create it/them.
The jobs are submitted to the CRS software via a command line interface on the rcrsuserX machines. On these machines is a daemon that runs and submits stage-requests to HPSS and monitors them, then submits jobs to Condor. Jobs progress through various states as documented here.
There are two daemons that need to run on the submit machine, both can be controlled via the crsctl command. One is the submitd, which handles submission of jobs to HPSS Staging and Condor, and the other is a simple logserver, that reads log messages from each remote job and puts them into a file per job. The system stores jobs in a MySQL database, and requires this database to be up and running in order to function.
Jobs, when created, enter the CREATED state, and are not yet considered runnable. You must submit jobs to the submit-daemon in order for them to run. There is a shortcut, the crs_insert command, which creates and submits a group of jobs at once.
Job Scheduling and Queues
Queues are no longer managed by condor configuration, they are internal to CRS and manageable by the crs_qedit command. Take the default setup for an example: there are three queues, low, default, and high shown here.
The priority number is a weight to assign to each queue, with lower priorities being worse (typical range is -6 < p < 6). The weights affect what percentage of the farm jobs in each queue will run on when the system is in a steady-state with jobs pending in all queues. If all queues are not full of waiting jobs, the extra slots will be distributed to each queue in priority-order. For example, if default and high each have 1000 jobs waiting to fill a farm of 100 slots, and low has 10 jobs, low's share should be 25 jobs according to the table above, and the extra 15 slots will be made available first to high, then if it is exhausted before using the 15 slots, the balance is made available to default.
As jobs can have multiple input files, and each input file can be requested by more than one job, the scheduling of jobs is not simply FIFO (first-in-first-out). The submit-deamon will reorder requests so as to submit the most advantageous stage requests. For example, if three jobs want file B, and only one wants file A, file B will be requested first, regardless of order of submission. Additionally, jobs that require > 1 input and have some of their files partially staged will get preferential treatment as to not waste the time already spent staging some of their inputs.
As mentioned above, jobs can have multiple executables that run sequentially. If more than one executable is specified, you can specify input and output required/produced by each job. This is for accounting purposes and to ensure that each sub-executable gets the appropriate environment provided to it.
Although the executables can be passed arguments and extra environment variables, CRS does provide some environment variables like the names of input/output files and standard output/error paths by default, the details of which are here.