You are here: Home Projects Cloud Development Condor VM Universe on Grid Testbed

Condor VM Universe on Grid Testbed

by Xin Zhao last modified Dec 05, 2011 09:36 AM
This doc describes the setup, configuration and how to run a VM universe job on the grid condor testbed.

The setup and configuration of this testbed is managed using Puppet

  • Right now there are 5 nodes in this testbed:
    • gridreserve09 is the central manager
    • gridreserve06/08/11 are the execution nodes, where gridreserve11 is the only node that can run VM universe jobs, due to disk space limitation.  So in total, there are 12 job slots, and among these, 4 slots can run VM universe jobs.
    • gridreserve07 is the submit host, running condor schedd.
    • The condor version used in this test is 7.7.1, very latest that fixes some bugs for VM universe.
    • Xen is the hypervisor available on the testbed.

How to run a test VM universe job

  • Log onto the submit host gridreserve07, as a user.
  • Copy your image to the host. You can make images using Xen client tools or virt-manager interface.
    • I have a simple SL5 image at /home/xinzhao/images/sl5xinvm_2.img . When it starts the first time, it will can a simple script inside /etc/rc.local, which print out some messages and shutdown the machine (VM). Then later on, when it starts, it does nothing, allowing users to log in and check the results from last run.
  • Here is a sample condor vm universe job submit file:
    universe        = vm

    Notification = Complete
    Executable = vm-univ-job-xin
    Arguments =
    Requirements =

    Log = /home/xinzhao/condor_test/vm-univ-job.log.$(Process)

    transfer_input_files = /home/xinzhao/images/sl5xinvm_2.img
    should_transfer_files = YES
    when_to_transfer_output = ON_EXIT_OR_EVICT
    vm_no_output_vm = false

    Notify_user = xzhao@bnl.gov

    vm_type = xen
    vm_memory = 2048
    vm_networking = true
    # vm_disk = /var/lib/libvirt/images/sl5xinvm_2.img:xvda:w
    vm_disk = sl5xinvm_2.img:xvda:w
    xen_kernel = included

    Queue 1
  • The job transfers your image file as input file to the execution node, and transfer back the image after the execution as output file back to the submit host.
  • The stage-out VM image file will be in the condor job workdir on the submit host.
  • There are other ways of running your VM on the worker node, one being to pre-stage the VM to the worker node; another is to use the condor transfer plugin.
Document Actions
Filed under: , ,