You are here: Home User Information Facility Services Frontier Frontier Meetings for ATLAS Meeting Minutes Archive 2011 Minutes: 8/11/2011

Minutes: 8/11/2011

by John S. De Stefano Jr. last modified Aug 12, 2011 04:55 PM
Notes from the ATLAS Frontier meeting on August 11, 2011.

frontier-minutes-20110811.txt — Plain Text, 6 kB (6915 bytes)

File contents

Participants:   Florentin Bujor, Alastair Dewhurst, Dave Dykstra, David Front, 
                Elizabeth Gallas, Fred Luehring, Emmmanouil Vamvakopoulos
Early check-in: Andrew Wong

*** Site Reports: ***

BNL:
- Increased max Tomcat threads on launchpads from 500 to 600 
- Disabled AJP connector error on launchpads

CERN:
- Increased max Tomcat threads on launchpads from 500 to 600 
  * Config file installed via RPM: will need to verify/change value after next 
    upgrade
    - David will incorporate this change with others in upcoming package 
      update
      * Dave recommends increasing limit to 1000 in RPM (for CMS)
- Testing new RPM releases with Quattor on atlasfrontier3
  * After testing, deployed to atlasfrontier1|2 (Savannah SR #122701)
    - Big fix needed to Tomcat (6.0.32-6) to solve log rotation issue
- Installed AGIS on atlasfrontier1|2 (Savannah SR #122704)
  * Added to Quattor template by Serguei
  * Also desired on frontier1|2 for monitoring script integration

KIT:
- Frontier service outage and reinstallation
  * Package upgrade made modifications, lost system files and binaries due to 
    Squid RPM upgrade
     - Potential problem in code recognized by David & Dave will be fixed
       in next version but it is not enough to explain deletion of 
	   system files
	* Service and SLS monitoring restored 30 Jul
- Increased max Tomcat threads on launchpads from 500 to 600 

LYON
- Have deployed new configuration for Frontier launchpad for testing
  * Need to coordinate AWStats passwords configuration, SLS monitoring
  * Discussion regarding ACLs for launchpads, site Squids
    - Launchpads: open access from anywhere, but restrict destination 
      connections to only the local Frontier service
    - Site proxies: restrict ACLs to local cloud connections only, but open 
      destination access
  * Currently installed packages on test launchpad:
    - frontier-tomcat-6.0.32-5 
    - frontier-awstats-6.0-4 
    - frontier-servlet-3.29-3
    - frontier-squid-2.7.STABLE9-5.6
- GGUS #73074 for more information on crash, local configuration (Alastair)
  * https://ggus.eu/ws/ticket_info.php?ticket=73074
  * Frontier at LYON has been running in unorthodox, unsupported deployment 
    model (normal Squid instance on launchpad machine; no accelerator Squid on 
    launchpad; no site Squid; logging suppressed locally)
  
RAL:
- Increased max Tomcat threads on launchpads from 500 to 600 
- AWStats installed, central entries added for both launchpads
- Alastair has enabled remote launchpad login via GSISSH (John, Florentin)
- IT site Squids down, failing over directly to RAL Frontier (GGUS #73397)
  * Squids crashed due to disk problem, fixed and restarted (Alessandro)
  * Also seeing direct Frontier connections from WNs
- Slightly older Quattor template in use, will be updated 
  
TRIUMF:
- Increased max Tomcat threads on launchpad from 500 to 600 


*** Deployment: ***

AWStats setup for all launchpad sites:
- Sites can contact Florentin to coordinate setup (will be away next week) 
- Password must be established privately with Dave
- Alastair will send a mail with details and recommendations

Tomcat thread increase:
- Recommended Tomcat server configuration change to increase maximum number of 
  available threads for ATLAS Frontier launchpads from 500 to 600 in 
  serverl.xml
  * Should not change value of 500 for servlet configuration
  
Disable AJP Tomcat connector:
- Noticed AJP error on BNL launchpads
  * Related to Tomcat AJP connector, enabled by default
    - Recommendation: disable AJP connector (port 8009) in server.xml
    - Will update documentation to describe specifics

Tomcat v5.x:
- End of life: September 2012
- Two ATLAS Frontier launchpad sites affected, should consider upgrading

Excessive T0 database connections:
- Frontier offered (again by Elizabeth and Dario) as possible solution to T0 
  processing, coordination
  * Increasing capacity, limited by available database connections
  * Recent direct database testing/development caused outage of ATLR
    - Unfortunate, but actually kicked off discussion of shifter discussion, 
      and implemented inclusion of Frontier monitoring in shifter instructions
  * Would like verification of cache consistency (already done) and launchpad 
    stability (recent crashes at CERN under investigation)
    - Shorter retention time at CERN possible; Dave would like to be included 
      in relevant discussions (should pick up in September)
    - May request dedicated Frontier server (or servlet) for T0
    - Would be helpful, proactive to propose server configuration, monitoring
    - Fail-over at CERN is configured to use Lyon; will switch to RAL

CAF trigger reprocessing:
- Ran out of database connections last week
- Discussion with Gancho/Roman/Luca will include possible Frontier use
- Rapid data refresh not as critical as T0 processing (1 hour should suffice)
- A separate servlet should suffice for this purpose
- Purpose and requirements:
  "Trigger jobs are essential to validate candidate AtlasP1HLT releases prior 
  to deployment. We need a turn-around of less than 24hrs""

AGIS integration:
- Florentin working with Alessandro, AGIS team to integrate client into 
  installation procedure (also in CVMFS)
  * Will test addition of end point via API
  * ATLAS switching to CVMFS (from AFS) to distribute software kits
- AGIS had reported to ADC lack of Frontier feedback since April
  * Alastair has responded to detail past feedback and current work

Discussion on mailing lists:
- ADC ATLAS list for ATLAS support discussion
- RACF Frontier list for general discussion (non-ATLAS specific)
- ATLAS Support TWiki page updated with this information by Elizabeth:
  https://twiki.cern.ch/twiki/bin/viewauth/Atlas/DatabaseMailingLists

*** Development: ***

Monitoring:
- SLS now fully functional for all sites:
  * https://sls.cern.ch/sls/service.php?id=ATLAS-Frontier
  * Savannah #21274 completed/closed
  * Dave provided feedback, recommendations for improving scripts
    - Frontier site service alias is probed, not individual launchpad nodes
    - Florentin is working on updates
- MRTG configuration:
  * Recommended format proposed by Dave for input file generation
  * Florentin to develop configuration tool
- AWStats now monitoring launchpads at BNL, CERN, RAL

Packages:
- Investigating problems with Squid package and configuration files, possible 
  missing system files and binaries
  * David was away last week, still investigating issues reported at ALT2, 
    CERN, KIT
  * Init/restart script used to be run as non-root user was changed to run via 
    root
    - Need to ensure `/etc/squid` is writable by non-root (squid) user
- Upcoming update to implement multiple servlets in Frontier package becoming 
  more urgent

*** A.O.B.: ***

To do: update meeting announcement instructions (separate audio call, 
  emphasize free #)
Document Actions
Filed under: , , ,