You are here: Home User Information Facility Services Frontier Frontier Meetings for ATLAS Meeting Minutes Archive 2011 Minutes: 11/17/2011

Minutes: 11/17/2011

by John S. De Stefano Jr. last modified Dec 15, 2011 09:58 AM
Notes from the ATLAS Frontier meeting on November 17, 2011.

frontier-2011-11-17.txt — Plain Text, 2 kB (2737 bytes)

File contents

Site Status:
BNL: Didn't upgrade during downtime. There is currently a squid problem under investigation at BNL by John, David and Dave.
- So far, it seems that running a squid logrotate causes the pid file to be removed.
 Removing the pid file confuses the frontier-squid service:
The status appears as not running while running,
 and service stop fails to stop because it relies on the (missing) pidfile.

CERN: Nothing to report

Lyon: Working well.  From time to time a small number of WN at Lyon access Frontier server at RAL.  Not understood why yet.  2nd Frontier server to be ready for deployment within 2 weeks.

RAL: Working well.  Still need to upgrade to using RPMs

There is an AWstats problem that could potential affect all sites.  Awstats files are growing too fast.  David has manually applied a fix for this at atlasfrontier1/2 and will email other sites about how to fix it.


RPMS:
In order to let Florentin easily test the upgrade from rpms used at production to new ones,
David is testing this first at atlasfrontier3.
Would prefer to complete testing before letting Florentin test at atlasfrontier4.
David is learning, with the help of Serquei and Vladimir, how to simulate Quattor installing the configuration files.

David plans to move the RPM documentation to html format when he has time.


AGIS:
Florentin is continuing to work on implementing the questions about how to use AGIS that were asked in the last meeting.

AGIS currently takes a primary and a failover for each site.  If a site was to have multiple machines we would just use the alias.  Dave Dykstra believes it is better to list the alias followed by the individual machines.  This would require AGIS to take a list of machines rather than just a primary and a backup.  Need to ask if this is possible to implement.


Monitoring:
Awstats - The issue at BNL where the IP addresses are not being correctly resolved should be fixed when they move to the new hardware.
SLS - No problems
MRTG - Problem generating complete list of sites still.  Florentin is working on this.
Dario has asked Florentin to add monitoring into SSB.  Limited colour scheme available.
There is a monitoring system that CMS uses which contacts administrators by email if their worker nodes are failing over to directly contact a launchpad.  The CMS author is working on making the monitor more general to be applicable to ATLAS too.  Florentin will then probably need to modify it to update the SSB rather than sending direct email to the administrators.

Testing:
Fred will make a Frontier test Tarball and put instruction on how to run test jobs on the twiki.
Alastair will document how to run the fnget.py test on twiki.

Next meeting will be 1st December at 14:30 UTC.
Document Actions
Filed under: , , ,