Failure Management in the London Distributed Tier 2

Authors: M. Aggarwal, D. Colling, A. Fage, S. George, W. Hay, P. Kyberd, A. Martin, G. Mazza, D. McBride, H. Nebrensky, D. Rand, G. Rybkine, G. Sciacca, O. van der Aa, B. Waugh,

The LCG [1] have adopted a hierarchical Grid computing model which has a Tier 0 centre at CERN, national Tier 1 centres and regional Tier 2 centres. The roles of the different Tier centres are described in the LCG Technical Design Report [2] and the levels of service required from each level of Tier centre is described in the LCG Memorandum of Understanding [3] . Many of the Tier 2 centres are formed by federating the resources belonging to geographically distributed institutes in a given region. The institutes within such a federation are able provide different levels of resources and typically will have different levels of expertise. Providing a good level of service in such situations is challenging. In this context, the London Tier2 (LT2) [4] is one of the four federated Tier 2 centres within the GridPP [5] collaboration in UK. The LT2 is distributed between five institutes in the London area and currently totals around 1 Mega Spec Int 2000 [6] . In this paper we analyze how we can minimize the time to solve LT2 failures within the constraint of the available human resources and their mobility. The analysis takes into account, the time to travel between institutes, the type of problems each support person can solve and their availability. We demonstrate how to create a hierarchy of support staff to solve an identified problem. We also provide an estimate of time to solve for future LT2 failures. This is based on failures rates extracted from the monitoring information and known response times. We suggest this failure management method as a model for any distributed Tier2.

[1] LCG http://lcg.web.cern.ch/LCG/
[2] LHC Computing Grid, Technical Design Report, LCG-TDR-001, CERN-LHCC-2005-024.
[3] http://lcg.web.cern.ch/LCG/C-RRB/MoU/LCG_T0-2_draft_final_051012.pdf
[4] LT2, http://www.gridpp.ac.uk/tier2/london/
[5] GridPP, UK computing for particle physics http://www.gridpp.ac.uk/
[6] Spec Int 2000 http://www.spec.org/cpu2000/


Last modified Thu 24 November 2005 . View page history
Switch to HTTPS . Website Help . Print View . Built with GridSite 1.4.3
For more about GridPP please contact Neasan O'Neill