HAPCW 2008 High Availability (HA) Computing and Resiliency have always played a critical role in commercial mission critical applications. Likewise, High Performance Computing (HPC) has equally been a significant enabler of the R&D community because their scientific discoveries. Serviceability aims toward effective means by which corrective and preventive maintenance can be performed on a system. Higher serviceability improves availability and resiliency helps retaining quality, performance and continuity of services at expected levels. Together, the combination of HA, resiliency, and HPC will clearly lead to even more benefits to critical shared major HPC resource environments.
The 5th High Availability and Performance Computing Workshop (HAPCW) 2008 will be held in conjunction with the High-Performance Computer Science Week (HPCSW) 2008 event on April 3-4 at the Grand Hyatt Hotel in Denver, CO, USA.
This workshop aims to provide a forum for researchers to discuss state-of-the-art and on-going research and development, and to share their findings and ideas in high availability and performance computing (HAPC). Since 2003, we have held four consecutive successful workshops in conjunction with the Los Alamos Computer Science Institute (LACSI) Symposium at the Eldorado Hotel in Santa Fe, NM, USA. This workshop represents a continuation as part of the High-Performance Computer Science Week. In addition to the presentation of reviewed papers, the workshop will include a panel discussion of relevant topics.
Original, unpublished work is required. Extended abstracts are not to exceed 2 pages (two columns, single space, 10 point font), including tables and illustrations. Accepted contributions will be published in the proceedings website and CD which will be available at the workshop. The final manuscript shall be a maximum of 6 IEEE style pages in camera-ready format. Please send all extended abstracts by email, in Postscript or PDF format to Dr. Ben He, email@example.com
Topics of interest are those relevant to HAPC including the following:
• Hardware for fault detection and resiliency
• System-level resiliency for HPC
• Statistical methods to improve system resiliency
• Fault tolerance mechanisms and experiments
• Resource management for system resiliency and availability
• Resilient systems based on hardware probes
• Reliability and robustness in HPC applications and systems
• Failure recovery strategies in Grid computing and HPC
• Reliable communication in HPC environments
• Architecture and tools supporting HAPC
• High availability computing
• Experience in creating HPAC environments
• Self-healing, self-configuration, self-optimization, fault prevention,
detection and recovery, fault tolerance and autonomic computing
• Configuration, resource and fault management
• Mission critical HPC applications
• HAPCW : http://xcr.cenit.latech.edu/hapcw2008
• HPCSW : http://www.hpcsw.org/
The HAPCW2008 is supported by the fastOS program,