Abstract: As we move to large manycores, the hardware-based global check-pointing schemes that have been proposed for small shared-memory machines do not scale. Scalability barriers include global operations, work lost to global rollback, and inefficiencies in imbalanced or I/O-intensive loads. Scalable checkpointing requires tracking inter-thread dependences and building the checkpoint and rollback operations around dynamic groups of communicating processors.
Publication Year: 2011
Publication Date: 2011-06-04
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 22
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot