Volltext-Downloads (blau) und Frontdoor-Views (grau)

Machine Learning Models with Fault Tree Analysis for Explainable Failure Detection in Cloud Computing

  • Cloud computing infrastructures availability rely on many components, like software, hardware, cloud man- agement system (CMS), security, environmental, and human operation, etc. If something goes wrong the root cause analysis (RCA) is often complex. This paper explores the integration of Machine Learning (ML) with Fault Tree Analysis (FTA) to enhance explainable failure detection in cloud computing systems. We introduce a framework employing ML for FT selection and generation, and for predicting Basic Events (BEs) to enhance the explainability of failure analysis. Our experimental validation focuses on predicting BEs and using these predictions to calculate the Top Event (TE) probability. The results demonstrate improved diagnostic accuracy and reliability, highlighting the potential of combining ML predictions with traditional FTA to identify root causes of failures in cloud computing environments and make the failure diagnostic more explainable.

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Rudolf HoffmannORCiDGND, Christoph ReichORCiDGND
URN:https://urn:nbn:de:bsz:fn1-opus4-106772
DOI:https://doi.org/10.5220/0012727600003711
ISBN:978-989-758-701-6
Parent Title (English):Proceedings of the 14th International Conference on Cloud Computing and Services Science, May 2-4, 2024, Angers, France
Document Type:Conference Proceeding
Language:English
Year of Completion:2024
Release Date:2024/05/29
Tag:AI; Cloud computing; Machine learning; Reliability; XAI
First Page:295
Last Page:302
Open-Access-Status: Open Access 
 Gold 
Licence (German):License LogoCreative Commons - CC BY-NC-ND - Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International