Machine Learning Models with Fault Tree Analysis for Explainable Failure Detection in Cloud Computing
- Cloud computing infrastructures availability rely on many components, like software, hardware, cloud man- agement system (CMS), security, environmental, and human operation, etc. If something goes wrong the root cause analysis (RCA) is often complex. This paper explores the integration of Machine Learning (ML) with Fault Tree Analysis (FTA) to enhance explainable failure detection in cloud computing systems. We introduce a framework employing ML for FT selection and generation, and for predicting Basic Events (BEs) to enhance the explainability of failure analysis. Our experimental validation focuses on predicting BEs and using these predictions to calculate the Top Event (TE) probability. The results demonstrate improved diagnostic accuracy and reliability, highlighting the potential of combining ML predictions with traditional FTA to identify root causes of failures in cloud computing environments and make the failure diagnostic more explainable.
Author: | Rudolf HoffmannORCiDGND, Christoph ReichORCiDGND |
---|---|
URN: | https://urn:nbn:de:bsz:fn1-opus4-106772 |
DOI: | https://doi.org/10.5220/0012727600003711 |
ISBN: | 978-989-758-701-6 |
Parent Title (English): | Proceedings of the 14th International Conference on Cloud Computing and Services Science, May 2-4, 2024, Angers, France |
Document Type: | Conference Proceeding |
Language: | English |
Year of Completion: | 2024 |
Release Date: | 2024/05/29 |
Tag: | AI; Cloud computing; Machine learning; Reliability; XAI |
First Page: | 295 |
Last Page: | 302 |
Open-Access-Status: | Open Access |
Gold | |
Licence (German): | Creative Commons - CC BY-NC-ND - Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International |