2021 AAAI AAAI 2021

Carbon to Diamond: An Incident Remediation Assistant System From Site Reliability Engineers’ Conversations in Hybrid Cloud Operations

Abstract

Abstract Conversational channels are changing the landscape of hybrid cloud service management. These channels are becoming important avenues for Site Reliability Engineers (SREs) %Subject Matter Experts (SME) to collaboratively work together to resolve an incident or issue. Identifying segmented conversations and extracting key insights or artefacts from them can help engineers to improve the efficiency of the incident remediation process by using information retrieval mechanisms for similar incidents. However, it has been empirically observed that due to the semi-formal behavior of such conversations (human language) the conversations are very unique in nature and also contain domain-specific terms. %It is important to identify the correct keywords and artefacts like symptoms, issue etc., present in the conversation chats. In this paper, we build a framework that taps into the conversational channels and uses various learning methods to (1) understand and extract key artefacts from conversations like diagnostic steps and resolution actions taken and (2) present an approach to identify past conversations about similar issues. Experimental results on our dataset show the efficacy of the methods used in our proposed system.

🌉 Interdisciplinary Bridge — Computer Science and Natural Language Processing
🧭 Keyword Pioneer — site reliability engineering
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio