Cloud

GenAI poised to overhaul the observability arena, ServiceNow says

By Tommy Clift Dec 28, 2023 8:00am

Like many segments, observability is poised to be transformed by GenAI
Major potential lies in the technology's ability to filter and sort data to pinpoint issues faster and speed incident response
But experts said humans will still need to remain in the loop as some root causes just can't be identified by software

Generative artificial intelligence (GenAI) is quickly eclipsing observability in terms of how broadly it’s being applied, ServiceNow VP and GM of Cloud Observability Ben Sigelman told Silverlinings. But the two technologies together present a promising future for holistically observing cloud environments, he added.

Across incident response, data interpretation and overall control of system monitoring, GenAI can supercharge the observability space and “completely revolutionize how human beings interact with these really complex systems,” Sigelman explained. “They'll be woken up to fewer false positives, and they’ll also have a much easier time getting a birds-eye-view of what’s going on, which is really the first step in incident response.”

One of the biggest opportunities for GenAI in this space, according to Sigelman, is to swiftly summarize what can often be a confusing and overwhelming number of signals — which a human can then contextualize.

“Where monitoring and observability solutions were originally just positioned as a way of getting data,” in the last decade “that's been largely replaced by open source,” he said. Sigelman highlighted the OpenTelemetry project, an open-source initiative he co-founded that aims to simplify observability in cloud-native environments, as an example of this trend.

“Solutions are shifting more towards a unification [and] consolidation story where, instead of focusing on getting the data, they're focused on bringing it together and analyzing it in some kind of coherent way,” Sigelman continued.

But there's still “a very real problem” when it comes to reducing the time to resolution for incidents that span across many different domains. Enter GenAI.

Making sense of it all

A large language model (LLM) can dramatically improve the signal-to-noise ratio during incident response.

“The opportunity to take a somewhat noisy signal of potential insights, look at that holistically and reduce it down to events that a human might actually care about… that's an area where LLMs do have a very significant part to play,” especially in emergency scenarios, he explained.

He recalled a memorably outlandish example of a serious incident during his time at Google. The root cause? A shark biting a cable at the bottom of the Atlantic Ocean.“You're not going to build software that's going to tell you that,” he said. Still, “there's an opportunity to consolidate and integrate into something greater than the sum of its parts, mainly as a communication tool.”

Separate the wheat from the chaff

Sigelman said 2023 was a “more interesting year than most for the development of any data-based technology,” thanks to what he referred to as both “genuine innovations” and “marketing thrash” around AI. “It's always a combination of the two.”

Marketing thrash indeed came along with AI’s mainstream emergence throughout the last year, with businesses “slapping an AI moniker on whatever their product lines” were, as CockroachLabs CEO Spencer Kimball described to Silverlinings in a recent interview.

GenAI poised to overhaul the observability arena, ServiceNow says

Like many segments, observability is poised to be transformed by GenAI

Major potential lies in the technology's ability to filter and sort data to pinpoint issues faster and speed incident response

But experts said humans will still need to remain in the loop as some root causes just can't be identified by software

Related

Making sense of it all

Related

Separate the wheat from the chaff

Related