In this episode of the Nexus Podcast, Vincente Diaz, a threat intelligence strategist on Google’s VirusTotal team, explains how AI and ML engines are being used in VirusTotal’s malware analysis, and how those results differ from what a traditional AV engine's analysis might render.

Cyber Resilience

Nexus Podcast: Vincente Diaz on Using AI for Malware Analysis

Michael Mimoso

Jul 24, 2024

VirusTotal is one of the largest online malware analysis services. Users upload files or URLs they suspect to be malicious, and VirusTotal checks those uploads against its database of known threats going back 19 years, including the leading antivirus engines.

While VirusTotal, owned by Google, is considered one of the most reputable cybersecurity services, there are no absolutes about its results. Artificial intelligence and machine learning capabilities are integrated into the service in order to fine-tune its capabilities.

In this episode of the Nexus Podcast, Vincente Diaz, a threat intelligence strategist on Google’s VirusTotal team, explains how AI and ML engines are being used in VirusTotal’s malware analysis, and how those results differ from what a traditional AV engine's analysis might render.

Diaz also delivered a presentation on this topic at this year’s RSA Conference. There he dove into Code Insight, a novel VirusTotal feature that—unlike other large language model engines that write code—reads code snippets and generates natural-language reports written from an analyst’s point of view.

For example, Diaz said that simple code samples such as an Office document macro or PowerShell script can be read by Code Insight, which was unveiled a year ago. Google has since then amassed a massive sample of tokenized data used to train and bring context to whatever is asked of the LLM, and Code Insight is delivering fascinating insights.

For example, during his talk, Diaz shared a code snippet from a PowerShell script that installs the Postman API client. Nine security vendor products labeled this snippet as malicious, failing to distinguish between its installer capabilities and what a malware dropper might do. Code Insight’s analysis of the script, however, rooted out that it was a legitimate Postman CLI installer despite similarities in their respective functionality.

A VirusTotal dashboard complemented by Code Insight analysis and context.

“Advantage No. 1, you don’t need to analyze this by yourself if you trust the LLM and you save a lot of time. Advantage No. 2, you can get a known binary answer,” Diaz said during his talk. “When you are asking a known antivirus, you get a verdict whether it’s malicious or not. The verdict is not very verbose. You’re not getting a lot of information.”

A year’s worth of information fed into the LLM brought the necessary context to deliver a more accurate analysis of this particular code snippet.

“For the first time, we have something that is comprehensive and not just a good or bad verdict,” Diaz said. “We have a full explanation. This is the first big thing from an LLM; they are providing you with an explanation.”

During the podcast, Diaz explains how Code Insight and ML overall are helping the breadth of VirusTotal’s analysis, including cutting through obfuscation and other techniques attackers use to bypass detection.

He also compares and contrasts how well AI systems fare against AV engines, and urges they be considered complementary technology. He also describes how effective they are for triage and noise reduction.