Skynet the CTI Intern: Building Effective Machine Augmented Intelligence

Skynet the CTI Intern Building Effective Machine Augmented Intelligence

Allow me to Reintroduce Myself • Scott J Roberts •
Head of Threat Research @ Interpres Security • 20 years of experience including Mandiant, GitHub, Apple, & Splunk • MAI w/Data Analytics @ USU • Adjunct Instructor of Cyber Security @ USU

What This Talk is Not! • No Foundational Models •
No Releasing Software or Products • A collection of perfect prompts

What This Talk Is! • Talking through pros and cons
of trying to adopt new AI tools incrementally • Not specific to any particular LLM provider or model (though I used OpenAI & Copilot for most of them) • The Plan • Overall Problem • Problem Use Cases • Conclusion

The Problem™ • Interpres needs to continuously collect, analyze, disseminate
intelligence data about the entire global galaxy of espionage, attack, and criminal adversaries. • We extract and create relationships to both MITRE ATT&CK and custom Interpres intelligence STIX2 objects.

This Is Chandler M • Former 😭 Intrepres Threat Engineering
Intern • Utah State University Data Analytics & Information Systems Senior • Head of USU Student Organization for Cybersecurity

Literature Review Because I’m a Closet Academic

Google Cloud - AI and the 5 Phases of the
Threat Intelligence Lifecycle • Broke down application of LLMs at each stage of the intelligence cycle (with an obvious focus on cyber threats) • Very few details, but enough to come up with a few ideas

Google Cloud - Supercharging security with generative AI • A
useful breakdown of a major security centric org (Google) taking a multifaceted approach to using LLMs across a wide variety of tools • Less to go off of in the document, but links to a lot of very inspiring posts, such as the work being done by VirusTotal

Thomas Roccia - Applying LLMs to Threat Intelligence • Probably
one of the single best blog posts about not just ideas, but examples of implementing both basic and advanced LLM techniques to security • Dives Into • Prompt Engineering • Few Shot Prompting • Retrieval Augmented Generation • Tokenization & Embeddings

Summary Generation Use Case #1

Use Case #1 Summary Generation - Problem • Challenge: Summarizing
longer articles for rapid evaluation • Original Solution: Natural Language Toolkit • Problem: Summaries were often very uneven and included content that made them incredibly difficult to read

Use Case #1 Summary Generation - Solution • Experimentation: Programmatic
summary generation with the OpenAI Conversational Endpoint • Outcome: Cut over fully to using OpenAI based summaries

One Off Data Generation Use Case #2

Use Case #2: One O, Data Generation - Problem •
Challenge: As part of improving the service we often need one off data for enrichment • Original Solution: Manual Effort • Problem: Demonym to ISO3166-3 mapping

Demonym Country Name ISO3166-3 Chinese The People's Republic of China
CHN Iranian The Islamic Republic of Iran IRN North Korean Democratic Peoples Republic of Korea PRK Russian Russian Federation RUS

Use Case #2: One Off Data Generation - Solution •
Experimentation: Asking directly in the Chat interface for the data we want • Outcome: Mixed, eventually with good prompting accurate results that required manual manipulation

ATT&CK Technique Extraction Use Case #3

Use Case #3: ATT&CK Technique Extraction - Problem • Challenge:
Finding references to techniques in vendor blog posts • Original Solution: Regex Based Extraction • Problem: Not everyone gives nice and easy tables

Try #3 • Back to prompting directly…

Found True Positive False Positive False Negative T1133 T1133 T1078.002
T1078.002 T1059.003 T1059.003 T1078.003 T1078.003 T1078.002 T1078.002 T1543.003 T1543.003 T1562.001 T1562.001 T1560.001 T1560.001 T1219 T1219 T1105 T1105 T1537 T1537 T1486 T1486

Use Case #3: ATT&CK Technique Extraction - Solution • Experimentation:
• Direct Call to OpenAI Conversation API: Failure • Langchain Tool Mode with OpenAI: Failure • Prompting in the OpenAI Chat Interface: Success • Outcome: Needs more experimentation

STIX2 Object Merging Use Case #4

Use Case #4: STIX2 Object Merging - Problem Challenge: We
often create ATT&CK objects for the “same” group and need to cluster them after the fact • Original Solution: Lots of manual efforts • Problem: Manual effort is time consuming and error prone

Use Case #4 STIX2 Object Merging - Solution • Experimentation:
Programmatic merging of STIX2 objects using the OpenAI • Outcome: Success! Shockingly effective in testing, still being investigated to operationalize

Code Assistance Bonus Use Case

Code Assistance with GitHub Copilot • Increased Productivity: Speeds up
the coding process by providing intelligent code suggestions and completions. It can help “good” developers write code faster, reducing the time spent on repetitive tasks and boilerplate code. • Code Quality Improvement: Copilot can assist developers in writing code that follows best practices and coding conventions. • Easy Access to a LLM in Code, even for Data: Given the access to base LLMs it allows easy generate of data, especially when you’re creating data in code.

In The End • Case #1 – Summary Generation •
Successful • Operational • Case #2: One Off Data Generation • Successful… Mostly • Used as Necessary • Case #3: ATT&CH Technique Extraction • Mixed Success • More Research Needed • Case #4: STIX2 Object Merging • Successful (Unexpectedly) • More Research Needed

What about Chandler M? • Still around Utah State •
Now with another internship this summer • Not replaced by AI no matter how hard I tried • Proved the value of not just interns but early career folks in CTI

Conclusions • While LLMs cannot do everything they can often
be effective at specific tasks • Speculation is usually somewhat useless but code wins • If at first you don’t succeed try and try again • Different Services • Different Models • Different Hyperparameters • We’re going to continue integrating and start looking at fine tuning our own models

Thank you!

Skynet the CTI Intern: Building Effective Machi...

Skynet the CTI Intern: Building Effective Machine Augmented Intelligence

Scott J. Roberts

More Decks by Scott J. Roberts

Other Decks in Technology

Featured

Transcript