Upgrade to Pro — share decks privately, control downloads, hide ads and more …

apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic Grok, Peter Marelas, New Relic

apidays
October 24, 2023

apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic Grok, Peter Marelas, New Relic

apidays Australia 2023 - Platforms, Products, and People: The Power of APIs
October 11 & 12, 2023
https://www.apidays.global/australia/

How We Built Our Generative AI Assistant: New Relic Grok
Peter Marelas, Chief Architect and Head of Technical Specialists, APAC at New Relic

------

Check out our conferences at https://www.apidays.global/

Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8

Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io

Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/

apidays

October 24, 2023
Tweet

More Decks by apidays

Other Decks in Programming

Transcript

  1. © 2023 New Relic, Inc. All rights reserved How we

    built our Generative AI assistant New Relic Grok Peter Marelas Chief Architect, APJ New Relic
  2. © 2023 New Relic, Inc. All rights reserved Collect Applications,

    Web, Mobile, Cloud. IoT, etc New Relic Cloud Observability Platform Infrastructure Security DevOps Web AI/ML Mobile Network SRE Back-End Full-Stack Cloud Kubernetes Synthetics Serverless APM Model Performance Network Browser Mobile Infrastructure Distributed Tracing Log Management AIOps Full Stack O11y Telemetry Data Platform ✓ ✓ ✓ No Team Silos Only pricing model For ubiquity and scale No Data Silos Only purpose built Telemetry data cloud No Tool Silos All monitoring and security Tools in one connected experience Store Filter, enrich, build relationships - system, software, users, topology maps, etc Visualise Real-time dashboards, service maps, query builders, curated experiences Analyse Correlation, causal analysis, trends, anomaly detection, real-time alerting, health indicators
  3. © 2023 New Relic, Inc. All rights reserved. Motivation Peak

    of hype cycle creates customer expectation.. * Gartner Hype Cycle for Artificial Intelligence, 2023
  4. © 2023 New Relic, Inc. All rights reserved. Grok has

    4 specific skills Skill Common NL prefix Tool Source of Knowledge Answer questions about New Relic How do I … NL 2 Docs New Relic Documentation Answer questions about users data What … How many … NL 2 NRQL NRDB Check if any problems or anomalies with users environment Are … NL 2 Anomalies NRDB Interpret users dashboards What is … NL 2 Dashboards Dashboard definition, NRDB
  5. © 2023 New Relic, Inc. All rights reserved. How Grok

    decides what skill (tool) to use? “What is my transaction count?” Ask LLM to pick tool for instruction given a description of each tool NL 2 NRQL NL 2 Docs NL 2 NRQL NL 2 Dashboards NL 2 Anomalies
  6. © 2023 New Relic, Inc. All rights reserved. How Grok

    processes NL 2 NRQL requests “What is my transaction count?” Ask LLM to pick most relevant tables relating to user’s question Ask LLM to generate NRQL from prompt Get schema for these tables as metadata Validate query is syntactically correct Combine metadata, examples and user’s question into prompt Execute NRQL Render chart and natural language response Pass response to LLM and ask to render natural text response Ask LLM to correct query Retrieve similar examples of Q/NRQL pairs from vector database Y
  7. © 2023 New Relic, Inc. All rights reserved. How Grok

    processes NL 2 DOCS How do I….? Convert question to embeddings using LLM Generate prompt with question and relevant text passages Search vector database for similar embeddings Pass prompt to LLM to render natural response from passages in prompt Extract passages of text associated with similar embeddings Render response to user In-context learning with Retrieval Augmented Generation (RAG)
  8. © 2023 New Relic, Inc. All rights reserved. Natural Language

    Instructions Deterministic Output What we want from an AI assistant.. Specific Knowledge DBs Specific Rule Interpreters General Output Formats
  9. © 2023 New Relic, Inc. All rights reserved. Natural Language

    Instructions Deterministic Output Natural Language Instructions Creative Output Generic Knowledge DBs Generic Rule Interpreters Generic Output Formats What we want from an AI assistant.. Foundational LLMs Specific Knowledge DBs Specific Rule Interpreters General Output Formats
  10. © 2023 New Relic, Inc. All rights reserved. Generic Knowledge

    DBs Generic Rule Interpreters Generic Output Formats * Deterministic Output Specific Knowledge DB Specific Rules Natural Language Instructions Foundational LLM + Retrieval Augmented Generation Natural Language Instructions Deterministic Output Natural Language Instructions Creative Output Generic Knowledge DBs Generic Rule Interpreters Generic Output Formats What we want from an AI assistant.. Foundational LLMs Specific Knowledge DBs Specific Rule Interpreters General Output Formats
  11. © 2023 New Relic, Inc. All rights reserved. What questions

    do our users want to ask? User Study 79% said they wanted to learn something about a capability or get insights from their own dataset.
  12. © 2023 New Relic, Inc. All rights reserved. Finding right

    prompts (Prompt Engineering) ▪ Ongoing refinement (edge cases) ▪ Add examples to prompts (fewshot) ▪ Add rules to prompt ▪ Feedback mechanism ▪ Robust test harness ▪ ROGUE & BERT scores ▪ 2nd LLM to assess quality <Context information>: You are an AI assistant specialized in translating user questions into New Relic Query Language (NRQL), with no knowledge of SQL. Given a user's question, information about the user, descriptions of event schemas, and examples of questions and answers, your task is to generate an appropriate NRQL query. The provided event schemas contain only the most relevant ones and you need to use only one. In the context of New Relic, an entity is a basic data reporting element, such as an application, host, or database service; each entity has a unique Guid, which is a base64-encoded unique identifier; and if a user references an entity by its Guid, you should use it in the NRQL you generate, but if entity guid is not explicitly referenced, you should not use in the query that you will generate. The wording of the question should tell you whether the user wants totals or data over a time interval. Use TIMESERIES clause in the NRQL query that you generate only if the user requests data over time or per day/hour. Otherwise, do not use it. <How to select time range in NRQL queries>: Every NRQL query should contain a SINCE and may contain an UNTIL clause, as this is the only viable way to select a time range in NRQL. If the SINCE clause is not used, the query uses the last 1 hour of data by default, but you should always use the SINCE clause in the query you generate, and if the time range is not explicitly specified, use SINCE 1 hour ago. <Examples of valid NRQL queries with time range selections>: <User question>: How many transactions happened today? <NRQL query>: SELECT count(*) FROM Transaction SINCE TODAY <User question>: How many transactions happened on 25th of April? <NRQL query>: FROM Transaction SELECT count(*) SINCE '2023-04-25 00:00:00' UNTIL '2023-04-25 23:59:59' <User question>: How many transactions happened in the previous calendar week? <NRQL query>: FROM Transaction SELECT COUNT(*) SINCE LAST WEEK UNTIL THIS WEEK <User question>: How many transactions happened on Monday? <NRQL query>: FROM Transaction SELECT count(*) SINCE MONDAY until TUESDAY <User question>: How many transactions per day occurred this year until 10 days ago <NRQL query>: FROM Transaction SELECT count(*) SINCE THIS YEAR until 10 days ago TIMESERIES 1 day System instruction Examples Rules User question
  13. © 2023 New Relic, Inc. All rights reserved. Performance ▪

    High variance ▪ Time to intermediate token ▪ Time to first token ▪ Time to last token ▪ Intermediate messages ▪ Distribute requests ▪ Cache some answers
  14. © 2023 New Relic, Inc. All rights reserved. Microsoft Azure

    OpenAI Service Cost of LLM Model Prompt (1000 tokens) Completion (1000 tokens) GPT-4 $0.003 $0.006 Ada (embeddings) $0.0001 Query Avg prompt tokens Avg completion tokens Avg cost per e2e request NL2Docs 3016 568 $0.13 NL2NRQL 6516 118 $0.20 New Relic Grok Users Daily docs requests Daily NL2NRQL requests Monthly cost 1 user 5 5 $49 100 users 500 500 $4,900 10,000 users 50,000 50,000 $490,000
  15. © 2023 New Relic, Inc. All rights reserved. What’s next

    for Grok? ▪ Improve quality responses ▪ Experiment own models ▪ Developing new skill (NL2Config)
  16. © 2023 New Relic, Inc. All rights reserved. Peter Marelas

    Chief Architect, APJ https://www.linkedin.com/in/peter-marelas
  17. © 2023 New Relic, Inc. All rights reserved. Deprecation of

    LLMs ▪ Robust Test Harness ▪ ROUGE – Quantify overlap of words between generated output and reference text ▪ BERTScore – semantic similarity ▪ Use GPT4 to evaluate ($’s)
  18. © 2023 New Relic, Inc. All rights reserved. LLM Rate

    Limits ▪ 40,000 / 5200 = 7 requests/min ▪ Multiple endpoints ▪ Queue ▪ Distribute requests ▪ Limit max completion tokens (counts towards token-per-minute limit)
  19. © 2023 New Relic, Inc. All rights reserved. LLM Context

    Length Limits ▪ GPT-4 8192 context length ▪ Prompt + completion within context length to avoid hallucinations ▪ Transform prompt ▪ Remove extra spaces ▪ Remove pronouns ▪ Convert JSON to CSV
  20. © 2023 New Relic, Inc. All rights reserved. LLM have

    no knowledge after 2021 ▪ In-context learning ▪ Pass question + relevant docs ▪ Only as good as algo used to find relevant docs ▪ Cross-encoder re-ranking
  21. © 2023 New Relic, Inc. All rights reserved. How are

    similar documents / examples found? passages doc Text Embedding [0.354, 0.234, … , 0.87] 1536 dimensions Vector DB Indexing Search text Text Embedding [0.354, 0.234, … , 0.87] 1536 dimensions Vector DB Return TopK (maxmarginal relevancy convert search convert store Indexed by embedding
  22. © 2023 New Relic, Inc. All rights reserved. LLM What

    tools do we use? + Vector Database LLM Logic