Slide 1

Slide 1 text

© 2018 Bloomberg Finance L.P. All rights reserved. © 2022 Bloomberg Finance L.P. All rights reserved. Search at Bloomberg: Challenges, Opportunities, and Lessons Learned SIGIR 2022 – SIRIP 2022 Keynote July 12, 2022 Edgar Meij, Ph.D. Head of AI Search and Discovery @edgarmeij | [email protected]

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

© 2018 Bloomberg Finance L.P. All rights reserved. Search @ Bloomberg ● Terminal / Enterprise search ○ Data ○ Information ○ Analytics ○ News ○ Functionality ○ … ● Query understanding ● Question answering ● Autocomplete / Query suggestions ● Intent detection ● Related entity suggestions ● Recommender systems ● …

Slide 8

Slide 8 text

© 2018 Bloomberg Finance L.P. All rights reserved. Search @ Bloomberg

Slide 9

Slide 9 text

© 2018 Bloomberg Finance L.P. All rights reserved. Today's topic Search and discovery, applied research, and corresponding challenges/opportunities – all in the context of the financial services domain

Slide 10

Slide 10 text

© 2018 Bloomberg Finance L.P. All rights reserved. Bloomberg is just finance, right? ● A technology company founded in New York City in 1981 ● 325,000+ subscribers in 170 countries ● Over 20,000 employees in 163 locations, including over 7,000 software engineers – with more than 200 engineers and data scientists working on AI and related problems ● Increased use of and contributions to open source software ● Increased presence in academic research

Slide 11

Slide 11 text

© 2018 Bloomberg Finance L.P. All rights reserved. Bloomberg DATA ANALYTICS NEWS COMMUNITY …to facilitate financial decision-making. 11

Slide 12

Slide 12 text

© 2022 Bloomberg Finance L.P. All rights reserved. The Bloomberg Terminal is software that delivers a diverse array of information, news and analytics to facilitate financial decision-making.

Slide 13

Slide 13 text

© 2018 Bloomberg Finance L.P. All rights reserved. Finding a trading strategy is not unlike MLDC

Slide 14

Slide 14 text

© 2018 Bloomberg Finance L.P. All rights reserved. Data is the backbone of the financial markets ● Historically mostly “structured” market data (ticks/quotes/trades) ○ Well-understood ○ Enabling advanced forms of automation ● Other types of data/information ○ Real-world events, natural disasters ○ Sociocultural phenomena ○ Economic indicators ○ Sales, revenue forecasts, futures, etc. ○ Government policies ○ Legal proceedings and litigation ○ The weather ○ … Our Challenge Identify financially-relevant signal from noisy, complex tangentially-related datasets

Slide 15

Slide 15 text

© 2018 Bloomberg Finance L.P. All rights reserved. Data is the backbone of the financial markets ● Increasingly non-traditional factors, based on “alternative” data, such as: ○ Satellite images / CO2 emissions over factories ○ Sentiment analytics on news ○ Shopping mall footfall traffic ○ Number of people riding the subway ○ “Pret index” ○ Credit card transactions ○ etc. ● But also “unstructured” data… Our Challenge Identify financially-relevant signal from noisy, complex tangentially-related datasets

Slide 16

Slide 16 text

© 2018 Bloomberg Finance L.P. All rights reserved. Data is the backbone of the financial markets ● Increasingly non-traditional factors, based on “alternative” data, such as: ○ Satellite images / CO2 emissions over factories ○ Sentiment analytics on news ○ Shopping mall footfall traffic ○ Number of people riding the subway ○ “Pret index” ○ Credit card transactions ○ etc. ● But also “unstructured” data… Our Challenge Identify financially-relevant signal from noisy, complex tangentially-related datasets

Slide 17

Slide 17 text

© 2018 Bloomberg Finance L.P. All rights reserved. Data is the backbone of the financial markets ● Increasingly non-traditional factors, based on “alternative” data, such as: ○ Satellite images / CO2 emissions over factories ○ Sentiment analytics on news ○ Shopping mall footfall traffic ○ Number of people riding the subway ○ “Pret index” ○ Credit card transactions ○ etc. ● But also “unstructured” data… Challenge: identify financially-relevant signal from noisy, complex tangentially-related datasets.

Slide 18

Slide 18 text

© 2018 Bloomberg Finance L.P. All rights reserved. “Unstructured” data ● 80% of data exists in the form of “raw”, unstructured text, e.g., ○ Company filings, earnings call transcripts ○ Tweets, Reddit, Facebook posts, news stories ○ Research analyst reports, CRMs ○ Economic policy, govt communications ○ Press releases ○ Web pages ○ Chats & e-mail, client feedback ○ etc. ○ Lots of jargon and custom terminology (sometimes even firm-specific!) Our Challenge Identify financially-relevant signal from noisy, complex tangentially-related datasets

Slide 19

Slide 19 text

© 2018 Bloomberg Finance L.P. All rights reserved. Why?

Slide 20

Slide 20 text

© 2022 Bloomberg Finance L.P. All rights reserved. Sanford Bernstein’s Toni Sacconaghi “And so, where specifically will you be in terms of capital requirements?” Real-time multi-modal data moves markets speech recognition entity recognition linking salience topic classification summarization Elon Musk “Excuse me. Next. Boring, bonehead questions are not cool. Next?”

Slide 21

Slide 21 text

Latency matters entity recognition linking salience sentiment

Slide 22

Slide 22 text

© 2022 Bloomberg Finance L.P. All rights reserved. Finance professionals (i.e., our users)

Slide 23

Slide 23 text

Roles in finance different roles → different needs different times → different needs

Slide 24

Slide 24 text

© 2022 Bloomberg Finance L.P. All rights reserved. Even within a single role, context matters financial research analyst REVIEW FORECASTS sell-side research press releases IDENTIFY RISK liquidity analysis new market conditions LISTEN TO CONFERENCE CALLS assess tone note new guidance / comments take notes for later reference FIND & READ RELEVANT DOCS financial reports presentations sell-side research ASSESS FUTURE PERFORMANCE competition ESG picture news / media sentiment tone in management calls ANNOTATE & STRUCTURE FINDINGS highlight research copy and share snippets set up alerts for new events EARNINGS SEASON NEW COMPANY

Slide 25

Slide 25 text

© 2022 Bloomberg Finance L.P. All rights reserved. Even within a single role, context matters portfolio manager IDEATION sell-side research press releases searching/browsing inefficiencies in the market FIND & READ RELEVANT DOCS financial reports presentations sell-side research RISK ANALYSIS liquidity analysis valuation peer analysis sell-side analyst recommendations ASSESS FUTURE PERFORMANCE industry comparison ESG news / media sentiment backtest TRADE factor analysis pricing information forecast market conditions MONITOR identify anomalies news alerts sell-side analyst recommendations IDEA GENERATION TRADE

Slide 26

Slide 26 text

© 2018 Bloomberg Finance L.P. All rights reserved. User models ● Most of our clients use the Terminal in their day-to-day workflows to: ○ Trade ○ Spot inefficiencies/opportunities in the market ○ Find signal ○ Keep abreast of developments ○ etc. ● Deeply-engrained muscle memory for executing Bloomberg functions ● Limited room for “discovery”

Slide 27

Slide 27 text

© 2018 Bloomberg Finance L.P. All rights reserved.

Slide 28

Slide 28 text

So where does one start?

Slide 29

Slide 29 text

© 2018 Bloomberg Finance L.P. All rights reserved. Today's topic Search and discovery, applied research, and corresponding challenges/opportunities – all in the context of the financial services domain

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

10K+ Functions (verticals) Structured Data* EQS Equity Screening PEOP People Search SRCH Bond Search … Unstructured Data BI BBG Intelligence NSE News HELP Help … Commands* GP Charting ALRT Alerts MAP Mapping …

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

© 2018 Bloomberg Finance L.P. All rights reserved. Bloomberg Terminal commandline search ● Autocomplete: vast majority is navigational (i.e., no search intent) ● Returning results of vastly different types ○ Securities ○ Non-securities ■ People ■ Companies ■ Wikipedia ■ Help pages + FAQs ■ Contributors ■ Issuers ■ Research categories/topics/analysts ■ Definitions ■ Fields ■ Functions

Slide 35

Slide 35 text

© 2018 Bloomberg Finance L.P. All rights reserved. Federating across many disparate units of retrieval ● Many disparate sources ○ Some indexed by us ○ Some indexed by others/owners ● How do you normalize scores for results from different back-ends, and then merge and present a single list? ○ Account for different “document” lengths and collection statistics ○ Account for different “document” fields ○ Account for multiple languages ○ Account for multiple ranking functions ○ Account for different, perhaps non-comparable scores

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

EQS An example workflow in 30 seconds. k

Slide 41

Slide 41 text

Show me the EPS of Chinese pharmaceutical companies that had dividend yield over 5% last year EQS

Slide 42

Slide 42 text

Show me the EPS of Chinese pharmaceutical companies that had dividend yield over 5% last year EQS

Slide 43

Slide 43 text

What are the market caps of German pharmaceuticals? Show me the EPS of Chinese pharmaceutical companies that had dividend yield over 5% last year EQS

Slide 44

Slide 44 text

What are the market caps of German pharmaceuticals? Show me the EPS of Chinese pharmaceutical companies that had dividend yield over 5% last year EQS

Slide 45

Slide 45 text

Show me the EPS of Chinese companies that had had dividend yield over 5% last year Show me the EPS of Chinese pharmaceutical companies that had dividend yield over 5% last year EQS

Slide 46

Slide 46 text

What are the market caps of German pharmaceuticals? Show me the EPS of Chinese pharmaceutical companies that had dividend yield over 5% last year EQS

Slide 47

Slide 47 text

What are the market caps of German pharmaceuticals? Show me the EPS of Chinese pharmaceutical companies that had dividend yield over 5% last year EQS

Slide 48

Slide 48 text

What are the market caps of German pharmaceuticals? Show me the EPS of Chinese pharmaceutical companies that had dividend yield over 5% last year EQS

Slide 49

Slide 49 text

What are the market caps of German pharmaceuticals? Show me the EPS of Chinese pharmaceutical companies that had dividend yield over 5% last year EQS

Slide 50

Slide 50 text

What are the market caps of German pharmaceuticals? Show me the EPS of Chinese pharmaceutical companies that had dividend yield over 5% last year EQS

Slide 51

Slide 51 text

What are the market caps of German pharmaceuticals? Show me the EPS of Chinese pharmaceutical companies that had dividend yield over 5% last year EQS

Slide 52

Slide 52 text

What are the market caps of German pharmaceuticals? EQS Show me the EPS of Chinese pharmaceutical companies that had dividend yield over 5% last year

Slide 53

Slide 53 text

© 2018 Bloomberg Finance L.P. All rights reserved. Semantic parsing framework • Reuse, reuse, reuse! Lots of domains, but they share a lot in common: language about time, currency, aggregation ops, … • Developer efficiency Build a semantic parser for a new domain fast, then iterate • Flexibility To support language for new semantic operations • Performance Interactive times, on par with the typical search engine • Interpretability Not just an answer, but how it was derived Developer User

Slide 54

Slide 54 text

© 2022 Bloomberg Finance L.P. All rights reserved. DATE/TIME, TOPIC, ENTITY RECOGNITION oil last week (topic:OIL, time(-1,week,now)) COMPLEX LOGICAL STRUCTURES german or french parliament elections ((topic:GEPARM or topic:FRPARM) AND topic:ELECTIONS) OPEN VOCABULARY pandora papers ("pandora papers") vs. ("papers" AND company:PNDORA@DC) Query understanding for News sony or toyota last two weeks in japanese german or french parliament elections oil last week pandora papers

Slide 55

Slide 55 text

© 2022 Bloomberg Finance L.P. All rights reserved. Query understanding is used across the Bloomberg Terminal Bonds Charts Economic Equities Holdings News People Show all floating Asian tech bonds maturing in the next 10 years Yearly net income of Google and IBM in last 20 years Show GDP of China and Germany in Q1 2016 What are the top 10 Asian tech companies with eps >= 4 German holders of French ETFs Show me news about oil from the FT from the last month Who are the UMich alumni that work for GS in NYC?

Slide 56

Slide 56 text

© 2018 Bloomberg Finance L.P. All rights reserved. Aggregated autocomplete suggestions Prime Terminal real estate BondsAC NewsAC 5W1H AC ChartAC … Balance: Relevance (short term) Get things done • Clicks • Task completion Discovery (long term) Serendipity and utility • Adoption • Workflows Personalization Semantically Driven Auto-completion, CIKM 2019 Auto-completion for Question Answering Systems at Bloomberg, SIGIR 2018

Slide 57

Slide 57 text

© 2022 Bloomberg Finance L.P. All rights reserved. (Text-based) Question Answering

Slide 58

Slide 58 text

No content

Slide 59

Slide 59 text

© 2018 Bloomberg Finance L.P. All rights reserved. (Text-based) Question Answering

Slide 60

Slide 60 text

© 2018 Bloomberg Finance L.P. All rights reserved. TextQA Document Stores Retriever (tens of documents) Scorer & Answer Extractor Multiple domains Multiple backends Multiple extractors Confidence Modeling

Slide 61

Slide 61 text

© 2018 Bloomberg Finance L.P. All rights reserved. Multiple domains Multiple backends Multiple extractors Confidence Model High confidence Moderate confidence Regular Result Best Answer Low confidence Φ No Show TextQA Document Stores Retriever (tens of documents) Scorer & Answer Extractor Show/No show

Slide 62

Slide 62 text

© 2018 Bloomberg Finance L.P. All rights reserved. Some more exotic examples ● How much new railway development is planned in Europe? Should projects work as planned and remain on schedule, Europe could have more than 17,200 km of new and upgraded railway by 2025, out of a total planned pipeline of 39,180 km. This equates to about $97 billion of investment in 2022 and $110 billion in 2023. High-speed rail could provide about 22,500 additional km. Industrial and materials producers are closely monitoring higher demand. The U.K., France, Russia, Italy, Germany and Sweden -- where some listed construction companies operate -- are showing up with leading projects, both in execution or in the planning phase.EU governments are making significant efforts to use the co-funding available -- as much as possible -- to boost their economies. From 2000-17, Spain spent 47.3% of EU funds on high-speed rail. ● When will the cruise industry return to profit? RECENT EVENT REACTION: Carnival's deflated January bookings for 2H cruises elevates our concern that the industry's targeted 2H return to profit could falter if the omicron variant's impact on sales lingers. Though Carnival expects to operate over 96% of disclosed 1Q capacity days despite virus disruption, even that shortfall could clip 1Q sales by 4% vs. consensus, our analysis shows.

Slide 63

Slide 63 text

© 2018 Bloomberg Finance L.P. All rights reserved. Challenges (and thus Opportunities!)

Slide 64

Slide 64 text

© 2018 Bloomberg Finance L.P. All rights reserved. Challenges (and thus Opportunities!) ● Federating across many disparate units of retrieval ● Partial observability: incomplete/noisy interactions ● Augmented intelligence ● Staying performant

Slide 65

Slide 65 text

© 2018 Bloomberg Finance L.P. All rights reserved. Federating across many disparate units of retrieval

Slide 66

Slide 66 text

© 2018 Bloomberg Finance L.P. All rights reserved. Federating across many disparate units of retrieval ● Search-as-a-platform ● In-domain vs. cross-domain search ● Domain specialists vs. novice users ○ Aliases in-context ○ How to quality control? ● Sample live queries as much as you can

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

No content

Slide 70

Slide 70 text

No content

Slide 71

Slide 71 text

© 2018 Bloomberg Finance L.P. All rights reserved. Federating across many disparate units of retrieval ● Entity / Intent / Domain detection, for partial as well as for full queries ○ In some cases NER is sufficient (when the label is unique) ○ Typeahead prediction first, and then NER/NED on the full query instead of on partial queries seems to work better than NER/NED on partial queries ● Adjust downstream ranking and presentation accordingly Identifying Named Entities as they are Typed, EACL 2021

Slide 72

Slide 72 text

© 2018 Bloomberg Finance L.P. All rights reserved. Learning to Rank, in practice

Slide 73

Slide 73 text

© 2018 Bloomberg Finance L.P. All rights reserved. NLP in Practice ● Generation 1: Write a bunch of rules (“templates”, “grammars”) ○ High-precision ○ Slow, manual, difficult to maintain or update ● Generation 2: Train a statistical classifier ○ For sequence tagging: conditional random fields ○ For document classification: logistic regression, SVMs, decision trees/random forests ○ Need labeled data ● Generation 3: Deep learning and human in the loop ○ Need a lot of labeled data, or distant supervision ○ May be slower

Slide 74

Slide 74 text

© 2018 Bloomberg Finance L.P. All rights reserved. Learning to Rank in practice ● Generation 1: Parametrized BM25F / LM for IR ● Generation 2: LambdaRank, RFs, GBRTs, (“deeper” learning) ○ Address cold-start issues, add contextual info, enrich instances ● Generation 3: Beyond supervised learning ○ Neural IR / Dense retrievers / Vector-based similarity ○ Reinforcement learning ○ Want to optimize for long-term “utility” and stickiness

Slide 75

Slide 75 text

© 2018 Bloomberg Finance L.P. All rights reserved. Partial observability ● How to train LTR models with (i) limited amounts of (ii) weakly supervised data? ○ System bias: only feedback on seen items ○ What if you don’t have that many users? ○ What if you have cohorts with outliers/different behaviours? ● Cold-start issues around sampling questions from logs for training ○ If users don’t know they can ask questions they probably won’t ○ Generate + paraphrase questions instead Novelty Controlled Paraphrase Generation with Retrieval Augmented Conditional Prompt Tuning, AAAI 2022

Slide 76

Slide 76 text

© 2018 Bloomberg Finance L.P. All rights reserved. Temporal relevance ● How to deal with annotations/relevance assessments that change/decay over time? ○ Developments in the real world ○ Timeliness of answers, stale answers, expiring answers ○ Temporally-anchored answers ● Need online metrics, ability to blocklist, continuous (re)training

Slide 77

Slide 77 text

© 2018 Bloomberg Finance L.P. All rights reserved. “Augmented” intelligence ● In industry, no model is static ○ New entities, new vocabulary, new contexts, new relationships, new data, etc. ● Humans in the loop ○ To generate questions, to annotate results, to judge relevance ○ Help prevent model drift and ameliorate lack of recall, precision ○ Provide important training data ● “Automation is augmentation, not replacement” ○ Need effective, easy to use tools for humans to work with algorithms

Slide 78

Slide 78 text

© 2018 Bloomberg Finance L.P. All rights reserved. Practical considerations

Slide 79

Slide 79 text

© 2018 Bloomberg Finance L.P. All rights reserved. Practical considerations ● Interpretability, explainability ● Regulatory / compliance ○ Data permissioning: per-role/per-person ○ On-/Off-prem data storage, compute ○ Encrypting data at-rest and in-transfer ○ Private data flows, separated networks ○ Right to be forgotten

Slide 80

Slide 80 text

© 2018 Bloomberg Finance L.P. All rights reserved. Practical considerations ● Buy vs. build: invest in resources and build from scratch, partner with vendor(s), or look at (and potentially improve) open source? ○ Type of problem, type of data ○ Where is the “alpha”? ○ Accuracy ○ Transparency ○ Customization ○ Maintenance, ease adding more/different data ○ Time to market ○ Privacy/Regulatory concerns ○ Cost

Slide 81

Slide 81 text

© 2018 Bloomberg Finance L.P. All rights reserved. Staying performant ● Being dependent on a (search) stack ○ Elastic? Solr? ○ Migrations, removing tech debt ○ Patching up old systems or redesigning? ○ Second stage re-ranker in-/outside of Solr ● Legacy systems and patchwork processes ○ Allocate time to disentangle and move to modern platforms and architectures

Slide 82

Slide 82 text

© 2018 Bloomberg Finance L.P. All rights reserved. Conclusion ● Search at Bloomberg: making structured and unstructured data machine- readable, human-interpretable, discoverable, and findable ○ At scale with high accuracy and low latency, to enable swift and effective financial decision-making ● Deliver value by pushing the state-of-the-art through applied research ○ Address challenges encountered in “production” scenarios (cold-start issues, confidence modeling, partially observed behavior, system-induced biases, and more) ○ Validation through scientific peer review, open source contributions ● Generate data and perform continuous annotations/training with a human-in- the-loop, to address (some of) these issues

Slide 83

Slide 83 text

© 2018 Bloomberg Finance L.P. All rights reserved. © 2022 Bloomberg Finance L.P. All rights reserved. https://TechAtBloomberg.com/AI https://TechAtBloomberg.com/data-science-research-grant-program/ https://www.bloomberg.com/careers @edgarmeij | [email protected] Thank you