Initiatives of ABYSS, a Vertical Search Platform

Agenda - Self introduction - Search technology in Yahoo! JAPAN
- What is ABYSS? - Use case 1: Timeline of Yahoo! JAPAN App - Use case 2: Yahoo! JAPAN Knowledge Search - Conclusion 2

Self introduction What do I do? I work as an
ABYSS engineer. What do I like? I like fast things, especially cars. One of my favorite hobby is Mini 4WD. For that, my internal slack icon is Mini 4WD made by myself. 3

Do you know? image: aflo 4

Search is NOT ONLY a search box. image: aflo 5

In Yahoo! JAPAN 6

Implicit searches are running in the form of recommendations. 7

What service does use search technology? Question 8

Search websites 9

Search weather 10

Search mails 11

Search items 12

Search transit information 13

Almost all services use search technology in Yahoo! JAPAN. 14

Search is necessary technology image: aflo 15

ABYSS is a search platform to support these services. image:
aflo 16

What is ABYSS? 17

“abyss” Abyss comes from Greek: a- "without" + byssos, "depth,
bottom." You may know the related adjective abysmal, which means "appallingly bad" — or "way down in the depths," as it were. from vocabulary.com image: aflo 18

Is ABYSS named after hell? image: aflo 19

/P image: aflo 20

image: aflo Automated Building Yahoo! Search Sdk ABYSS is 21

History of ABYSS 2002 • In-house search application released 2010
•ABYSS 1.0 released using in-house search application 2014 •ABYSS 2.0 released using Apache Solr 22

Why ABYSS was born? 23

Before the birth of ABYSS • In-house search application is
packaged and published. • Service personnels install and maintain the application on real servers. • Knowledges are collected among each services • However, there is no guarantee that these knowledges work well in other service because application execution environments are different each other. Service 24

ABYSS is born as a common search platform. • Knowledges
are collected in one place. • These knowledges work well in other service because application execution environments are the same Service Platform • Platform personnels focus on reducing operating costs and incorporating modern technologies. • Service personnels focus on improving their service. ABYSS makes a foundation for developing successful cases from one service to another. Roles are clarified. 25

Current ABYSS 26

ABYSS is based on Apache Solr. 27

• An open source full-text search application. • Written in
Java. • Solr can extend its functions by implementing interfaces. What is Solr? 28

ABYSS provides managed environments as a common search platform. image:
aflo 29

100+ services use ABYSS. 30

10,000+ virtual machines are running in ABYSS. image: aflo 31

Focus on reducing operating costs and incorporating modern technologies. image:
aflo 32

Reduce operating costs 33

VM layer Container layer • Use virtual machines (VMs) provided
by in-house cloud vendor. • Use container technology by building our own Solr container image. Reduce operating costs by virtualization technology. 34

※ We call “feeding” to put data into ABYSS VM
layer Container layer Data • Save fed data on the VM layer outside the container layer. • ABYSS operator can recreate containers without worrying about data loss. Reduce operating costs by virtualization technology. 35

Physical layer VM layer Container layer Data • VM can
lose its data due to physical layer failure. • In ABYSS, shards have at least 3 replicas for redundancy. • By Solr’s replication, shards will not lose data even if one VM loses its data. Reduce operating costs by redundancy. 36

checker Physical layer VM VM VM check SPOF • It’s
not safe if some VMs in one shard run on the same physical layer. • ABYSS has checker component which checks whether shards have a single point of failure (SPOF) on the physical layer. • If SPOF is found, ABYSS operator remove it by swapping VMs. Reduce operating costs by removing single points of failure (SPOF). Data Data Data 37

ABYSS Diagram of ABYSS internal API Web UI zookeeper routing
server Service personnel set configurations see logs search via service servers ~ ~ User 38

Diagram of ABYSS ABYSS internal API Web UI zookeeper routing
server User Service personnel set configurations set configurations ~ ~ 39

Diagram of ABYSS ABYSS internal API Web UI zookeeper routing
server User Service personnel Cloud vendor API repairer checker restart VM check SPOF check failure and recovery information ~ ~ 40

ABYSS operator can rest assured even if a server fails.
Virtualization Remove SPOF Auto healing image: aflo Summary 41

Incorporate modern technologies 42

image: aflo If we could realize the search written in
this paper, we would deliver a better search experience to users. Common feature request 43

image: aflo Please wait until the function is implemented in
the application, the version that can use it is released, and our operation check is completed. 44

We can't wait for that. image: aflo 45

Service Science Platform Development relationship Around ABYSS, platform team and
science team work together to realize service requests. 46

Service Science Platform Science team implements Solr extensions for service
requests. Platform team incorporates these extensions to ABYSS So that, service team can use requested functions earlier than Solr natively supports it. Development relationship 47

voronoi diagram What is ANN? ANN is a technology to
"guess" the document vectors nearest to a query vector. ANN drastically reduces search latency by losing only a bit of accuracy. In case of Approximately Nearest Neighbor search (ANN) Plugin 48

ANN is natively supported in Solr 9.0.0 released in May
2022 2020 2021 2022 Background In the fields of image search and natural language processing (NLP), vector search attracts attention. In case of Approximately Nearest Neighbor search (ANN) Plugin 49

Science team starts to develop ANN plugin in 2020 The
first service to use ANN plugin consults ABYSS in Jun 2021. In case of Approximately Nearest Neighbor search (ANN) Plugin 2020 2021 2022 Prototype of ANN plugin is delivered to ABYSS in November 2020 The service switch to using ANN plugin in October 2021. 50

ABYSS incorporates many in-house extensions like this. image: aflo 51

Popular in-house extensions • Japanese morphological analysis tokenizers • WebMA
Tokenizer • Asagi Tokenizer • Two Phase Ranker (TPR) Plugin • multi-phase ranking • dense vector search • Approximately Nearest Neighbor search (ANN) Plugin • Dedupe Plugin 52

Service Science Platform Reduce operating costs Incorporate modern technologies Summary
Virtualization Remove SPOF Auto healing image: aflo 53

Use case 1 54

Timeline of Yahoo! JAPAN App 55

The timeline where many articles appear by scrolling 56

Functional requirements 57

We want to recommend articles according to individual interests. Functional
Requirement １ 58

We want to recommend articles that have just been created.
Batch processing every 5 or 10 minutes does not meet this requirement. Functional Requirement 2 59

System requirements 60

thousands requests / sec (Normal daytime) thousands requests / sec
* 2 (Daily peak time) thousands requests / sec * n (Push notification triggered) A huge mount of requests come System Requirement 1 61

Show articles immediately. tens of milliseconds (average) hundreds of milliseconds
(99%ile) System Requirement 2 62

We want to recommend articles with high accuracy by using
machine learning. However, there is a trade-off between accuracy and response speed image: aflo System Requirement 3 63

How do we meet these requirements? 64

ABYSS Article User Generate and update user vector Generate article
vector Search articles with the user vector Feed articles with their article vectors A simple vector search runs in ABYSS. More details: https://www.slideshare.net/techblogyahoo/yjtc-yjtc21-a1-241223218 Run machine learnings outside of ABYSS. image: aflo 65

What does ABYSS team provide? 66

We provide many clusters with many replicas There are too
many requests to be received by one cluster. To distribute them, we use many clusters with the same data. We build the clusters with middle spec VMs because a simple vector search is running on them. High-end spec VM High spec VM Middle spec VM ← We provide! Low spec VM 67

Use case 2 68

Yahoo! JAPAN Knowledge Search 69

ABYSS is used to search questions and answers. 70

Functional requirements 71

Questions that have good answers are viewed many times. We
want to lead users to such questions because they visit Yahoo! JAPAN Knowledge Search to solve their problems. Functional Requirement 72

We want to score how good each question is. We
use machine learning for that. It's tough to score them using natural language processing. image: aflo Functional Requirement 73

System requirements 74

Questions and answers are collected over 18 years. • 250,000,000+
questions • 600,000,000+ answers 40,000+ questions are added every day. Since 2004 System Requirement 1 Huge mounts of questions and answers 75

Response performance Throughput thousands requests / sec (Normal daytime) Latency
tens of milliseconds (average) hundreds of milliseconds (99%ile) System Requirement 2 76

How do we meet these requirements? 77

Use Two Phase Ranker (TPR) plugin. • Generate a search
result in 2 phases. • 1st Phase • Score questions using simple morphological analysis. • Pass the top N (determined by the tuning parameter) in each shard to the 2nd Phase • 2nd Phase • Score N * shards questions using machine learning. • Rerank questions by this scores and return Top 10. Faster than scoring all questions that hit the search query using machine learning. 78

What does ABYSS team provide? 79

We provide large-scale clusters built with high-end spec VMs We
use high-end spec VMs that have large size memory and high-performance vCPU. Because Solr is a full-text search application, having all texts on memory greatly contributes to its performance. Using machine learning, we need high- performance vCPU. High-end spec VM ← We provide! High spec VM Middle spec VM Low spec VM 80

Summary ABYSS provides Solr clusters that meet service’s requirements and
maintain them. 81

Conclusion 82

We continue to reduce operating costs and incorporate modern technologies.
83

We introduced 2 use cases, timeline of Yahoo! JAPAN App
and Yahoo! JAPAN Knowledge Search. 84

We hope this presentation helps anyone who wants to improve
their search. 85

Thank you 86

Initiatives of ABYSS, a Vertical Search Platform

Initiatives of ABYSS, a Vertical Search Platform

More Decks by Tech-Verse2022

Other Decks in Technology

Featured

Transcript