Semantic Search Engine: Semantic Search and Query Parsing with Phrases and Entities

@KorayGubur Semantic Search Engine & Query Parsing In the Light
of Semantic Search Principles

@KorayGubur A b o u t M e Koray Tuğberk
GÜBÜR Owner and Founder of Holistic SEO & Digital • Educates his team • Publishes SEO Case Studies, Researches & Guides • Twitter: @KorayGubur • Email: [email protected] • Official Site: https://www.holisticseo.digital

@KorayGubur S E O C a s e S t
u d i e s o f K T G

@KorayGubur S E O G u i d e s
o f K T G

@KorayGubur W e b i n a r s a
n d I n t e r v i e w s o f K T G

@KorayGubur What is Query Parsing? • Query Parsing it the
process of understanding the different sections of a query. • Types: Entity-seeking Query, a Substitue Term, or Synonym Term. • Canonical and Represented Versions: A Canonical Query can represent close variations. • Query Character: Affects the SERP Design, Dominant and Minor Search Intent Assigments. • Query Process: Other name of the Query Parsing. @KorayGubur

@KorayGubur Multi-Stage Query Processing • The first patent that talks
about «Context of Words». • It tries to delete the stop words. • Stemming the concrete words. • Expanding words with Synonyms and Co- occurence. • Some Criterias: Absent Queries, Boolean Logic, Query Term Weights, Document Popularity, Word Proximity (Distance), Word Adjacency. • It uses «VIPS» and Web Page Layout. @KorayGubur Inventors: Jeffrey Adgate Dean, Paul G. Haahr, Olcan Sercinoglu, and Amitabh K. Singhal US Patent Application 20060036593 Filed: August 13, 2004 Published February 16, 2006

@KorayGubur Query Breadth • This is for «adjecent words» and
«unknown entities». • It uses related document count to see the ‘query breadth’. • Query Breadth can be decreased with the ‘adjecent word’ count. • Query Breadth can be used for ‘Named Entity Recognition’, or Triple Creation (An Object and two Subject). Invented by Karl Pfleger and Brian Larson Assigned to Google US Patent 7,925,657 Granted April 12, 2011 Filed: March 17, 2004 @KorayGubur

@KorayGubur Query Analysis • Selection Over Time: For different timespans,
a document can be chosen more frequently. • Documents with Hot Topics: Rising Queries can boost documents that include these queries. • Documents with Related Hot Topics: Related queries for rising queries can boost the documents with related queries. • Constant Queries with Consistently Changing Results: Constant Query is the always popular query with changing information for a topic. • Freshness of Documents: Date of the information on the web page, not the date of the document’s last version. @KorayGubur Invented by Karl Pfleger and Brian Larson Assigned to Google US Patent 7,925,657 Granted April 12, 2011 Filed: March 17, 2004

@KorayGubur Query Analysis • Staleness of Documents: Historical Data amount
can be a positive ranking signal for a page for a query. • Overly Broad Pages: Includes discordant queries, a signal for spam. • Continuation Patent filed in 2011 for «document locator». And, some terms changed. @KorayGubur Inventors: DEAN; Jeffrey; (Palo Alto, CA) ; Haahr; Paul; (San Francisco, CA) ; Henzinger; Monika; (Corseaux, CH) ; Lawrence; Steve; (Mountain View, CA) ; Pfleger; Karl; (Mountain View, CA) ; Sercinoglu; Olcan; (Mountain View, CA) ; Tong; Simon; (Mountain View, CA) Assignee: GOOGLE INC. Mountain View CA Family ID: 34381362 Appl. No.: 13/244853 Filed: September 26, 2011

@KorayGubur Query Analysis • Trends Related to Topics and Search
Terms: Grouping Topics, and Subtopics announced for Trending Queries. • Access Times to Determine Freshness and Staleness: Compares the First Access and Last Access time for certain documents. • Frequency of Selection: Compares the selection count for the first and latter time. • When Staleness Might be Preferred: Even if there is fresh news, or documents, the user can choose the stale document. These documents are not affected by stale information. • Spam Determination Based Upon Breadth of Rankings, and Authority: If the document is popular, or authoritative (link-based), or the source is relevant enough, it will be an exception. Inventors: DEAN; Jeffrey; (Palo Alto, CA) ; Haahr; Paul; (San Francisco, CA) ; Henzinger; Monika; (Corseaux, CH) ; Lawrence; Steve; (Mountain View, CA) ; Pfleger; Karl; (Mountain View, CA) ; Sercinoglu; Olcan; (Mountain View, CA) ; Tong; Simon; (Mountain View, CA) Assignee: GOOGLE INC. Mountain View CA Family ID: 34381362 Appl. No.: 13/244853 Filed: September 26, 2011

@KorayGubur Query Analysis • Continuation of the Historical Data Patent.
• Speaks about Topics, and Query Categorization based on Topics. • It is important beause, same year, Google Launched its Knowledge Graph with 5 million entities, and 500 million facts. @KorayGubur Inventors: DEAN; Jeffrey; (Palo Alto, CA) ; Haahr; Paul; (San Francisco, CA) ; Henzinger; Monika; (Corseaux, CH) ; Lawrence; Steve; (Mountain View, CA) ; Pfleger; Karl; (Mountain View, CA) ; Sercinoglu; Olcan; (Mountain View, CA) ; Tong; Simon; (Mountain View, CA) Assignee: GOOGLE INC. Mountain View CA Family ID: 34381362 Appl. No.: 13/244853 Filed: September 26, 2011

@KorayGubur Midpage Query Refinements • In 2006, Google published the
«Midpage Query Refinements», a.k.a, Search Suggestions from today. • The GUI test was between 2004-2006. • The patent filed in 2003. • Includes Semantic Query Clusters for Different Contexts. • A Matcher, a Clusterer, A Scorer, and A Presenter. @KorayGubur Inventors: Haahr, Paul; (San Francisco, CA) ; Baker, Steven; (Mountain View, CA) Correspondence Address: PATRICK J S INOUYE P S 810 3RD AVENUE SUITE 258 SEATTLE WA 98104 US Family ID: 34228721 Appl. No.: 10/668721 Filed: September 22, 2003

@KorayGubur Midpage Query Refinements • Precomputation Engine has four parts.
• Associator: Query and Document Association. • Selector: Document and Query Section Selector. • Regenerator: Checks the query logs to refresh the selections. • Inverter: Checks the Cached Data for presenting. @KorayGubur Inventors: Haahr, Paul; (San Francisco, CA) ; Baker, Steven; (Mountain View, CA) Correspondence Address: PATRICK J S INOUYE P S 810 3RD AVENUE SUITE 258 SEATTLE WA 98104 US Family ID: 34228721 Appl. No.: 10/668721 Filed: September 22, 2003

@KorayGubur Midpage Query Refinements • Query Ambiguity: If the query
is ambigous, Search Engine can use the query clusters. • Homonyms, General Terms, Improper Context, and Narrow Terms can create a stateless SERP Instance. • To prevent this, Semantic Grouping, Centroids and Centroid distance are used. • A Query Cluster and Document Cluster can be paired. If Document cluster is larger, or more relevant, the query cluster will be used as query suggestion. @KorayGubur Inventors: Haahr, Paul; (San Francisco, CA) ; Baker, Steven; (Mountain View, CA) Correspondence Address: PATRICK J S INOUYE P S 810 3RD AVENUE SUITE 258 SEATTLE WA 98104 US Family ID: 34228721 Appl. No.: 10/668721 Filed: September 22, 2003

@KorayGubur Midpage Query Refinements • Matcher: Stored query variations are
put into a cluster, and document phrase variations are matched. • Clusterer: The matched query variations, and documents are clustered together. Different than query clusters. • Scorer: Determines the center of the centroid. If the term vectors are distant to the centroid, another cluster will be chosen by the Clusterer for Scorer. • Presenter: Created Clusters, and Centroids are presented to the user. According to the preferred choices, presenter will use sub- centroids. @KorayGubur Inventors: Haahr, Paul; (San Francisco, CA) ; Baker, Steven; (Mountain V CA) Correspondence Address: PATRICK J S INOUYE P S 810 3RD AVENUE SUITE 258 SEATTLE WA 98104 US Family ID: 34228721 Appl. No.: 10/668721 Filed: September 22, 2003

@KorayGubur Midpage Query Refinements • During 2017, the patent has
been refreshed. • The Scorer Method has been changed. • Representative Queries are chosen based on centroids. • For every cluster, a representative query is chosen. • According to the cluster size, and relevance scores, the clusters are aligned. • And, sub-queries are used as the refinement queries. @KorayGubur Inventors: Paul Haahr and Steven D. Baker Assignee: Google Inc. The United States Patent 9,552,388 Granted: January 24, 2017 Filed: January 31, 2014

@KorayGubur Midpage Query Refinements • Inventors of the Midpage Query
Refinement Methodology are Paul Haahr and Steven D. Baker. • Steven Baker has written the Google Synonyms Blog Post for Google’s Synonym Update before the RankBrain Announcement. • Helping Search Engines to Understand Language: https://googleblog.blogspot.com/2010/01/hel ping-computers-understand-language.html • Paul Haahr is the owner of the How Google Works Presentation from SMX West. Includes lots of useful insights. @KorayGubur Inventors: Paul Haahr and Steven D. Baker Assignee: Google Inc. The United States Patent 9,552,388 Granted: January 24, 2017 Filed: January 31, 2014

@KorayGubur Context-Vectors • Midpage Query Refinements and Query- Document Logical
Pairs with Centroids and Clusters are the beginning of RankBrain. • Context-Vectors were the second step for completing the journey. • Word Vectors and Context Vectors are different from each other. • Word Vectors are the combination of words. • Context Vectors are the list of combination of words for a Contextual Domain. • Term Vector is a word combination from a Contextual Domain. @KorayGubur Inventors: David C. Taylor Application Date: 09/04/2012 Grant Number: 09449105 Grant Date: 09/20/2016

@KorayGubur Context-Vectors • Context-Vectors are close to the ‘Lexicon’ of
the first research paper of Google which is An Anatomy of Large Hypertextual Web Search Engine document. • Context-Vectors are the version of Lexicon with different Contextual Domains. • Context-Vectors are located in Domain List Terms. • A Domain List Terms can include 800.000 words, and word combinations. • A Domain List Terms can include a macro- context, and a sub-context with sub- portions. @KorayGubur Inventors: David C. Taylor Application Date: 09/04/2012 Grant Number: 09449105 Grant Date: 09/20/2016

@KorayGubur Context-Vectors • Context-vectors use ‘Topical Entries’. • A Topical
Entry, can be used for macro- context. • These topical entries can be used for question generation. • Generated questions can be used for differentiating the different sub-contexts from each other. • A Macro-context can have a Dominant Knowledge Domain. A Context-Vector can be used for intersectional areas. @KorayGubur Inventors: David C. Taylor Application Date: 09/04/2012 Grant Number: 09449105 Grant Date: 09/20/2016

@KorayGubur Categorical Quality • This is an ‘Re-ranking’ Algorithm Patent.
• There is a strong difference between the Re-ranking and Initial Ranking. • Re-ranking Algorithms are the modifying algorithms for the Query Results. • Inventor is Tyrstan Upstill, author of the Evidence-based Ranking Research. • Categorical Quality doesn’t focus on relevance, or authoritativeness, it focuses on Understanding the Category of the Query. @KorayGubur Inventors: Trystan G. Upstill, Abhishek Das, Jeongwoo Ko, Neesha Subramaniam, and Vishnu P. Natchu US Patent Application: 20190155948 Published on: May 23, 2019 Filed: March 31, 2015

@KorayGubur Categorical Quality • This patent mentions the ‘social media
shares’ and community size. • If the query satisfy the ‘categorical query’ conditions, the search results will be evaluated for related and close queries too. • If a resource satisfies also the related categorical queries, a categorical quality score will be assigned to the source. • Categorical Quality Methodology collects Navigational Queries for different sources. • If the source has more navigational queries, it means that it has a popularity for the category. • Categorical Quality mentions «Topicality Score». @KorayGubur Inventors: Trystan G. Upstill, Abhishek Das, Jeongwoo Ko, Neesha Subramaniam, and Vishnu P. Natchu US Patent Application: 20190155948 Published on: May 23, 2019 Filed: March 31, 2015

@KorayGubur Categorical Quality • If a source includes all query
terms for a topic, it will have more Categorical Quality and Topicality Score. • This method also mentions ‘Click Selection.’ • To understand the Model’s Success, they do not take every click or CTR into account. • They take CTR and Clicks into account if it meets with certain criterias such as time, frequency, or personal interest. @KorayGubur Inventors: Trystan G. Upstill, Abhishek Das, Jeongwoo Ko, Neesha Subramaniam, and Vishnu P. Natchu US Patent Application: 20190155948 Published on: May 23, 2019 Filed: March 31, 2015

@KorayGubur Substitue Query • Substitue Query is the query that
can replace another query. These queries are used for bolding the some sections of the content. • Substitue Queries make ‘context’ more important. Because, synonyms make change the context. Such as, car and auto can be same thing for ‘repair’ but they are not same for ‘railroad’. • There is a railroad car, but not auto. • Thus, Sustitue Queries are not synonyms. They are the replacble words without changing the context. @KorayGubur Invented by Daisuke Ikeda and Ke Yang Assigned to Google US Patent 8,504,562 Granted August 6, 2013 Filed: April 3, 2012

@KorayGubur Substitue Query • Co-occurence Matrix and Phrase- based Indexing
are used to support the Substitue Queries. • The method uses the Space Vectors to compare the word vectors to each other. • If the queries are similar to each other with enough co-occurent words, it means that they can be subtitue to each other. @KorayGubur Invented by Daisuke Ikeda and Ke Yang Assigned to Google US Patent 8,504,562 Granted August 6, 2013 Filed: April 3, 2012

@KorayGubur Synthetic Query • Synthetic Query is the re-written version
of the query of the user by the search engine. • A search engine can re-write a query by augmenting the query to diversify the SERP Features for a better search activity satisfaction possibility. • Some score types that Synthetic Queries include are ‘Edit Distance Score’, ‘Similarity Score’, ‘Transformation Cost Score’. • Synthetic Queries can be collected from web documents, Structured Data, and Similarity Between Documents. @KorayGubur Inventors: Anand Shukla, Mark Pearson, Krishna Bharat and Stefan Buettcher Assignee: Google LLC US Patent: 9,916,366 Granted: March 13, 2018 Filed: July 28, 2015

@KorayGubur Synthetic Query and Query Templates • Query Templates are
intermediary forms between the Seed Queries and Synthetic Queries. • Synthetic Queries are helpful for a Search Engine to create pre-defined and pre-ordered SERP Instances. • Synthetic Queries can be generated from HTML Tags, IDF Scores, Close Phrases. • If a Document has «Dorothy Parker Biography» as H1, and «Sylvia Plath» as H2. • Search Engine can use the «Sylvia Plath Biography» as a synthetic query. • If the results are good enough for relevance and quality, the Synthetic Query will become a Seed Query. @KorayGubur Invented by Steven D. Baker, Michael Flaster, Nitin Gupta, Paul Haahr, Srinivasan Venkatachary, and Yonghui Wu Assigned to Google US Patent 8,346,792 Granted January 1, 2013 Filed: November 9, 2010

@KorayGubur Synthetic Query and Query Templates • Synthetic Queries can
be generated from the same author, same journal, source, or time of period. • Synthetic Queries and Open Information Extraction are closely related to each other. • Before entering the world of entities, understanding the world of phrases are important. • Open Information Extraction, and Unknown Phrases, Entities are connected to each other. @KorayGubur Invented by Steven D. Baker, Michael Flaster, Nitin Gupta, Paul Haahr, Srinivasan Venkatachary, and Yonghui Wu Assigned to Google US Patent 8,346,792 Granted January 1, 2013 Filed: November 9, 2010

@KorayGubur Open Information Extraction • Google bought Wavii for 30.000.000$
in 2013. • Open Information Extraction is about ‘fact extraction’ around nouns. • It is for connecting different nouns to each other based on relations. • A classifier assigns a confidence scores to a relation between two nouns. • This is a text-to-data example. • Wavii was originally a news aggregator based on topics, not phrases. @KorayGubur Invented by Michael J. Cafarella, Michele Banko, and Oren Etzioni Assigned to: University of Washington through its Center for Commercialization United States Patent 7,877,343 Granted January 25, 2011

@KorayGubur Open Information Extraction • The relational tuples include at
least two nouns by connected to each other at least one verb and adverb, such as ‘created by’, ‘author of’, ‘is from’, ‘located there’. • ‘... Moreover, the number and complexity of entity types on the Web means that existing NER systems are inapplicable...’ • Open IE is for Unknown Entities, and recognizing Minor Entities without a registration to the Knowledge Base. @KorayGubur Invented by Michael J. Cafarella, Michele Banko, and Oren Etzioni Assigned to: University of Washington through its Center for Commercialization United States Patent 7,877,343 Granted January 25, 2011

@KorayGubur Answer-seeking Query • Answer-seeking Queries have specific elements within
the questions, and answers. • Google’s purpose is that extracting question and answer formats for answer- seeking queries. • Answer-seeking queries requires concise answers without any skepticism. • Answer-seeking Query is an important bridge between the Natural Language Queries with an Intent. @KorayGubur Inventors: Yi Liu, Preyas Popat, Nitin Gupta, and Afroz Mohiuddin Assignee: Google LLC US Patent: 10,592,540 Granted: March 17, 2020 Filed: June 28, 2016

@KorayGubur Answer-seeking Query • Question Elements are, Entity Instance, Entity
Class, Part of Speech Class, Root Word, N-Gram and Question Triggering Words. • Answer Elements are Measurement, N- Gram, Verb, Preposition, Entity_instance, N-gram near entity, verb near entity, preposition near_entity, verb class, skip grams. • Answer-seeking Queries trigger Answer Scoring Engine, @KorayGubur Inventors: Yi Liu, Preyas Popat, Nitin Gupta, and Afroz Mohiuddin Assignee: Google LLC US Patent: 10,592,540 Granted: March 17, 2020 Filed: June 28, 2016

@KorayGubur Natural Language Queries • Natural Language Queries are the
queries with the daily language. • They do not have a proper grammar rule, or complete sentence. • They do not explicitly tell their intent. • That’s why these queries also called Intent Queries, or Queries with a specific minor intent. • For such a query, a Search Engine should return an answer without lots of details, or structure. @KorayGubur International Application No WO/2014/197227 Published:11.12.2014 International Filing Date: 23.05.2014 Applicant: Google Inventors: Tomer Shmiel, Dvir Keysar, and Yonatan Erez

@KorayGubur Natural Language Queries • Natural Language Queries are not
Factual-queries, this is the main difference for Answer-seeking queries. • Natural Language Queries are related to the Intent Template Generation. • A Natural Language Query can have multiple intents with a non-factual information, such as ‘How do I make hummus?’. • There might be different methods to make a hummus, and there are different types of hummus, also, the query includes ‘I’. So, no one can know how you do hummus. • The answer-seeking version of this query is that ‘How to do hummus’. • One of the important methodology points from here is that Google creates ‘heading-text’ pairs to understand the topics of the sub-sections of the article. @KorayGubur International Application No WO/2014/197227 Published:11.12.2014 International Filing Date: 23.05.2014 Applicant: Google Inventors: Tomer Shmiel, Dvir Keysar, and Yonatan Erez

@KorayGubur Natural Language Queries • Variable and Non-Variable Portions are
important concepts for the intent templates. • Non-variable section of the intent for the previous query is ‘hummus’. • The variable section or portion can be a ‘place, method, tool, or style’. And, ‘I’ can change as a child, as a women, men, or adult and blind person. • For Natural Language Queries, the Intent Templates can be implemented to different Query Patterns such as X Causes, X Reasons. • If someone searches for only X, the intent templates will be used to assign the natural language results to the query. @KorayGubur International Application No WO/2014/197227 Published:11.12.2014 International Filing Date: 23.05.2014 Applicant: Google Inventors: Tomer Shmiel, Dvir Keysar, and Yonatan Erez

@KorayGubur Query Rewriting for Same Intnet Across Languages • Google
tried to unite different search intents, data for these intents, and phrases that represents these intents to each other to improve the search results before. • This is called Query Expansion. Query Expansion can compare results for a query from a language, to results for the same query with a different language. • If the click satisfaction possibility is higher for another language, for the same intent, search engine can re-rank the results for the first language. @KorayGubur Invented by Stefan Riezler, Alexander L. Vasserman Assigned to Google US Patent Application 20080319962 Published December 25, 2008 Filed: March 17, 2008

@KorayGubur Seed-Queries • Seed Queries can be synthetic queries, user
generated queries. The main necessity for a seed query is that the query should be satisfying with a set of documents. • If a query is logical, popular and satisfying for the user, it will be marked as seed query whether it is synthetic or searcher generated. • Seed Queries are used to determine the representative queries for query variations, query and intent templates. @KorayGubur Inventors Manaal Faruqui and Dipanjan Das Applicants Google LLC Publication Number 20200167379 Filed: January 18, 2019 Publication Date May 28, 2020

@KorayGubur End of Phrase-based Indexing and Query Processing Chaos •
Query Parsing • Seed Query • Substitue Query • Natural Language Query • Answer-seeking Query • Factual Query • Non-factual Query • Non-variable Portion in Query • Variable Portion in Query • Discordant Query • Query Re-writing • Open Information Extraction • Synthetic Query • Categorical Query • Contextual Vectors • Term Vectors @KorayGubur • Intent Templates • Question and Answer Elements • Co-occurence Matrix • Query Expansion • Query Term Weight • Multi-stage Query Processing • Query Breadth • Query Template • Relation Types and Noun Tuples • Macro-context • Topical Entry • Mid-page Query Refinement • Query Ambiguity • Query Cluster – Document Cluster for Logical Pair • Associator, Matcher, Scorer for Query, Document Association • Edit Distance Score’, ‘Similarity Score’, ‘Transformation Cost Score’. • Phrase-based Indexing • Contextual Domains • Contextual Domain Word List • Query Analysis • Representative Query • Canonical Query • Minor Intent • Space Vectors • Navigational Query as a Popularity Signal • Evidence Based Ranking • Word Proximity • Word Adjecency • Query Term Weight

@KorayGubur First Semantic Web Announcement • Semantic Web Roadmap has
been published in September 1998 by Tim Barners-Lee. • Semantic HTML, and Semantic Web, Semantic User Patterns were the principles of Semantic Search. • The main purpose of Semantic Web is making the web understandable to machines so that machines can help humen-beings for better web surfing. • Tim Barners Lee talked about Agents, Ontology, Structured Data, RDFa, or Semantic HTML Tags and Digital Signature. • ‘Such an agent coming to the clinic's Web page will know not just that the page has keywords such as "treatment, medicine, physical, therapy" (as might be encoded today) but also that Dr. Hartman works at this clinic on Mondays, Wednesdays and Fridays and that the script takes a date range in yyyy-mm-dd format and returns appointment times. And it will "know" all this without needing artificial intelligence ‘ @KorayGubur ‘The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.’ -Tim Barners-Lee

@KorayGubur First Semantic Search Patent • Google’s first Semantic Search
Engine patent is from 1999. One year later from Tim Barners-lee’s announcement. • The Inventor is directly Sergey Bring. • Document doesn’t have a legal language, like other first patent instances of Google. • Document tells that every thing from similar type has same features. • Things on the web can be collected for certain type of information and stored with this information. @KorayGubur Invented by Sergey Brin Assigned to Google US Patent 6,678,681 Granted January 13, 2004 Filed: March 9, 2000

@KorayGubur First Semantic Search Patent • Sergey Brin encountered some
problems such as Named Entity Recognition, or Main Entity, and Entity Relation Detection. • These problems are not called based on Entities, but these books were entities with string representations. • Even a single letter difference resulted in big problems for Sergey Brin. • And, some books didn’t have price, or proper title, and some of them were not even real books. • In the first trying, the cost was high, process was slow, results were half, but Google kept going. @KorayGubur Invented by Sergey Brin Assigned to Google US Patent 6,678,681 Granted January 13, 2004 Filed: March 9, 2000

@KorayGubur Knowledge Graph Launch • ‘Things, not strings.’ is the
motto of Knowledge Graph. Everything on the web is divided into different entities, entity types, entity connections. • Named Entity Recognition, and Natural Language Processing increased its value and prominence within the algorithmic hierarchy of Google. • Knowledge Graph supported the Knowledge Panels. • Fact Extracting, Question Answering, Accuracy Audit, and Entity Relations are the columns of Entity-oriented Search Engine. • ‘Wouldn’t it be great understanding every word of user, instead of matching words?’, by Jack Menzel. @KorayGubur Inventors: John R. Provine Assignee: Google LLC US Patent: 10,922,326 Granted: February 16, 2021 Filed: March 14, 2013

@KorayGubur Browsable Fact Repisotory • Browsable Fact Repisotory is the
main and primitive version of the Google Knowledge Graph. • There are three important problems for Browsable Fact Repisotory. 1. Updating the Knowledge Graph. 2. Extracting the New Entities. 3. Auditing the Fact Accuracy. @KorayGubur Invented by Andrew W. Hogue and Jonathan T. Betz Assigned to Google Inc. US Patent 7,774,328 Granted August 10, 2010 Filed: February 17, 2006

@KorayGubur Entity-seeking Query • Today’s last Query type. • Entity-seeking
Queries are one of the basic columns of Entity-oriented search. • Identify the Query seeks for a singular entity, or plural things from same type. • If it is singular, entity-seeking query will match the term and the entity based on an attribute. • Entity-seeking Queries include a Semantic Dependency Tree, Relevance Threshold @KorayGubur Inventors: Mugurel Ionut Andreica, Tatsiana Sakhar, Behshad Behzadi, Marcin M. Nowak-Przygodzki, and Adrian-Marius Dumitran US Patent Application: 20190370326 Published: December 5, 2019 Filed: May 29, 2018

@KorayGubur Entity-seeking Query @KorayGubur

@KorayGubur Structured Search Engine @KorayGubur • Sergey Brin said, ‘Structured
Form’ in 1999. • In 2011, Andrew Hogue said Structured Search Engine. • Andrew Hogue introduced the Open- Domain Fact Extraction methodologies for extracting, clustering entities from the web. • Andrew Hogue has showed some concrete examples to the future Google Engineers for the direction that they want to head. Cartoon is created by Gary Larson.

@KorayGubur Semantic Search Engine @KorayGubur • Google can extract all
attributes of an entity to understand its general features. • According to the Source Attribute, these features can be changed, detected or altered. • Based on the entity types, and candidate entities, Google can generate more entity types, and connections between them. • Structured Search Engine’s other name is Semantic Search Engine. • Semi-structured Text Understanding, Question Generation from Keywords, and Question-Answer Pairing are the main objectives of Semantic Search Engine.

@KorayGubur Semantic Search Engine @KorayGubur This is a Query Parsing
Example from a Google Engineer for Entity-oriented Search. Source: The Structured Search Engine by Andrew Hogue

Example from a Google Engineer for Entity-oriented Search. Source: The Structured Search Engine by Andrew Hogue Named Entity Recognition process for the query. • Entity-seeking Queries are the backbone of the entity oriented search. • Recognizing an entity from a Query is not easy, or cheap. • Neural Matching, RankBrain, Sub-topic Update, or BERT, MuM, LaMDA... All of them are used for recognizing the entity, and its related attributes.

Example from a Google Engineer for Entity-oriented Search. Source: The Structured Search Engine by Andrew Hogue Second step is Entity Resolution. • Entity Resolution, and Attribute Extraction are for understanding the related attribute of the entity. • Entity-seeking Queries usually try to find an Entity’s Attribute such as look, height, taste, inception or history. • After the entity and its attribute are taken from the query, at the next step, Question Format will be taken.

Example from a Google Engineer for Entity- oriented Search. Source: The Structured Search Engine by Andrew Hogue Third step is Synonym Extraction. • Synonym Extraction is for strenghten the confidence score. • Other function of the Synonym Extraction is that, it helps for using alternate documents for the same question. • According to the Synonyms, the question format can change.

Example from a Google Engineer for Entity- oriented Search. Source: The Structured Search Engine by Andrew Hogue Question format is necessary to understand the query by increasing the confidence score, and matching the similar successful documents. • Question format is important to determine the answer format. • Quetion term order, and answer term order can increase the success rate. • The last important thing here is that the ‘answer data type’ which is a date.

Example from a Google Engineer for Entity- oriented Search. Source: The Structured Search Engine by Andrew Hogue Forth step is Entity Reconciliation and data accuracy audit. At the next step, Google can check the related search activity, possible search activity, and choose the best answer. • The answer formats, and answer phrases will be used for entity reconcilation. • Entity reconcilation includes the standartization of the entity with the correct information. • 5 Rand Fishkin Entity Recording exist in Knowledge Graph, for same Rand Fishkin.

Example from a Google Engineer for Entity- oriented Search. Source: The Structured Search Engine by Andrew Hogue Entity Reconcilation Inventors: Oksana Yakhnenko and Norases Vesdapunt Assignee: GOOGLE LLC US Patent: 10,331,706 Granted: June 25, 2019 Filed: October 4, 2017 Entity Reconcilation is another patent from Google. • It includes checking multiple sources to complete the missing information on the Knowledge Graph. • It also uses similarity threshold between different sources and the knowledge graph. • If the source is authoritative, it will be easier to modify the Knowledge Graph.

Example from a Google Engineer for Entity- oriented Search. Source: The Structured Search Engine by Andrew Hogue “For other people it can be a little more complicated. Like me, for example, John Mueller. If you search for me you’ll find Wikipedia pages, barbecue restaurants, bands, all kinds of people who are called John Mueller. And if, on my site, I don’t specify who I actually am, then it could happen that our systems look at my page and go: “oh this is that guy that runs that barbecue restaurant.” And suddenly I’m associated with a barbecue restaurant, which might be a move up, I don’t know. But these subtle things make it easier for us to recognize who is actually behind something. We call that reconciliation when it comes to structured data, kind of recognizing which of these entities belong together.” John Mueller

Example from a Google Engineer for Entity- oriented Search. Source: The Structured Search Engine by Andrew Hogue

Example from a Google Engineer for Entity-oriented Search. Source: The Structured Search Engine by Andrew Hogue

@KorayGubur Semantic Search Engine @KorayGubur Semantic Role Labeling Named Entity
Resolution Named Entity Extraction Relation Detection Lexical Semantics Taxonomy Ontology Onomastics Important Terms and Concepts for NER and Semantic Search Engine

@KorayGubur Semantic Search Engine @KorayGubur Entity Extraction • Entity extraction
is a complementary step for Named Entity Recognition. • Recognized Entity can be extracted from the text to be stored in a Knowledge Base. • Entity Extraction uses attributes to connect the entity and its meaning, prominence and attributes to each other. • In the sentence of ’46th President of United States (US) had decided to go Paris on Monday, 2th june, 2002.’ • ‘46th President of United States’ is the named entity. • The decision of the president is the attribute with the date contribution which is included in entity extraction.

@KorayGubur Semantic Search Engine @KorayGubur Entity Resolution • Entity Resolution
has two phases. • First phase is finding the mention entity’s correct idendity. • Second phase is finding the correct profile of the mentioned entity. • For instance, Bill Clinton was a U.S President, but also an Actor in Hollywood. An American Football Player can be also a cook, or journalist. • To find the right entity, from the entity reference, Search Engine can use related entities, and their types. • Entity Resolution helps for feeding the text- to-data systems of Search Engines. • If you tell ‘Barry Scwhartz entered to classroom and asked questions to the students’, the Entity Resolution will decide that it is the Professor Barry, not our Barry.

@KorayGubur Semantic Search Engine @KorayGubur Relation Detection • Relation Detection
is the process of understanding the relation type and labels between different entities within a text. • There are different types of relations, such as ‘isSimilarOf’, ‘locatedIn’, ‘superiorOf’, ‘closeTo’, ‘sameAs’. • Some of these relation types are familiar from the Structured Data. • Some of the relation types are unique for specific entities and specific topics. • Relation Detection takes power from the Lexical Semantics. • Relation detection can be used for Visual-to- text algorithms too.

@KorayGubur Semantic Search Engine @KorayGubur Lexical Semantics • Lexical Semantics
should be known by every human-being for thinking and speaking in a healthy way. • Lexical Semantics include semantic meaning connections between different words. • Lexical Semantics are used to understand the relational connections between named entities. • For instance, ‘Boy’ includes ‘single’, ‘teenage’, ‘male’, ‘young’ meanings as default. But, some of these meanings have high possibility, some of them low. • For instance, someone young, male, teenage can be also married. • Lexical Semantics are used to understand the named entity’s resolution and connection with other things. Lexemes: not analyzable unit, by itself. Lexicon: List of lexemes.

@KorayGubur Semantic Search Engine @KorayGubur Semantic Role Labeling • Semantic
Role Labeling is the process of understanding the parts of a sentence by assigning related labels. • Semantic Role Labeling takes power from Lexical Semantics, and Part of Speech Tag. • Semantic Role Labeling helps Relation Detection. • There are more than 32 Semantic Roles. • For Semantic Role Labeling, the most important part is finding the theme, predicate, agent, and effect. • Semantic Role Labeling is beneficial to audit the content’s accuracy, and fact extraction from the prepositions.

@KorayGubur Semantic Search Engine @KorayGubur Taxonomy • Taxos-logos, or Taxonomy
means arrangement of things. • It is used for animal classification first, in Anceint Greek. • In moden era, it is used for all living thing classification in biology, and then it has been used for classification of chemical, or other types of existing things. • In the field of Search Engine Optimization, Semantic Entity Types, and Semantic Dependency Tree is important. • Creationg a hierarchy between entities based on their type and size, prominence or superiority and inferiority is important to increase the contextual relevance, and specifying the relevance of the article. • Every entity type has a different attribute group, and hierarchy can be refreshed. • If the context is size of cities, ‘berlin’, ‘paris’, ‘istanbul’ can have a different taxonomy, in terms of big, small, medium cities. • If the context is countries of these cities, taxonomy can be aligned with country names, and region, continent names.

@KorayGubur Semantic Search Engine @KorayGubur Ontology • Ontology completes the
taxonomy. • Ontos-logos, essence of things. • It is a barnch of philosophy. • Ontology is a reflex for all human-beings. • Ontology can be created based on mutual points of different entities. • According to the mutual attribute between entities, the taxonomy can change, and ontology can follow it also. • If three named entities are from same region, region name is the mutual attribute, and it can have other types of connections based on this.

@KorayGubur Semantic Search Engine @KorayGubur Onomastics • Onomastics is the
science of naming, and analyzing the name patterns for different languages. • Every enttiy type has a different naming pattern. • Name patterns are used to recognize entities, entity types, and attributes of entities. • It comes from onoma and stikos, means names of things. • Different science names, city names, event names, situation names, or instituion names can have naming patterns. • Some onomastics sub-type examples, 1. helonyms: proper names of swamps, marshes and bogs. 2. limnonyms: proper names of lakes and ponds. 3. oceanonyms: proper names of oceans. 4. pelagonyms: proper names of seas and maritime bays. 5. potamonyms: proper names of rivers and streams. • Onomastics can be used for taxonomy and ontology creation too. Even a water can have multiple naming patterns based on sub-types.

@KorayGubur Semantic Search Engine @KorayGubur BERT - SMITH MuM LaMDA
Conversational Search Important Announcements for Structured Search Engine

@KorayGubur Semantic Search Engine @KorayGubur BERT - SMITH Uses, Masked
Language Model. It masks 15% of every tokens for prediction model. Used, Bidrectional Language Understanding. It reads all sentence at once from both direction. It predicts the next sentence. Used bigger tokens than 512 with SMITH. Used fine-tuning based representation model.

@KorayGubur Semantic Search Engine @KorayGubur MuM The research papers have
been taken in 2021 March. In 2021 May, they announced MuM. In 2021 June, they announced that they started to use MuM. All system is related to the understand ‘Related Search Activity’ to predict the future queries.

@KorayGubur Semantic Search Engine @KorayGubur MuM If you search for
trekking to a mountain, there are three possible different contexts: Trekking Mountain And, Specific Mountain Trekking

@KorayGubur Semantic Search Engine @KorayGubur LaMDA LaMDA is for connecting
a question to another with Human Sensible Way. Specifity Factuality Interestingness Sensibleness LaMDA is a part of Conversational AI.

@KorayGubur Semantic Search Engine @KorayGubur Conversational Search Conversational Search is
close to Conversational AI. It connects different entities, concepts, intents to each other. Creates new Contextual Domains, and Co-occurence Matrixes. Conversational Search Announcement includes only the past queries. MuM, and LaMDA includes future queries.

@KorayGubur Semantic Search Engine @KorayGubur Important Language Models for Near
Future in the context of Semantic Search Engine ReALM KeALM

@KorayGubur Semantic Search Engine @KorayGubur ReALM Retrieval Augmented Language Model
Based on Entity Dependency Tree, missed attributes and facts can be extracted. Source: https://ai.googleblog.com/2020/08/realm-integrating-retrieval-into.html

@KorayGubur Semantic Search Engine @KorayGubur ReALM Inventors: Kenton Chiu Tsun
Lee, Kelvin Gu, Zora Tung, Panupong Pasupat, and Ming-Wei Chang Assignee: Google LLC US Patent: 11,003,865 Granted: May 11, 2021 Filed: May 20, 2020 First a Research Paper, Then, a Patent. Lastly, an Update with Official Statement, Or Non-Official Statement.

@KorayGubur Semantic Search Engine @KorayGubur KeALM Knowledge Graph Integrated Language
Model for Fact and Accuracy Checking. Source: https://ai.googleblog.com/2021/05/kelm-integrating- knowledge-graphs-with.html Data to text Triple Example

@KorayGubur Semantic Search Engine @KorayGubur Encazip.com. Holistic SEO Case Study
based on Semantic SEO. Used Entity-oriented Search. From daily 150 clicks to 6.000 clicks.

@KorayGubur Semantic Search Engine @KorayGubur An Education Brand 11.000 queries
and 30.000 monthly clicks within 25 days

@KorayGubur Semantic Search Engine @KorayGubur An unpublished case study. 422.000
queries, 220.000 clicks in 66 days. It is also a Technical SEO Case Study. Indexed 73.000 pages in 66 days.

@KorayGubur Semantic Search Engine @KorayGubur 15.000 New Queries. 35.000 monthly
traffic. In 3 months. Used Semantic SEO

@KorayGubur @KorayGubur ‘Without understanding the Query Processing in the eyes
of Search Engine, you can’t create the relevant, and satisfying document based on minor and dominant search activity types.’ Thank You

Semantic Search Engine: Semantic Search and Que...

Semantic Search Engine: Semantic Search and Query Parsing with Phrases and Entities

More Decks by Koray Tuğberk GÜBÜR

Other Decks in Marketing & SEO

Featured

Transcript