Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scale, Structure, and Semantics

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Scale, Structure, and Semantics

This Semantic Technology & Business Conference (SemTechBiz) keynote argues that knowledge representation is overrated for AI systems and computation is underrated. It discusses past attempts at knowledge representation like Cyc and Freebase, and how today's data-driven approaches using large datasets have proven more effective than rule-based systems for tasks like machine translation and question answering. It advocates for semi-structured data and data-driven recommendations and queries to empower users and fill gaps in systems' knowledge. It concludes that communication is both the problem and solution, and systems should leverage users as intelligent partners rather than relying solely on perfect schemas or vocabularies.

Avatar for Daniel Tunkelang

Daniel Tunkelang

May 25, 2026

More Decks by Daniel Tunkelang

Other Decks in Technology

Transcript

  1. The Bad News 1. Knowledge representation is overrated. 1. Computation

    is underrated. 2. We have a communication problem. 4
  2. Knowledge representation is overrated. Today’s knowledge repositories are:  incomplete

     inconsistent  inscrutable  and not sustained by economic incentives. 1986 estimate of effort to complete Cyc:  250,000 rules + 350 person-years 10
  3. The Good News 1. Knowledge representation is overrated. 1. Computation

    is underrated. 2. We have a communication problem. 11
  4. The Unreasonable Effectiveness of Data  simple models + lots

    of data >> elaborate models + less data  machine translation: parallel corpora >> elaborate rules for syntactic and semantic patterns  semantic web formalism just means semantic interpretation on shorter strings between angle brackets Alon Halevy, Peter Norvig, and Fernando Pereira (2009) 15
  5. Today’s Challenge 1. Knowledge representation is overrated. 1. Computation is

    underrated. 2. We have a communication problem. 16
  6. Semi-structured Data at LinkedIn <person> <id> <first-name /> <last-name />

    <location> <name> <country> <code> </country> </location> <industry> … </person> Summary I lead a data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn’s members. Prior to LinkedIn, I led a local search quality team at Google and was a founding employee of faceted search pioneer Endeca (acquired by Oracle in 2010), where…
  7. Another Example: Helping a Friend Dear Daniel, I'm attaching the

    resume of an old friend who just moved up to the Bay Area. He has a very strong background in:  mobile / wireless applications  start-ups and new product launches  international expansion Best regards, XXX 20
  8. Data-Driven Computation Serves Communication 24 for i in [1..n] s

    w1 w2 … wi if Pc (s) > 0 a new Segment() a.segs {s} a.prob Pc (s) B[i] {a} for j in [1..i-1] for b in B[j] s wj wj+1 … wi if Pc (s) > 0 a new Segment() a.segs b.segs U {s} a.prob b.prob * Pc (s) B[i] B[i] U {a} sort B[i] by prob truncate B[i] to size k
  9. Recommendations Leverage Semi-structured Data 25 Corpus Stats Job User Base

    Filtered title geo company industry description functional area … Candidate General expertise specialties education headline geo experience Current Position title summary tenure length industry functional area … Similarity (candidate expertise, job description) 0.56 Similarity (candidate specialties, job description) 0.2 Transition probability (candidate industry, job industry) 0.43 Title Similarity 0.8 Similarity (headline, title) 0.7 . . . Matching Binary Exact matches: geo, industry, … Soft transition probabilities, similarity, … Text Transition probabilities Connectivity yrs of experience to reach title education needed for this title …
  10. There is no perfect schema or vocabulary.  And even

    if there were, not everyone would use it.  Knowledge representation has only succeeded within narrow scope.  Brute force is surprisingly effective but does not leverage the user as an intelligent partner. 29
  11. Communication is the problem and the solution.  Rich communication

    channel fills gaps in system’s knowledge representation and in user’s knowledge.  Use data science to make the system smart, but be humble and empower the human user. You've got the brawn I've got the brains Let's make lots of money Pet Shop Boys, “Opportunities” 30
  12. One More Thing “More data beats clever algorithms but better

    data beats more data.” Monica Rogati @ Strata 2012 32