Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Web Search and SEO - Lecture 10 - Web Technologies (1019888BNR)

Beat Signer
November 27, 2023

Web Search and SEO - Lecture 10 - Web Technologies (1019888BNR)

This lecture forms part of the course Web Technologies given at the Vrije Universiteit Brussel.

Beat Signer

November 27, 2023
Tweet

More Decks by Beat Signer

Other Decks in Education

Transcript

  1. 2 December 2005
    Web Technologies
    Web Search and SEO
    Prof. Beat Signer
    Department of Computer Science
    Vrije Universiteit Brussel
    beatsigner.com

    View full-size slide

  2. Beat Signer - Department of Computer Science - [email protected] 2
    November 28, 2023
    Search Engine Result Pages (SERP)

    View full-size slide

  3. Beat Signer - Department of Computer Science - [email protected] 3
    November 28, 2023
    Search Engine Result Pages (SERP) …

    View full-size slide

  4. Beat Signer - Department of Computer Science - [email protected] 4
    November 28, 2023
    Vertical Search Result Pages

    View full-size slide

  5. Beat Signer - Department of Computer Science - [email protected] 5
    November 28, 2023
    Search Engine Result Page
    ▪ There is a variety of information shown on a search
    engine result page (SERP)
    ▪ organic search results
    ▪ non-organic search results
    ▪ meta-information about the result (e.g. number of result pages)
    ▪ vertical navigation
    ▪ advanced search options
    ▪ query refinement suggestions
    ▪ ...

    View full-size slide

  6. Beat Signer - Department of Computer Science - [email protected] 6
    November 28, 2023
    Global Search Engine Market Share (2020)
    [https://alphametic.com/global-search-engine-market-share]

    View full-size slide

  7. Beat Signer - Department of Computer Science - [email protected] 7
    November 28, 2023
    Search Engine History
    ▪ Early "search engines" include various systems
    starting with Bush's Memex
    ▪ Archie (1990)
    ▪ first Internet search engine
    ▪ indexing of files on FTP servers
    ▪ W3Catalog (September 1993)
    ▪ first "web search engine"
    ▪ mirroring and integration of manually maintained catalogues
    ▪ JumpStation (December 1993)
    ▪ first web search engine combining crawling, indexing and
    searching

    View full-size slide

  8. Beat Signer - Department of Computer Science - [email protected] 8
    November 28, 2023
    Search Engine History ...
    ▪ In the following two years (1994 / 1995) many
    new search engines appeared
    ▪ AltaVista, Infoseek, Excite, Inktomi, Yahoo!, ...
    ▪ Two categories of early Web search solutions
    ▪ full-text search
    - based on an index that is automatically created by a web crawler in
    combination with an indexer
    - e.g. AltaVista or InfoSeek
    ▪ manually maintained classification (hierarchy) of webpages
    - significant human editing effort
    - e.g. Yahoo (until 2014)

    View full-size slide

  9. Beat Signer - Department of Computer Science - [email protected] 9
    November 28, 2023
    Information Retrieval
    ▪ Precision and recall can be used to measure the
    performance of different information retrieval algorithms
       
     
    documents
    retrieved
    documents
    retrieved
    documents
    relevant
    precision

    =
       
     
    documents
    relevant
    documents
    retrieved
    documents
    relevant
    recall

    =
    D
    1
    D
    2
    D
    4
    D
    6
    D
    7
    D
    10
    D
    3
    D
    5
    D
    8
    D
    9
    D
    1
    D
    3
    D
    8
    D
    9
    D
    10
    query
    6
    .
    0
    5
    3
    precision =
    =
    75
    .
    0
    4
    3
    recall =
    =

    View full-size slide

  10. Beat Signer - Department of Computer Science - [email protected] 10
    November 28, 2023
    Information Retrieval ...
    ▪ Often a combination of precision and recall, the so-called
    F-score (harmonic mean) is used as a single measure
    D
    1
    D
    2
    D
    4
    D
    6
    D
    7
    D
    10
    D
    3
    D
    5
    D
    8
    D
    9
    D
    1
    D
    3
    D
    8
    D
    9
    D
    10
    query
    57
    .
    0
    precision =
    1
    recall =
    recall
    precision
    recall
    precision
    2
    score
    -
    F
    +


    =
    D
    1
    D
    2
    D
    4
    D
    6
    D
    7
    D
    10
    D
    3
    D
    5
    D
    8
    D
    9
    D
    1
    D
    3
    D
    8
    D
    9
    D
    10
    query
    6
    .
    0
    precision =
    75
    .
    0
    recall =
    67
    .
    0
    score
    -
    F =
    D
    5
    D
    2
    73
    .
    0
    score
    -
    F =

    View full-size slide

  11. Beat Signer - Department of Computer Science - [email protected] 11
    November 28, 2023
    Bank
    Delhaize
    Ghent
    Metro
    Shopping
    Train
    D1
    D2
    D3
    D4
    D5
    D6
    1
    Boolean Model
    ▪ Based on set theory and boolean logic
    ▪ Exact matching of documents to a user query
    ▪ Uses the boolean AND, OR and NOT operators
    ▪ query: Shopping AND Ghent AND NOT Delhaize
    ▪ computation: 101110 AND 100111 AND 000111 = 000110
    ▪ result: document set {D4
    ,D5
    }
    1 0 0 1 1
    1
    1
    0
    1
    1
    1
    0
    0
    1
    0
    0
    1
    1
    1
    0
    0
    1
    0
    1
    1
    0
    1
    0
    1
    0
    0
    1
    0
    0
    0
    ... ... ... ... ... ... ...
    inverted index

    View full-size slide

  12. Beat Signer - Department of Computer Science - [email protected] 12
    November 28, 2023
    Boolean Model ...
    ▪ Advantages
    ▪ relatively easy to implement and scalable
    ▪ fast query processing based on parallel scanning of indexes
    ▪ Disadvantages
    ▪ no ranking of output
    ▪ often the user has to learn a special syntax such as the use of
    double quotes to search for phrases
    ▪ Variants of the boolean model form the basis of many
    search engines
    ▪ inverted index

    View full-size slide

  13. Beat Signer - Department of Computer Science - [email protected] 13
    November 28, 2023
    Web Search Engines
    ▪ Most web search engines are based on traditional
    information retrieval techniques, but they must be
    adapted to deal with the characteristics of the Web
    ▪ immense amount of web resources (>150 billion web pages)
    ▪ hyperlinked resources
    ▪ dynamic content with frequent updates
    ▪ self-organised web resources
    ▪ Evaluation of performance
    ▪ no standard collections
    ▪ often based on user studies (satisfaction)
    ▪ Of course, not only the precision and recall but also the
    query answer time is an important issue

    View full-size slide

  14. Beat Signer - Department of Computer Science - [email protected] 14
    November 28, 2023
    Web Search Engine Architecture
    WWW Crawler
    URL Pool
    Storage
    Manager
    Page
    Repository
    content already added?
    Document
    Index
    Special
    Indexes
    Indexers
    URL Handler
    URL
    Repository
    filter
    normalisation
    and duplicate
    elimination
    Client
    Query
    Handler
    inverted index
    Ranking

    View full-size slide

  15. Beat Signer - Department of Computer Science - [email protected] 15
    November 28, 2023
    Web Crawler
    ▪ A web crawler or spider is used to create an
    index of webpages to be used by a web search engine
    ▪ any web search is then based on this index
    ▪ Web crawler has to deal with the following issues
    ▪ freshness
    - the index should be updated regularly (based on web page update frequency)
    ▪ quality
    - since not all web pages can be indexed, the crawler should give priority to
    "high quality" pages
    ▪ scalability
    - it should be possible to increase the crawl rate by just adding additional
    servers (modular architecture)
    - e.g. the estimated number of Google servers in 2016 was 2.5 million (including
    not only the crawler but the entire Google platform)

    View full-size slide

  16. Beat Signer - Department of Computer Science - [email protected] 16
    November 28, 2023
    Web Crawler ...
    ▪ distribution
    - the crawler should be able to run in a distributed manner (computer
    centres all over the world)
    ▪ robustness
    - the Web contains a lot of pages with errors and a crawler must deal with
    these problems
    - e.g. deal with a web server that creates an unlimited number of "virtual web
    pages" (crawler trap)
    ▪ efficiency
    - resources (e.g. network bandwidth) should be used in the most efficient way
    ▪ crawl rates
    - the crawler should pay attention to existing web server policies
    (e.g. revisit-after HTML meta tag or robots.txt file)
    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /tmp/ robots.txt

    View full-size slide

  17. Beat Signer - Department of Computer Science - [email protected] 17
    November 28, 2023
    Pre-1998 Web Search
    ▪ Find all documents for a given query term
    ▪ use information retrieval (IR) solutions
    - boolean model
    - vector space model
    - ...
    ▪ ranking based on "on-page factors"
    → problem: poor quality of search results (order)
    ▪ Larry Page and Sergey Brin proposed to compute the
    absolute quality of a page called PageRank
    ▪ based on the number and quality of pages linking
    to a page (votes)
    ▪ query-independent

    View full-size slide

  18. Beat Signer - Department of Computer Science - [email protected] 18
    November 28, 2023
    Origins of PageRank
    ▪ Developed as part of an
    academic project at Stanford
    University
    ▪ research platform to aid under-
    standing of large-scale web data
    and enable researchers to easily
    experiment with new search
    technologies
    ▪ Larry Page and Sergey Brin worked on the project about a new
    kind of search engine (1995-1998) which finally led to a functional
    prototype called Google
    Larry Page Sergey Brin

    View full-size slide

  19. Beat Signer - Department of Computer Science - [email protected] 19
    November 28, 2023
    PageRank
    ▪ A page Pi
    has a high PageRank Ri
    if
    ▪ there are many pages linking to it
    ▪ or, if there are some pages with a high PageRank linking to it
    ▪ Total score = IR score × PageRank
    P1
    R1
    P2
    R2
    P3
    R3
    P4
    R4
    P5
    R5
    P6
    R6
    P7
    R7
    P8
    R8

    View full-size slide

  20. Beat Signer - Department of Computer Science - [email protected] 20
    November 28, 2023
    Basic PageRank Algorithm
    ▪ where
    ▪ Bi
    is the set of pages
    that link to page Pi
    ▪ Lj
    is the number of
    outgoing links for page Pj


    =
    i
    j
    B
    P j
    j
    i L
    P
    R
    P
    R
    )
    (
    )
    ( P1
    P2
    P3
    P1
    1
    P2
    1
    P3
    1
    P1
    1.5
    P2
    1.5
    P3
    0.75
    P1
    1.5
    P2
    1.5
    P3
    0.75

    View full-size slide

  21. Beat Signer - Department of Computer Science - [email protected] 21
    November 28, 2023
    Matrix Representation
    ▪ Let us define a hyperlink
    matrix H
    P1
    P2
    P3


     
    =
    otherwise
    0
    if
    1
    i
    j
    j
    ij
    B
    P
    L
    H










    =
    0
    2
    1
    0
    0
    0
    1
    1
    2
    1
    0
    H
    ( )
     
    i
    P
    R
    =
    R
    and
    HR
    R =
    R is an eigenvector of H
    with eigenvalue 1

    View full-size slide

  22. Beat Signer - Department of Computer Science - [email protected] 22
    November 28, 2023
    Matrix Representation ...
    ▪ We can use the power method to find R
    ▪ sparse matrix H with 150 billion columns and rows but only an
    average of 10 non-zero entries in each column
    t
    t HR
    R =
    +1










    =
    0
    2
    1
    0
    0
    0
    1
    1
    2
    1
    0
    H
    For our example
    this results in or
     
    1
    2
    2
    =
    R  
    2
    .
    0
    4
    .
    0
    4
    .
    0

    View full-size slide

  23. Beat Signer - Department of Computer Science - [email protected] 23
    November 28, 2023
    Dangling Pages (Rank Sink)
    ▪ Problem with pages that
    have no outgoing links (e.g. P2
    )
    ▪ Stochastic adjustment
    ▪ if page Pj
    has no outgoing links then replace column j with 1/Lj
    ▪ New stochastic matrix S always has a stationary vector R
    ▪ can also be interpreted as a Markov chain
    P1
    P2






    =
    0
    1
    0
    0
    H and  
    0
    0
    =
    R






    =
    2
    1
    0
    2
    1
    0
    C 





    =
    +
    =
    2
    1
    1
    2
    1
    0
    C
    H
    S
    and
    C
    C

    View full-size slide

  24. Beat Signer - Department of Computer Science - [email protected] 24
    November 28, 2023
    Strongly Connected Pages (Graph)
    ▪ Add new transition proba-
    bilities between all pages
    ▪ with probability d we follow
    the hyperlink structure S
    ▪ with probability 1-d we
    choose a random page
    ▪ matrix G becomes irreducible
    ▪ Google matrix G reflects
    a random surfer
    ▪ no modelling of back button
    P1
    P2
    P3
    P4
    P5
    ( ) 1
    S
    G
    n
    d
    d
    1
    1 −
    +
    = GR
    R =
    1-d
    1-d 1-d

    View full-size slide

  25. Beat Signer - Department of Computer Science - [email protected] 25
    November 28, 2023
    Examples ( ) 1
    S
    G
    n
    d
    d
    1
    1 −
    +
    =
    A1
    0.26
    A2
    0.37
    A3
    0.37

    View full-size slide

  26. Beat Signer - Department of Computer Science - [email protected] 26
    November 28, 2023
    Examples ...
    A1
    0.13
    A2
    0.185
    A3
    0.185
    B1
    0.13
    B2
    0.185
    B3
    0.185
    ( ) 5
    .
    0
    =
    A
    P ( ) 5
    .
    0
    =
    B
    P
    ( ) 1
    S
    G
    n
    d
    d
    1
    1 −
    +
    =

    View full-size slide

  27. Beat Signer - Department of Computer Science - [email protected] 27
    November 28, 2023
    Examples
    ▪ PageRank leakage
    A1
    0.10
    A2
    0.14
    A3
    0.14
    B1
    0.22
    B2
    0.20
    B3
    0.20
    ( ) 38
    .
    0
    =
    A
    P ( ) 62
    .
    0
    =
    B
    P
    ( ) 1
    S
    G
    n
    d
    d
    1
    1 −
    +
    =

    View full-size slide

  28. Beat Signer - Department of Computer Science - [email protected] 28
    November 28, 2023
    Examples ...
    A1
    0.3
    A2
    0.23
    A3
    0.18
    B1
    0.10
    B2
    0.095
    B3
    0.095
    ( ) 71
    .
    0
    =
    A
    P ( ) 29
    .
    0
    =
    B
    P
    ( ) 1
    S
    G
    n
    d
    d
    1
    1 −
    +
    =

    View full-size slide

  29. Beat Signer - Department of Computer Science - [email protected] 29
    November 28, 2023
    Examples
    ▪ PageRank feedback
    A1
    0.35
    A2
    0.24
    A3
    0.18
    B1
    0.09
    B2
    0.07
    B3
    0.07
    ( ) 77
    .
    0
    =
    A
    P ( ) 23
    .
    0
    =
    B
    P
    ( ) 1
    S
    G
    n
    d
    d
    1
    1 −
    +
    =

    View full-size slide

  30. Beat Signer - Department of Computer Science - [email protected] 30
    November 28, 2023
    Examples ...
    A1
    0.33
    A2
    0.17
    A3
    0.175
    B1
    0.08
    B2
    0.06
    B3
    0.06
    ( ) 80
    .
    0
    =
    A
    P
    ( ) 20
    .
    0
    =
    B
    P
    A4
    0.125
    ( ) 1
    S
    G
    n
    d
    d
    1
    1 −
    +
    =

    View full-size slide

  31. Beat Signer - Department of Computer Science - [email protected] 31
    November 28, 2023
    Google Search Central
    ▪ Various services and infor-
    mation about a website
    ▪ Site configuration
    ▪ submission of sitemap
    ▪ crawler access
    ▪ URLs of indexed pages
    ▪ Performance
    ▪ search queries
    ▪ countries
    ▪ devices
    ▪ …

    View full-size slide

  32. Beat Signer - Department of Computer Science - [email protected] 32
    November 28, 2023
    Google Search Central …
    ▪ Enhancements
    ▪ core web vitals (speed)
    - mobile as well as desktop
    ▪ mobile usability
    ▪ Security issues
    ▪ Similar tools offered by other search engines
    ▪ e.g. Bing Webmaster Tools

    View full-size slide

  33. Beat Signer - Department of Computer Science - [email protected] 33
    November 28, 2023
    XML Sitemaps
    ▪ List of URLs that should be crawled and indexed



    https://beatsigner.com/
    2023-11-28
    weekly
    1.0


    https://beatsigner.com/publications.html
    2023-11-25
    weekly
    0.9

    ...

    View full-size slide

  34. Beat Signer - Department of Computer Science - [email protected] 34
    November 28, 2023
    XML Sitemaps ...
    ▪ All major search engines support the sitemap format
    ▪ The URLs of a sitemap are not guaranteed to be added
    to a search engine's index
    ▪ helps search engine to find pages that are not yet indexed
    ▪ Additional metadata might be provided to search engines
    ▪ relative page relevance (priority)
    ▪ date of last modification (lastmod)
    ▪ update frequency (changefreq)

    View full-size slide

  35. Beat Signer - Department of Computer Science - [email protected] 35
    November 28, 2023
    Questions
    ▪ Is PageRank fair?
    ▪ What about Google's power and influence?
    ▪ What about Web 2.0 or Web 3.0 and web search?
    ▪ "non-existent" webpages such as offered by Rich Internet
    Applications (e.g. using AJAX) may bring problems for traditional
    search engines (hidden web)
    ▪ new forms of social search
    - social bookmarking
    - ...
    ▪ social marketing

    View full-size slide

  36. Beat Signer - Department of Computer Science - [email protected] 36
    November 28, 2023
    The Google Effect
    ▪ A recent study by Sparrow et al. shows that
    people less likely remember things that they
    believe to be accessible online
    ▪ Internet as a transactive memory
    ▪ Does our memory work differently in the age of Google?
    ▪ What implications will the future of the Internet and new
    search have?

    View full-size slide

  37. Beat Signer - Department of Computer Science - [email protected] 37
    November 28, 2023
    Search Engine Marketing (SEM)
    ▪ For many companies Internet marketing
    has become a big business
    ▪ Search engine marketing (SEM) aims to
    increase the visibility of a website
    ▪ search engine optimisation (SEO)
    ▪ paid search advertising (non-organic search)
    ▪ social media marketing
    ▪ SEO should not be decoupled from a website's
    content, structure, design and used technologies
    ▪ SEO has to be seen as a continuous process in a rapidly
    changing environment
    ▪ different search engines with regular changes in ranking

    View full-size slide

  38. Beat Signer - Department of Computer Science - [email protected] 38
    November 28, 2023
    Structural Choices
    ▪ Keep the website structure as flat a possible
    ▪ minimise link depth
    ▪ avoid pages with much more than 100 links
    ▪ Think about your website's internal link structure
    ▪ which pages are directly linked from the homepage?
    ▪ create many internal links for important pages
    ▪ be "careful" about where to put outgoing links
    - PageRank leakage
    ▪ use keyword-rich anchor texts
    ▪ dynamically create links between related content
    - e.g. "customer who bought this also bought ..." or "visitors who viewed this
    also viewed ..."
    ▪ Increase the number of pages

    View full-size slide

  39. Beat Signer - Department of Computer Science - [email protected] 39
    November 28, 2023
    Technological Choices
    ▪ Use SEO-friendly content management system (CMS)
    ▪ Dynamic URLs vs. static URLs
    ▪ avoid session IDs and parameters in URL
    ▪ use URL rewriting to get descriptive URLs containing keywords
    ▪ Think carefully about the use of dynamic content
    ▪ Rich Internet Applications (RIAs) based on AJAX etc.
    ▪ content hidden behind pull-down menus etc.
    ▪ Address webpages consistently
    ▪ https://www.vub.ac.be  https://www.vub.ac.be/index.php

    View full-size slide

  40. Beat Signer - Department of Computer Science - [email protected] 40
    November 28, 2023
    Search Engine Optimisations
    ▪ Different things can be optimised
    ▪ on-page factors
    ▪ off-page factors
    ▪ It is assumed that some search engines use more than
    200 on-page and off-page factors for their ranking
    ▪ Difference between optimisation and breaking the
    "search engine rules"
    ▪ white hat and black hat optimisations
    ▪ A bad ranking or removal from index can cost a company
    a lot of money or even mark the end of the company
    ▪ e.g. supplemental index ("Google hell")

    View full-size slide

  41. Beat Signer - Department of Computer Science - [email protected] 41
    November 28, 2023
    Positive On-Page Factors
    ▪ Use of keywords at relevant places
    ▪ in title tag (preferably one of the first words)
    ▪ in URL and domain name
    ▪ in header tags (e.g. ) and multiple times in body text
    ▪ Mobile usability
    ▪ mobile-first indexing by Google since 2016
    ▪ Fast page load times
    ▪ mobile as well as desktop
    ▪ Provide metadata
    ▪ e.g. also used by search engines to
    create the text snippets on the SERPs

    View full-size slide

  42. Beat Signer - Department of Computer Science - [email protected] 42
    November 28, 2023
    Positive On-Page Factors
    ▪ Quality of HTML code
    ▪ Security and accessibility
    ▪ Uniqueness of content across the website
    ▪ …

    View full-size slide

  43. Beat Signer - Department of Computer Science - [email protected] 43
    November 28, 2023
    Negative On-Page Factors
    ▪ Links to "bad neighbourhood"
    ▪ Link selling
    ▪ in 2007 Google announced a campaign against
    paid links that transfer PageRank
    ▪ Over optimisation penalty (keyword stuffing)
    ▪ Text with same colour as background (hidden content)
    ▪ Automatic redirect via the refresh meta tag
    ▪ Cloaking
    ▪ different pages for spider and user
    ▪ Malware being hosted on the page

    View full-size slide

  44. Beat Signer - Department of Computer Science - [email protected] 44
    November 28, 2023
    Negative On-Page Factors ...
    ▪ Duplicate or similar content
    ▪ Duplicate page titles or meta tags
    ▪ Slow page load time
    ▪ Any copyright violations
    ▪ ...

    View full-size slide

  45. Beat Signer - Department of Computer Science - [email protected] 45
    November 28, 2023
    Positive Off-Page Factors
    ▪ Links from pages with a high PageRank
    ▪ Keywords in anchor text of inbound links
    ▪ Links from topically relevant sites
    ▪ High clickthrough rate (CTR) from search engine for a
    given keyword
    ▪ High number of shares on social media (social signals)
    ▪ e.g. Facebook or Twitter
    ▪ Site age (stability)
    ▪ Domain expiration date
    ▪ …

    View full-size slide

  46. Beat Signer - Department of Computer Science - [email protected] 46
    November 28, 2023
    Negative Off-Page Factors
    ▪ Site often not accessible to crawlers
    ▪ e.g. server problem
    ▪ High bounce rate
    ▪ users immediately press the back button
    ▪ Link buying
    ▪ rapidly increasing number of inbound links
    ▪ Use of link farms
    ▪ Participation in link sharing programmes
    ▪ Links from bad neighbourhood?
    ▪ Competitor attack (e.g. via duplicate content)?

    View full-size slide

  47. Beat Signer - Department of Computer Science - [email protected] 47
    November 28, 2023
    Black Hat Optimisations (Don'ts)
    ▪ Link farms
    ▪ Spamdexing in guestbooks, Wikipedia etc.
    ▪ "solution": ...
    ▪ Keyword Stuffing
    ▪ overuse of keywords
    - content keyword stuffing
    - image keyword stuffing
    - keywords in meta tags
    - invisible text with keywords
    ▪ Selling/buying links
    ▪ "big" business until 2007
    ▪ costs based on the PageRank of the linking site

    View full-size slide

  48. Beat Signer - Department of Computer Science - [email protected] 48
    November 28, 2023
    Black Hat Optimisations (Don'ts) ...
    ▪ Doorway pages (cloaking)
    ▪ doorway pages are normally just designed for search engines
    - user is automatically redirected to the target page
    ▪ e.g. BMW Germany and Ricoh Germany banned
    in February 2006

    View full-size slide

  49. Beat Signer - Department of Computer Science - [email protected] 49
    November 28, 2023
    Nofollow Link Example
    ▪ nofollow value for hyperlinks introduced by Google in
    2005 to avoid spamdexing
    ▪ ...
    ▪ Links with a nofollow value were not counted in the
    PageRank computation
    ▪ division by number of outgoing links
    ▪ e.g. page with 9 outgoing links and 3 of them are nofollow links
    - PageRank divided by 6 and distributed across the 6 "really linked pages"
    ▪ SEO experts started to use (misuse) the nofollow links
    for PageRank sculpting
    ▪ control flow of PageRank within a website

    View full-size slide

  50. Beat Signer - Department of Computer Science - [email protected] 50
    November 28, 2023
    Nofollow Link Example ...
    ▪ In June 2009 Google decided to treat nofollow links
    differently to avoid PageRank sculpting
    ▪ division by total number of outgoing links
    ▪ e.g. page with 9 outgoing links and 3 of them are nofollow links
    - PageRank divided by 9 and distributed across the 6 "really linked pages"
    ▪ no longer a good solution to prevent Spamdexing since we loose
    (diffuse) some PageRank
    ▪ SEO experts start to use alternative techniques to
    replace nofollow links
    ▪ e.g. obfuscated JavaScript links

    View full-size slide

  51. Beat Signer - Department of Computer Science - [email protected] 51
    November 28, 2023
    Non-Organic Search
    ▪ In addition to the so-called organic search, websites can
    also participate in non-organic web search
    ▪ cost per impression (CPI)
    ▪ cost-per-click (CPC)
    ▪ The non-organic web search should not be treated
    independently from the organic web search
    ▪ Quality of the landing page can have an impact on the
    non-organic web search performance!
    ▪ The Google Ads programme is an example of a
    commercial non-organic web search service
    ▪ other services include Yahoo! Advertising Solutions,
    Facebook Ads, ...

    View full-size slide

  52. Beat Signer - Department of Computer Science - [email protected] 52
    November 28, 2023
    Google Ads and Google AdSense
    ▪ pay-per-click (PPC) or
    cost-per-thousand (CPM)
    ▪ Campaigns and ad groups
    ▪ Two types of advertising
    ▪ search
    ▪ content network
    - Google AdSense
    ▪ Highly customisable ads
    ▪ region
    ▪ language
    ▪ daytime
    ▪ ...

    View full-size slide

  53. Beat Signer - Department of Computer Science - [email protected] 53
    November 28, 2023
    Google Ads ...
    ▪ Excellent control and monitoring for Ads users
    ▪ cost per conversion
    ▪ Google advertising revenues
    ▪ 2022: USD 224.47 billion (total revenues USD 279.8 billion)

    View full-size slide

  54. Beat Signer - Department of Computer Science - [email protected] 54
    November 28, 2023
    Conclusions
    ▪ Web information retrieval techniques have to deal with
    the specific characteristics of the Web
    ▪ PageRank algorithm
    ▪ absolute quality of a page based on incoming links
    ▪ based on random surfer model
    ▪ computed as eigenvector of Google matrix G
    ▪ PageRank is just one factor
    ▪ Various implications for website development and SEO

    View full-size slide

  55. Beat Signer - Department of Computer Science - [email protected] 55
    November 28, 2023
    Exercise 10
    ▪ PageRank and Security

    View full-size slide

  56. Beat Signer - Department of Computer Science - [email protected] 56
    November 28, 2023
    References
    ▪ L. Page, S. Brin, R. Motwani and T. Winograd,
    The PageRank Citation Ranking: Bringing Order
    to the Web, January 1998
    ▪ S. Brin and L. Page, The Anatomy of a Large-Scale
    Hypertextual Web Search Engine, Computer Networks
    and ISDN Systems, 30(1-7), April 1998
    ▪ https://research.google/pubs/pub334.pdf
    ▪ Amy N. Langville and Carl D. Meyer, Google's PageRank
    and Beyond: The Science of Search Engine Rankings,
    Princeton University Press, July 2006

    View full-size slide

  57. Beat Signer - Department of Computer Science - [email protected] 57
    November 28, 2023
    References …
    ▪ B. Sparrow, J. Liu and D.M. Wegner, Google
    Effects on Memory: Cognitive Consequences of Having
    Information at Our Fingertips, Science, July 2011
    ▪ https://doi.org/10.1126/science.1207745
    ▪ Google Search Central
    ▪ https://developers.google.com/search
    ▪ The W3C Markup Validation Service
    ▪ https://validator.w3.org
    ▪ SEO Book
    ▪ https://www.seobook.com

    View full-size slide

  58. 2 December 2005
    Next Lecture
    Security, Privacy and Trust

    View full-size slide