Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Getting the most out of Google Search Console API with RegEx

Getting the most out of Google Search Console API with RegEx

A talk given for SMX Advanced 2022 covering the Google Search Console API and how to use regex.

Getting the most out of the Google Search Console APIGoogle Search Console is an amazing tool that provides invaluable search data by real users directly from Google. While the charts and tables are friendly to work with, a large part of the data is not accessible from the UI. The only way to get to this hidden data is to use the API and extract all that valuable search data that is available to you, only if you know how.

After this session, you’ll be able to:
» Extract the maximum amount of data from the GSC API
» Utilize the GSC API in your SEO process with Spreadsheets, Data Studio, or python
» Learn how to use regular expressions (regEx) to filter difficult URLs

Eric Wu

May 24, 2022
Tweet

More Decks by Eric Wu

Other Decks in Marketing & SEO

Transcript

  1. @SPEAKERNAME/#SMX
    Getting the most out of the
    Google Search Console API
    with RegEx

    View full-size slide

  2. @SPEAKERNAME/#SMX
    stagnant

    View full-size slide

  3. @SPEAKERNAME/#SMX
    declining

    View full-size slide

  4. @SPEAKERNAME/#SMX
    core update drop
    🤔

    View full-size slide

  5. @SPEAKERNAME/#SMX

    View full-size slide

  6. @SPEAKERNAME/#SMX

    View full-size slide

  7. @SPEAKERNAME/#SMX
    Data Studio
    GSC Analysis in

    View full-size slide

  8. @SPEAKERNAME/#SMX
    https://twitter.com/aleyda/status/1461358112745537545

    View full-size slide

  9. @SPEAKERNAME/#SMX
    https://datastudio.google.com/u/0/reporting/1Fm7x1vc0vLokRhGf0WqaMd52mw7wjaSI/page/6zXD

    View full-size slide

  10. @SPEAKERNAME/#SMX
    https://twitter.com/DataChaz/status/1509198629361303560

    View full-size slide

  11. @SPEAKERNAME/#SMX
    https://twitter.com/HannahRampton/status/1513923100768935939

    View full-size slide

  12. @SPEAKERNAME/#SMX
    https://www.hannahrampton.co.uk/v2-search-console-explorer-studio/

    View full-size slide

  13. @SPEAKERNAME/#SMX
    https://twitter.com/HannahRampton/status/1226166365788233728

    View full-size slide

  14. @SPEAKERNAME/#SMX
    https://www.hannahrampton.co.uk/search-console-explorer-sheet-free-google-sheet/

    View full-size slide

  15. @SPEAKERNAME/#SMX
    🐢

    View full-size slide

  16. @SPEAKERNAME/#SMX
    https://twitter.com/anthonydnelson/status/967059108905078786

    View full-size slide

  17. @SPEAKERNAME/#SMX
    Sampling Problem
    Overcoming the

    View full-size slide

  18. @SPEAKERNAME/#SMX
    https://twitter.com/noahlearner/status/1339621958665682944

    View full-size slide

  19. @SPEAKERNAME/#SMX
    https://twitter.com/Similar_ai/status/1511008285951991811

    View full-size slide

  20. @SPEAKERNAME/#SMX
    https://twooctobers.com/two-octobers-explorer-for-search/

    View full-size slide

  21. @SPEAKERNAME/#SMX
    https://twooctobers.com/two-octobers-explorer-for-search/

    View full-size slide

  22. @SPEAKERNAME/#SMX
    https://similar.ai/blog/closing-google-search-console-sampling-gap/
    • Adding 10 well-chosen
    sub-directories as GSC
    profiles can close the
    gap to almost 75%
    • The gap saturates
    towards the end
    because of longer tail
    sub-directories

    View full-size slide

  23. @SPEAKERNAME/#SMX
    Regular Expressions
    Getting more with

    View full-size slide

  24. @SPEAKERNAME/#SMX
    https://twitter.com/jackson_lo/status/1352997899806912513

    View full-size slide

  25. @SPEAKERNAME/#SMX
    https://twitter.com/googlesearchc/status/1379775388193320962

    View full-size slide

  26. @SPEAKERNAME/#SMX

    View full-size slide

  27. @SPEAKERNAME/#SMX
    https://xkcd.com/208/

    View full-size slide

  28. @SPEAKERNAME/#SMX
    https://xkcd.com/208/

    View full-size slide

  29. @SPEAKERNAME/#SMX
    RE2!
    https://xkcd.com/208/

    View full-size slide

  30. @SPEAKERNAME/#SMX
    https://www.reddit.com/r/ProgrammerHumor/comments/tdtdfn/id_like_you_to_meet_regex/
    🥲

    View full-size slide

  31. @SPEAKERNAME/#SMX
    https://www.reddit.com/r/ProgrammerHumor/comments/tdtdfn/id_like_you_to_meet_regex/
    🤪

    View full-size slide

  32. @SPEAKERNAME/#SMX
    https://github.com/google/re2/wiki/Syntax

    View full-size slide

  33. @SPEAKERNAME/#SMX
    Informational Queries
    RegEx

    View full-size slide

  34. @SPEAKERNAME/#SMX
    https://twitter.com/danielkcheung/status/1524314760346365954

    View full-size slide

  35. @SPEAKERNAME/#SMX
    https://twitter.com/seo_notebook/status/1381792930197831687

    View full-size slide

  36. @SPEAKERNAME/#SMX
    are, can, can't, could, couldn't, did, didn't,
    do, does, doesn't, how, if, is, isn't, should,
    shouldn't, was, wasn't, were, weren't, what,
    when, where, who, whom, whose, why, will,
    won't, would, wouldn't

    View full-size slide

  37. @SPEAKERNAME/#SMX

    View full-size slide

  38. @SPEAKERNAME/#SMX

    View full-size slide

  39. @SPEAKERNAME/#SMX

    View full-size slide

  40. @SPEAKERNAME/#SMX

    View full-size slide

  41. @SPEAKERNAME/#SMX
    https://regexper.com/

    View full-size slide

  42. @SPEAKERNAME/#SMX
    https://regexper.com/

    View full-size slide

  43. @SPEAKERNAME/#SMX
    https://regexper.com/

    View full-size slide

  44. @SPEAKERNAME/#SMX

    View full-size slide

  45. @SPEAKERNAME/#SMX
    https://twitter.com/danielwaisberg/status/1402979440183939074

    View full-size slide

  46. @SPEAKERNAME/#SMX
    https://twitter.com/lazarinastoy/status/1461302669172166661

    View full-size slide

  47. @SPEAKERNAME/#SMX
    Branded Queries
    RegEx

    View full-size slide

  48. @SPEAKERNAME/#SMX

    View full-size slide

  49. @SPEAKERNAME/#SMX
    aamaung, damsung, mamsang, sam sung, samaung,
    samdung, samesung, sameung, samgsung, samgung,
    samsang, samsaung, samsgu, samshgg, samshng,
    samsing, samsnug, samssung, samsu, samsuag,
    samsubg, samsubng, samsug, samsumg, samsumng,
    samsun g, samsunb, samsund, samsund, samsunh,
    samsunt …

    View full-size slide

  50. @SPEAKERNAME/#SMX
    (s+|a|d|z)[a-z\s]{1,4}m?
    [a-z\s]{1,6}(m|u|n|g|t|h|b|v)
    Consider:
    • Main letters
    • Consonants
    • Letters surrounding hard
    consonants

    View full-size slide

  51. @SPEAKERNAME/#SMX
    samsung galaxy note
    galaxy samsung
    new samsung TV
    Consider:
    • Start of string
    • Surrounded by spaces
    • End of string
    🔍
    🔍
    🔍

    View full-size slide

  52. @SPEAKERNAME/#SMX
    (^|\s)(s+|a|d|z)[a-z\s]{1,4}m?
    [a-z\s]{1,6}(m|u|n|g|t|h|b|v)(\s|$)
    • Start of string = ^
    • Surrounded by spaces = \s
    • End of string = $

    View full-size slide

  53. @SPEAKERNAME/#SMX
    https://twitter.com/ChouinardJC/status/1405471189653360646

    View full-size slide

  54. @SPEAKERNAME/#SMX
    With the API
    Going deeper

    View full-size slide

  55. @SPEAKERNAME/#SMX
    https://twitter.com/GregBernhardt4/status/1462797592664887305

    View full-size slide

  56. @SPEAKERNAME/#SMX
    https://importsem.com/calculate-gsc-ctr-stats-by-position-using-python-for-seo/

    View full-size slide

  57. @SPEAKERNAME/#SMX
    https://www.oncrawl.com/technical-seo/extract-data-google-search-console-data-analysis-in-python/

    View full-size slide

  58. @SPEAKERNAME/#SMX
    US Product URLs
    //
    ///p-
    /tvs/
    /tvs/4k-led/
    /tvs/4k-led/p-4k-hd-model-314
    i18n Language & Country URLs
    //
    /-/
    /fr/
    /fr-br/
    /ae-ar/
    /fr/tvs/4k/
    /fr-br/tvs/4k-led/p-4k-hd-model-314
    /ae-ar/tvs/4k-led/p-4k-hd-model-314

    View full-size slide

  59. @SPEAKERNAME/#SMX
    Include /([^/]+/){1,2}p?
    • Any character that’s not a slash = [^/]+
    • 1 or 2 directories = /){1,2}
    • Sometimes followed by a product slug = p?
    Get All US PLPs + PDPs and NOT i18n pages
    Exclude
    /[a-zA-Z]{2}|[a-zA-Z]{2}-[a-zA-Z]{2}/
    • Any 2 letter directory = [a-zA-Z]{2}
    • 2 letter + 2 letter lang-country combo
    = [a-zA-Z]{2}-[a-zA-Z]{2}

    View full-size slide

  60. @SPEAKERNAME/#SMX

    View full-size slide

  61. @SPEAKERNAME/#SMX

    View full-size slide

  62. @SPEAKERNAME/#SMX
    https://twitter.com/eywu/status/1508992377372901376

    View full-size slide

  63. @SPEAKERNAME/#SMX
    https://twitter.com/victorpan/status/629291612812746753

    View full-size slide

  64. @SPEAKERNAME/#SMX
    https://developers.google.com/webmaster-tools/v1/api_reference_index

    View full-size slide

  65. @SPEAKERNAME/#SMX
    https://developers.google.com/webmaster-tools/v1/api_reference_index

    View full-size slide

  66. @SPEAKERNAME/#SMX
    https://developers.google.com/webmaster-tools/v1/api_reference_index

    View full-size slide

  67. @SPEAKERNAME/#SMX
    https://www.postman.com/

    View full-size slide

  68. @SPEAKERNAME/#SMX
    https://www.postman.com/

    View full-size slide

  69. @SPEAKERNAME/#SMX

    View full-size slide

  70. @SPEAKERNAME/#SMX
    https://www.jcchouinard.com/how-to-get-google-search-console-api-keys/

    View full-size slide

  71. @SPEAKERNAME/#SMX

    View full-size slide

  72. @SPEAKERNAME/#SMX

    View full-size slide

  73. @SPEAKERNAME/#SMX

    View full-size slide

  74. @SPEAKERNAME/#SMX

    View full-size slide

  75. @SPEAKERNAME/#SMX

    View full-size slide

  76. @SPEAKERNAME/#SMX

    View full-size slide

  77. @SPEAKERNAME/#SMX
    { "rows": [
    { "keys": ["2022-06-15"],"clicks":
    359756,"impressions": 7294403,"ctr":
    0.049319457671861563,"position":
    9.128287263536166}],"responseAggregationTy
    pe": "byPage"
    }
    Export JSON to file: samsung-g11n.json

    View full-size slide

  78. @SPEAKERNAME/#SMX
    https://stedolan.github.io/jq/

    View full-size slide

  79. @SPEAKERNAME/#SMX
    { "rows": [
    { "keys": ["2022-06-15"],
    "clicks": 359756,
    "impressions": 7294403,
    "ctr": 0.049319457671861563,
    "position": 9.128287263536166
    }
    ],
    "responseAggregationType": "byPage"
    }
    jq . < samsung-g11n.json

    View full-size slide

  80. @SPEAKERNAME/#SMX
    jq '.rows |
    [.[] |
    { _Date: .keys[0],
    Clicks: .clicks|tostring,
    Impressions: .impressions|tostring,
    CTR: .ctr,
    Position: .position
    }]' < samsung-g11n.json

    View full-size slide

  81. @SPEAKERNAME/#SMX
    [
    { "_Date": "2022-06-15",
    "Clicks": "359756",
    "Impressions": "7294403",
    "CTR": "0.049319457671861563",
    "Position": "9.128287263536166"
    }
    ]

    View full-size slide

  82. @SPEAKERNAME/#SMX
    https://github.com/TomWright/dasel

    View full-size slide

  83. @SPEAKERNAME/#SMX
    jq '.rows |
    [.[] |
    { _Date: .keys[0],
    Clicks: .clicks|tostring,
    Impressions: .impressions|tostring,
    CTR: .ctr,
    Position: .position
    }]' < samsung-g11n.json > samsung.json

    View full-size slide

  84. @SPEAKERNAME/#SMX
    cat samsung.json |
    dasel -r json -w csv > data.csv

    View full-size slide

  85. @SPEAKERNAME/#SMX
    jq '.rows |
    [.[] |
    { _Date: .keys[0],
    Clicks: .clicks|tostring,
    Impressions: .impressions|tostring,
    CTR: .ctr,
    Position: .position
    }]' < samsung-g11n.json |
    dasel -r json -w csv > data.csv

    View full-size slide

  86. @SPEAKERNAME/#SMX

    View full-size slide

  87. @SPEAKERNAME/#SMX
    https://en.ryte.com/

    View full-size slide

  88. @SPEAKERNAME/#SMX

    View full-size slide

  89. @SPEAKERNAME/#SMX
    @eywu

    View full-size slide

  90. @SPEAKERNAME/#SMX
    APPENDIX

    View full-size slide

  91. @SPEAKERNAME/#SMX
    After this session, you’ll be able to:
    • Extract the maximum amount of data from the GSC API
    • Utilize the GSC API in your SEO process with Spreadsheets, Data Studio,
    or python
    • Learn how to use regular expressions (regEx) to filter difficult URLs
    Getting the most out of the Google Search Console API
    Google Search Console is an amazing tool that provides invaluable search
    data by real users directly from Google. While the charts and tables are
    friendly to work with, a large part of the data is not accessible from the UI.
    The only way to get to this hidden data is to use the API and extract all
    that valuable search data that is available to you, only if you know how.

    View full-size slide

  92. @SPEAKERNAME/#SMX
    https://twitter.com/garabatokid/status/1147063121678389253

    View full-size slide

  93. @SPEAKERNAME/#SMX
    https://twitter.com/garabatokid/status/1147063121678389253

    View full-size slide

  94. @SPEAKERNAME/#SMX

    View full-size slide

  95. @SPEAKERNAME/#SMX

    View full-size slide