Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Getting the most out of Google Search Console API with RegEx

Getting the most out of Google Search Console API with RegEx

A talk given for SMX Advanced 2022 covering the Google Search Console API and how to use regex.

Getting the most out of the Google Search Console APIGoogle Search Console is an amazing tool that provides invaluable search data by real users directly from Google. While the charts and tables are friendly to work with, a large part of the data is not accessible from the UI. The only way to get to this hidden data is to use the API and extract all that valuable search data that is available to you, only if you know how.

After this session, you’ll be able to:
» Extract the maximum amount of data from the GSC API
» Utilize the GSC API in your SEO process with Spreadsheets, Data Studio, or python
» Learn how to use regular expressions (regEx) to filter difficult URLs

Eric Wu

May 24, 2022
Tweet

More Decks by Eric Wu

Other Decks in Marketing & SEO

Transcript

  1. @SPEAKERNAME/#SMX Getting the most out of the Google Search Console

    API with RegEx
  2. @SPEAKERNAME/#SMX stagnant

  3. @SPEAKERNAME/#SMX declining

  4. @SPEAKERNAME/#SMX core update drop 🤔

  5. @SPEAKERNAME/#SMX

  6. @SPEAKERNAME/#SMX

  7. @SPEAKERNAME/#SMX Data Studio GSC Analysis in

  8. @SPEAKERNAME/#SMX https://twitter.com/aleyda/status/1461358112745537545

  9. @SPEAKERNAME/#SMX https://datastudio.google.com/u/0/reporting/1Fm7x1vc0vLokRhGf0WqaMd52mw7wjaSI/page/6zXD

  10. @SPEAKERNAME/#SMX https://twitter.com/DataChaz/status/1509198629361303560

  11. @SPEAKERNAME/#SMX https://twitter.com/HannahRampton/status/1513923100768935939

  12. @SPEAKERNAME/#SMX https://www.hannahrampton.co.uk/v2-search-console-explorer-studio/

  13. @SPEAKERNAME/#SMX https://twitter.com/HannahRampton/status/1226166365788233728

  14. @SPEAKERNAME/#SMX https://www.hannahrampton.co.uk/search-console-explorer-sheet-free-google-sheet/

  15. @SPEAKERNAME/#SMX 🐢

  16. @SPEAKERNAME/#SMX https://twitter.com/anthonydnelson/status/967059108905078786

  17. @SPEAKERNAME/#SMX Sampling Problem Overcoming the

  18. @SPEAKERNAME/#SMX https://twitter.com/noahlearner/status/1339621958665682944

  19. @SPEAKERNAME/#SMX https://twitter.com/Similar_ai/status/1511008285951991811

  20. @SPEAKERNAME/#SMX https://twooctobers.com/two-octobers-explorer-for-search/

  21. @SPEAKERNAME/#SMX https://twooctobers.com/two-octobers-explorer-for-search/

  22. @SPEAKERNAME/#SMX https://similar.ai/blog/closing-google-search-console-sampling-gap/ • Adding 10 well-chosen sub-directories as GSC profiles

    can close the gap to almost 75% • The gap saturates towards the end because of longer tail sub-directories
  23. @SPEAKERNAME/#SMX Regular Expressions Getting more with

  24. @SPEAKERNAME/#SMX https://twitter.com/jackson_lo/status/1352997899806912513

  25. @SPEAKERNAME/#SMX https://twitter.com/googlesearchc/status/1379775388193320962

  26. @SPEAKERNAME/#SMX

  27. @SPEAKERNAME/#SMX https://xkcd.com/208/

  28. @SPEAKERNAME/#SMX https://xkcd.com/208/

  29. @SPEAKERNAME/#SMX RE2! https://xkcd.com/208/

  30. @SPEAKERNAME/#SMX https://www.reddit.com/r/ProgrammerHumor/comments/tdtdfn/id_like_you_to_meet_regex/ 🥲

  31. @SPEAKERNAME/#SMX https://www.reddit.com/r/ProgrammerHumor/comments/tdtdfn/id_like_you_to_meet_regex/ 🤪

  32. @SPEAKERNAME/#SMX https://github.com/google/re2/wiki/Syntax

  33. @SPEAKERNAME/#SMX Informational Queries RegEx

  34. @SPEAKERNAME/#SMX https://twitter.com/danielkcheung/status/1524314760346365954

  35. @SPEAKERNAME/#SMX https://twitter.com/seo_notebook/status/1381792930197831687

  36. @SPEAKERNAME/#SMX are, can, can't, could, couldn't, did, didn't, do, does,

    doesn't, how, if, is, isn't, should, shouldn't, was, wasn't, were, weren't, what, when, where, who, whom, whose, why, will, won't, would, wouldn't
  37. @SPEAKERNAME/#SMX

  38. @SPEAKERNAME/#SMX

  39. @SPEAKERNAME/#SMX

  40. @SPEAKERNAME/#SMX

  41. @SPEAKERNAME/#SMX https://regexper.com/

  42. @SPEAKERNAME/#SMX https://regexper.com/

  43. @SPEAKERNAME/#SMX https://regexper.com/

  44. @SPEAKERNAME/#SMX

  45. @SPEAKERNAME/#SMX https://twitter.com/danielwaisberg/status/1402979440183939074

  46. @SPEAKERNAME/#SMX https://twitter.com/lazarinastoy/status/1461302669172166661

  47. @SPEAKERNAME/#SMX Branded Queries RegEx

  48. @SPEAKERNAME/#SMX

  49. @SPEAKERNAME/#SMX aamaung, damsung, mamsang, sam sung, samaung, samdung, samesung, sameung,

    samgsung, samgung, samsang, samsaung, samsgu, samshgg, samshng, samsing, samsnug, samssung, samsu, samsuag, samsubg, samsubng, samsug, samsumg, samsumng, samsun g, samsunb, samsund, samsund, samsunh, samsunt …
  50. @SPEAKERNAME/#SMX (s+|a|d|z)[a-z\s]{1,4}m? [a-z\s]{1,6}(m|u|n|g|t|h|b|v) Consider: • Main letters • Consonants •

    Letters surrounding hard consonants
  51. @SPEAKERNAME/#SMX samsung galaxy note galaxy samsung new samsung TV Consider:

    • Start of string • Surrounded by spaces • End of string 🔍 🔍 🔍
  52. @SPEAKERNAME/#SMX (^|\s)(s+|a|d|z)[a-z\s]{1,4}m? [a-z\s]{1,6}(m|u|n|g|t|h|b|v)(\s|$) • Start of string = ^ •

    Surrounded by spaces = \s • End of string = $
  53. @SPEAKERNAME/#SMX https://twitter.com/ChouinardJC/status/1405471189653360646

  54. @SPEAKERNAME/#SMX With the API Going deeper

  55. @SPEAKERNAME/#SMX https://twitter.com/GregBernhardt4/status/1462797592664887305

  56. @SPEAKERNAME/#SMX https://importsem.com/calculate-gsc-ctr-stats-by-position-using-python-for-seo/

  57. @SPEAKERNAME/#SMX https://www.oncrawl.com/technical-seo/extract-data-google-search-console-data-analysis-in-python/

  58. @SPEAKERNAME/#SMX US Product URLs /<cat>/ /<cat>/<sub-cat>/p-<product> /tvs/ /tvs/4k-led/ /tvs/4k-led/p-4k-hd-model-314 i18n

    Language & Country URLs /<lang>/ /<lang>-<country>/ /fr/ /fr-br/ /ae-ar/ /fr/tvs/4k/ /fr-br/tvs/4k-led/p-4k-hd-model-314 /ae-ar/tvs/4k-led/p-4k-hd-model-314
  59. @SPEAKERNAME/#SMX Include /([^/]+/){1,2}p? • Any character that’s not a slash

    = [^/]+ • 1 or 2 directories = /){1,2} • Sometimes followed by a product slug = p? Get All US PLPs + PDPs and NOT i18n pages Exclude /[a-zA-Z]{2}|[a-zA-Z]{2}-[a-zA-Z]{2}/ • Any 2 letter directory = [a-zA-Z]{2} • 2 letter + 2 letter lang-country combo = [a-zA-Z]{2}-[a-zA-Z]{2}
  60. @SPEAKERNAME/#SMX

  61. @SPEAKERNAME/#SMX

  62. @SPEAKERNAME/#SMX https://twitter.com/eywu/status/1508992377372901376

  63. @SPEAKERNAME/#SMX https://twitter.com/victorpan/status/629291612812746753

  64. @SPEAKERNAME/#SMX https://developers.google.com/webmaster-tools/v1/api_reference_index

  65. @SPEAKERNAME/#SMX https://developers.google.com/webmaster-tools/v1/api_reference_index

  66. @SPEAKERNAME/#SMX https://developers.google.com/webmaster-tools/v1/api_reference_index

  67. @SPEAKERNAME/#SMX https://www.postman.com/

  68. @SPEAKERNAME/#SMX https://www.postman.com/

  69. @SPEAKERNAME/#SMX

  70. @SPEAKERNAME/#SMX https://www.jcchouinard.com/how-to-get-google-search-console-api-keys/

  71. @SPEAKERNAME/#SMX

  72. @SPEAKERNAME/#SMX

  73. @SPEAKERNAME/#SMX

  74. @SPEAKERNAME/#SMX

  75. @SPEAKERNAME/#SMX

  76. @SPEAKERNAME/#SMX

  77. @SPEAKERNAME/#SMX { "rows": [ { "keys": ["2022-06-15"],"clicks": 359756,"impressions": 7294403,"ctr": 0.049319457671861563,"position":

    9.128287263536166}],"responseAggregationTy pe": "byPage" } Export JSON to file: samsung-g11n.json
  78. @SPEAKERNAME/#SMX https://stedolan.github.io/jq/

  79. @SPEAKERNAME/#SMX { "rows": [ { "keys": ["2022-06-15"], "clicks": 359756, "impressions":

    7294403, "ctr": 0.049319457671861563, "position": 9.128287263536166 } ], "responseAggregationType": "byPage" } jq . < samsung-g11n.json
  80. @SPEAKERNAME/#SMX jq '.rows | [.[] | { _Date: .keys[0], Clicks:

    .clicks|tostring, Impressions: .impressions|tostring, CTR: .ctr, Position: .position }]' < samsung-g11n.json
  81. @SPEAKERNAME/#SMX [ { "_Date": "2022-06-15", "Clicks": "359756", "Impressions": "7294403", "CTR":

    "0.049319457671861563", "Position": "9.128287263536166" } ]
  82. @SPEAKERNAME/#SMX https://github.com/TomWright/dasel

  83. @SPEAKERNAME/#SMX jq '.rows | [.[] | { _Date: .keys[0], Clicks:

    .clicks|tostring, Impressions: .impressions|tostring, CTR: .ctr, Position: .position }]' < samsung-g11n.json > samsung.json
  84. @SPEAKERNAME/#SMX cat samsung.json | dasel -r json -w csv >

    data.csv
  85. @SPEAKERNAME/#SMX jq '.rows | [.[] | { _Date: .keys[0], Clicks:

    .clicks|tostring, Impressions: .impressions|tostring, CTR: .ctr, Position: .position }]' < samsung-g11n.json | dasel -r json -w csv > data.csv
  86. @SPEAKERNAME/#SMX

  87. @SPEAKERNAME/#SMX https://en.ryte.com/

  88. @SPEAKERNAME/#SMX

  89. @SPEAKERNAME/#SMX @eywu

  90. @SPEAKERNAME/#SMX APPENDIX

  91. @SPEAKERNAME/#SMX After this session, you’ll be able to: • Extract

    the maximum amount of data from the GSC API • Utilize the GSC API in your SEO process with Spreadsheets, Data Studio, or python • Learn how to use regular expressions (regEx) to filter difficult URLs Getting the most out of the Google Search Console API Google Search Console is an amazing tool that provides invaluable search data by real users directly from Google. While the charts and tables are friendly to work with, a large part of the data is not accessible from the UI. The only way to get to this hidden data is to use the API and extract all that valuable search data that is available to you, only if you know how.
  92. @SPEAKERNAME/#SMX https://twitter.com/garabatokid/status/1147063121678389253

  93. @SPEAKERNAME/#SMX https://twitter.com/garabatokid/status/1147063121678389253

  94. @SPEAKERNAME/#SMX

  95. @SPEAKERNAME/#SMX