Retiring and removing
content… at scale.
William Kerslake
University of Cambridge
@williamkerslake
Speakerdeck.com/wk252
Slide 2
Slide 2 text
An overgrown site
Slide 3
Slide 3 text
Duplicate
content risk
Slide 4
Slide 4 text
A complex web
of institutions.
Slide 5
Slide 5 text
University
collects
everything
Slide 6
Slide 6 text
Trojan Room
Coffee Pot’s
content lifecycle
Slide 7
Slide 7 text
Our web estate
is huge and
growing
0
500
1000
1500
2000
2500
3000
2008 2013 2018 2023 2024
Slide 8
Slide 8 text
Basics
Websites are
locally optimised
Slide 9
Slide 9 text
2,500
sites
with over 2 million
web pages
Slide 10
Slide 10 text
1,300+ editors
Slide 11
Slide 11 text
SEO
Make it easier for
search engines to
find and understand.
Slide 12
Slide 12 text
Make it easier for users, and it is easier
for search engines.
Slide 13
Slide 13 text
Try it
I want to study
English at Cambridge.
What will it cost?
Slide 14
Slide 14 text
No content
Slide 15
Slide 15 text
£9,250
Maybe.
Slide 16
Slide 16 text
Google gets it wrong
Slide 17
Slide 17 text
Where to start?
Prioritise and reduce
Slide 18
Slide 18 text
Find
content
owners
Slide 19
Slide 19 text
Don’t go
alone
Slide 20
Slide 20 text
Get data
Google Analytics 4 on
everything.
Slide 21
Slide 21 text
Tag
Manager
Lookup Table variable
Slide 22
Slide 22 text
Inventory
and audit
Slide 23
Slide 23 text
Start crawling
Slide 24
Slide 24 text
Be
ruthless
Slide 25
Slide 25 text
Additive
bias
Easy to add more, hard
to take away
Slide 26
Slide 26 text
Archive
Make it less scary for
people.
Slide 27
Slide 27 text
Clear
criteria
For last year {
If views < 366
then delete
Else review
}
Slide 28
Slide 28 text
Redundant
Obsolete
Trivial
Slide 29
Slide 29 text
Obsolete…
As the Millenium approaches, the
Fitzwilliam Museum is opening a new
exhibition focusing on the apocalyptic
vision of a German artist five hundred
years ago.
Slide 30
Slide 30 text
Redundant...
Slide 31
Slide 31 text
Trivial… last 12 months
Slide 32
Slide 32 text
Criteria
need
data
For published > 10
years {
If ROT = true
then delete
Else review
}
Slide 33
Slide 33 text
=IMPORTXML(“URL”,”xpath”)
Slide 34
Slide 34 text
Custom crawl
Screamingfrog’s Custom extraction & GA4 API
Slide 35
Slide 35 text
Crawl output
Slide 36
Slide 36 text
Export
Use criteria and conditional formatting to target
Slide 37
Slide 37 text
Take out
easy
targets
Slide 38
Slide 38 text
Redirect
users
Slide 39
Slide 39 text
Success?
Slide 40
Slide 40 text
URL and domain
strategy
Slide 41
Slide 41 text
Let’s get fixing.
William Kerslake
University of Cambridge
@williamkerslake
Speakerdeck.com/wk252