User behaviour modeling for data
prefetching in web applications
Kacper Łukawski
Slide 2
Slide 2 text
The problem
Users tend to use the
application in a similar manner
over time
A majority of modern web
application uses multiple HTTP
calls to fully generate the
content
Each request increases total
latency perceived by a
particular user
Slide 3
Slide 3 text
Mobile applications
Network connection is not always good
Applications may become useless without an access to some
external resources
Would be great to have the data locally and synchronize whenever
possible
Slide 4
Slide 4 text
User behaviour modeling
Get only resources that the user requested for
Fetch everything what available
Try to predict what might be useful in the nearest
future
Slide 5
Slide 5 text
Content based methods
Statistical modeling
Predicting user's interests
Slide 6
Slide 6 text
Content based methods
Statistical modeling
Predicting user's interests
Slide 7
Slide 7 text
Content based methods
Statistical modeling
Predicting user's interests
Slide 8
Slide 8 text
NLP related technique
Performs statistical analysis of the language
As a base unit of modeling it uses fixed length
subsequences
Builds statistics of the collocations
N-gram statistical model
Slide 9
Slide 9 text
N-gram – an example
„The man who does not read has no advantage
over the man who cannot read”
The man 2 Has no
Man who 2 No advantage
Who does Advantage over
Does not Over the
Not read Who cannot
Read has Cannot read
Slide 10
Slide 10 text
Statistics are built in the similar manner like it is
done in standard n-gram architecture
But this time, n is a minimal length of a sequence,
not a fixed length
Such strategy usually leads to more accurate
predictions
N-gram+
Slide 11
Slide 11 text
N-gram+ – an example
„The man who does not read has no advantage
over the man who cannot read”
The man who does not
read has no advantage
over the man who cannot
read
The man Has no
The man who does not
read cannot
Man who No advantage
Man who does not read
has no advantage over
the man who cannot read
Who does Advantage over
... Does not Over the
... Not read Who cannot
... Read has Cannot read
Slide 12
Slide 12 text
A collection of Apache HTTP server logs
Delivers the following details of the requests:
Host
Timestamp
Request (URL and method)
HTTP reply code
Number of bytes in the response
NASA dataset
Slide 13
Slide 13 text
NASA dataset
Slide 14
Slide 14 text
NASA dataset - results
N-gram ~50% accuracy
N-gram+ ~60% accuracy
Slide 15
Slide 15 text
Typical web application are built on top of the HTTP
protocol
HTTP delivers 9 methods – each one has a different
meaning (GET, HEAD, PUT, POST, DELETE,
OPTIONS, TRACE, CONNECT, PATCH)
A proposal of extension
Slide 16
Slide 16 text
GET /order
GET /product/{product_id}
A proposal of extension – an example
GET /order [
{'id': 1, 'products: [{'id': 100}, {'id': 200}]},
{'id': 2, 'products: [{'id': 300}]}
]
GET /product/100 {
'id': 100,
'name': 'lorem'
}
GET /product/200 {
'id': 200,
'name': 'ipsum'
}
GET /product/300 {
'id': 300,
'name': 'dolor'
}
Slide 17
Slide 17 text
Typical API endpoint:
GET /resource/{id}/?foo={foo}&bar={bar}
Between two HTTP actions from one session:
➔
No relation at all
➔
Request or response tokens of second action are taken from
the first one
➔
Tokens are filled using some external knowledge
➔
Value of the token in the second action might be calculated
from the values of the first one
A proposal of extension - relations
Slide 18
Slide 18 text
Typical API endpoint:
GET /resource/{id}/?foo={foo}&bar={bar}
Between two HTTP actions from one session:
➔
No relation at all
➔
Request or response tokens of second action are taken from
the first one
➔
Tokens are filled using some external knowledge
➔
Value of the token in the second action might be calculated
from the values of the first one
A proposal of extension - relations
Slide 19
Slide 19 text
Let's treat each HTTP request and response as a
single action performed by the user
Try to find the relations between the actions that
often take place in the similar order
Use only the actions that do not change anything
(GET, HEAD)
A proposal of extension - contd
Slide 20
Slide 20 text
Assign the HTTP actions into the HTTP endpoints
that were used to process them
Tokenize each request and response
Using some n-gram-like architecture, try to find
the relations within the subsequences
(request/response request)
→
Collect the statistics of token values in the requests
A proposal of extension – an algorithm
Slide 21
Slide 21 text
In the prediction phase try to find the most
probable endpoint that will be used in the next step
Having the endpoint, fill the tokens using relations
to previous actions
If not all the tokens can be filled, use the statistics
of the values
Perform the action(s) using predicted values of
tokens
Send aggregated responses at once
A proposal of extension – an algorithm
Slide 22
Slide 22 text
A. Georgakis, H. Li, “User behavior modeling and content
based speculative web page prefetching”, Data & Knowledge
Engineering 59 (2006) 770–788
M. Narvekar, S. S. Banu, “Predicting User’s Web Navigation
Behavior Using Hybrid Approach”, Procedia Computer
Science 45 ( 2015 ) 3 – 12
http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html
https://en.wikipedia.org/wiki/N-gram
References