Slide 1

Slide 1 text

User behaviour modeling for data prefetching in web applications Kacper Łukawski

Slide 2

Slide 2 text

The problem Users tend to use the application in a similar manner over time A majority of modern web application uses multiple HTTP calls to fully generate the content Each request increases total latency perceived by a particular user

Slide 3

Slide 3 text

Mobile applications Network connection is not always good Applications may become useless without an access to some external resources Would be great to have the data locally and synchronize whenever possible

Slide 4

Slide 4 text

User behaviour modeling Get only resources that the user requested for Fetch everything what available Try to predict what might be useful in the nearest future

Slide 5

Slide 5 text

Content based methods Statistical modeling Predicting user's interests

Slide 6

Slide 6 text

Content based methods Statistical modeling Predicting user's interests

Slide 7

Slide 7 text

Content based methods Statistical modeling Predicting user's interests

Slide 8

Slide 8 text

NLP related technique Performs statistical analysis of the language As a base unit of modeling it uses fixed length subsequences Builds statistics of the collocations N-gram statistical model

Slide 9

Slide 9 text

N-gram – an example „The man who does not read has no advantage over the man who cannot read” The man 2 Has no Man who 2 No advantage Who does Advantage over Does not Over the Not read Who cannot Read has Cannot read

Slide 10

Slide 10 text

Statistics are built in the similar manner like it is done in standard n-gram architecture But this time, n is a minimal length of a sequence, not a fixed length Such strategy usually leads to more accurate predictions N-gram+

Slide 11

Slide 11 text

N-gram+ – an example „The man who does not read has no advantage over the man who cannot read” The man who does not read has no advantage over the man who cannot read The man Has no The man who does not read cannot Man who No advantage Man who does not read has no advantage over the man who cannot read Who does Advantage over ... Does not Over the ... Not read Who cannot ... Read has Cannot read

Slide 12

Slide 12 text

A collection of Apache HTTP server logs Delivers the following details of the requests:  Host  Timestamp  Request (URL and method)  HTTP reply code  Number of bytes in the response NASA dataset

Slide 13

Slide 13 text

NASA dataset

Slide 14

Slide 14 text

NASA dataset - results N-gram ~50% accuracy N-gram+ ~60% accuracy

Slide 15

Slide 15 text

Typical web application are built on top of the HTTP protocol HTTP delivers 9 methods – each one has a different meaning (GET, HEAD, PUT, POST, DELETE, OPTIONS, TRACE, CONNECT, PATCH) A proposal of extension

Slide 16

Slide 16 text

GET /order GET /product/{product_id} A proposal of extension – an example GET /order [ {'id': 1, 'products: [{'id': 100}, {'id': 200}]}, {'id': 2, 'products: [{'id': 300}]} ] GET /product/100 { 'id': 100, 'name': 'lorem' } GET /product/200 { 'id': 200, 'name': 'ipsum' } GET /product/300 { 'id': 300, 'name': 'dolor' }

Slide 17

Slide 17 text

Typical API endpoint: GET /resource/{id}/?foo={foo}&bar={bar} Between two HTTP actions from one session: ➔ No relation at all ➔ Request or response tokens of second action are taken from the first one ➔ Tokens are filled using some external knowledge ➔ Value of the token in the second action might be calculated from the values of the first one A proposal of extension - relations

Slide 18

Slide 18 text

Typical API endpoint: GET /resource/{id}/?foo={foo}&bar={bar} Between two HTTP actions from one session: ➔ No relation at all ➔ Request or response tokens of second action are taken from the first one ➔ Tokens are filled using some external knowledge ➔ Value of the token in the second action might be calculated from the values of the first one A proposal of extension - relations

Slide 19

Slide 19 text

Let's treat each HTTP request and response as a single action performed by the user Try to find the relations between the actions that often take place in the similar order Use only the actions that do not change anything (GET, HEAD) A proposal of extension - contd

Slide 20

Slide 20 text

Assign the HTTP actions into the HTTP endpoints that were used to process them Tokenize each request and response Using some n-gram-like architecture, try to find the relations within the subsequences (request/response request) → Collect the statistics of token values in the requests A proposal of extension – an algorithm

Slide 21

Slide 21 text

In the prediction phase try to find the most probable endpoint that will be used in the next step Having the endpoint, fill the tokens using relations to previous actions If not all the tokens can be filled, use the statistics of the values Perform the action(s) using predicted values of tokens Send aggregated responses at once A proposal of extension – an algorithm

Slide 22

Slide 22 text

A. Georgakis, H. Li, “User behavior modeling and content based speculative web page prefetching”, Data & Knowledge Engineering 59 (2006) 770–788 M. Narvekar, S. S. Banu, “Predicting User’s Web Navigation Behavior Using Hybrid Approach”, Procedia Computer Science 45 ( 2015 ) 3 – 12 http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html https://en.wikipedia.org/wiki/N-gram References