Slide 9
Slide 9 text
• http://laptop/offsets/Python
• Get the byte offset of every title starting with
‘Python’
• Get the next byte offset of the next page via
offsets.searchsorted()
• Adjust them (start – 7 bytes, end – 11 bytes)
• …and return a JSON of [ [‘’, starting_byte,
ending_byte] ]
• Web client then does a HTTP ranged request against
/xml to grab the relevant XML fragment
• Or, http://laptop/wiki/Python for exact lookup
Does all of the above but issues the range request as
well if there’s an exact hit, returning the fragment in one
• Non-trivial app that does something half useful, good use
case
• Exercises external C modules (NumPy, Cythonized datrie,
etc)
• Not something you could easily do with existing solutions
(multiprocessing) (Could you?)