Upgrade to Pro — share decks privately, control downloads, hide ads and more …





May 28, 2014


  1. ndb “NDB is a better datastore API for the Google

    App Engine Python runtime.”
  2. Part 1 of 2

  3. Why ndb? 1. Less stupid by default 2. More flexible

    queries 3. Tasklets with autobatching
  4. Less stupid by default With db: class UserVideo(db.Model): user_id =

    db.StringProperty() video = db.ReferenceProperty(Video) user_video = UserVideo.get_for_video_and_user_data( video, user_data) return jsonify(user_video) # slow
  5. Less stupid by default With ndb: class UserVideo(ndb.Model): user_id =

    ndb.StringProperty() video = ndb.KeyProperty(kind=Video) user_video = UserVideo.get_for_video_and_user_data( video, user_data) return jsonify(user_video) # not slow!
  6. More flexible queries ndb lets you build filters using ndb.AND

    and ndb.OR: questions = Feedback.query() .filter(Feedback.type == 'question') .filter(Feedback.target == video_key) .filter(ndb.OR( Feedback.is_visible_to_public == True, Feedback.author_user_id == current_id)) .fetch(1000) Magic happens.
  7. Performance The datastore is slow. How can we speed things

    up? 4 Batch operations together 4 Do things in parallel 4 Avoid the datastore
  8. Tasklets and autobatching def get_user_exercise_cache(user_data): uec = UEC.get_for_user_data(user_data) if not

    uec: user_exercises = UE.get_all(user_data) uec = UEC.build(user_exercises) return uec def get_all_uecs(user_datas): return map(get_user_exercise_cache, user_datas)
  9. Tasklets and autobatching @ndb.tasklet def get_user_exercise_cache_async(user_data): uec = yield UEC.get_for_user_data_async(user_data)

    if not uec: user_exercises = yield UE.get_all(user_data) uec = UEC.build(user_exercises) raise ndb.Return(uec) @ndb.synctasklet def get_all_uecs(user_datas): uecs = yield map(get_user_exercise_cache_async, user_datas) raise ndb.Return(uecs)
  10. Moral ndb is awesome. Use it.

  11. Part 2 of 2

  12. The sad truth ndb isn't perfect.

  13. Mysterious errors You heard from Marcia about this gem back

    in March: TypeError: '_BaseValue' object is not subscriptable
  14. Q: What's worse than code that doesn't work at all?

    A: Code that mostly works but breaks in subtle ways.
  15. Secret slowness #1 Multi-queries, with IN and OR: answers =

    Feedback.query() .filter(Feedback.type == 'answer') .filter(Feedback.in_reply_to.IN(question_keys)) .fetch(1000) Doesn't run in parallel!
  16. Secret slowness #1 A not-horribly-slow multi-query: answers = Feedback.query() .filter(Feedback.type

    == 'answer') .filter(Feedback.in_reply_to.IN(question_keys)) .order(Feedback.__key__) .fetch(1000)
  17. Secret slowness #2 Query iterators: query = Feedback.query().filter( Feedback.topic_ids ==

    'algebra') questions = [] for q in query.iter(batch_size=20): if q.is_visible_to(user_data): questions.append(q) if len(questions) >= 10: break
  18. Secret slowness #2 Solution? Sometimes you have to do it

    by hand.
  19. Moral ndb isn't perfect. Pay attention. Profile your code.

  20. The End