Slide 1

Slide 1 text

ndb “NDB is a better datastore API for the Google App Engine Python runtime.”

Slide 2

Slide 2 text

Part 1 of 2

Slide 3

Slide 3 text

Why ndb? 1. Less stupid by default 2. More flexible queries 3. Tasklets with autobatching

Slide 4

Slide 4 text

Less stupid by default With db: class UserVideo(db.Model): user_id = db.StringProperty() video = db.ReferenceProperty(Video) user_video = UserVideo.get_for_video_and_user_data( video, user_data) return jsonify(user_video) # slow

Slide 5

Slide 5 text

Less stupid by default With ndb: class UserVideo(ndb.Model): user_id = ndb.StringProperty() video = ndb.KeyProperty(kind=Video) user_video = UserVideo.get_for_video_and_user_data( video, user_data) return jsonify(user_video) # not slow!

Slide 6

Slide 6 text

More flexible queries ndb lets you build filters using ndb.AND and ndb.OR: questions = Feedback.query() .filter(Feedback.type == 'question') .filter(Feedback.target == video_key) .filter(ndb.OR( Feedback.is_visible_to_public == True, Feedback.author_user_id == current_id)) .fetch(1000) Magic happens.

Slide 7

Slide 7 text

Performance The datastore is slow. How can we speed things up? 4 Batch operations together 4 Do things in parallel 4 Avoid the datastore

Slide 8

Slide 8 text

Tasklets and autobatching def get_user_exercise_cache(user_data): uec = UEC.get_for_user_data(user_data) if not uec: user_exercises = UE.get_all(user_data) uec = UEC.build(user_exercises) return uec def get_all_uecs(user_datas): return map(get_user_exercise_cache, user_datas)

Slide 9

Slide 9 text

Tasklets and autobatching @ndb.tasklet def get_user_exercise_cache_async(user_data): uec = yield UEC.get_for_user_data_async(user_data) if not uec: user_exercises = yield UE.get_all(user_data) uec = UEC.build(user_exercises) raise ndb.Return(uec) @ndb.synctasklet def get_all_uecs(user_datas): uecs = yield map(get_user_exercise_cache_async, user_datas) raise ndb.Return(uecs)

Slide 10

Slide 10 text

Moral ndb is awesome. Use it.

Slide 11

Slide 11 text

Part 2 of 2

Slide 12

Slide 12 text

The sad truth ndb isn't perfect.

Slide 13

Slide 13 text

Mysterious errors You heard from Marcia about this gem back in March: TypeError: '_BaseValue' object is not subscriptable

Slide 14

Slide 14 text

Q: What's worse than code that doesn't work at all? A: Code that mostly works but breaks in subtle ways.

Slide 15

Slide 15 text

Secret slowness #1 Multi-queries, with IN and OR: answers = Feedback.query() .filter(Feedback.type == 'answer') .filter(Feedback.in_reply_to.IN(question_keys)) .fetch(1000) Doesn't run in parallel!

Slide 16

Slide 16 text

Secret slowness #1 A not-horribly-slow multi-query: answers = Feedback.query() .filter(Feedback.type == 'answer') .filter(Feedback.in_reply_to.IN(question_keys)) .order(Feedback.__key__) .fetch(1000)

Slide 17

Slide 17 text

Secret slowness #2 Query iterators: query = Feedback.query().filter( Feedback.topic_ids == 'algebra') questions = [] for q in query.iter(batch_size=20): if q.is_visible_to(user_data): questions.append(q) if len(questions) >= 10: break

Slide 18

Slide 18 text

Secret slowness #2 Solution? Sometimes you have to do it by hand.

Slide 19

Slide 19 text

Moral ndb isn't perfect. Pay attention. Profile your code.

Slide 20

Slide 20 text

The End