Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stream Data Processing with Kinesis and Go at Timehop

Stream Data Processing with Kinesis and Go at Timehop

Stream data processing, data pipeline architecture, unified log system, event sourcing, CQRS, Complex Event Processing — these are just some of the names for an approach to system design that emphasizes and embraces that data *flow* like a stream. There’s been a recent surge in discussions about building systems this way, from LinkedIn to Confluent to Yahoo. There are lots of fascinating and inspiring articles, books, and conference talks, but many of them are bold, broad, and fundamental. There’s a dearth of guidance on the nuts and bolts of actually building systems this way. So in the spirit of Go this talk will start to fill that gap a bit, from the bottom up.

I’ll describe an existing queue-based data processing system at Timehop that was starting to break down, and the steps we took to replace it with a stream-based system. We will discuss the overall dataflow of each system and review the Go code used to interface with Kinesis and process the streaming data.

Avi Flax

June 02, 2015
Tweet

More Decks by Avi Flax

Other Decks in Programming

Transcript

  1. Stream'Data'Processing'with'Kinesis'and'Go'at'
    Timehop
    Avi$Flax
    June%2015

    View Slide

  2. View Slide

  3. Background
    • Whenever(a(user(opens(the(app,(the(app(no0fies(the(API
    • We(do(4(things(with(this(data
    • (1)(Count(daily(unique(user(opens
    • (2)(Record(the(last(opened(0me(for(each(user
    • (3)(Update(user(data
    • (4)(Archive(the(app(open(event

    View Slide

  4. Prior%System

    View Slide

  5. Interlude)I:)Jay)Kreps)❤s)Logs
    • Jan%2011:%Ka+a%released%as%FLOSS
    • Jul%2011:%Ka+a%entered%Apache%incuba=on
    • Nov%2012:%Ka+a%graduated%incuba=on
    • Dec%2013:%Kreps%published%The%Log:%What%every%soGware%
    engineer%should%know%about%realJ=me%data's%unifying%abstrac=on
    • Sep%2014:%published%in%book%form:%I%❤%Logs:%Event%Data,%Stream%
    Processing,%and%Data%Integra=on
    • Nov%2014:%coJfounded%Confluent

    View Slide

  6. Interlude)II:)Unified)Logs)in)a)Nutshell
    • A#very#specific#approach#to#streaming#data#transport
    • a#server#providing#“an#append6only,#totally6ordered#sequence#of#
    records#ordered#by#=me”
    • Decouples:+producers+&+consumers;+transport+&+processing;+
    consumers)&)consumers

    View Slide

  7. Interlude)III:)AWS)Kinesis
    …recently)Amazon)has)offered)a)service)that)is)very)very)similar)to)
    Ka6a)called)Kinesis…)I)was)pre:y)happy)about)this.
    —"Jay"Kreps

    View Slide

  8. Interlude)IV:)Go
    • An$object+oriented$systems$language$with$GC

    View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. The$Workers

    View Slide

  14. View Slide

  15. Producer(/(apicard
    • We’re&using&Sendgrid’s&go-kinesis&library&to&produce&records&
    to&Kinesis
    • We&pre7y&quickly&ran&into&some&issues:
    • Kinesis&somenormal)
    • Expense&of&sending&records&individually
    • Need&to&do&batching&and&retrying&in&the&background
    • Solu

    View Slide

  16. go-kinesis/batchproducer
    type Producer interface {
    Start() error
    Stop() error
    Add(data []byte, partitionKey string) error
    Flush(timeout time.Duration, sendStats bool) (sent int, remaining int, err error)
    }
    // BatchingKinesisClient is a subset of KinesisClient to ease mocking.
    type BatchingKinesisClient interface {
    PutRecords(args *kinesis.RequestArgs) (resp *kinesis.PutRecordsResp, err error)
    }
    func New(client BatchingKinesisClient, streamName string, config Config) (Producer, error)

    View Slide

  17. API$Startup
    AppOpenBatcher, err = batchproducer.New(ksis, appOpenStreamName, config)
    if err != nil {
    golog.Fatal(gologID, "Oh noes!")
    }
    AppOpenBatcher.Start()

    View Slide

  18. App#Opens#API#Resource
    func enqueueAppOpenEventForStream(event appopen.AppOpen) {
    avrobytes, err := event.ToAvro()
    if err != nil {
    ...
    return
    }
    partitionKey := strconv.Itoa(time.Unix(event.Timestamp, 0).Second())
    err = conf.AppOpenBatcher.Add(avrobytes, partitionKey)
    if err != nil {
    ...
    return
    }
    }

    View Slide

  19. View Slide

  20. Streams(Repo
    Four%notable%concepts/packages:
    • models
    • package kclmultilang
    • tasks
    • cmd

    View Slide

  21. package …/streams/models/appopen
    type AppOpen struct{
    Timestamp int64
    UserID int64
    ...
    }
    func FromAvro(record []byte) (*AppOpen, error)
    func (s AppOpen) ToAvro() ([]byte, error)
    func (s AppOpen) Validate() error

    View Slide

  22. kclmultilang
    type Config struct {
    StreamName string
    WorkerName string
    ...
    }
    func RunWithSingleProcessor(Config, SingleEventProcessor)
    type SingleEventProcessor func(appopen.AppOpen, log.Logger) error

    View Slide

  23. Task%RecordLastOpen
    // RecordLastOpen updates a certain Redis key for each user that
    // stores the last time that user opened the Timehop mobile app.
    func RecordLastOpen(
    event appopen.AppOpen,
    redis redis.Pool,
    logger log.Logger
    ) error {
    key := fmt.Sprintf("user:%v:checkpoint", event.UserID)
    field := fmt.Sprintf("%v_app_open", event.Platform)
    value := fmt.Sprint(event.Timestamp)
    _, err := redis.HSet(key, field, value)
    return err
    }

    View Slide

  24. Worker&lastopens
    cmd/appopen/lastopens/main.go
    func main() {
    resultsRedisURL := env.MandatoryVar("RESULTS_REDIS_URL")
    streamName := env.MandatoryVar("STREAM_NAME")
    logFilePath := env.ImportantVar("LOG_PATH", defaultLogFilePath)
    config := kclmultilang.Config{...}
    resultsRedisPool := redis.NewPool(resultsRedisURL, redis.DefaultConfig)
    processor := func(record appopen.AppOpen, logger log.Logger) error {
    return lastopens.RecordLastOpen(record, resultsRedisPool, logger)
    }
    kclmultilang.RunWithSingleProcessor(config, processor)
    }

    View Slide

  25. View Slide

  26. Lessons&Learned,&Hints,&Tips,&and&Miscellany
    • Didn’t(need(to(write(kclmultilang(—(could(have((should(have)(
    used(Niek(Sanders’(gokinesis
    • Probably(could(have(delayed(adding(the(batch(producer(to(go-
    kinesis(—(could(have(lived(without(it
    • Deployment(is(the(other(90%

    View Slide

  27. Ques%ons,)Comments,)Sugges%ons?

    View Slide

  28. More%Resources
    • The%Log:%an%epic%so0ware%engineering%ar3cle%by%Bryan%Pendleton
    • The%three%eras%of%business%data%processing%by%Alex%Dean
    • Loving%a%LogAOriented%Architecture%by%Andrew%Montalen3
    • Stream%Processing,%Event%Sourcing,%Reac3ve,%CEP…%And%Making%
    Sense%Of%It%All%by%Mar3n%Kleppman
    And$a$whole$bunch$more$here$including$books$and$videos.

    View Slide

  29. Bonus:'AWS'Lambda'and'Go
    • Ruben'Fonseca:'AWS'Lambda'Func4ons'in'Go

    View Slide

  30. AWS$Lambda$Go$Adapter
    exports.handler = function(event, context, test_config) {
    var config = test_config || prod_config;
    var options = {
    env: config.env,
    input: JSON.stringify(event)
    }
    var result = child_process.spawnSync(config.child_path, [], options);
    if (result.status !== 0) {
    return context.fail(new Error(result.stderr.toString()));
    }
    context.succeed();
    }

    View Slide