Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stream Data Processing with Kinesis and Go at T...

Stream Data Processing with Kinesis and Go at Timehop

Stream data processing, data pipeline architecture, unified log system, event sourcing, CQRS, Complex Event Processing — these are just some of the names for an approach to system design that emphasizes and embraces that data *flow* like a stream. There’s been a recent surge in discussions about building systems this way, from LinkedIn to Confluent to Yahoo. There are lots of fascinating and inspiring articles, books, and conference talks, but many of them are bold, broad, and fundamental. There’s a dearth of guidance on the nuts and bolts of actually building systems this way. So in the spirit of Go this talk will start to fill that gap a bit, from the bottom up.

I’ll describe an existing queue-based data processing system at Timehop that was starting to break down, and the steps we took to replace it with a stream-based system. We will discuss the overall dataflow of each system and review the Go code used to interface with Kinesis and process the streaming data.

Avi Flax

June 02, 2015
Tweet

More Decks by Avi Flax

Other Decks in Programming

Transcript

  1. Producer(/(apicard • We’re&using&Sendgrid’s&go-kinesis&library&to&produce&records& to&Kinesis • We&pre7y&quickly&ran&into&some&issues: • Kinesis&some<mes&returns&500s&(Amazon&considers&this& normal) •

    Expense&of&sending&records&individually • Need&to&do&batching&and&retrying&in&the&background • Solu<on:&added&batchproducer&to&go-kinesis
  2. go-kinesis/batchproducer type Producer interface { Start() error Stop() error Add(data

    []byte, partitionKey string) error Flush(timeout time.Duration, sendStats bool) (sent int, remaining int, err error) } // BatchingKinesisClient is a subset of KinesisClient to ease mocking. type BatchingKinesisClient interface { PutRecords(args *kinesis.RequestArgs) (resp *kinesis.PutRecordsResp, err error) } func New(client BatchingKinesisClient, streamName string, config Config) (Producer, error)
  3. API$Startup AppOpenBatcher, err = batchproducer.New(ksis, appOpenStreamName, config) if err !=

    nil { golog.Fatal(gologID, "Oh noes!") } AppOpenBatcher.Start()
  4. App#Opens#API#Resource func enqueueAppOpenEventForStream(event appopen.AppOpen) { avrobytes, err := event.ToAvro() if

    err != nil { ... return } partitionKey := strconv.Itoa(time.Unix(event.Timestamp, 0).Second()) err = conf.AppOpenBatcher.Add(avrobytes, partitionKey) if err != nil { ... return } }
  5. package …/streams/models/appopen type AppOpen struct{ Timestamp int64 UserID int64 ...

    } func FromAvro(record []byte) (*AppOpen, error) func (s AppOpen) ToAvro() ([]byte, error) func (s AppOpen) Validate() error
  6. kclmultilang type Config struct { StreamName string WorkerName string ...

    } func RunWithSingleProcessor(Config, SingleEventProcessor) type SingleEventProcessor func(appopen.AppOpen, log.Logger) error
  7. Task%RecordLastOpen // RecordLastOpen updates a certain Redis key for each

    user that // stores the last time that user opened the Timehop mobile app. func RecordLastOpen( event appopen.AppOpen, redis redis.Pool, logger log.Logger ) error { key := fmt.Sprintf("user:%v:checkpoint", event.UserID) field := fmt.Sprintf("%v_app_open", event.Platform) value := fmt.Sprint(event.Timestamp) _, err := redis.HSet(key, field, value) return err }
  8. Worker&lastopens cmd/appopen/lastopens/main.go func main() { resultsRedisURL := env.MandatoryVar("RESULTS_REDIS_URL") streamName :=

    env.MandatoryVar("STREAM_NAME") logFilePath := env.ImportantVar("LOG_PATH", defaultLogFilePath) config := kclmultilang.Config{...} resultsRedisPool := redis.NewPool(resultsRedisURL, redis.DefaultConfig) processor := func(record appopen.AppOpen, logger log.Logger) error { return lastopens.RecordLastOpen(record, resultsRedisPool, logger) } kclmultilang.RunWithSingleProcessor(config, processor) }
  9. AWS$Lambda$Go$Adapter exports.handler = function(event, context, test_config) { var config =

    test_config || prod_config; var options = { env: config.env, input: JSON.stringify(event) } var result = child_process.spawnSync(config.child_path, [], options); if (result.status !== 0) { return context.fail(new Error(result.stderr.toString())); } context.succeed(); }