to ingest and persist weather data. Each topology is responsible for fetching one dataset from an internal or external network (the Internet), reshaping records for use by our company and persisting the records to a relational database” – The Weather Channel Source: h*ps://github.com/nathanmarz/storm/wiki/Powered-‐By
product, processing every tweet and click that happens on Twitter to provide analytics for Twitter’s publisher partners” Source: https://github.com/nathanmarz/storm/wiki/Powered-By
tuples • Defined with a schema – Names of “fields” in the tuples being transported by the stream – Values are dynamically typed • Serializers for primitive types are provided • Complex types require custom serializers
Entry point for data – connect to data sources • Inject tuples into the topology • Tuples are emitted on streams • Can output more than one stream • Reliable or Unreliable
from Spouts or other Bolts • Can emit tuples to other Bolts • Can do anything i.e. filtering, joins, aggregations, read from/write to databases, run arbitrary functions.. • All sinks in the topology are bolts but not all bolts are sinks
specified by the grouping • If the stream is partitioned by “user-id” then all tuples with the same user-id will go to the same instance of the bolt • Tuples with different user-ids will go to different instances
that connects to Twi*er stream • Create a bolt that receives tweets from Spout – IniJalize top_tweet_retweets = 50 – If (retweet_count > top_tweet_retweets) • Print tweet author, tweet text and retweet count • Update top_retweet_count • Bonus: Keep an in-‐memory leaderboard of the most retweeted tweets in past 5 minutes
https://github.com/abh1nav/dvto1 OR wget https://github.com/abh1nav/dvto1/archive/v0.1.zip • What’s in it? – Twi*erSampleSpout: connects to the twi*er API and emits tweets – LogBolt: logs tweet author and text to console – Topology: connects 1 Twi*erSampleSpout to 1 LogBolt and runs locally • What should I do? – Import project into Eclipse as an exisJng Maven project – Add in your Twi*er credenJals to Topology.java – Modify LogBolt to complete today’s challenge