Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Let's Connect on Vodafone 360 - Using Apache ActiveMQ in Mobile Web 2.0

Let's Connect on Vodafone 360 - Using Apache ActiveMQ in Mobile Web 2.0

How to meet design goals and non functional requirements: architecture for using JMS with ActiveMQ in very large backends with respect to high availability and high scalability.

Dirk Fröhner

May 25, 2010
Tweet

More Decks by Dirk Fröhner

Other Decks in Technology

Transcript

  1. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 1 Let's Connect on

    Vodafone 360 25 May 2010 Let's Connect on Vodafone 360 - Using Apache ActiveMQ in Mobile Web 2.0 Dirk Fröhner People Services / Vodafone Internet Services (VIS) / Group Marketing 25 May 2010
  2. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 2 Let's Connect on

    Vodafone 360 25 May 2010 Table of contents Introduction What is Vodafone 360 JMS components in Vodafone 360 JMS architecture in Vodafone 360 >Design goals Experience / problems / best practice with ActiveMQ >Bugs, problems, trouble shooting, testing
  3. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 3 Let's Connect on

    Vodafone 360 25 May 2010 Introduction What this presentation is all about > What do we do with JMS in the Vodafone 360 backend > Non-functional requirements – Design goals for the JMS architecture > Experience – Problems, best practice, testing → An overview on how JMS can be used in Mobile Web 2.0
  4. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 25 May 2010 Vodafone

    360 is a set of internet services comprising five elements
  5. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 25 May 2010 Vodafone

    360 works on a range of mobiles and PC Website Over 100 other mobile phones Vodafone 360 phones www.360.com with Vodafone 360 services best Vodafone 360 experience
  6. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 25 May 2010 Vodafone

    360 • Launched in 8 markets: DE, ES, UK, IT, NL, PT, GR & IE • 500K registered 360 customers • Sold ca. 800K devices with 360 services on them • Currently ca. 15 device models that have all/some 360 services pre-embedded • 360 services downloadable to over 100 popular devices • Works on most major phone platforms from S60, through to Apple and Android • Ca. 9K apps in Apps Shop Progress since launch
  7. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 8 Let's Connect on

    Vodafone 360 25 May 2010 JMS components in Vodafone 360 backend Social networks backend Clients proxy Email backend BAR backend FOO backend JMS channel 360 API
  8. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 9 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 Most important design goals to meet non-functional requirements > Reliable messaging > High availability > Horizontal scalability > Performance
  9. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 10 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 Excursus: JMS messaging paradigms > Point-to-point (queues) > Publish-subscribe (topics) > ActiveMQ also offers combination of both: Virtual Topics
  10. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 11 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 > Multiple consumers can subscribe to a queue destination > Guaranteed that exactly one consumer (eventually) receives a particular message > Obviously multiple consumers can be used to spread the load of messages from the queue Excursus: JMS messaging paradigms Queues JMS Broker Queue Destination Producers Consumer A Consumer B
  11. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 12 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 > Multiple consumers can subscribe to a topic destination > All consumers receive each message > Obviously multiple consumers lead to more load of messages from the topic Excursus: JMS messaging paradigms Topics JMS Broker Topic Destination Producers Consumer A Consumer B
  12. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 13 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 > Producers send messages to a topic destination > Consumers subscribe to a queue destination > Consumers can be grouped, each group receives each message, but within a particular group, exactly one consumer receives a particular message > Obviously multiple consumers can be used for load distribution within each group Excursus: JMS messaging paradigms Virtual Topics Topic Destination JMS Broker Queue Destination Producers Consumer Group A Consumer Group B Consumer A1 Consumer A2
  13. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 14 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 > A message, once sent to the broker, will survive an outage of the broker and / or the consumers Reliable messaging > Needs persistence layer (JDBC database or file system based) > Message is stored in persistence layer before ACK is sent to producer > Message is removed from persistence layer after ACK is sent from consumer Non-functional requirement Implementation Drawbacks > Performance loss due to access to persistence layer – decent tests regarding throughput essential > Careless setup of persistence layer can make it even worse > Concurrency issues in past versions (5.3.0.4) can lead to total inactivity of broker when number of persistent messages gets significant
  14. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 15 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 > One host > One broker process > If host dies, messaging dies > If process dies, messaging dies > Only vertical scaling (naturally limited) High availability Standalone JMS node Host A Broker 01 Producers Consumers
  15. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 16 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 > Two hosts > Two broker processes > If host A dies, broker012 takes over > If broker11 dies, broker012 takes over > Only vertical scaling (naturally limited) High availability HA JMS node Host A Broker 011 (Master) Producers Consumers Host B Broker 012 (Slave)
  16. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 17 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 > When a master process dies (e.g. due to CPU meltdown), a slave process can instantly take over > The new active broker has instant access to all persistent messages and none of them will be lost High availability > Build a JMS node with at least two processes, master and slave (or even multiple slaves) > Use a shared persistence layer that all processes can access > Use mutex mechanism on persistence layer to determine who is master and who is slave > Clients need to connect with failover and include hostports of all processes Non-functional requirement Implementation Drawbacks > Topic messages in the dead master will be lost (naturally) > Requires expensive additional hardware in case of shared filesystem > ActiveMQ has to rely on the HA capabilities of the underlying persistence layer (naturally) • e.g. issues with unstable MySQL HA cluster that made the brokers shut down repeatedly
  17. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 18 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 Horizontal scalability is most important design goal > Vodafone 360 needs to be able to serve several tens of millions of users > Vertical scalability is unfortunately naturally limited > Message load needs to be spread horizontally – Leads to obvious approach to have more than one JMS node – Different ways of organization and interaction of JMS nodes can be applied – Not all approaches are suitable for all types of destinations Horizontal scalability
  18. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 19 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 > On-board proposal of ActiveMQ > Consists of a number of JMS nodes that know about each other > Nodes keep themselves aligned regarding connected clients through advisory messages > Messages are routed to other nodes in case there is no (available) local app consumer > Scalability details in a minute... Horizontal scalability Approach 1a: network of brokers (NWOB) Node 1 Broker 01 Producers Consumers Node 2 Broker 02
  19. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 20 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 > n hosts, n*2 broker processes > Nodes spread crosswise on hosts > No node lost when one host dies > Combines HA with NWOB (required) > NWOB provides horizontal scalability... > ...for queues, if clients set up properly > ...not for topics (we'll see why not) Horizontal scalability Approach 1b: HA network of brokers Host A Broker 011 Broker 022 Producers Consumers Host B Broker 012 Broker 021 Node 1 Node 2
  20. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 21 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 Horizontal scalability – HA NWOB for queues n1 p1 p2 p3 p4 c1 c2 c3 c4 n2 p1 p2 p3 p4 c1 c2 c3 c4 n1 n4 n3 Reduce load per node to 25%
  21. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 22 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 > Producers can spread their load on all nodes > If existing nodes cannot handle load fast enough so that producers are slowed down, new nodes can be added > → production should go uniformly distributed to all nodes Horizontal scalability – HA NWOB for queues > Consumers can spread their greed on all nodes > If existing nodes cannot serve greed fast enough so that consumers idle around, new nodes can be added > → consumption should take place from all nodes Producer view Consumer view Broker view > For each queue destination there has to be at least one producer and one consumer against every node • To make use of the width of the NWOB • To avoid unnecessary message forwarding between the nodes • To avoid stuck messages when consumers reconnect with randomize=true > → Tuple (producer, broker, consumer) should always be considered in a holistic way
  22. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 23 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 Horizontal scalability – HA NWOB for topics n1 p1 p2 c1 c2 n1 p1 p2 c1 c2 n2 Reproduce load to new node
  23. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 24 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 > Producers can spread their load on all nodes > If existing nodes cannot handle load fast enough so that producers are slowed down, new nodes can be added > → production should go uniformly distributed to all nodes Horizontal scalability – HA NWOB for topics > Consumers can spread their greed on all nodes > If existing nodes cannot serve greed fast enough so that consumers idle around, new nodes can be added > → consumption should take place from all nodes Producer view (isolated view same as for queues) Consumer view (isolated view same as for queues) Broker view > Number of incoming and outgoing messages per broker does not decrease > No load distribution, but load multiplication • If n1 wasn't able to cope with the load of topic messages, it won't be better with the additional node • In the above example: still 2 messages in and 2 messages out, now on every node
  24. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 25 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 Horizontal scalability – partitioning for topics n1 t1 p1 p2 p3 p4 c1 c2 c3 c4 p1 p2 p3 p4 c1 c2 c3 c4 n1 t1.1 n2 t1.2 cp.a cp.b Reduce load per node to 50%
  25. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 26 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 > Find a way to divide the load of the topic into parts of about the same size > Avoid unnecessary overhead of NWOB Horizontal scalability – partitioning for topics > Easy approach: round robin production on topic partitions (above shown producers p1, ..., p4) > Especially suitable when each consumer process really needs each message Partitioning for topics for horizontal scalability and independent JMS nodes (no NWOB) Partitioning by number of messages Partitioning by message property values: sharding > Sophisticated approach: find a message property whose values can be used to partition the set of messages > Can also reduce total number of messages on the wire: if consumer processes not interested in every message, but only in certain set of partitions, they connect only to those nodes that serve those partitions > Needs a sharding aware lib on producer and consumer side > Avoids overhead of massive use of message selectors
  26. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 27 Let's Connect on

    Vodafone 360 25 May 2010 JMS architecture in Vodafone 360 Performance as part of the architecture goals > Unfortunately contrary to reliability and redundancy > Fortunately supported by horizontal scalability – Faster production through more nodes to produce against – Faster consumption through more nodes serving the consumers' greed – Possibility to apply virtual topics to also speed up consumption on topics > Performance also influenced by persistence layer Performance
  27. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 28 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ We can share a experience related things with you > Infamous bugs > Social aspects > Trouble shooting > Testing
  28. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 29 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ > Becomes obvious only in a larger NWOB (e.g. eight nodes) > Number of messages routed around becomes several magnitudes higher than the number of messages actually produced (of course with consideration of multiple subscribers) > In conjunction with concurrency issue that prevents topic messages from being swapped into the temp storage this can lead to “frozen” topics Infamous bugs and problems > Due to a concurrency issue, we can encounter a looping thread on a collection inside of the DefaultJDBCAdapter implementation in case of a significant number of persistent messages > Unfortunately, looping thread also blocks a monitor that all transport threads need to enter to do their work > Results in a practical paralysis of the broker, no production or consumption can take place anymore Multiplication of topic messages in a NWOB / frozen topics Looping thread in DefaultJDBCAdapter
  29. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 30 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ Social aspects It's all the fault of JMS! > When something is suddenly fupped, especially after a deployment, we all do know the responsible entity: • either JMS • or the JMS guy • or both > Opportunity to make it easy on yourself and refuse to check own code / config until the JMS guy provides all evidence that JMS is working fine > But maybe it's actually • The persistence layer • The network • Broken clients • Eyjafjallajokull
  30. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 31 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ Trouble shooting Logs > Factory settings require some refactoring to have a decent setup > Helpful to observe what happens – but logs usually don't tell you what does not happen > Give basic hints on startup if config is accessible, totally broken, other JMS nodes can be discovered and connected with > Good support on debugging of message routing, especially with logging interceptor enabled (unfortunately fills the HDD very soon) > Not very helpful for analyzing technical problems, semantical configuration problems or broken clients
  31. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 32 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ Trouble shooting – Logs
  32. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 33 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ Trouble shooting JMX / jconsole > Essential tool that accumulates all JMX probes of ActiveMQ > Reveals mistakes that occur again and again everywhere: misconfiguration of the broker, broken clients, network issues: • Too many connections • No connections • Too many queue subscriptions • No subscriptions • Consumer blocked in message processing • Multicast not functioning properly (for discovery in a NWOB)
  33. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 34 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ Trouble shooting – JMX / jconsole
  34. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 35 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ Trouble shooting JMX / jconsole > When brokers reside behind a firewall, configure both JMX ports to fixed values in the XML config: <managementContext> <managementContext createConnector="true" rmiServerPort="4711" connectorPort="4712" /> </managementContext> > Above config makes broker connectable for jconsole via this URI: service:jmx:rmi://host:4711/jndi/rmi://host:4712/jmxrmi
  35. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 36 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ Trouble shooting ActiveMQ web console > Webapp in embedded jetty that presents a subset of the JMX probes > Reported to not be under regular development > Doesn't show all essential values > But provides a nice summary of the state of the destinations
  36. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 37 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ Trouble shooting – ActiveMQ web console
  37. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 38 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ Trouble shooting jconsole vs. ActiveMQ web console > Combination of both needed > Recommended is a webapp that provides JMX probes in an overview of all JMS nodes in an environment • That saves you from stepping through several windows of jconsole • Or several browser tabs of web console • But includes all important values at a glance
  38. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 39 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ Trouble shooting Thread dumps > Although no detectable deadlock occurs in most cases, especially for concurrency issues, a thread dump can immediately highlight what is going wrong > Do a $> kill -3 <pid> several times with 20 – 30 seconds time between them > Analyze with tda and find long running threads or monitors that are not released and everybody else is waiting for > E.g. the issue with the DefaultJDBCAdapter is revealed with a thread dump
  39. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 40 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ Trouble shooting – thread dumps
  40. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 41 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ Trouble shooting tcpdump > For certain cases where certain forces still need certain prove that JMS message go over the wire or not > Although JMX and thread dumps reveal usually anything already
  41. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 42 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ Trouble shooting Testing > Absolutely suggestive to have decent testing separately on JMS infrastructure > Tests should cover both non-functional and pseudo-functional aspects: • Test throughput • Test reliability • Considering actual destination infrastructure • Considering estimated message load • With different scenarios regarding production and consumption rate > Expected output: • Hints on adjusting the broker config • Hints on adjusting the producer / consumer config • Hints on adjusting the persistency config • Hints on adjusting the network infrastructure
  42. Confidentiality C1 V1.0 People Services/VIS/Group Marketing 43 Let's Connect on

    Vodafone 360 25 May 2010 Experience / problems / best practice with ActiveMQ Trouble shooting Testing > Tools for testing: • FUSE's JMS test framework • Grinder with self-made client code • Junit for simple tests • Above mentioned trouble shooting tools