Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Big Data in Domino? Yes!

580c23627733433e1441d7a695507f4d?s=47 liuqibj
January 29, 2014

Big Data in Domino? Yes!

IBM Connect Ad203 session deck

580c23627733433e1441d7a695507f4d?s=128

liuqibj

January 29, 2014
Tweet

Transcript

  1. 1 1 Divider / Subchapter

  2. © 2014 IBM Corporation AD203: Big Data in Domino? Yes!

    Qi Liu, IBM
  3. 3 3 IBM’s statements regarding its plans, directions, and intent

    are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. Please Note
  4. 4 Agenda  Customer's Requirements  Big Data Solution –

    Big Data Topology – NSF Partition – Global Search – Statistical Reports – Performance Statistics  Design Pattern Extension  Summary  Q & A
  5. 5 Agenda  Customer's Requirements  Big Data Solution –

    Big Data Topology – NSF Partition – Global Search – Statistical Reports – Performance Statistics  Design Pattern Extension  Summary  Q & A
  6. 6 Customer's Requirements  Customers are planning the data centralization

    for dozens of sub-corp.  The workflow applications (e.g. finance, vacation, etc) can go beyond 64G in one year; meanwhile, customers was worried about the performance of searching or statistical reports in such big databases. Performance in big data! NSF > 64G? Cross-NSF search?
  7. 7 64G Limitation Diagnosis  Existing age-old issue, existing since

    Notes/Domino v1  We are improving all the time … – DAOS • Saves up to 50% of space by “consolidating” attachments • Enabled via database properties “Use Domino Attachment and Object Service” for transaction logged ODS 51 applications – Compression • Can provide 40% size savings • Enabled via the database property setting – ... “How to overcome the limitation?"
  8. 8 Cross NSF Search Diagnosis  Domain Search – Domain

    Search is a powerful tool that allows you to search a Domino® domain, or across several Domino domains  Site Search – Search Site is a feature from a previous release of Notes® that your organization may use instead of Domain Search, if your administrator has not yet set up Domain Search.  But … – Customers is willing to sort the results by any fields “Any ideas?"
  9. 9 Performance Diagnosis  How do customers search? – End

    users input keywords in text box of web applications – Domino server received the request – Domino server use view.FTSearch to retrieve document collection – Calculate the page total number based the returned doc collection – Calculate doc list based on requested index(e.g. the nth page) – Calculate the HTML code for response • Note: Form is used to render document content consisting of HTML code – Domino server sent HTML code to end users PgUp/PgDn repeats the same steps as above! “What's wrong about app?"
  10. 10 Performance Diagnosis(Cont.)  How do customers do statistical reports?

    – Take finance for example, customers would like to know the budget/cost/surplus for each project • Administrators send the statistics request to Domino server • Domino server issues one request for each project • For each project, calculate its budget/cost/surplus based on returned document collection • Generated the HTML code for one table which hosting all the results • Domino server send the response to client “What's wrong about the app?" Project Budget Cost Surplus A 1000K$ 900K$ 100K$ B 1500K$ 1550K$ -50K$ C 280K$ 300K$ -20K$
  11. 11 Agenda  Customer's Requirements  Big Data Solution –

    Big Data Topology – NSF Partition – Global Search – Statistical Reports – Performance Statistics  Design Pattern Extension  Summary  Q & A
  12. 12 Big Data Topology 1 ... m 5 2 4

    3 Thread Pool Search Bean Statistics Bean Base Bean Common Lib Others Database Bean Data Bean Consists of Consists of m<n 1..n NSF Json objects Access
  13. 13 Referenced Applications on OpenNTF  XPages Insights into Big

    Data – http://www.openntf.org/Internal/home.nsf/project.xsp? databaseName=CN=NotesOSS2/O=NotesOSS!!Projects %5Cpmt.nsf&documentId=404CBFCE4205B13586257B5000210CE1&action=openDocument  OpenNTF Hyper Search – https://github.com/OpenNTF/org.openntf.domino/wiki/OpenNTF-Hyper-Search
  14. 14 User Case and Solution  NSF Partition: The application

    NSF reaches the partition criteria, then one new NSF is created to store the incoming new requests.  Global Search: End user inputs query to search documents in multiple NSF of workflow application.  Statistical Reports: Operation team administrator generates all the projects reports with excellent user experience (UI + performance)  My Requests: End user navigates to “My Documents”(all the login users' requests) to view her/his documents NSF > 64G? Yes Cross-NSF search? Yes Good perf in big data? NSF Partition and multi-threads! XPages!
  15. 15 NSF Partition  Daily Agent “NSF Cutting” will check

    whether to create one new NSF to store the incoming requests
  16. 16 NSF Partition (Cont.)  Gold Rules for Workflow Applications

    – By Document status, e.g. New/In Progress, Completed, etc – By Document number, e.g. about 500,000 documents – By NSF size, e.g. about 20 G  Normally implemented by Agents – e.g. Run at 2:00 am daily
  17. 17 Application Profile  Each application has one profile which

    lists: – Title – Template name – NSF list – Current NSF – Partition Criteria – ...
  18. 18 Demo NSF Partition

  19. 19 Global Search User Interface

  20. 20 Back-end Logic  Managed-Beans – Base Bean: Provides debugging

    configurations – Data Bean: Provides multi-NSF handling and utility methods for views etc – Database Bean: Provides the utility to retrieve NSF sources from application profile document – Search Bean: Bean for advanced search functionality – Statistics Bean: Bean for statistics analysis  Common Interface  etc
  21. 21 Search Logic Choices  Multiple-threads Search – Multiple-threads to

    perform full text index search against NSF Partitions  OpenNTF Hyper Search (Alternative) – OpenNTF API: A new set of classes as part of the org.openntf.domino.big.impl package is intended for dealing with data sets across multiple NSF • Index database is maintained to contain db documents and term documents • Index database can iterate over the directory and loop over all the documents • Term doc contains the databases in which the term was found, along with the number of documents containing that term. 1 ... m 5 2 4 3 Thread Pool m<n Json objects Access
  22. 22 Demo Global Search & My Requests

  23. 23 Statistical Reports User Interface

  24. 24 Demo Statistical Reports

  25. 25 Performance Statistics Customer's Environment Configuration FT Search XPages 

    Customer's original environment – All the application data is stored in one big NSF – NSF is FT indexed and use view.FTIndex to search documents  Customer's new environment – The big NSF is divided into 3 NSF – All the NSF is FT indexed – The XPages app and all the nsf is located in one server
  26. 26 Performance Statistics(Cont.)

  27. 27 Agenda  Customer's Requirements  Big Data Solution –

    Big Data Topology – NSF Partition – Global Search – Statistical Reports – Performance Statistics  Design Pattern Extension  Summary  Q & A
  28. 28 Design Pattern Extension ... Application Profile Partition Agent XPages

    UI Back-end Classes Cluster ... NSF Databases Load Balance SCR
  29. 29 Summary  Customer's requirements – 64G limitation – Cross

    NSF search – Poor performance for big data  Big data solution – NSF Partition – Multi-thread global Search – Statistical Reports  Why big data solution work? – XPages make it possible! NSF > 64G! Cross-NSF search Performance in big data? NSF cutting and multi-threads! Apply XPages – transfer Json, not HTML code
  30. 30 More Details?  Visit IBM Notes and Domino Application

    Development Wiki for paper “AD203: Big Data in Domino? Yes” – http://www-10.lotus.com/ldd/ddwiki.nsf – It will be published within one week  Meet the developer time (Domino Database round table) – Jan 29 1:00 PM – 3:30PM – Jan 30 10:00 AM - 11:30PM
  31. 31 Engage Online  SocialBiz User Group socialbizug.org – Join

    the epicenter of Notes and Collaboration user groups  Follow us on Twitter – @IBMConnect and @IBMSocialBiz  LinkedIn http://bit.ly/SBComm – Participate in the IBM Social Business group on LinkedIn:  Facebook https://www.facebook.com/IBMSocialBiz – Like IBM Social Business on Facebook  Social Business Insights blog ibm.com/blogs/socialbusiness – Read and engage with our bloggers
  32. 32  Access Connect Online to complete your session surveys

    using any: – Web or mobile browser – Connect Online kiosk onsite 32 AD203: Big Data in Domino? Yes!
  33. 33 33 © Copyright IBM Corporation 2014. All rights reserved.

     U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp..  IBM, the IBM logo, ibm.com, IBM Domino, IBM Notes, IBM XPages are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml Openntf, StackOverflow and GitHub may be trademarks or service marks of others. Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. Acknowledgements and Disclaimers
  34. 34 BACKUP

  35. 35 NSF Partition Agent

  36. 36 Search Bean Threads Initialization/Start/Wait public void searchByFTDB(DataBean dataProvider, int

    maxDocs, boolean doFuzzySearch, ... { // Create search threads for (int i = 0; i < dataProvider.databases.size(); i++) { searchThreads.put(dataProvider.databases.get(i), new SearchThreadFTDB( this, dataProvider, dataProvider.databases.get(i), searchCriteria.toString(), doFuzzySearch, maxDocsPortion)); } //Start search threads for (Map.Entry<String, Thread> searchThreadEntry : searchThreads.entrySet()) { SearchThreadFTDB searchThread = (SearchThreadFTDB) searchThreadEntry.getValue(); searchThread.start(); } //Wait till threads completes for (Map.Entry<String, Thread> searchThreadEntry : searchThreads.entrySet()) { SearchThreadFTDB searchThread = (SearchThreadFTDB) searchThreadEntry.getValue(); synchronized (this) { while (!searchThread.isReady()) { try { this.wait(); } catch (InterruptedException e) { } }
  37. 37 Search Bean NSF Search Logic public SearchThreadFTDB(final SearchBean owner,

    final DataBean dataProvider, final String dbKey, final String searchCriteria, ...) { Database database = dataProvider.getDatabase(dbKey, session); DocumentCollection collection; if (doFuzzySearch) { collection = database.FTSearch(searchCriteria, maxDocs, Database.FT_SCORES, Database.FT_FUZZY); } else { collection = database.FTSearch(searchCriteria, maxDocs, Database.FT_SCORES, Database.FT_STEMS); } int count = collection.getCount(); if (null != collection && count > 0) { synchronized (owner) { resultCount += count; } ... }
  38. 38 Search Bean Json Objects

  39. 39 Database Bean

  40. 40 Beans Registration

  41. 41 Beans Registeration (Cont.)

  42. 42 XPages User Interface Search Bean Usage <?xml version="1.0" encoding="UTF-8"?>

    <xp:view xmlns:xp="http://www.ibm.com/xsp/core" xmlns:xc="http://www.ibm.com/xsp/custom" xmlns:xe="http://www.ibm.com/xsp/coreex"> <xp:this.beforePageLoad> <xp:executeScript> <xp:this.script><![CDATA[#{javascript:databaseBean.init(session); searchBean.setViewName("requests"); searchBean.setQuery(userBean.displayName); searchBean.setField("requester"); searchBean.searchByFTVW(databaseBean)}]]></xp:this.script> </xp:executeScript> </xp:this.beforePageLoad> <xc:dataviewRender></xc:dataviewRender> </xp:view>
  43. 43 XPages User Interface Search Results Rendering <xe:dataView id="resultDataView" var="searchResult"

    collapsibleRows="false" collapsibleDetail="true" columnTitles="false" rows="20" partialRefresh="true" partialExecute="true" style="width:100%" rowStyleClass="xspDataViewRow" value="#{searchBean.searchResults}" detailsOnClient="true"> <xe:this.extraColumns> <xe:viewExtraColumn value="#{javascript:searchResult.projectId}"> </xe:viewExtraColumn> <xe:viewExtraColumn value="#{javascript:searchResult.cost}"> </xe:viewExtraColumn> <xe:viewExtraColumn value="#{javascript:searchResult.author}"> </xe:viewExtraColumn> </xe:this.extraColumns> <xp:this.facets> <xp:div xp:key="pagerTopLeft"> <xp:comboBox id="sortKey"> <xp:selectItem itemLabel="Subject" itemValue="subject"></xp:selectItem> <xp:selectItem itemLabel="Project Id" itemValue="projectId"> </xp:selectItem> <xp:selectItem itemLabel="Cost" itemValue="cost"> </xp:selectItem> <xp:eventHandler event="onchange" submit="true" refreshMode="complete"> <xp:this.action> ...