are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. Please Note
for dozens of sub-corp. The workflow applications (e.g. finance, vacation, etc) can go beyond 64G in one year; meanwhile, customers was worried about the performance of searching or statistical reports in such big databases. Performance in big data! NSF > 64G? Cross-NSF search?
Notes/Domino v1 We are improving all the time … – DAOS • Saves up to 50% of space by “consolidating” attachments • Enabled via database properties “Use Domino Attachment and Object Service” for transaction logged ODS 51 applications – Compression • Can provide 40% size savings • Enabled via the database property setting – ... “How to overcome the limitation?"
Search is a powerful tool that allows you to search a Domino® domain, or across several Domino domains Site Search – Search Site is a feature from a previous release of Notes® that your organization may use instead of Domain Search, if your administrator has not yet set up Domain Search. But … – Customers is willing to sort the results by any fields “Any ideas?"
users input keywords in text box of web applications – Domino server received the request – Domino server use view.FTSearch to retrieve document collection – Calculate the page total number based the returned doc collection – Calculate doc list based on requested index(e.g. the nth page) – Calculate the HTML code for response • Note: Form is used to render document content consisting of HTML code – Domino server sent HTML code to end users PgUp/PgDn repeats the same steps as above! “What's wrong about app?"
– Take finance for example, customers would like to know the budget/cost/surplus for each project • Administrators send the statistics request to Domino server • Domino server issues one request for each project • For each project, calculate its budget/cost/surplus based on returned document collection • Generated the HTML code for one table which hosting all the results • Domino server send the response to client “What's wrong about the app?" Project Budget Cost Surplus A 1000K$ 900K$ 100K$ B 1500K$ 1550K$ -50K$ C 280K$ 300K$ -20K$
NSF reaches the partition criteria, then one new NSF is created to store the incoming new requests. Global Search: End user inputs query to search documents in multiple NSF of workflow application. Statistical Reports: Operation team administrator generates all the projects reports with excellent user experience (UI + performance) My Requests: End user navigates to “My Documents”(all the login users' requests) to view her/his documents NSF > 64G? Yes Cross-NSF search? Yes Good perf in big data? NSF Partition and multi-threads! XPages!
– By Document status, e.g. New/In Progress, Completed, etc – By Document number, e.g. about 500,000 documents – By NSF size, e.g. about 20 G Normally implemented by Agents – e.g. Run at 2:00 am daily
perform full text index search against NSF Partitions OpenNTF Hyper Search (Alternative) – OpenNTF API: A new set of classes as part of the org.openntf.domino.big.impl package is intended for dealing with data sets across multiple NSF • Index database is maintained to contain db documents and term documents • Index database can iterate over the directory and loop over all the documents • Term doc contains the databases in which the term was found, along with the number of documents containing that term. 1 ... m 5 2 4 3 Thread Pool m<n Json objects Access
Customer's original environment – All the application data is stored in one big NSF – NSF is FT indexed and use view.FTIndex to search documents Customer's new environment – The big NSF is divided into 3 NSF – All the NSF is FT indexed – The XPages app and all the nsf is located in one server
NSF search – Poor performance for big data Big data solution – NSF Partition – Multi-thread global Search – Statistical Reports Why big data solution work? – XPages make it possible! NSF > 64G! Cross-NSF search Performance in big data? NSF cutting and multi-threads! Apply XPages – transfer Json, not HTML code
Development Wiki for paper “AD203: Big Data in Domino? Yes” – http://www-10.lotus.com/ldd/ddwiki.nsf – It will be published within one week Meet the developer time (Domino Database round table) – Jan 29 1:00 PM – 3:30PM – Jan 30 10:00 AM - 11:30PM
the epicenter of Notes and Collaboration user groups Follow us on Twitter – @IBMConnect and @IBMSocialBiz LinkedIn http://bit.ly/SBComm – Participate in the IBM Social Business group on LinkedIn: Facebook https://www.facebook.com/IBMSocialBiz – Like IBM Social Business on Facebook Social Business Insights blog ibm.com/blogs/socialbusiness – Read and engage with our bloggers
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.. IBM, the IBM logo, ibm.com, IBM Domino, IBM Notes, IBM XPages are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml Openntf, StackOverflow and GitHub may be trademarks or service marks of others. Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. Acknowledgements and Disclaimers