Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Oracle Machine Learning Notebooks - Deep Dive Part II

Oracle Machine Learning Notebooks - Deep Dive Part II

This is Part II of the deeper dive into Oracle Machine Learning Notebooks. In Part I we learned about administrative and collaborative functionality from the OML Notebooks, which is the Zeppelin–based interface available for the Oracle Autonomous Database.

For Part II we will check more specific tasks like Scheduling Notebooks for execution, Output formats available in Notebooks, using Forms inside Paragraphs and lessor-known OML4SQL features like text mining and partitioned models.

We will finish the Session with a look at the RoadMap.

Mark Hornick, Senior Director of Product Management for Data Science and Machine Learning will join us and present.

Marcos Arancibia

April 07, 2020
Tweet

More Decks by Marcos Arancibia

Other Decks in Technology

Transcript

  1. The picture can't be displayed. The picture can't be displayed.

    The picture can't be displayed. With Mark Hornick, Senior Director, Product Management, Data Science and Machine Learning @MarkHornick Marcos Arancibia, Product Manager, Data Science and Big Data @MarcosArancibia oracle.com/machine-learning Oracle Machine Learning Office Hours Oracle Machine Learning Notebooks, Part 2 Copyright © 2020, Oracle and/or its affiliates. All rights reserved
  2. Today’s Agenda Upcoming session Speaker Mark Hornick – OML Notebooks

    Deep Dive – Part 2 Q&A Copyright © 2020 Oracle and/or its affiliates.
  3. Next Session May 5, 2020: Oracle Machine Office Hours, 9AM

    US Pacific Machine Learning 101 – Classification Have you always been curious about what machine learning can do for your business problem, but could never find the time to learn the practical necessary skills? Do you wish to learn what Classification, Regression, Clustering and Feature Extraction techniques do, and how to apply them using the Oracle Machine Learning family of products? Join us for this special series “Oracle Machine Learning Office Hours – Machine Learning 101”, where we will go through the main steps of solving a Business Problem from beginning to end, using the different components available in Oracle Machine Learning: programming languages and interfaces, including Notebooks with SQL, UI, and languages like R and Python. This first session in the series will cover Classification, where we will learn how to set up a data set for classification modeling, build machine learning models that can, e.g., discern between good or bad customers for a marketing offer, and evaluate the quality of that model. Marcos Arancibia, OML Product Management Copyright © 2020, Oracle and/or its affiliates. All rights reserved
  4. Today’s Session: Oracle Machine Learning Notebooks, Part 2 Take a

    deeper dive into Oracle Machine Learning Notebooks through the Zeppelin–based interface and additional OML4SQL API functionality We will review demos and code, see what's possible today, and hear about what's coming on Oracle Machine Learning Notebooks for Autonomous Database Copyright © 2020, Oracle and/or its affiliates. All rights reserved
  5. Oracle Machine Learning Notebooks Deep Dive – Part 2 Mark

    Hornick Oracle Machine Learning Product Management Copyright © 2020 Oracle and/or its affiliates. Oracle Machine Learning Office Hours
  6. The following is intended to outline our general product direction.

    It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. Safe harbor statement Copyright © 2020, Oracle and/or its affiliates. All rights reserved
  7. Copyright © 2020, Oracle and/or its affiliates 11 • Quick

    Orientation Demo • Objects and Relationships • Projects and Workspaces • Notebook Basics • Collaboration • Connection Groups and Interpreter Bindings • Scheduling Jobs for Notebook Execution • Output Formats in Notebooks • Input Forms in Notebooks • Lessor-known OML4SQL Features • OML Roadmap on ADB OML Notebooks Topics See Part 1 presented last month Oracle Machine Learning Notebooks - Deep Dive
  8. Copyright © 2020, Oracle and/or its affiliates 12 • Quick

    Orientation Demo • Objects and Relationships • Projects and Workspaces • Notebook Basics • Collaboration • Connection Groups and Interpreter Bindings • Scheduling Jobs for Notebook Execution • Output Formats in Notebooks • Input Forms in Notebooks • Lessor-known OML4SQL Features • OML Roadmap on ADB OML Notebooks Topics Today, part 2 of OML Notebooks Deep Dive
  9. Collaborative UI • Based on Apache Zeppelin • Supports data

    scientists, data analysts, application developers, DBAs • Easy sharing of notebooks and templates • Permissions, versioning, and execution scheduling Included with Autonomous Database • Automatically provisioned, managed, backed up • In-database SQL algorithms and analytics functions • Explore and prepare data, build and evaluate models, score data, deploy solutions • SQL today — soon to be augmented with Python and R Autonomous Database as a Data Science Platform Oracle Machine Learning Notebooks Copyright © 2020 Oracle and/or its affiliates.
  10. Copyright © 2020, Oracle and/or its affiliates 45 Under what

    circumstances would we want to schedule notebook execution? • Execute a long-running notebook at off-peak hours, perhaps 2:00 AM • Rebuild one or more models at periodic intervals, e.g., nightly or weekly, on the latest data • Perform batch scoring of latest data at periodic intervals, e.g., nightly • Generate notebook “report” with latest analytics results, including tables and graphs Jobs use cases
  11. Copyright © 2020, Oracle and/or its affiliates 46 Jobs allow

    you to schedule the running of notebooks Operations on Jobs • Edit – Edit job metadata of any job listed in the Jobs page • Create – Create a new job to schedule your Notebook • Duplicate – Create a copy of an existing job listed in the Jobs page • Stop – Terminate a job that is currently running • Start – Enabled only for jobs that are in Scheduled status • Delete – Delete any job listed on the Jobs page Example Project Lead schedules Clustering notebook to run daily, but clicks Start to also run it immediately Jobs
  12. Copyright © 2020, Oracle and/or its affiliates 47 Maximum Number

    of Runs • Maximum number of times the job will be run given repeat frequency • If notebook times out or exceeds max runs allowed, status is set to COMPLETED Maximum Failures Allowed • Maximum number of times a job can fail on consecutive scheduled runs • If notebook execution fails more than max failures allowed, status is set to BROKEN Timeout in Minutes • Maximum amount of time a job should be allowed to run, otherwise it is stopped • If notebook execution exceeds specified timeout, status is set to STOPPED Jobs Advanced Settings
  13. Copyright © 2020, Oracle and/or its affiliates 49 Historical logs

    of jobs in current user’s Job Log interface On Jobs page, click on name of job to view On the Job Log page for the selected job • View Date of execution, Status (success, running, stopped, failed), and Duration • To view notebook execution results from job history – select the row and click view icon • To delete a job log entry – select it and click delete icon Job Logs
  14. Copyright © 2020, Oracle and/or its affiliates 51 Use SQLFORMAT

    to format query output %script SET SQLFORMAT format_option Project Lead created some examples in the Project Examples project in the DevTest workspace For example, SET SQLFORMAT ansiconsole; SELECT * FROM SH.SALES; Set Output Format in Notebooks
  15. Copyright © 2020, Oracle and/or its affiliates 52 The available

    output formats are: • ANSICONSOLE — resizes the columns to the width of the data to save space. It also underlines the columns, instead of separate line of output • CSV —standard comma-separated variable output, with string values enclosed in double quotes • DELIMITED — manually define the delimiter string, and the characters that are enclosed in the string values • FIXED —fixed width columns with all data enclosed in double-quotes • HTML —HTML for a responsive table. The content of the table changes dynamically to match the search string entered in the text field • INSERT— the INSERT statements that could be used to recreate the rows in a table. • JSON — a JSON document containing the definitions of the columns along with the data that it contains. • LOADER — pipe delimited output with string values enclosed in double quotes—column names not included in the output • XML —a tag-based XML document. All data is presented as CDATA tags • DEFAULT — clears all previous SQLFORMAT settings, and returns to the default output, also, just specify SET SQLFORMAT Set Output Format in Notebooks
  16. Copyright © 2020, Oracle and/or its affiliates 54 Request user

    input for a value '${formName}’ or with default value ’${formName=defaultValue}’ SELECT * FROM ALL_OBJECTS WHERE OBJECT_TYPE = '${OBJ}'; SELECT * FROM ALL_OBJECTS WHERE OBJECT_TYPE = '${OBJ=TABLE}' Create Text Input Forms in Notebooks
  17. Copyright © 2020, Oracle and/or its affiliates 55 Define the

    Select form by using the syntax: '${formName=defaultValue,option1|option2...}' SELECT * FROM ALL_OBJECTS WHERE OBJECT_TYPE = '${OBJ=INDEX,INDEX|TABLE|VIEW|SYNONYM}' Create Select Forms in Notebooks
  18. Copyright © 2020, Oracle and/or its affiliates 56 '${checkbox:formName=defaultValue1|defaultValue2...,option1|option2...}' SELECT

    ${checkbox:Which Columns=OWNER|OBJECT_TYPE, OWNER|OBJECT_NAME|OBJECT_TYPE|STATUS} FROM ALL_OBJECTS Create Check Box Forms in Notebooks
  19. Unstructured text data: web pages, document libraries, Power Point presentations,

    product specifications, emails, comment fields in reports, and call center notes Extracting meaningful information from unstructured text can enhance ML results OML… • uses Oracle Text utilities and term weighting strategies to prepare text for TEXT columns • passes user-provided text configuration to Oracle Text • uses extracted tokens or themes for model building • supports text for columns data types VARCHAR2, CHAR, CLOB, BLOB, and BFILE Oracle Text: an Oracle Database technology for term extraction, word and theme searching, and other utilities for querying text Enable via ADB PDB Admin by invoking “GRANT RESOURCE, CONNECT, CTXAPP TO user;” OML processing text data Text Processing/Mining Copyright © 2020, Oracle and/or its affiliates. All rights reserved
  20. Builds ensemble model with multiple sub-models, one for each data

    partition • Potentially achieve better accuracy through multiple targeted models • Sub-models managed and used as one model Simplified scoring using top-level model only • Proper sub-model chosen by system based on row of data to be scored Automates a typical machine learning task for data scientists Partitioned Models Oracle Database Table Specify Partition Column(s) Partition-1 Partition-2 Partition-3 Partition-n … Sub-Model-1 Sub-Model-2 Sub-Model-3 Sub-Model-n Top Level Model New Data Score data using top level model In-DB Algorithm … Copyright © 2020, Oracle and/or its affiliates. All rights reserved
  21. Roadmap: Expand Autonomous Database with Python and R OML Notebooks

    add support for Python and R Python and R scripts managed in-database • Invoke from OML Notebooks, and REST or SQL APIs • Deploy into SQL and Web applications easily Scalable Python and R execution • Transparency layer-enabled database functionality • In-database machine learning algorithms AutoML functionality via OML4Py OML4Py integrated with OCI Data Science Autonomous Database as a Data Science Platform DATA SCIENTISTS SQL Clients REST Applications $ SQL Copyright © 2020 Oracle and/or its affiliates.
  22. Roadmap: OML AutoML User Interface Automate production and deployment of

    ML models • Enhance Data Scientist productivity, user-experience • Enable non-expert users to leverage ML • Unify model deployment and monitoring • Support model management Features • Minimal user input: data, target • Model leaderboard • Model deployment in applications via REST endpoint • Model monitoring: accuracy, prediction/predictor drift • Cognitive features for processing image and text “Code-free” user interface supporting automated end-to-end machine learning Sample screen mock-up Copyright © 2020 Oracle and/or its affiliates.
  23. Copyright © 2020 Oracle and/or its affiliates. Model building and

    monitoring AutoML code-free User Interface
  24. Copyright © 2020 Oracle and/or its affiliates. Model building and

    monitoring AutoML code-free User Interface
  25. Copyright © 2020 Oracle and/or its affiliates. Model building and

    monitoring AutoML code-free User Interface