Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Oracle Machine Learning Notebooks - Deep Dive

Oracle Machine Learning Notebooks - Deep Dive

This is a deeper dive into Oracle Machine Learning Notebooks. We talk about the administrative and collaborative functionality of the OML Notebooks, which is the Zeppelin–based interface available for the Oracle Autonomous Database, and we finish with more specific and advanced OML4SQL API functionality, including partitioned models and text mining.

Marcos Arancibia

August 25, 2020
Tweet

More Decks by Marcos Arancibia

Other Decks in Technology

Transcript

  1. Oracle Machine Learning Notebooks Deep Dive Mark Hornick Oracle Machine

    Learning Product Management Copyright © 2020 Oracle and/or its affiliates.
  2. The following is intended to outline our general product direction.

    It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. Safe harbor statement Copyright © 2020, Oracle and/or its affiliates. All rights reserved
  3. Copyright © 2020, Oracle and/or its affiliates 3 • Quick

    Orientation Demo • Objects and Relationships • Projects and Workspaces • Notebook Basics • Collaboration • Connection Groups and Interpreter Bindings • Scheduling Jobs for Notebook Execution • Output Formats in Notebooks • Input Forms in Notebooks • Lessor-known OML4SQL Features OML Notebooks Topics
  4. Collaborative UI • Based on Apache Zeppelin • Supports data

    scientists, data analysts, application developers, DBAs • Easy sharing of notebooks and templates • Permissions, versioning, and execution scheduling Included with Autonomous Database • Automatically provisioned, managed, backed up • In-database SQL algorithms and analytics functions • Explore and prepare data, build and evaluate models, score data, deploy solutions • SQL today — soon to be augmented with Python and R Autonomous Database as a Data Science Platform Oracle Machine Learning Notebooks Copyright © 2020 Oracle and/or its affiliates.
  5. Copyright © 2020, Oracle and/or its affiliates 5 Need an

    account? https://www.oracle.com/cloud/free/
  6. Copyright © 2020, Oracle and/or its affiliates 7 Login Hamburger

    menu: Home, Notebooks, Templates, Jobs, Connection Groups, Notebook Sessions Home page: How do I?, Quick Actions, Recent Activities, Recent Notebooks Project Selection and Workspace management User: preferences, password changes, Help, About, Log out Notebooks page Templates • Personal Page • Shared Page • Examples Jobs Connection groups Notebooks Sessions Script – quick overview of OML Notebooks environment
  7. Copyright © 2020, Oracle and/or its affiliates 9 Autonomous Database

    and OML Users Autonomous Database Pluggable Database (PDB) OML User - 1 OML User - 2 OML User - N …
  8. Copyright © 2020, Oracle and/or its affiliates 10 OML Notebooks

    – workspaces, projects, notebooks OML User Workspace Workspace Project Project Project Notebook Notebook Notebook Notebook Scratchpad Notebook Notebook Notebook Scratchpad Notebook Scratchpad Notebook … … …
  9. Copyright © 2020, Oracle and/or its affiliates 11 A Data

    Science Team Structure Data Scientist DBA Application Developer Executive Data Analyst Project Lead Provides reports to Oversees Data Management Support Provides analytics to
  10. Copyright © 2020, Oracle and/or its affiliates 12 Autonomous Database

    and OML – Data Science Project Scenario DB1 Autonomous Database PDB MARK Project Lead PETE App Developer CINDY Data Analyst ADMIN ANNE Data Scientist SERGIO Executive TIM DBA GOLD_DATA
  11. Copyright © 2020, Oracle and/or its affiliates 13 Users from

    ADMIN User Administration System Administrator privileges • Create, edit, and delete OML user accounts • Read-only access to notebooks • Cannot create or run notebooks • Reassign user workspace when deleting users Developer privileges • Create projects and workspaces • Create, edit, delete, version, and run notebooks • Use SQL and SQL Script Scratchpads • Schedule jobs for notebook execution To follow along with the slides in your account, use the OML Notebooks ADMIN interface to create users as shown in this screen capture, or minimally create two users.
  12. Copyright © 2020, Oracle and/or its affiliates 14 OML Notebooks

    – workspaces, projects, notebooks OML User: MARK – project lead MARK Workspace – default DevTest Workspace MARK Project Customer Segmentation Demand Forecasting Notebook Notebook Notebook Scratchpad Notebook Notebook Scratchpad Notebook Scratchpad Notebook … Create Customer Segmentation, Demand Forecasting, and Project Examples projects in DevTest workspace shared with other team members as Developer or Viewer …
  13. Copyright © 2020, Oracle and/or its affiliates 15 Data Science

    Team Workspace Permissions Data Scientist Developer DBA Developer Application Developer Viewer Executive Viewer Data Analyst Developer Project Lead DevTest Workspace owner • Create, edit, and delete OML user accounts • Read-only access to notebooks • Cannot create or run notebooks • Reassign user workspace when deleting users ADMIN • Provide notebook reports to Executive • Review team’s work and collaborate using concurrent notebook sessions • Explore, prepare, and analyze data • Build and test ML models • Deliver ML scripts to data analyst and application developer • View notebook reports on project from the project lead • View any DevTest workspace notebooks • Access machine learning scripts to include in applications and dashboards • Use SQL or REST for scoring data with ML models • Read-only view of others’ scripts • Analyze data using Notebooks or BI tools like Oracle Analytics Cloud • Use machine learning scripts from data scientist • Load and prepare data for team • Collaborate with other users to assist with queries and data analysis
  14. Copyright © 2020, Oracle and/or its affiliates 16 User Roles

    from MARK ‘DevTest’ Workspace Permissions
  15. Copyright © 2020, Oracle and/or its affiliates 18 Initial workspace

    and default project created by OML service automatically when you first log in to OML CINDY Data Analyst creates a new project within a new workspace using the Create Project dialog and shares her workspace with other users. Note: users must be in a project where they have manager rights. Manage Workspaces • On the top right corner in your home page, click Select a Project pulldown menu • Select Manage Workspaces • Manage Workspaces dialog box opens • Click Permissions to add and delete users • Click Edit to edit the selected workspace • Click Delete to delete the selected workspace • When finished, click Close. Projects and Workspaces
  16. Copyright © 2020, Oracle and/or its affiliates 19 Collaborate with

    other users by granting permissions to access your workspace Grant permissions Manager, Developer, and Viewer to allow another user to view your workspace Granting permission type Manager or Developer allows user to drop tables, create tables, and run any scripts at any time on your account Granting permission type Manager allows the creation of new projects and workspaces Workspace Permissions • On the top right corner in your home page, click Select a Project pulldown menu • Select Workspace Permissions • Workspace Permissions dialog box opens • Select user name and permission type, then click + Add • Select row and click Delete to remove user permissions Note: re-adding a user overwrites previous settings – use this to change settings More on Workspace Permissions
  17. Copyright © 2020, Oracle and/or its affiliates 20 Viewer •

    Project: view only • Workspace: view only • Notebooks: view only • Jobs: view jobs and job runs of shared notebooks only Note: Users are not authorized to run or make any changes to your notebooks – useful for “reports” Developer • Project: view only • Workspace: view only • Notebooks: create, update, run, and delete notebooks that a developer creates only • Jobs: view and run jobs of shared notebooks only. A developer cannot create jobs for notebooks that are shared Note: Users can drop tables and run any script at any time on the owner’s account Manager • Project: create, update, delete • Workspace: view only • Notebooks: create, update, run, delete • Jobs: schedule jobs Note: Users can drop tables and run any script at any time on the owner’s account Workspace Permissions
  18. Copyright © 2020, Oracle and/or its affiliates 22 On the

    Notebooks page, click Create Must be in a workspace/project you can write to ANNE Data Scientist creates a notebook in DevTest Workspace and Customer Segmentation Project This is now visible to other team members who have access to DevTest Workspace Create a Notebook
  19. Copyright © 2020, Oracle and/or its affiliates 23 On the

    Notebooks page, click Import Select one or more json notebook files in folder to import Must be in a workspace/project you can write to ANNE Data Scientist imports a OML4SQL Clustering notebook in DevTest Workspace and Customer Segmentation Project Discussed later: check interpreter bindings are set to enable proper notebooks execution Import a Notebook
  20. Copyright © 2020, Oracle and/or its affiliates 24 Scratchpads are

    “glorified” notebooks SQL and Script scratchpads can have paragraphs of either %sql or %script, and later %python with OML4Py TIM DBA opens a SQL Query Scratchpad and lists the tables in all_tables If you return to Notebooks page, your Scratchpad is saved with name “SQL Query Scratchpad” or “SQL Script Scratchpad” Users can rename a Scratchpad notebook Renaming or deleting a default-named Scratchpad notebook gets you a clean Scratchpad next access Save a Scratchpad as a Notebook
  21. Copyright © 2020, Oracle and/or its affiliates 25 On the

    Notebooks page, MARK Project Lead selects notebook to version and clicks Version This allows seeing and manipulating existing versions, as well as create new ones On Versions page • Click + Version • Create Versions dialog opens • Enter comments for this version of your notebook, and click OK The versioned notebook is now listed in the Versions page. You can… • Click Revert Version to restore the older version of your notebook • Click Delete to delete the selected version of your notebook • Click New Notebook to create a new notebook from the selected notebook version • Results in a notebook with same name + “_i”, where i = 1..n each time invoked Version a Notebook
  22. Copyright © 2020, Oracle and/or its affiliates 26 OML Templates

    UI provides another way to share work • Publish notebooks as read-only reports • Publish snapshots of notebooks • Allow users to create notebooks from templates ANNE Data Scientist, on the Notebooks page, selects a notebook and click Save as Template Save as Template dialog opens • Name field – enter a name for the notebook template • Comments field – enter comments, if any • Tags field – enter tags for the template, if any • In Save To, select: • Personal: If you want to save this notebook template to Personal • Shared: If you want to save and share this notebook template Save a Notebook as a Template
  23. Copyright © 2020, Oracle and/or its affiliates 28 Share your

    work, publish your work as reports, or create notebooks from templates Granting Workspace Access To collaborate, grant access to your workspace to another user When other users log in, your workspace, projects in that workspace, and any workspace notebooks are visible Open the same notebook in different users’ accounts to dynamically see others’ changes All can open notebooks and work on or see changes in real time relative to granted permissions OML Templates UI Three types: Personal, Shared, and Examples View selected templates in read-only mode Create new editable/runnable notebooks from selected templates Collaborate with other users
  24. Copyright © 2020, Oracle and/or its affiliates 29 Autonomous Database

    and OML – limited architectural view Autonomous Database Pluggable Database (PDB) User User User … OML Repository Personal Templates Notebook Notebook Shared Templates Notebook Notebook Example Templates Notebook Notebook
  25. Copyright © 2020, Oracle and/or its affiliates 31 A connection

    group, also known as a Zeppelin interpreter set, is a collection of database connections Global Connection Group is created automatically when a new autonomous database is provisioned Developer • View database connections Administrator • View and Edit database connections: Name, Row Render Limit, Comment • Reset and Stop database connections • In the Database section, specify PL/SQL DBMS output option • Select Enabled to allow the PL/SQL interpreter to display messages sent to DBMS_OUTPUT in paragraph results • Requires users to log back in for change to take effect Connection Groups
  26. Copyright © 2020, Oracle and/or its affiliates 32 Interpreters •

    A plug-in for using a specific data processing language at the backend, e.g., SQL, Python, R • Currently supported interpreters • sql and script (PL/SQL) interpreters within an Oracle Database interpreter group • md (MarkDown) interpreter for plain text formatting syntax for conversion to HTML Interpreter Bindings • Association of interpreters to notebooks, and individual paragraphs • Notebooks contain an internal ordered list of bindings, default order  Low, Medium, High • *Low: least level of resources to each SQL statement, maximizes concurrent SQL statements • Medium: lower level of resources to each SQL statement, allows more concurrent SQL statements • High: highest level of resources to each SQL statement resulting in the highest performance, but significantly limits number of concurrent SQL statements • Toggle bindings on/off and drag buttons to order – top bindings take precedence over lower bindings Interpreters and Interpreter Bindings * Only non-parallel Low access is supported with the Always Free account
  27. Copyright © 2020, Oracle and/or its affiliates 33 May want

    to override execution priority / resources for an individual SQL interpreter paragraph To change the interpreter bindings order for a particular SQL paragraph in the a notebook: • Click the paragraph where you want to change the interpreter • Invoke the interpreter with the specific binding • Run the paragraph For example, TIM DBA invokes Scratchpad with medium resource allocation, where db1 is the database name %db1_medium Validate the interpreter binding by executing SELECT SYS_CONTEXT ('USERENV', 'SERVICE_NAME') FROM DUAL; Override Interpreter Bindings within a Paragraph
  28. Copyright © 2020, Oracle and/or its affiliates 34 If you

    do not bind any specific interpreter to your notebook, you receive the error: databasename_servicename not found TIM DBA explicitly sets the interpreter binding order in the following scenarios: • Notebook creation: notebook inherits the initial interpreter binding order, which is low (default), medium, high • Notebook import: notebook inherits the defined interpreter bindings – after importing a notebook, verify interpreter binding order and activation • Notebook export: notebook inherits the defined interpreter bindings • Interpreter bindings are not portable across Pluggable Databases (PDBs) • If export and import notebooks within the same PDB, interpreter bindings are preserved • If notebooks are created in different PDBs, interpreter bindings are not preserved after import, and must be manually set • Notebook creation from templates: notebook inherits the default order of interpreter bindings Interpreter Bindings
  29. Copyright © 2020, Oracle and/or its affiliates 36 Under what

    circumstances would we want to schedule notebook execution? • Execute a long-running notebook at off-peak hours, perhaps 2:00 AM • Rebuild one or more models at periodic intervals, e.g., nightly or weekly, on the latest data • Perform batch scoring of latest data at periodic intervals, e.g., nightly • Generate notebook “report” with latest analytics results, including tables and graphs Jobs use cases
  30. Copyright © 2020, Oracle and/or its affiliates 37 Jobs allow

    you to schedule the running of notebooks Operations on Jobs • Edit – Edit job metadata of any job listed in the Jobs page • Create – Create a new job to schedule your Notebook • Duplicate – Create a copy of an existing job listed in the Jobs page • Stop – Terminate a job that is currently running • Start – Enabled only for jobs that are in Scheduled status • Delete – Delete any job listed on the Jobs page Example MARK Project Lead schedules Clustering notebook to run daily, but clicks Start to also run it immediately Jobs
  31. Copyright © 2020, Oracle and/or its affiliates 38 Maximum Number

    of Runs • Maximum number of times the job will be run given repeat frequency • If notebook times out or exceeds max runs allowed, status is set to COMPLETED Maximum Failures Allowed • Maximum number of times a job can fail on consecutive scheduled runs • If notebook execution fails more than max failures allowed, status is set to BROKEN Timeout in Minutes • Maximum amount of time a job should be allowed to run, otherwise it is stopped • If notebook execution exceeds specified timeout, status is set to STOPPED Jobs Advanced Settings
  32. Copyright © 2020, Oracle and/or its affiliates 40 Historical logs

    of jobs in current user’s Job Log interface On Jobs page, click on name of job to view On the Job Log page for the selected job • View Date of execution, Status (success, running, stopped, failed), and Duration • To view notebook execution results from job history – select the row and click view icon • To delete a job log entry – select it and click delete icon Job Logs
  33. Copyright © 2020, Oracle and/or its affiliates 42 Use SQLFORMAT

    to format query output %script SET SQLFORMAT format_option MARK Project Lead created some examples in the Project Examples project in the DevTest workspace For example, SET SQLFORMAT ansiconsole; SELECT * FROM SH.SALES; Set Output Format in Notebooks Import Notebooks 1), 2), and 3) into the Project Examples project
  34. Copyright © 2020, Oracle and/or its affiliates 43 The available

    output formats are: • ANSICONSOLE — resizes the columns to the width of the data to save space. It also underlines the columns, instead of separate line of output • CSV —standard comma-separated variable output, with string values enclosed in double quotes • DELIMITED — manually define the delimiter string, and the characters that are enclosed in the string values • FIXED —fixed width columns with all data enclosed in double-quotes • HTML —HTML for a responsive table. The content of the table changes dynamically to match the search string entered in the text field • INSERT— the INSERT statements that could be used to recreate the rows in a table. • JSON — a JSON document containing the definitions of the columns along with the data that it contains. • LOADER — pipe delimited output with string values enclosed in double quotes—column names not included in the output • XML —a tag-based XML document. All data is presented as CDATA tags • DEFAULT — clears all previous SQLFORMAT settings, and returns to the default output, also, just specify SET SQLFORMAT Set Output Format in Notebooks
  35. Copyright © 2020, Oracle and/or its affiliates 45 Request user

    input for a value '${formName}’ or with default value ’${formName=defaultValue}’ SELECT * FROM ALL_OBJECTS WHERE OBJECT_TYPE = '${OBJ}'; SELECT * FROM ALL_OBJECTS WHERE OBJECT_TYPE = '${OBJ=TABLE}' Create Text Input Forms in Notebooks
  36. Copyright © 2020, Oracle and/or its affiliates 46 Define the

    Select form by using the syntax: '${formName=defaultValue,option1|option2...}' SELECT * FROM ALL_OBJECTS WHERE OBJECT_TYPE = '${OBJ=INDEX,INDEX|TABLE|VIEW|SYNONYM}' Create Select Forms in Notebooks
  37. Copyright © 2020, Oracle and/or its affiliates 47 '${checkbox:formName=defaultValue1|defaultValue2...,option1|option2...}' SELECT

    ${checkbox:Which Columns=OWNER|OBJECT_TYPE, OWNER|OBJECT_NAME|OBJECT_TYPE|STATUS} FROM ALL_OBJECTS Create Check Box Forms in Notebooks
  38. Unstructured text data: web pages, document libraries, Power Point presentations,

    product specifications, emails, comment fields in reports, and call center notes Extracting meaningful information from unstructured text can enhance ML results OML… • uses Oracle Text utilities and term weighting strategies to prepare text for TEXT columns • passes user-provided text configuration to Oracle Text • uses extracted tokens or themes for model building • supports text for column data types VARCHAR2, CHAR, CLOB, BLOB, and BFILE Oracle Text: an Oracle Database technology for term extraction, word and theme searching, and other utilities for querying text Enable via ADB PDB Admin by invoking “GRANT RESOURCE, CONNECT, CTXAPP TO user;” OML processing text data Text Processing/Mining Copyright © 2020, Oracle and/or its affiliates. All rights reserved
  39. Builds ensemble model with multiple sub-models, one for each data

    partition • Potentially achieve better accuracy through multiple targeted models • Sub-models automatically managed and used as one model Simplified scoring using top-level model only • Proper sub-model chosen by system based on row of data to be scored Automates a typical machine learning task for data scientists Partitioned Models Oracle Database Table Specify Partition Column(s) Partition-1 Partition-2 Partition-3 Partition-n … Sub-Model-1 Sub-Model-2 Sub-Model-3 Sub-Model-n Top Level Model New Data Score data using top level model In-DB Algorithm … Copyright © 2020, Oracle and/or its affiliates. All rights reserved