Oracle Cloud Infrastructure Data Science Service

Oracle Cloud Infrastructure Data Science Service

2020年2月リリース OCI Data Science Service技術概要資料

3115a782126be714b5f94d24073c957d?s=128

oracle4engineer

April 06, 2020
Tweet

Transcript

  1. 2020 4 2 Oracle Cloud Infrastructure Data Science

  2. Oracle Java Oracle Corporation Oracle Java Oracle Corporation 2 Copyright

    © 2020 Oracle and/or its affiliates.
  3. 3 Copyright © 2020 Oracle and/or its affiliates. • •

    • - - - • (ML) OSS Oracle Accelerated Data Science(ADS) • ML • PaaS IaaS Overview
  4. 4 Copyright © 2020 Oracle and/or its affiliates. • •

    • Notebook • • Jupyter Notebook ML Compute • Compartment VCN Subnet Compute Block Volume • ML • Keras • scikit-learn • XGBoost • Oracle Accelerated Data Science(ADS) • • Accelerated Data Science scikit-learn ML Jupyter Notebook Noteboot Compute Block Storage
  5. 5 Copyright © 2020 Oracle and/or its affiliates. Notebook Python

    Notebook OCI OCI Jupyter Notebook
  6. 6 Copyright © 2020 Oracle and/or its affiliates. • Oracle

    Cloud Infrastructure Data Science Python • API • Oracle AutoML • • • Oracle Accelerated Data Science(ADS) AutoML Confidential – © 2020 Oracle Internal ⑥モデルの 解釈 ②データの 変換 ⑤モデルの 評価 Accelerated data Science
  7. 7 Copyright © 2020 Oracle and/or its affiliates. • ADS

    • DatasetFactory • • • OCI Object Storage, Amazon S3, Google Cloud Storage, Azure Blob • Oracle DB, ADW, MongoDB, HDFS, NoSQL DB, Elastic Search, etc. • • CSV, TSV, Parquet, libsvm, json, Excel, HDF5, SQL, xml, Apache Server Logfile(clf, log), arff
  8. 8 Copyright © 2020 Oracle and/or its affiliates. # ds

    = DatasetFactory.open("/path/to/data.data", format='csv', delimiter=" ") # OCI Object Storage Service ds = DatasetFactory.open("oci://<bucket-name>/<file-name>", storage_options = { "config": "~/.oci/config", "profile": "DEFAULT_USER" }) # Amazon S3 ds = DatasetFactory.open("s3://bucket_name/iris.csv", storage_options = { 'key': 'aws key', 'secret': 'aws secret, 'blocksize': 1000000, 'client_kwargs': { "endpoint_url": "https://s3-us-west-1.amazonaws.com" } }) # ADW uri = f'oracle+cx_oracle://{os.environ["ADW_USER"]}:{os.environ["ADW_PASSWORD"]}@{os.environ["ADW_SID"]}’ ds = DatasetFactory.open(uri, format="sql", table=table, index_col=index_col, target='label')
  9. 9 Copyright © 2020 Oracle and/or its affiliates. • RDB

    • ( ) • • ” ” • • • • • etc.
  10. 10 Copyright © 2020 Oracle and/or its affiliates. • •

    • • • String • ( ) • • Null Null
  11. 11 Copyright © 2020 Oracle and/or its affiliates. 1. 2.

    3. 4. ADS # ds.get_recommendations() transformed_ds = ds.get_transformed_dataset() # transformed_ds = ds.auto_transform() ADS AutoML
  12. 12 Copyright © 2020 Oracle and/or its affiliates. ADS (

    , ) ( , ) “Drop” get_recommendations()
  13. 13 Copyright © 2020 Oracle and/or its affiliates. ( ,

    ) ( , ) “Drop” get_recommendations()
  14. 14 Copyright © 2020 Oracle and/or its affiliates. ( ,

    ) ( , ) “Drop” get_recommendations()
  15. 15 Copyright © 2020 Oracle and/or its affiliates. ( )

    ( , ) “Up-sample” “Down-sample” ( , ) get_recommendations()
  16. 16 Copyright © 2020 Oracle and/or its affiliates. • •

    • • API(Seaborn, Matplotlib, GIS)
  17. 17 Copyright © 2020 Oracle and/or its affiliates. # show_in_notebook()

    ds.show_in_notebook() 5
  18. 18 Copyright © 2020 Oracle and/or its affiliates. # ds.plot("col02").show_in_notebook(figsize=(4,4))

    # ds.plot("col02", y="col01").show_in_notebook(figsize=(4,4)) # ds.plot("col01", y="col03").show_in_notebook()
  19. 19 Copyright © 2020 Oracle and/or its affiliates. API #

    Matplotlib from numpy.random import randn df = pd.DataFrame(randn(1000, 4), columns=list('ABCD')) def ts_plot(df, figsize): ts = pd.Series(randn(1000), index=pd.date_range('1/1/2000', periods=1000)) df.set_index(ts) df = df.cumsum() plt.figure() df.plot(figsize=figsize) plt.legend(loc='best') ds = DatasetFactory.from_dataframe(df, target='A') ds.call(ts_plot, figsize=(7,7)) Seaborn, Matplotlib, GIS
  20. 20 Copyright © 2020 Oracle and/or its affiliates. • ADS

    AutoML • 1. 2. ( ) 3. 4. # train, test = transformed_ds.train_test_split(test_size=0.1) # ml_engine = OracleAutoMLProvider(n_jobs=-1, loglevel=logging.ERROR) oracle_automl = AutoML(train, provider=ml_engine) automl_model1, baseline = oracle_automl.train() • AdaBoostClassifier • DecisionTreeClassifier • ExtraTreesClassifier • KNeighborsClassifier • LGBMClassifier • LinearSVC • LogisticRegression • RandomForestClassifier • SVC • XGBClassifier
  21. 21 Copyright © 2020 Oracle and/or its affiliates. Oracle AutoML

    oracle_automl.visualize_algorithm_selection_trials() oracle_automl.visualize_adaptive_sampling_trials()
  22. 22 Copyright © 2020 Oracle and/or its affiliates. Oracle AutoML

    oracle_automl.visualize_feature_selection_trials() oracle_automl.visualize_tuning_trials()
  23. 23 Copyright © 2020 Oracle and/or its affiliates. • •

    • ( ) TEST TEST TEST TEST TEST TRAIN TEST TEST TEST TEST TEST TRAIN TRAIN TEST TRAIN TRAIN TEST TRAIN TEST TRAIN TEST (※1) 1 2 3 4 5 ※1 N 1 1 TEST N-1 TRAIN 2 1 TEST N-1 TRAIN N
  24. 24 Copyright © 2020 Oracle and/or its affiliates. ) •

    • PR ROC • # bin_evaluator = ADSEvaluator(test, models=[bin_lr_model, bin_rf_model], training_data=train) # bin_evaluator.show_in_notebook(perfect=True)
  25. 25 Copyright © 2020 Oracle • • • • •

    • • Global Explainer = - (Feature Permutation Importance) - (Individual Conditional Expectation(ICE)) - (Partial Dependence Plot(PDP)) • Local Explainer =
  26. 26 Copyright © 2020 Oracle and/or its affiliates. ADS Global

    Explainer – Feature Permutation Importance PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 1 0 3 Braund, Mr. Owen male 22 1 0 7.25 S 2 1 1 Cumings, Mrs. John female 38 1 0 71.2833 C 3 1 3 Heikkinen, Miss. Laina female 26 0 0 7.925 S 4 1 1 Futrelle, Mrs. Jacques Heath female 35 1 0 53.1 S PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 1 0 3 Braund, Mr. Owen Female 22 1 0 7.25 S 2 1 1 Cumings, Mrs. John Male 38 1 0 71.2833 C 3 1 3 Heikkinen, Miss. Laina Male 26 0 0 7.925 S 4 1 1 Futrelle, Mrs. Jacques Heath male 35 1 0 53.1 S (baseline_score) (shuffled_score) baseline_score shuffled_score baseline_score shuffled_score • • baseline_score - shffuled_score
  27. 27 Copyright © 2020 Oracle and/or its affiliates. # With

    ADSExplainer, create a global explanation object using # the MLXGlobalExplainer provider from ads.explanations.mlx_global_explainer import MLXGlobalExplainer global_explainer = explainer.global_explanation( provider=MLXGlobalExplainer()) # A summary of the global feature permutation importance algorithm and # how to interpret the output can be displayed with global_explainer.feature_importance_summary() # Compute the global Feature Permutation Importance explanation importances = global_explainer.compute_feature_importance() # ADS supports multiple visualizations for the global Feature # Permutation Importance explanations (see "Interpretation" above) # Simple bar chart highlighting the average impact on model score # across multiple iterations of the algorithm importances.show_in_notebook() # Build the model using AutoML. 'model' is a subclass of type ADSModel. # Note that the ADSExplainer below works with any model (classifier or # regressor) that is wrapped in an ADSModel import logging from ads.automl.provider import OracleAutoMLProvider from ads.automl.driver import AutoML ml_engine = OracleAutoMLProvider(n_jobs=-1, loglevel=logging.ERROR) oracle_automl = AutoML(train, provider=ml_engine) model, baseline = oracle_automl.train() # Create the ADS explainer object, which is used to construct global # and local explanation objects. The ADSExplainer takes as input the # model to explain and the train/test dataset from ads.explanations.explainer import ADSExplainer explainer = ADSExplainer(test, model, training_data=train) Global Explainer – Feature Importance Sample Code
  28. 28 Copyright © 2020 Oracle and/or its affiliates. ADS Global

    Explainer – Feature Permutation Importance PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 1 0 3 Braund, Mr. Owen male 22 1 0 7.25 S 2 1 1 Cumings, Mrs. John female 38 1 0 71.2833 C 3 1 3 Heikkinen, Miss. Laina female 26 0 0 7.925 S 4 1 1 Futrelle, Mrs. Jacques Heath female 35 1 0 53.1 S PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 1 0 3 Braund, Mr. Owen Female 22 1 0 7.25 S 2 1 1 Cumings, Mrs. John Male 38 1 0 71.2833 C 3 1 3 Heikkinen, Miss. Laina Male 26 0 0 7.925 S 4 1 1 Futrelle, Mrs. Jacques Heath male 35 1 0 53.1 S (baseline_score) (shuffled_score) baseline_score shuffled_score baseline_score shuffled_score • • baseline_score - shffuled_score
  29. 29 Copyright © 2020 Oracle and/or its affiliates. ADS Global

    Explainer - Individual Conditional Expectation(ICE/PDP) F1 F2 F3 T 2 1.2 0 15.1 7 2.4 4 12.5 8 9.7 3 18.1 . ... ... 13.5 F1 F2 F3 T 2 1.2 0 15.1 F1 F2 F3 T 1 1.2 0 ? 2 2.4 4 ? 3 9.7 3 ? . ... ... ? F1 F2 F3 T 1 1.2 0 13.5 2 2.4 4 15.1 3 9.7 3 17.5 . ... ... ... F1 T F1 input T ( ) T F1 F1 T Oracle
  30. 30 Copyright © 2020 Oracle and/or its affiliates. ADS Global

    Explainer - Partial Dependence Plot(ICE/PDP) F1 F2 F3 T 2 1.2 0 15.1 7 2.4 4 12.5 8 9.7 3 18.1 . ... ... 13.5 F1 F2 F3 T 2 1.2 0 15.1 F1 F2 F3 T 1 1.2 0 ? 2 2.4 4 ? 3 9.7 3 ? . ... ... ? F1 F2 F3 T 1 1.2 0 13.5 2 2.4 4 15.1 3 9.7 3 17.5 . ... ... ... F1 T ICE ICE PDP = ICE ( ) Oracle (ICE) (PDP) ICE = ( )
  31. 31 Copyright © 2020 Oracle and/or its affiliates. from ads.explanations.mlx_global_explainer

    import MLXGlobalExplainer global_explainer = explainer.global_explanation( provider=MLXGlobalExplainer()) # A summary of the global partial feature dependence explanation # algorithm and how to interpret the output can be displayed with global_explainer.partial_dependence_summary() # Compute the 1-feature PDP on the categorical feature, "sex", # and numerical feature, "age" pdp_sex = global_explainer.compute_partial_dependence("sex") pdp_age = global_explainer.compute_partial_dependence( "age", partial_range=(0, 1)) # ADS supports PDP visualizations for both 1-feature and 2-feature # Feature Dependence explanations, and ICE visualizations for 1-feature # Feature Dependence explanations (see "Interpretation" above) # Visualize the categorical feature PDP for the True (Survived) label pdp_sex.show_in_notebook(labels=True) # Note that the ADSExplainer below works with any model (classifier or # regressor) that is wrapped in an ADSModel import logging from ads.automl.provider import OracleAutoMLProvider from ads.automl.driver import AutoML ml_engine = OracleAutoMLProvider(n_jobs=-1, loglevel=logging.ERROR) oracle_automl = AutoML(train, provider=ml_engine) model, baseline = oracle_automl.train() # Create the ADS explainer object, which is used to construct # global and local explanation objects. The ADSExplainer takes # as input the model to explain and the train/test dataset from ads.explanations.explainer import ADSExplainer explainer = ADSExplainer(test, model, training_data=train) # With ADSExplainer, create a global explanation object using # the MLXGlobalExplainer provider Global Explainer – ICE/PDP Sample Code
  32. 32 Copyright © 2020 Oracle and/or its affiliates. Local Explainer

    • • ( α) • (Survived= 0 or 1) • PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 1 0 3 Braund, Mr. Owen male 22 1 0 7.25 S 2 1 1 Cumings, Mrs. John female 38 1 0 71.2833C 3 1 3 Heikkinen, Miss. Laina female 26 0 0 7.925 S ... ... ... ... ... ... ... ... ... ... ) (https://www.kaggle.com/c/titanic) PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 500 ? 1 Anna. Miss. Bworn female 36 1 0 71.283 C PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 500 1 1 Anna. Miss. Bworn female 36 1 0 71.283 C Why?
  33. 33 Copyright © 2020 Oracle and/or its affiliates. Local Explainer

    PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 1 0 3 Braund, Mr. Owen male 22 1 0 7.25 S 2 1 1 Cumings, Mrs. John female 38 1 0 71.2833 C 3 1 3 Heikkinen, Miss. Laina female 26 0 0 7.925 S ... ... ... ... ... ... ... ... ... ... Oracle PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 500 ? 1 Anna. Miss. Bworn female 36 1 0 71.283 C Passenger ID = 500 Passenger ID = 500 Oracle MLX
  34. 34 Copyright © 2020 Oracle and/or its affiliates. Local Explainer

    PassengerID 500 PassengerID 500 ( )
  35. 35 Copyright © 2020 Oracle and/or its affiliates. from ads.explanations.mlx_local_explainer

    import MLXLocalExplainer local_explainer = explainer.local_explanation( provider=MLXLocalExplainer()) # A summary of the local explanation algorithm and how to interpret # the output can be displayed with local_explainer.summary() # Select a specific sample (instance/row) to generate a local # explanation for sample = 14 # Compute the local explanation on our sample from the test set explanation = local_explainer.explain(test.X.iloc[sample:sample+1], test.y.iloc[sample:sample+1]) # Visualize the explanation for the label True (Survived). See # the "Interpretation" section above for more information explanation.show_in_notebook(labels=True) # Build the model using AutoML. 'model' is a subclass of type ADSModel. # Note that the ADSExplainer below works with any model (classifier or # regressor) that is wrapped in an ADSModel import logging from ads.automl.provider import OracleAutoMLProvider from ads.automl.driver import AutoML ml_engine = OracleAutoMLProvider(n_jobs=-1, loglevel=logging.ERROR) oracle_automl = AutoML(train, provider=ml_engine) model, baseline = oracle_automl.train() # Create the ADS explainer object, which is used to construct # global and local explanation objects. The ADSExplainer takes # as input the model to explain and the train/test dataset from ads.explanations.explainer import ADSExplainer explainer = ADSExplainer(test, model, training_data=train) # With ADSExplainer, create a local explanation object using # the MLXLocalExplainer provider Local Explainer
  36. 36 Copyright © 2020 Oracle and/or its affiliates. • •

    • Data Science Platform • ADS ML • scikit-learn, keras, xgboost, lightGBM scikit-learn lightGBM OCI [ ]> [ ] Notebook
  37. 37 Copyright © 2020 Oracle and/or its affiliates. Oracle Functions

    OCI Data Science OCI API Gateway http://hoge:8080/invoke/.. REST Endpoint OCI Functions Service OCI Registry Service Application func.yml func.py scorefn.py requirement.txt ? cURL • • • func.yml • func.py • scorefn.py • requirement.txt • ( ) • Fn OCI Functions • OCI API Gateway • OCI (OCI Functions) • REST (API Gateway) • OCI • REST OCI Functions
  38. None