Efficient Energy Analytics with Airflow, Spark, and MLFlow

Slide 1

Slide 1 text

Confidential Efficient Energy Analytics with Airflow, Spark, and MLflow Hank Ehly / ENECHANGE Inc.

Slide 2

Slide 2 text

Conﬁdential Introduction

Slide 3

Slide 3 text

Conﬁdential 3 ● Energy tech is a fun, fulfilling area to work in ○ Exciting technical projects for software developers ○ Fulfilling career choice due to ubiquitous nature Introduction Photo by Michael Marais Photo by Dan LeFebvre IoT Electric Vehicles (EV) Electricity Market Price Prediction Image: Renewable Energy Institute ● I want to show you: ○ some examples of software problems that we encounter at ENECHANGE ○ the technologies that we use to solve these problems Image: Synergy Energy Usage Data Analysis

Slide 4

Slide 4 text

Conﬁdential ● Energy consumption data from smart meters ● 30 minute intervals (48 values per day) Meter ID Date/time Usage (kWh) 1 2023/04/03 12:00 1.4 1 2023/04/03 12:30 1.0 1 2023/04/03 13:00 1.1 smart meter energy consumption data (sample) Introduction number of meters number of days (per meter) lots of data!

Slide 5

Slide 5 text

Conﬁdential Example #1 Bulk data downloads with Apache Airﬂow

Slide 6

Slide 6 text

Conﬁdential Example #1 - Bulk data downloads with Apache Airﬂow usage data usage data energy users energy company Meter ID Date/time Prediction (kWh) Actual (kWh) Savings (kWh) 1 2023/04/03 12:00 1.4 1.1 0.3 1 2023/04/03 12:30 1.0 1.1 0.1 1 2023/04/03 12:30 1.1 1.1 0.0 ENECHANGE datastore Save electricity between 5-6 PM tomorrow to earn points! Have: ● Big blob of data ● Columnar (Parquet) format Want: ● Subset of data ● Multiple zipped CSV files (~100 MB each) ● E-mail pre-signed links ● Slack notifications ● Automatic retries (for idempotent & known to fail tasks)

Slide 7

Slide 7 text

Conﬁdential ● Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows ● You define the workflow steps in Python code and Airflow turns it into an executable program. ● Run on a schedule or trigger via API request (redacted) Example #1 - Bulk data downloads with Apache Airﬂow from datetime import datetime from airflow import DAG from airflow.decorators import task from airflow.operators.bash import BashOperator with DAG(dag_id="demo", schedule="0 0 * * *", …): task1 = BashOperator( task_id="hello", bash_command="echo hello" ) @task() def task2(): print("world") task1 >> task2() ① ② ③ ④ ⑤

Slide 8

Slide 8 text

Conﬁdential Example #2 Organized machine learning with MLﬂow

Slide 9

Slide 9 text

Conﬁdential ● Predict how much electricity buildings A,B,C will use tomorrow ● Machine Learning ● Challenges: ○ Growing number of models ○ Different models require different inputs ○ Tweak each model individually ○ Track performance ○ Record what training data we used ○ Save model files (and any associated data visualizations) 9 Example #2 - Organized Machine Learning with MLﬂow smart meter

Slide 10

Slide 10 text

Conﬁdential ● MLflow is an open source Python framework for creating and managing machine learning models. ● Deploy it as a web application. Interact via Python API (see code example) and Web UI. ● Keeps track of model executable files, parameters, model versions, performance metrics and data visualization files. Example #2 - Organized Machine Learning with MLﬂow # Get training data train_x, train_y, test_x, test_y = get_train_data() with mlflow.start_run(): # Train a model model = ElasticNet(alpha=0.5, l1_ratio=0.5) model.fit(train_x, train_y) # Evaluate performance y_pred = model.predict(test_x) (rmse, mae, r2) = eval_metrics(test_y, y_pred) # Record model parameters mlflow.log_param("alpha", alpha) mlflow.log_param("l1_ratio", l1_ratio) # Record performance metrics mlflow.log_metric("rmse", rmse) mlflow.log_metric("r2", r2) mlflow.log_metric("mae", mae) # Save model executable file mlflow.sklearn.log_model( model, artifact_path="model", registered_model_name="MyFirstModel" ) Image: Databricks Comparing model performance via the UI Training and saving a new model to MLflow

Slide 11

Slide 11 text

Conﬁdential Example #2 - Organized Machine Learning with MLﬂow Object Storage RDB Jupyter Notebook (our team) 1. Download models from MLflow (select * where stage=Production) 2. Make predictions 3. Export to storage * 10 * * * Everyday at 10 AM ./run_prediction_job.py

Slide 12

Slide 12 text

Conﬁdential ● Energy tech is a fun, fulfilling area to work in. ● ENECHANGE uses modern technology to solve problems in the energy industry. ● Apache Airflow is a platform for creating and managing complex batch workflows. ● MLflow is a framework for keeping machine learning operations organized. Summary Thank you for listening!