Efficient Energy Analytics with Airflow, Spark, and MLFlow

Confidential Efficient Energy Analytics with Airflow, Spark, and MLflow Hank
Ehly / ENECHANGE Inc.

Conﬁdential Introduction

Conﬁdential 3 • Energy tech is a fun, fulfilling area
to work in ◦ Exciting technical projects for software developers ◦ Fulfilling career choice due to ubiquitous nature Introduction Photo by Michael Marais Photo by Dan LeFebvre IoT Electric Vehicles (EV) Electricity Market Price Prediction Image: Renewable Energy Institute • I want to show you: ◦ some examples of software problems that we encounter at ENECHANGE ◦ the technologies that we use to solve these problems Image: Synergy Energy Usage Data Analysis

Conﬁdential • Energy consumption data from smart meters • 30
minute intervals (48 values per day) Meter ID Date/time Usage (kWh) 1 2023/04/03 12:00 1.4 1 2023/04/03 12:30 1.0 1 2023/04/03 13:00 1.1 smart meter energy consumption data (sample) Introduction number of meters number of days (per meter) lots of data!

Conﬁdential Example #1 Bulk data downloads with Apache Airﬂow

Conﬁdential Example #1 - Bulk data downloads with Apache Airﬂow
usage data usage data energy users energy company Meter ID Date/time Prediction (kWh) Actual (kWh) Savings (kWh) 1 2023/04/03 12:00 1.4 1.1 0.3 1 2023/04/03 12:30 1.0 1.1 0.1 1 2023/04/03 12:30 1.1 1.1 0.0 ENECHANGE datastore Save electricity between 5-6 PM tomorrow to earn points! Have: • Big blob of data • Columnar (Parquet) format Want: • Subset of data • Multiple zipped CSV files (~100 MB each) • E-mail pre-signed links • Slack notifications • Automatic retries (for idempotent & known to fail tasks)

Conﬁdential • Apache Airflow is an open-source platform for developing,
scheduling, and monitoring batch-oriented workflows • You define the workflow steps in Python code and Airflow turns it into an executable program. • Run on a schedule or trigger via API request (redacted) Example #1 - Bulk data downloads with Apache Airﬂow from datetime import datetime from airflow import DAG from airflow.decorators import task from airflow.operators.bash import BashOperator with DAG(dag_id="demo", schedule="0 0 * * *", …): task1 = BashOperator( task_id="hello", bash_command="echo hello" ) @task() def task2(): print("world") task1 >> task2() ① ② ③ ④ ⑤

Conﬁdential Example #2 Organized machine learning with MLﬂow

Conﬁdential • Predict how much electricity buildings A,B,C will use
tomorrow • Machine Learning • Challenges: ◦ Growing number of models ◦ Different models require different inputs ◦ Tweak each model individually ◦ Track performance ◦ Record what training data we used ◦ Save model files (and any associated data visualizations) 9 Example #2 - Organized Machine Learning with MLﬂow smart meter

Conﬁdential • MLflow is an open source Python framework for
creating and managing machine learning models. • Deploy it as a web application. Interact via Python API (see code example) and Web UI. • Keeps track of model executable files, parameters, model versions, performance metrics and data visualization files. Example #2 - Organized Machine Learning with MLﬂow # Get training data train_x, train_y, test_x, test_y = get_train_data() with mlflow.start_run(): # Train a model model = ElasticNet(alpha=0.5, l1_ratio=0.5) model.fit(train_x, train_y) # Evaluate performance y_pred = model.predict(test_x) (rmse, mae, r2) = eval_metrics(test_y, y_pred) # Record model parameters mlflow.log_param("alpha", alpha) mlflow.log_param("l1_ratio", l1_ratio) # Record performance metrics mlflow.log_metric("rmse", rmse) mlflow.log_metric("r2", r2) mlflow.log_metric("mae", mae) # Save model executable file mlflow.sklearn.log_model( model, artifact_path="model", registered_model_name="MyFirstModel" ) Image: Databricks Comparing model performance via the UI Training and saving a new model to MLflow

Conﬁdential Example #2 - Organized Machine Learning with MLﬂow Object
Storage RDB Jupyter Notebook (our team) 1. Download models from MLflow (select * where stage=Production) 2. Make predictions 3. Export to storage * 10 * * * Everyday at 10 AM ./run_prediction_job.py

Conﬁdential • Energy tech is a fun, fulfilling area to
work in. • ENECHANGE uses modern technology to solve problems in the energy industry. • Apache Airflow is a platform for creating and managing complex batch workflows. • MLflow is a framework for keeping machine learning operations organized. Summary Thank you for listening!

Efficient Energy Analytics with Airflow, Spark,...

Efficient Energy Analytics with Airflow, Spark, and MLFlow

Hank Ehly

More Decks by Hank Ehly

Other Decks in Technology

Featured

Transcript

Confidential Efficient Energy Analytics with Airflow, Spark, and MLflow Hank

Conﬁdential Introduction

Conﬁdential 3 • Energy tech is a fun, fulfilling area

Conﬁdential • Energy consumption data from smart meters • 30

Conﬁdential Example #1 Bulk data downloads with Apache Airﬂow

Conﬁdential Example #1 - Bulk data downloads with Apache Airﬂow

Conﬁdential • Apache Airflow is an open-source platform for developing,

Conﬁdential Example #2 Organized machine learning with MLﬂow

Conﬁdential • Predict how much electricity buildings A,B,C will use

Conﬁdential • MLflow is an open source Python framework for

Conﬁdential Example #2 - Organized Machine Learning with MLﬂow Object

Conﬁdential • Energy tech is a fun, fulfilling area to

Conﬁdential © ENECHANGE Ltd.