Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Efficient Energy Analytics with Airflow, Spark, and MLFlow

Efficient Energy Analytics with Airflow, Spark, and MLFlow

Hank Ehly

April 06, 2023

More Decks by Hank Ehly

Other Decks in Technology


  1. Confidential 3 • Energy tech is a fun, fulfilling area

    to work in ◦ Exciting technical projects for software developers ◦ Fulfilling career choice due to ubiquitous nature Introduction Photo by Michael Marais Photo by Dan LeFebvre IoT Electric Vehicles (EV) Electricity Market Price Prediction Image: Renewable Energy Institute • I want to show you: ◦ some examples of software problems that we encounter at ENECHANGE ◦ the technologies that we use to solve these problems Image: Synergy Energy Usage Data Analysis
  2. Confidential • Energy consumption data from smart meters • 30

    minute intervals (48 values per day) Meter ID Date/time Usage (kWh) 1 2023/04/03 12:00 1.4 1 2023/04/03 12:30 1.0 1 2023/04/03 13:00 1.1 smart meter energy consumption data (sample) Introduction number of meters number of days (per meter) lots of data!
  3. Confidential Example #1 - Bulk data downloads with Apache Airflow

    usage data usage data energy users energy company Meter ID Date/time Prediction (kWh) Actual (kWh) Savings (kWh) 1 2023/04/03 12:00 1.4 1.1 0.3 1 2023/04/03 12:30 1.0 1.1 0.1 1 2023/04/03 12:30 1.1 1.1 0.0 ENECHANGE datastore Save electricity between 5-6 PM tomorrow to earn points! Have: • Big blob of data • Columnar (Parquet) format Want: • Subset of data • Multiple zipped CSV files (~100 MB each) • E-mail pre-signed links • Slack notifications • Automatic retries (for idempotent & known to fail tasks)
  4. Confidential • Apache Airflow is an open-source platform for developing,

    scheduling, and monitoring batch-oriented workflows • You define the workflow steps in Python code and Airflow turns it into an executable program. • Run on a schedule or trigger via API request (redacted) Example #1 - Bulk data downloads with Apache Airflow from datetime import datetime from airflow import DAG from airflow.decorators import task from airflow.operators.bash import BashOperator with DAG(dag_id="demo", schedule="0 0 * * *", …): task1 = BashOperator( task_id="hello", bash_command="echo hello" ) @task() def task2(): print("world") task1 >> task2() ① ② ③ ④ ⑤
  5. Confidential • Predict how much electricity buildings A,B,C will use

    tomorrow • Machine Learning • Challenges: ◦ Growing number of models ◦ Different models require different inputs ◦ Tweak each model individually ◦ Track performance ◦ Record what training data we used ◦ Save model files (and any associated data visualizations) 9 Example #2 - Organized Machine Learning with MLflow smart meter
  6. Confidential • MLflow is an open source Python framework for

    creating and managing machine learning models. • Deploy it as a web application. Interact via Python API (see code example) and Web UI. • Keeps track of model executable files, parameters, model versions, performance metrics and data visualization files. Example #2 - Organized Machine Learning with MLflow # Get training data train_x, train_y, test_x, test_y = get_train_data() with mlflow.start_run(): # Train a model model = ElasticNet(alpha=0.5, l1_ratio=0.5) model.fit(train_x, train_y) # Evaluate performance y_pred = model.predict(test_x) (rmse, mae, r2) = eval_metrics(test_y, y_pred) # Record model parameters mlflow.log_param("alpha", alpha) mlflow.log_param("l1_ratio", l1_ratio) # Record performance metrics mlflow.log_metric("rmse", rmse) mlflow.log_metric("r2", r2) mlflow.log_metric("mae", mae) # Save model executable file mlflow.sklearn.log_model( model, artifact_path="model", registered_model_name="MyFirstModel" ) Image: Databricks Comparing model performance via the UI Training and saving a new model to MLflow
  7. Confidential Example #2 - Organized Machine Learning with MLflow Object

    Storage RDB Jupyter Notebook (our team) 1. Download models from MLflow (select * where stage=Production) 2. Make predictions 3. Export to storage * 10 * * * Everyday at 10 AM ./run_prediction_job.py
  8. Confidential • Energy tech is a fun, fulfilling area to

    work in. • ENECHANGE uses modern technology to solve problems in the energy industry. • Apache Airflow is a platform for creating and managing complex batch workflows. • MLflow is a framework for keeping machine learning operations organized. Summary Thank you for listening!