Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rust on AWS でデータ分析 / 20260523iotlt-niigata-rust...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Rust on AWS でデータ分析 / 20260523iotlt-niigata-rust-on-aws

2026/05/23 (土) 開催の IoTLT新潟 Vol.17 で登壇した資料

イベントページ
https://iotlt.connpass.com/event/391502/

Avatar for kasacchiful

kasacchiful PRO

May 23, 2026

More Decks by kasacchiful

Other Decks in Programming

Transcript

  1. Metadata: BuildMethod: rust-cargolambda cargo-lambda provided.al2023 [dependencies] lambda_runtime = "0.13" aws-sdk-s3

    = "1" polars = { version = "0.50", features = ["lazy", "parquet", "temporal", "abs"] } tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
  2. LazyFrame::scan_parquet(input, ScanArgsParquet::default())? .with_columns([ col("timestamp").dt().truncate(lit("1h")).alias("hour"), col("temperature").mean().over([col("device_id")]).alias("_dev_mean"), col("temperature").std(1).over([col("device_id")]).alias("_dev_std"), ]) .with_columns([ ((col("temperature") -

    col("_dev_mean")).abs() .gt(lit(3.0) * col("_dev_std"))).alias("_is_anomaly"), col("status").eq(lit("error")).alias("_is_error"), ]) .group_by([col("device_id"), col("hour")]) .agg([ col("temperature").mean().alias("temp_mean"), col("_is_error").sum().alias("error_count"), col("_is_anomaly").sum().alias("anomaly_count"), ]) .sort(["device_id", "hour"], SortMultipleOptions::default()) .collect()?
  3. df["hour"] = df["timestamp"].dt.floor("h") dev_stats = df.groupby("device_id")["temperature"].agg( _dev_mean="mean", _dev_std=lambda s: s.std(ddof=1),

    ) df = df.merge(dev_stats, left_on="device_id", right_index=True) df["_is_anomaly"] = ( (df["temperature"] - df["_dev_mean"]).abs() > 3.0 * df["_dev_std"] ) df["_is_error"] = df["status"].eq("error") agg = (df.groupby(["device_id", "hour"], sort=True) .agg(temp_mean=("temperature", "mean"), error_count=("_is_error", "sum"), anomaly_count=("_is_anomaly", "sum")) .reset_index())