Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What the HACK happened to the data?

35c158f57df6814ad809f7471f4123ed?s=47 loleg
March 07, 2020

What the HACK happened to the data?

Presentation by Tina & Nico at the #energyhack2020 event which I co-organized.

35c158f57df6814ad809f7471f4123ed?s=128

loleg

March 07, 2020
Tweet

Transcript

  1. What the HACK happened to the data? by Tina &

    Nico Client Analysis 2.0 and the power of open data
  2. Research approach how the BFS calculates the energy consumption on

    a national level segmentation
  3. Example of Lucerne

  4. None
  5. So let’s do it for fun!

  6. Search for ‘openly available’ customer segmentation • Gebäude mit Wohnnutzung

    • Further examples of Zurich and Lucerne • Kennzahlen der Basler Wohnviertel und Landgemeinde
  7. Segmentation of regions in Basel Data: https://data.bs.ch/explore/dataset/100011/table/ (19 factors, 21

    regions) Assumption: This dataset includes factors which are all related to the energy needs and consumption behaviour of a region Limitations: Not enough regions, too high level information (individual households would be much more interesting) and lacking many and possibly the most relevant characteristics (e.g. traffic, population density, industry, connections between the variables)
  8. Examples of potential factors Buildings’ mean age Populations’ age ratio

    % of persons living in single households
  9. Number of segments? Based on all regions and factors in

    the dataset, how many segments might be useful? E.g. elbow plot (k-means clustering, plotting SSE for possible number of segments) https://towardsdatascience.com/custo mer-segmentation-using-k-means-clu stering-d33964f238c3 3 groups
  10. Location of segments based on the example factors Buildings’ mean

    age Populations’ age ratio % of persons living in single households
  11. Potential next steps • Defining the characteristics of a segment

    (e.g. regions with high number of one person households, newer buildings -> possibly high potential region? but what about the red dots?) • Finding proxies to energy consumption in each region and see how it relates to our segmentation (could be number of devices, check data from Swisscom) • Example question which could be answered: Is the assumed high potential region consuming less or more power? Example limitation: Since we do not have many regions and miss out characteristics, outlier (e.g. a region with a great number of unemployed persons) may end up in a not appropriate segment (e.g. maybe they live in regions with otherwise low potential and then therefore may receive no offer, but actually would be persons more willing to spend time to select a better provider and they are also the ones with greatest need to benefit)
  12. Libraries on how to visualise the data into a map

    • Worldwide view: rnaturalearth is an R package to hold and facilitate interaction with natural earth vector map data. • National view: geofaceting provides a functionality for 'ggplot2'. Geofaceting arranges a sequence of plots of data for different geographical entities into a grid that preserves some of the geographical orientation.
  13. “Technology now allows people to connect anytime, anywhere, to anyone

    in the world, from almost any device. This is dramatically changing the way people work, facilitating 24/7 collaboration with colleagues who are dispersed across time zones, countries, and continents.” Michael Dell, Chairman and CEO of Dell
  14. Recommendations For virtual set up use tools like: • Code

    sharing online platforms, like Google Colab • Frequent hangout calls • Frequent exchange with challenge owner