Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NLP for Energy Transition - Demo Day

NLP for Energy Transition - Demo Day

October 2020.

Julia Wabant

August 31, 2020
Tweet

More Decks by Julia Wabant

Other Decks in Technology

Transcript

  1. Using NLP and sentiment analysis to better interpret and implement

    energy transition 1 Source:- https://www.vox.com/2019/4/22/18510953/climate-change-2020-bernie-sanders-donald-trump-harvard-poll
  2. What to do with all this data? How do we

    analyze all this information? Challenges • Multiple data sources. • Incomplete information. • How do we account for biases? • What role do citizens play? • Languages and finances. The Problem – Lack of Social Interpretation • https://www.worldenergy.org/news- views/entry/global-challenge-to-define-a-role- of-a-citizen-in-energy-transition 2 Source:- https://www.vox.com/2019/4/22/18510953/climate-change-2020-bernie- sanders-donald-trump-harvard-poll
  3. Purpose & Motivation, Data-The New Oil • A shift in

    support for alternatives requires social upheaval and greater understanding among populations. • Consumers and individuals vote with their wallets and must be educated on the merits and demerits of each strategy. • It is thus imperative to interpret social sentiments regarding existing and emerging sources as well as technologies globally. • Data is the fuel that can facilitate a better decision making system for environmentalists and policy makers. 3
  4. Those being studied: • Reddit, Facebook, Youtube, Twitter. • Weibo.

    • News Websites like Huffington Post, Reuters, Bloomberg etc. • Developing and developed nations, including those poised to be global superpowers. • U.S., China, India, Nigeria, Europe etc. Problem Scope 4 Those being benefitted: • Policy makers and economists. • Government stakeholders and environmentalists. • Consumers, tax payers and society.
  5. • List data sources from social media sources, forums, news

    articles and surveys related to energy and energy transition. • Analyze and breakdown the data to remove noise. • Create visualizations to help understand social sentiments across nations and societies for various issues including energy prices and comfortability to switch sources. The Goal 5 OBJECTIVES WHAT WE ACHIEVED • Cleaned datasets with proper labels from various media sources. • Interpretable sentiment analysis visualizations for specific keywords and phrases. • Highly accurate AI models to predict and classify texts. • Automation code to collect, scrape and analyze text from social media.
  6. 6 PIPILINE STEPS Find relevant links and discussion forums Connect

    website data and scrape sources Automate data cleaning and filtering relevant content Perform sentiment analysis and compare best results Build code to predict and classify texts
  7. 7 BREAKING DOWN DATA- SENTIMENT ANALYSIS • Data from online

    sources is usually unclean, filled with irrelevant text, numbers, articles, stop words and whitespace. • Data has to be segmented, tokenized, vectorized and categorized by comparing it with existing dictionaries with surplus information about words and their context. • Data cleanup for NLP is also essential to computing speeds since less processing space is wasted.
  8. BREAKING DOWN THE TASKS 8 Collect Data Clean Data Generate

    Features Convert data to usable forms Load Dictionaries Generate Sentiments Create bag of words Generate tabular features Train and test models for prediction
  9. 10

  10. 11

  11. Sentiment Analysis • Text analysis method • Can detect polarity

    (positive or negative opinion) • Can be very precise (fine-grained) • Can be aspect-based • Can detect emotion 12
  12. Transfer Learning • Use a pre-trained model →« Portability »

    of features → When small amount of data • Permitted breakthrough of NLP 24
  13. • Pre-trained model (BERT base) • BookCorpus • Wikipedia English

    dataset • Fine-tuned on scraped tweets • Positive • Negative 98% accuracy General 28
  14. • Same pre-trained model • Fine-tuned on Climate-44 dataset •

    With sentiment index (between 0 and 1) • Threshold 0.5 94% accuracy Focus On Costs 30
  15. Steps (1) • Map tweets with Open Street Map data

    (location) • Identiy topics by applying topic model • Filter tweets on renewable energies 35
  16. Steps (2) • Use fine-tuned BERT sentiment classifier • Label

    remaining using Label Propagation • Perform community detection → show how an aspect-based sentiment traverses around in nearby locations 36
  17. 38

  18. 39

  19. 41

  20. 42

  21. 44

  22. • Topic variation by geographical area and country • Ex

    : electricty availability isn’t discussed everywhere • Discussions forums and articles can talk about same concerns • Sentiments flowing on nearby cities • Some countries with an overall feeling more positive than others 46
  23. • Apply graph approach to other areas • Scalable code

    base • See how fine-tuned models performs on any commentts (not only Twitter) 48
  24. 50 THE PRESENT AND THE ROAD AFAR • We have

    a repository of information about social sentiments( different countries, websites, languages). • Sentiment analysis results for sources other than English Spanish and Chinese(Mandarin, Cantonese). • Current generated models for topic synthesis and prediction are well adjusted to dissect information obtained from new social media sources. • Ensemble methods like LinearSVC are better at connecting sentiments to context. • Social media is a rich resource to tap into and understand what people are talking about. • Social perceptions about energy bills, energy use, green energy and more topics vary depending on online anonymity and platform culture. • Reddit and Facebook are preferred choices for social sentiment analysis. • Sentiment analysis needs to expands beyond text(pictures, emojis etc).