AI'S Hidden cost: The environmental impact of Machine Learning

Why this talk?

Climate change is one of our genera ti on’s biggest
challenges Why this talk?

challenges Why this talk? We are trying to fi ght it but at the same ti me we fi nd new ways to hurt our planet

challenges Why this talk? We are trying to fi ght it but at the same ti me we fi nd new ways to hurt our planet We all love using AI Tools but what is the environmental impact?

challenges Why this talk? We are trying to fi ght it but at the same ti me we fi nd new ways to hurt our planet We all love using AI Tools but what is the environmental impact? The news about this topic are not encouraging

Why this talk? As a computer scien ti st, I
asked myself what is the cost to the environment of using these tools

asked myself what is the cost to the environment of using these tools I started to read more about this topic

asked myself what is the cost to the environment of using these tools I started to read more about this topic Today I’ll tell you what I have learnt

Luca Corbucci Podcaster @ PointerPodcast Community Manager @ SuperHero Valley
Community Manager @ Pisa.dev https://lucacorbucci.me/ PhD Student in Computer Science @ University of Pisa - Responsible AI

🔋 🏭 Model Carbon Footprint

🧑🏭 Equipment Manufactoring 🔋 🏭 Model Carbon Footprint

🏋 Training 🧑🏭 Equipment Manufactoring 🔋 🏭 Model Carbon Footprint

🔍 Inference 🏋 Training 🧑🏭 Equipment Manufactoring 🔋 🏭 Model
Carbon Footprint

🔍 Inference 🏋 Training 🏭 Data centers 🧑🏭 Equipment Manufactoring
🔋 🏭 Model Carbon Footprint

🔍 Inference 🏋 Training 🏭 Data centers 🔋 🏭 Model
Carbon Footprint

Training

Models are get ti ng bigger…

GPT 3 175 Billion parameters 2020 Models are get ti
ng bigger…

GPT 3 175 Billion parameters GPT-4 1.76 Trillion parameters (uno
ffi cial) 2020 2023 Models are get ti ng bigger…

GPT o1-preview 2.8 trillion parameters (uno ffi cial) GPT 3
175 Billion parameters GPT-4 1.76 Trillion parameters (uno ffi cial) 2020 2023 2024 Models are get ti ng bigger…

Most closed LLMs do not give any informa ti on
about their size. It is hard to know what was the cost of the training

There are some es ti ma ti ons for the
training of GPT 3: 1,300 MWh of electricity and 552 tonnes of CO2 Most closed LLMs do not give any informa ti on about their size. It is hard to know what was the cost of the training

GPT 3 175 Billion parameters 130 US homes in a
year There are some es ti ma ti ons for the training of GPT 3: 1,300 MWh of electricity and 552 tonnes of CO2 Training ( ) = 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 🏠 Most closed LLMs do not give any informa ti on about their size. It is hard to know what was the cost of the training

BLOOM - BigScience https://bigscience.huggingface.co/blog/bloom

BigScience Large Open- science Open-access Mul ti lingual Language Model
(BLOOM)

BigScience Large Open- science Open-access Mul ti lingual Language Model
(BLOOM) Trained on 348 A100 GPUs

Trained for 1.08 million GPU hours BigScience Large Open- science
Open-access Mul ti lingual Language Model (BLOOM) Trained on 348 A100 GPUs

Trained for 1.08 million GPU hours BigScience Large Open- science
Open-access Mul ti lingual Language Model (BLOOM) Trained on 348 A100 GPUs We can have an es ti mate of the energy used and of the carbon footprint of training an LLM

How can we measure the energy consump ti on? TDP
= 400W (They considered 100% usage)

How can we measure the energy consump ti on? TDP
= 400W (They considered 100% usage) 348 GPUS X

TDP = 400W (They considered 100% usage) 348 GPUS 1,082,990
Hours X X How can we measure the energy consump ti on?

TDP = 400W (They considered 100% usage) 348 GPUS 1,082,990
Hours X X = 433.196 kWh How can we measure the energy consump ti on? Images from https://thenounproject.com/

Energy Consump ti on 433.196 kWh How can we measure
the carbon footprint?

X Energy Consump ti on 433.196 kWh Carbon intensity of
the energy grid 57 gCo_2eq/KWh How can we measure the carbon footprint?

X Energy Consump ti on 433.196 kWh Carbon intensity of
the energy grid 57 gCo_2eq/KWh 24.69 tonnes of CO2 How can we measure the carbon footprint? =

24.69 tonnes of CO2 Training ( ) =

24.69 tonnes of CO2 Training ( ) = 101623 km
driven by an average gasoline- powered passenger vehicle

24.69 tonnes of CO2 Training ( ) = 1,629,952 smartphone
charged

24.69 tonnes of CO2 Training ( ) = 4.9 homes’
electricity use for one year

Inference

Inference (for LLMs) The LLM is loaded on the GPUs

We give a prompt to the LLM and we wait for an answer

The LLM predict a series of tokens based on the prompt We give a prompt to the LLM and we wait for an answer

The LLM predict a series of tokens based on the prompt This phase is called inference and we need one or more GPUs We give a prompt to the LLM and we wait for an answer

Inference costs less than training but it happens more frequently

Send a message to your favourite LLM

“A single Google search takes 0.3 watt-hours of electricity, while
a ChatGPT request takes 2.9 watt- hours” AI already uses as much energy as a small country. It’s only the beginning. https://www.vox.com/climate/2024/3/28/24111721/climate-ai-tech-energy- demand-rising

“If ChatGPT were integrated into the 9 billion searches done
daily, the electricity demand would increase by 10 terawatt-hours a year — the amount consumed by about 1.5 million European Union residents. “ Electricity 2024: Analysis and forecast to 2026 - https://bit.ly/3C13ZZ3

Deployed on a GCP instance with 16 GPUs

Deployed on a GCP instance with 16 GPUs 230k requests
in 18 days

Deployed on a GCP instance with 16 GPUs Total consump
ti on 914KWh of electricity 230k requests in 18 days Energy Consump ti on of training 433.196 kWh

Deployed on a GCP instance with 16 GPUs CPU 2%
GPU 75% RAM 23% Total consump ti on 914KWh of electricity The consump ti on is ∼0.28kWh even if the model is not answering to ques ti ons 230k requests in 18 days

The training represents the biggest cost in terms of energy
but inference could exceed training emissions in just a few weeks

Power Hungry Processing: Watts Driving the Cost of AI Deployment?
https://arxiv.org/abs/2311.16863

Inference of 88 Gen AI models compared Power Hungry Processing:
Watts Driving the Cost of AI Deployment? https://arxiv.org/abs/2311.16863

Inference of 88 Gen AI models compared 💡Image genera ti
on is more costly than text genera ti on Power Hungry Processing: Watts Driving the Cost of AI Deployment? https://arxiv.org/abs/2311.16863

Inference of 88 Gen AI models compared 💡Image genera ti
on is more costly than text genera ti on 🚗 6.5 Km vs 🚗. 0.0009 Km Power Hungry Processing: Watts Driving the Cost of AI Deployment? https://arxiv.org/abs/2311.16863

What is the main problem with inference and “famous” models
Power Hungry Processing: Watts Driving the Cost of AI Deployment? https://arxiv.org/abs/2311.16863

What is the main problem with inference and “famous” models
For models like ChatGPT, 2 weeks could be enough to have higher inference cost than training Power Hungry Processing: Watts Driving the Cost of AI Deployment? https://arxiv.org/abs/2311.16863

How can I choose an LLM while considering the energy
e ff i ciency? Ml.Energy Leaderboard https://ml.energy/leaderboard/

AI Energy Score 2024 https://huggingface.co/spaces/AIEnergyScore/2024_Leaderboard How can I choose an
LLM while considering the energy e ff i ciency?

Data Centers

Data Centers It is not enough to op ti mize
training and inference, we need to op ti mize the en ti re training stack

Data centers power demand over ti me AI is poised
to drive 160% increase in data center power demand https://www.goldmansachs.com/insights/ar ti cles/AI-poised-to-drive-160-increase- in-power-demand

Data centers power demand over ti me Increased workload +
Increased data centers e ff i ciency AI is poised to drive 160% increase in data center power demand https://www.goldmansachs.com/insights/ar ti cles/AI-poised-to-drive-160-increase- in-power-demand

Data centers power demand over ti me Increase in power
demand and decrease in the e ffi ciency gain AI is poised to drive 160% increase in data center power demand https://www.goldmansachs.com/insights/ar ti cles/AI-poised-to-drive-160-increase- in-power-demand

Data centers power demand over ti me AI could represent
the 30% of the data centers power demand in 2030 AI is poised to drive 160% increase in data center power demand https://www.goldmansachs.com/insights/ar ti cles/AI-poised-to-drive-160-increase- in-power-demand

Data centers power demand over ti me The es ti
mated increase of power consump ti on in Europe could be equal to the power consump ti on of Nederland, Greece and Portugal AI is poised to drive 160% increase in data center power demand https://www.goldmansachs.com/insights/ar ti cles/AI-poised-to-drive-160-increase- in-power-demand

Is it only a matter of energy? Microsoft recently opened
new datacenters in Goodyear, Arizona. Image from https://deepgram.com/learn/how-ai-consumes-water Making AI Less “Thirsty”: Uncovering and Addressing thr Secret Water Footprint of AI Models https://arxiv.org/pdf/2304.03271

Is it only a matter of energy? Microsoft recently opened
new datacenters in Goodyear, Arizona. Not only do they consume energy but they also need a lot of water to keep a low temperature. Image from https://deepgram.com/learn/how-ai-consumes-water Making AI Less “Thirsty”: Uncovering and Addressing thr Secret Water Footprint of AI Models https://arxiv.org/pdf/2304.03271

Is it only a matter of energy? Image from https://deepgram.com/learn/how-ai-consumes-water
Microsoft recently opened new datacenters in Goodyear, Arizona. Not only do they consume energy but they also need a lot of water to keep a low temperature. Training GPT-3 in Microsoft’s data centers can evaporate 700,000 liters of clean freshwater Making AI Less “Thirsty”: Uncovering and Addressing thr Secret Water Footprint of AI Models https://arxiv.org/pdf/2304.03271

Is it only a matter of energy? Asking between twenty
and fi fty ques ti ons on ChatGPT is equivalent to consuming half a litre of water. Image from https://deepgram.com/learn/how-ai-consumes-water Making AI Less “Thirsty”: Uncovering and Addressing thr Secret Water Footprint of AI Models https://arxiv.org/pdf/2304.03271

Is it only a matter of energy? Asking between twenty
and fi fty ques ti ons on ChatGPT is equivalent to consuming half a litre of water. Researchers at UC Riverside es ti mated that global AI demand could cause data centers to use more than 4 trillion liter of fresh water by 2027. Image from https://deepgram.com/learn/how-ai-consumes-water Making AI Less “Thirsty”: Uncovering and Addressing thr Secret Water Footprint of AI Models https://arxiv.org/pdf/2304.03271

This is a Big Problem for the Big Tech companies.

The solu ti on they found

The solu ti on they found They invested in two
startups that produces small reactors. They are paying to revive the shuttered Three Mile Island nuclear power plant in Pennsylvania They have been stopped by 🐝🐝🐝🐝🐝

Mi ti ga ti on Strategies

Mi ti ga ti on Strategies Specialized Hardware Improve the
E ff i ciency of the accelerators Hardware solu ti ons

Mi ti ga ti on Strategies Specialized Hardware Improve the
E ff i ciency of the accelerators Hardware solu ti ons Etched: An ASIC Specialized for Transformer based Model inference

Fine-Tuning Quan ti za ti on Small language models New
training Algorithms Mi ti ga ti on Strategies Specialized Hardware Improve the E ff i ciency of the accelerators Hardware solu ti ons Algorithmic Techniques

Fine-tuning Llama 3.2 70B The big Genera ti ve Models
try to do a lot of things at once This reduces the training cost but increases the inference Fine-tuning can create smaller models that are more specialized and consume less energy.

Small Language models and On-device LLMs

Quan ti za ti on Llama 3.2 70B Modern transformer-based
LLMs rely heavily on matrix mul ti plica ti on The quan ti za ti on converts the weights from high-precision values to lower-precision ones. Example: the weighs are converted from 32-bit fl oa ti ng-point number to an 8-bit integer Llama 3.2 70B

Matmul free Language models Modern transformer- based LLMs rely heavily
on matrix mul ti plica ti on opera ti ons Can we reduce the matrix mul ti plica ti on opera ti ons?

Matmul free Language models

Matmul free Language models Instead of using MatMul, they proposed
to use addi ti ve opera ti ons and ternary weights. The goal is to have a compe ti ti ve performance while reducing resource demands.

Matmul free Language models

As a researcher I can report as much details as
I can on the training details and energy cost Conclusion

I can on the training details and energy cost As user of Gen AI Tools I can try to exploit as much as possible the on-device models Conclusion

I can on the training details and energy cost As user of Gen AI Tools I can try to exploit as much as possible the on-device models As a developer that wants to implement Gen AI tools inside their app I could try to fi nd a balance between energy e ffi ciency and “u ti lity” of the tool Conclusion

I can on the training details and energy cost As user of Gen AI Tools I can try to exploit as much as possible the on-device models As a developer that wants to implement Gen AI tools inside their app I would try to fi nd a balance between energy e ffi ciency and “u ti lity” of the tool I’m op ti mis ti c on this topic . Different companies are compe ti ng to create the best possible Gen AI. It is useful for them to have energy e ffi cient models. Conclusion

Thanks https://lucacorbucci.me/

References -AI/data centers' global power surge and the Sustainability impact
https:// www.goldmansachs.com/images/migrated/insights/pages/gs-research/gs- sustain-genera ti onal-growth-ai-data-centers-global-power-surge-and-the- sustainability-impact/sustain-data-center-redac ti on.pdf -Making AI Less “Thirsty”: Uncovering and Addressing the Secret Water Footprint of AI Models https://arxiv.org/pdf/2304.03271 -AI Is Taking Water From the Desert https://www.theatlan ti c.com/technology/ archive/2024/03/ai-water-climate-microsoft/677602/ -Power Hungry Processing: Watts Driving the Cost of AI Deployment? https:// arxiv.org/abs/2311.16863 -MatMulfree LM https://huggingface.co/collec ti ons/ridger/matmulfree- lm-665f4d2b4e4648756e0dd13c -A Guide to Quan ti za ti on in LLMs https://symbl.ai/developers/blog/a-guide-to- quan ti za ti on-in-llms/

References -🌸Introducing The World’s Largest Open Mul ti lingual Language
Model: BLOOM🌸 https://bigscience.huggingface.co/blog/bloom -Sasha Luccioni https://x.com/SashaMTL -Code Carbon https://github.com/mlco2/codecarbon -Hungry for Energy, Amazon, Google and Microsoft Turn to Nuclear powerhttps:// www.ny ti mes.com/2024/10/16/business/energy-environment/amazon-google- microsoft-nuclear-energy.html -Addi ti on is All You Need for Energy-e ff i cient Language Models https://arxiv.org/ abs/2410.00907 -Energy-e ff i cient Language Models https://arxiv.org/abs/2410.00907

AI'S Hidden cost: The environmental impact of M...

AI'S Hidden cost: The environmental impact of Machine Learning

More Decks by Luca Corbucci

Other Decks in Science

Featured

Transcript