Serverless Computing and AI/ML — Balancing Performance, Cost, and Sustainability

In this post, I’d like to share some thoughts on serverless computing, especially in the context of AI/ML. I’ve kept the content light on technical jargon so it’s easier to follow, even if you’re not deeply familiar with these concepts. Feel free to comment and share your thoughts!

Running powerful computing systems isn’t just about getting fast results — it also comes with an energy cost.

Imagine you have a high-performance server designed for AI/ML tasks. Even when there’s no active training or inference job, the server still consumes power. Why? Because essential hardware like the GPU, CPU, and NIC (Network Interface Card) need to stay on standby, ready to process data when a job arrives.

In short, ‘idle’ doesn’t mean ‘off.’ Even when a system is doing nothing, it’s still using energy — like keeping a car engine idling at a red light. The power consumption doesn’t hit zero.

You might have heard that Generative AI (Gen-AI) has an environmental footprint. That might sound surprising at first — how could training a language model or generating AI art contribute to environmental concerns?

Here’s the catch: Gen-AI models are computationally demanding — both during training and inference. This means they require powerful servers that consume substantial energy.

Where does this energy come from? Often from power plants that burn fossil fuels, contributing to carbon emissions. Even before Gen-AI, computing systems weren’t exactly “green,” but the increasing demand for Gen-AI models has made the challenge even bigger.

Fortunately, there are ways to reduce this energy impact without compromising much on performance. Techniques like:
✅ Model Pruning — Removing less important parts of a model to reduce its size.
✅ Quantization — Using lower precision data types to minimize computation.
✅ Distillation — Training a smaller model to mimic a larger one.

These methods shrink model sizes, reduce computational overhead, and improve speed — all while maintaining solid prediction accuracy.

Many researchers and open-source communities are working hard to improve these techniques, helping AI systems become faster, leaner, and more Eco-friendly.

Now, what about those idle servers still consuming power?

Imagine deploying a GenAI model inside a Virtual Machine (VM). Even if no ML job is running, the VM’s vCPU or vGPU stays powered on — still drawing energy.

One simple fix could be to shut down the VM when it’s idle and spin it back up when needed. This saves energy but comes with a trade-off: you lose high availability — meaning there’s a delay when spinning the VM back up. For some applications, that’s not acceptable.

Enter Serverless Computing — A Smarter Solution

This is where serverless computing makes a big difference.

In a serverless setup, your VM behaves like it’s “off” when idle but automatically turns “on” when needed. Think of it like motion-sensor lights — they stay off until someone enters the room, then turn on automatically. Serverless computing helps reduce energy consumption, cuts costs, and makes your infrastructure smarter. It’s a win for your budget and the environment.

In my next post, I’ll dive deeper into serverless computing, how it works, and why it’s gaining traction in AI/ML deployments.

Stay tuned!

	Docker and Podman or… on Docker or Podman, whichever fi…
	Why precision matter… on Hardware role for ML
	Hardware role for ML… on Why precision matters, use…
	Hardware role for ML… on Neural network, Dive Deep…
	Neural network, dive… on MicroService concept, philosop…

joyantablog

One thought on “Serverless Computing and AI/ML — Balancing Performance, Cost, and Sustainability”

Leave a comment Cancel reply

Enter Serverless Computing — A Smarter Solution

Share this:

Related

Share this:

One thought on “Serverless Computing and AI/ML — Balancing Performance, Cost, and Sustainability”

Leave a comment Cancel reply