Serverless Computing (Dive Deeper)


When reading my articles, you might notice that some ideas are simplified — and that’s intentional. I know that not everyone comes from a technical background, and I want these posts to be as approachable as possible. That’s also why I welcome conversation in the comments — to dive deeper, clarify, and learn together. My audience comes from all kinds of fields — not just IT. Whether you’re in biotech, e-commerce, business, or something entirely different, my goal is to share the exciting, sometimes complex world of computer systems in a way that feels clear and engaging.

Let’s start talking about my previous post related to serverless computing. In my previous post here, I highlighted that the best way to balance energy efficiency and cost savings is to ensure that the VM or server hosting GenAI models turns off when idle and powers back on only when needed for training or inference. This is where serverless computing comes in. Imagine your virtual machine behaving like a light with a motion sensor — turning off when no one’s around and turning on when needed.

The concept behind serverless is simple: the virtual machine powers on only when required and shuts down when idle. The delay between initiating a request and the virtual machine being ready is known as the cold start time.


Now, you might wonder:
“Why isn’t every cloud solution built this way?”


The answer lies in serverless limitations. The first challenge is that serverless computing is stateless by design. An analogy can be drawn to a calculator — every time you turn it off, it forgets the previous calculation. Similarly, serverless functions forget previous data after each run. Developers must rely on external storage to manage state between tasks.

The second challenge is the cold start period itself. When a virtual machine starts, it must load the operating system, services, and dependencies — all of which take time. A typical Ubuntu VM restart usually takes a few seconds to a couple of minutes, but can vary depending on factors like hardware, software, and the configuration— far too slow for real-time AI/ML inference applications.

The solution lies in lightweight virtual machine. Lightweight virtual techniques like AWS Firecracker can start VMs in just 100 milliseconds by using a minimal OS setup. Minimum OS setup means setting up only those boot components that are adequate to run the assigned task (e.g. Model inference). Very popular serverless technique AWS Lambda uses AWS Firecracker for the backbone. Similarly, Google Cloud Run offers quick activation for on-demand workloads.


Yeah, we could stop here. We already have solutions that can spin up in under 100 milliseconds. So imagine this — when a heavy machine learning job comes in, like model inference (a.k.a. making a prediction), we just start the virtual machine at that moment, run the ML model, and return the result. Simple math: total response time = cold start time (say 100 Milliseconds using high-end virtual machine technology) + the actual inference time.

But here’s the catch: for a GPU-based, generative AI model, the inference itself might take less than 100 milliseconds. And for a lightweight ML model — say a linear regression with just a few parameters — the inference time is definitely going to be even faster, especially when using a GPU. So in such cases, that 100 ms cold start delay becomes a bottleneck. Depending on the use case, it can seriously affect user experience or performance expectations.

And one more thing — if you’re turning your virtual machines ON and OFF frequently just to be clever with resource usage, then honestly? I personally believe it might be better to just keep it always ON, instead of building some complex ON/OFF scheduling system. Sometimes, simple and always-available beats smart and overly complicated.

Let’s take a moment to recall how we usually deploy an application in the cloud. Typically, the application runs inside a virtual machine, which itself runs on a powerful physical server located somewhere in cloud.

That physical server uses tools like VMware, Vagrant, or libvirt to manage these virtual machines. These tools help create, delete, and allocate resources like CPU and memory for each virtual machine.

To create a virtual machine, the tool takes a base operating system — such as Ubuntu, CentOS, or Windows — and sets up a virtual environment with the necessary system resources.

Once the virtual machine is ready, you install all the dependencies and prerequisites your application needs, and then finally run your application inside that virtual machine.

Running applications on virtual machines in the cloud gives us a lot of flexibility. But the way we manage those machines can be quite different. In some setups, all virtual machines stay ON all the time — ready to go, but using up resources. In others, the machines are turned on only when needed — which saves resources, but can cause delays when things need to start quickly.

So, a balanced approach works better in many cases. Instead of keeping everything running or starting from scratch each time, we can keep a few powerful virtual machines on standby — with lots of CPU, memory, and storage. Then, whenever there’s work to do, we send it to those machines. It’s a nice trade-off between speed and efficiency.

But even this setup has its limits — especially when we move toward serverless computing, where we expect things to happen instantly. In a serverless world, you don’t want to wait for a virtual machine to boot up or for software to install. You just want your code to run the moment it’s needed.

This is where containers make a big difference.

Unlike virtual machines, containers are much smaller and faster. They don’t need to load an entire operating system — they just include the app and whatever it needs to run. That means they can start up in seconds, sometimes even milliseconds. This speed makes containers perfect for serverless platforms, where functions need to respond quickly and scale automatically without wasting resources.

Stay tuned — in the next article, I’ll wrap up our discussion on serverless by exploring how containerization makes it all possible behind the scenes. From there, we’ll dive into the world of microservices and how they tie into modern cloud architectures.

One thought on “Serverless Computing (Dive Deeper)

Leave a comment