ECS vs Lambda for Web APIs: When Functions Stop Making Sense


Over the past few years, AWS Lambda has often been one of the first options people consider when building web APIs on AWS — especially for greenfield projects, internal tools, and teams adopting a serverless-first mindset.

Small payloads? Lambda.
Event-driven? Lambda.
Low or unpredictable traffic? Lambda.

And to be clear — Lambda is a great product. We use it extensively.

But after building and maintaining SaaS APIs with real users, dashboards, background jobs, and enterprise performance expectations, I no longer see function-based serverless as the default choice for response-time sensitive, user-facing web APIs.

In many cases, container-based serverless (with ECS Fargate) ends up being a better architectural fit.

This isn’t about serverless vs servers. It’s about functions vs containers, on-demand execution vs always-warm runtimes, and where each model breaks down for response-time sensitive APIs.


1. Why Lambda is often an early choice for APIs

Lambda’s appeal is easy to understand:

  • No servers to manage
  • Automatic scaling
  • Pay only for what you use
  • Tight integration with API Gateway

For simple APIs, internal tools, background jobs, or async workflows, this model works extremely well.

It’s common to see early architectures built around:

  • API Gateway
  • Lambda
  • DynamoDB or RDS

For many workloads, this setup is more than sufficient — and sometimes ideal.


2. The real decision factor isn’t execution time — it’s end-to-end response time

A rule of thumb that comes up frequently is:

“If the workload finishes in under 15 minutes, Lambda is a good fit.”

Technically, that’s true. But architecturally, it’s incomplete.

For many systems, execution time alone isn’t the real constraint — the total time it takes for a request to complete from the user’s perspective is.

If you’re running a batch job or background process, a cold start adding a second or two is usually irrelevant. But when you’re building a user-facing API for an enterprise application, that extra time directly impacts perceived performance.

Consider a realistic API flow:

  • A user hits an API endpoint
  • That request triggers another internal API call
  • That second call hits a cold start
  • A third downstream service may also be cold

Suddenly, the response time includes:

  • Seconds of startup delay due to cold starts
  • Plus the actual execution time
  • Plus network latency (depending on where the user and services are located)

Even if each individual delay is relatively small, chained calls amplify the problem quickly. The end result is a UI that intermittently feels slow, even though nothing is technically failing.


3. Lambda has improved — but the trade-offs still exist

To be fair, AWS has invested heavily in improving Lambda over the years, particularly around cold-start performance. Today, there are several ways to reduce cold-start impact:

  • Provisioned Concurrency
  • Lambda SnapStart (for supported runtimes)
  • Scheduled warm-ups using EventBridge

These options can be very effective, especially for response-time sensitive workloads.

On the .NET side, improvements have also been substantial. With .NET 8 Native AOT, cold-start times can be reduced dramatically, and some benchmarks report startup times as low as ~50ms under ideal conditions.

However, it’s important to understand what those numbers represent. They typically apply to minimal APIs or “hello world” scenarios. As soon as you introduce real-world concerns — logging, authentication, configuration, database clients, dependency injection — startup time increases, sometimes significantly.

Provisioned Concurrency can largely eliminate cold starts by keeping execution environments warm. But at that point you’re explicitly trading on-demand execution for predictability, which starts to undermine one of Lambda’s core advantages (paying only when code runs).


4. Per-function tuning: powerful, but adds friction

One of Lambda’s strengths is the ability to tune memory (and CPU) on a per-function basis.

Because Lambda pricing is tied to execution time and allocated resources, increasing memory (and therefore CPU) can improve performance and sometimes reduce costs per execution. This can be a real advantage and can lead to faster responses.

The challenge isn’t whether this works — it’s what it requires at scale.

Each function effectively becomes its own performance profile:

  • Too little memory → slower execution and longer response times
  • More memory → faster execution, until performance gains level off while cost continues to increase

Finding the right balance usually means benchmarking, tuning, and revisiting these settings over time. For a handful of functions, this is manageable. For dozens of API endpoints, it becomes ongoing work.

At that point, the primary cost concern isn’t the Lambda bill — it’s developer time. Every function might need attention, tuning, and validation. In most teams, the time spent configuring and maintaining optimal performance across many individual functions is far more expensive than the infrastructure itself.


5. Configuration and operational friction

As Lambda-based systems grow, configuration complexity often grows with them.

When using AWS SAM or CloudFormation, the template (for example a serverless.template file) tends to sit in an uncomfortable middle ground:

  • Partially auto-generated
  • Partially manually maintained
  • Easy to break unintentionally in PRs
  • Prone to merge conflicts in team environments

When new functions are added, some parts of the template are updated automatically, while others still require manual changes (environment variables, permissions, project associations, and so on). Over time, this creates a subtle risk: automatic updates can overwrite or conflict with manual adjustments if teams aren’t careful.

Globals and shared configurations help reduce duplication, but they don’t eliminate the need for manual wiring. None of this is catastrophic, but it adds ongoing cognitive overhead and requires extra discipline during reviews and deployments.


6. Scaling, concurrency, and database pressure

Lambda can handle traffic spikes very well — provided you plan for them.

If you approach higher concurrency levels, you typically need to:

  • Request concurrency limit increases
  • Protect your database from connection storms (e.g. via RDS Proxy)
  • Introduce a proxy/pooling layer for external resources

By default, AWS imposes a concurrent execution limit of 1,000 per account per Region, shared across all Lambda functions. If this limit is reached, new invocations will be throttled until capacity becomes available. While this limit can be increased by AWS on request, teams need to be aware of it as traffic grows.

Without precautions like connection pooling or throttling, Lambda can scale itself correctly while downstream systems struggle. For example, a burst of Lambda invocations can overwhelm a database if each function opens a new connection. It’s on the architecture team to mitigate that.

It’s also worth noting that cold starts are not a one-time event per function. Lambda scales by creating new execution environments as concurrency increases. If an existing environment is busy handling a request, an additional concurrent request will trigger a new instance of the function, each with its own cold start — even though a warm instance already exists.

During a sudden traffic spike, the first wave of concurrent requests could all hit cold starts in parallel, which contributes to inconsistent response times even if the function had been invoked recently under lower load.

ECS behaves differently. Scaling containers is generally slower but more controlled. A service might take longer to spin up additional tasks, but once running, each container can handle many requests sequentially.

Because containers are long-running processes, applications can typically reuse connection pools and other in-memory resources across requests, which tends to behave more predictably under sustained load once tasks are running.


7. ECS for APIs: predictable and boring (in a good way)

ECS, especially when paired with Fargate, is also serverless in the sense that you don’t manage servers or EC2 instances directly. The distinction here isn’t serverless vs non-serverless, but containers vs functions.

With ECS/Fargate for an API service, you get:

  • No cold starts on requests (the container is already running)
  • Stable CPU and memory resources for each task
  • Natural reuse of in-memory resources within the container (like connection pools)
  • A clear, flat cost model (tasks running for a fixed time at a fixed price)

There is more setup upfront — building Docker images, defining your task and service, configuring scaling policies — but once deployed, the system behaves consistently. You’re optimizing the service as a whole (e.g. adjusting task size or count), not wrangling dozens of independent function configurations.

In practice, ECS often feels more predictable and, frankly, a bit boring — which is a good thing for a core API. There are fewer surprises in how it performs under load.


8. The hidden cost: developer effort

Although Lambda is conceptually simple, using it for APIs can involve bending it to fit scenarios it wasn’t originally designed for:

  • Crafting warm-up strategies
  • Tuning provisioned concurrency levels
  • Setting per-function reserved concurrency limits
  • Per-function memory optimization

Each of these problems is solvable, but together they represent ongoing effort. In many cases, the engineering hours spent on these optimizations outweigh the infrastructure savings that Lambda initially promised.

It’s not that Lambda can’t be made to work for high-performance APIs — it absolutely can. The question is how much complexity you’re willing to manage for the benefits it provides.


9. Local development and debugging

While Lambda tooling has improved significantly, container-based APIs are still generally easier to debug and iterate on locally:

  • You’re running the same environment locally as in production (Docker container matching your service)
  • Standard debugging tools (breakpoints, local logs) work out of the box
  • Fewer mocks and emulators are needed for basic tests
  • You get faster feedback loops (run the container/service locally and hit the endpoint, without packaging/deploying)

As teams and codebases grow, these differences become increasingly important.


10. When Lambda is the right tool

All that said, Lambda remains an excellent choice for many scenarios. I still reach for it regularly, but in a more targeted way:

  • Background jobs
  • Event-driven workflows
  • File processing
  • Webhooks
  • Async and scheduled tasks

Used intentionally for those use cases, it excels. You get fantastic scalability and zero-idle-cost for workloads that don’t need to respond in milliseconds.

It’s just not always the best fit for response-time sensitive, user-facing web APIs.


11. Key takeaways

  • Lambda and ECS are both “serverless” — the real choice is between functions and containers as the unit of deployment
  • Cold starts matter for user-facing APIs — unpredictable seconds delays are a big deal
  • Lambda offers flexibility, but consistent low response times require careful tuning (or paying for warm capacity)
  • ECS tends to offer more consistent performance — always-warm containers mean requests are served immediately
  • Developer time often costs more than infrastructure
  • Hybrid architectures are common — ECS for core APIs, Lambda for async tasks and spiky workloads often hits the sweet spot

As with most things in software architecture, there’s no single right answer. Different teams, workloads, and constraints will lead to different conclusions.

From my experience, container-based approaches (ECS/Fargate) have often been a better fit for response-time sensitive, user-facing APIs, while Lambda shines in event-driven and asynchronous workloads.

If you’re building an API today, don’t choose Lambda by default just because it’s serverless. Ask instead:

Is this workload better served by short-lived, on-demand functions — or by long-running, always-warm containers?

For user-facing APIs, that distinction tends to matter far more than the label.