Norma

From naive agent deployment to scalable architectures: Building Asynchronous,observable Workflows on GCP

Over the past year, we’ve built and now operate an agentic platform designed to manage prospect interactions from the very first touchpoint to lead conversion. This means supporting multi-turn, multi-channel conversations that range from simple qualification to complex, context-aware exchanges.

When we first shipped our conversational agent, we went with the most straightforward setup possible: a FastAPI app exposed behind an API Gateway, hosted in a Cloud Run container. The agent logic (built with LangGraph) lived in that same container, and we used LangSmith for monitoring and debugging.

This made sense at the time, we wanted fast iterations, low overhead, and something that “just worked.” And for a few months, it did. Our main focus at that time was to have a stable and performing agentic system.

But as we transitioned from prototyping to real-world usage, the simplicity that had served us well began to show its limits.

Why we had to change

The first sign came from response time variability. Some queries completed in under a second. Others required long database queries or called external services, and could easily take 30 seconds or more. We had no way to predict it in advance.

Keeping services waiting for that long, while holding a synchronous connection open, was a poor experience and a fragile design choice. Any failure along the way would bubble up directly to the client, with no retry or recovery strategy.

Then came the traceability issue. LangSmith was helpful for tracking the agent’s internal logic, but we were blind to everything outside of it. We didn’t know when or how requests entered the system, how long processing took, or whether delays came from our infrastructure or the agent itself. Also, error analysis was not easy to achieve since we did not track all metadata of the request, including code and error descriptions.

At the same time, a new challenge emerged: controlling the timing of processing. Since we work with text and instant messaging, we couldn’t just forward each input in real time. Some messages were too fragmented, some needed to be grouped. We had to know when to start processing, define windows, and decide exactly what context to send to the LangGraph server. Without this kind of control, the agent’s behavior was inconsistent and hard to optimize.

And finally, there was the question of long-term visibility. We wanted to spot slow regressions, errors that only happen once a week, and performance changes across agent releases. LangSmith logs weren’t built for that kind of analysis.

It became clear: we needed a better architecture, not just for performance, but for resilience, observability, and full control over how and when inputs are processed.

This article explains how we restructured our system to support asynchronous processing, streaming data pipelines, and deep analytics, all while keeping the agent logic isolated and reusable.

Moving Past the Naive Setup

The initial setup was simple:

The client sent a request to our API Gateway.
The Gateway forwarded it to a FastAPI app hosted on Cloud Run.
The app ran the LangGraph agent logic and returned a response.
LangSmith tracked the agent’s execution for debugging.

It was easy to deploy, easy to test, and everything was contained in one place.

But the issues began to multiply:

Long or resource-intensive requests blocked the container.
Errors in agent logic took down the whole stack.
No buffering, no retries, no decoupling.
Difficult to trace anything outside LangSmith’s scope.

We didn’t just need to move to async, we needed to split responsibilities, add streaming mechanisms, and give ourselves the ability to observe and evolve each part independently.

Designing the scalable architecture

We rebuilt the system around GCP-native tools: Cloud Functions, Pub/Sub, and Dataflow. Instead of treating requests as isolated transactions, we started treating them as messages flowing through a distributed pipeline.

*End-to-end architecture for async message processing and agent execution.*

Here’s how it works now:

A client sends a message through our API Gateway and is forwarded to a lightweight Cloud Function (the orchestrator)
The orchestrator attaches metadata, a UUID, timestamps, routing tags, and publishes it to a Pub/Sub topic.
A streaming Dataflow pipeline subscribes to this topic, batches and windows the events, and invokes the LangGraph SDK via its async API.
The response is published to a second Pub/Sub topic.
A second Cloud Function consumes these responses and calls the client’s callback endpoint with the enriched result.

In many cases, we receive fragmented user input, short messages, incomplete prompts, or sequences that only make sense in aggregate. Calling the agent on every message is wasteful and leads to inconsistent results.

With windowing, we buffer inputs and only process them once a window closes, for example, after collecting a batch of related messages.

Sync isn’t dead..It’s just optional

Not every request goes through the async pipeline. While most production traffic benefits from decoupling and streaming, we kept a synchronous path available.

Some consumer services interacting with our system aren’t designed for async communication. In particular, our automated testing and evaluation platform expects immediate responses to maintain strict execution order and build reliable dashboards. Introducing async behavior there would have made test results inconsistent and harder to interpret.

To support those use cases, the orchestrator can bypass Pub/Sub entirely and call the LangGraph SDK directly. This path is used sparingly, but it gives us the flexibility to integrate with legacy systems or controlled environments where determinism and low latency matter more than throughput.

Observability across the full system

LangSmith gives great insight into what happens inside the agent, tool calls, reasoning steps, error stacks, and it’s still an essential part of our support and debugging workflow.

But we needed observability across the entire system: from the moment a message enters, through preprocessing and orchestration, to the moment a response is sent back regardless of whether the agent executes or not.

Thanks to our Pub/Sub-based architecture, we now have full traceability of every incoming and outgoing message, including metadata like message IDs, timestamps, sources, and response codes. All of this flows through our pipeline and can be processed directly in BigQuery for analytics.

To complement this, we also built a daily export pipeline using GCP-native components:

*Daily trace and performance export pipeline, used to build Looker dashboards.*

Every 24 hours:

A Cloud Scheduler triggers a Cloud Function that calls the LangGraph bulk export API.
The results are stored as partitioned Parquet files in GCS.
A batch Dataflow job processes the files, flattens the schema, adds SLA or error flags, and writes to structured BigQuery tables.
Looker dashboards built on top of these tables display trends, latency percentiles, error distributions, and usage insights.

We now have longitudinal visibility across agent and system behavior. We can:

Track how performance evolves across versions,
Detect rare issues that only appear over time,
Reconstruct full message journeys end to end.
Bonus: build a business dashboard about the ROI of the system.

All of this is embedded directly into our admin tools and dashboards, making performance monitoring and debugging part of our daily workflow.

How responsibilities are split

Each component now has a clear role:

API Gateway + Orchestrator: Handle secure ingress, validate and shape requests, enrich them with metadata, and dispatch them to the appropriate processing path (sync or async).
Pub/Sub: Decouple incoming request ingestion from downstream processing, providing buffering and resilience during peak loads or failures.
Dataflow: Act as the streaming orchestrator. It windows and batches incoming messages, applies pre-processing logic, and invokes the LangGraph Server via its async API.
LangGraph Server (Cloud Run): Hosts the agentic logic using a ReAct architecture. This service receives structured prompts and executes reasoning chains, tool invocations, and response generation in a serverless, scalable environment.
Cloud SQL (PostgreSQL): Stores langgraph’s checkpoints and conversation history.
Redis: Used as a task queue for langgraph.
Cloud Function (Response Dispatcher): Listens to the outbound Pub/Sub topic and forwards enriched agent responses to client callback endpoints.
Scheduler + GCS: Handles daily trace extraction, schema normalization, and writes analytics data to BigQuery for downstream consumption.
BigQuery + Looker: Power the observability layer with dashboards on usage, latency, errors, and trends.

This separation of concerns makes debugging easier, upgrades safer, and scaling more predictable.

Conclusion

The naive deployment, while useful for fast prototyping, couldn’t meet the real-world demands of latency variability, observability, and failure isolation.

By moving to an asynchronous pipeline backed by Pub/Sub and Dataflow, and surrounding it with export pipelines and dashboards, we turned an opaque system into a transparent, traceable flow. We can now reason about our agent workflows, monitor them over time, and improve them continuously.

It also made future development easier. New services can subscribe to the same streams, we can introduce new preprocessing logic without breaking the core agent, and we can confidently scale usage across clients.

This architecture didn’t just solve today’s problems, it gave us a foundation we trust for what comes next.

From naive agent deployment to scalable architectures: Building Asynchronous,observable Workflows…

From naive agent deployment to scalable architectures: Building Asynchronous,observable Workflows on GCP

Why we had to change

Moving Past the Naive Setup

Designing the scalable architecture

Sync isn’t dead..It’s just optional

Observability across the full system

How responsibilities are split

Conclusion

Building Conversational AI: A Deep Dive into Voice Agent Architectures and Best Practices

Building an AI Agent for Production Monitoring at the Agentic Era Hackathon

Google Agentic Era Hackathon

Find a better way to
customAI solutions

From naive agent deployment to scalable architectures: Building Asynchronous,observable Workflows…

From naive agent deployment to scalable architectures: Building Asynchronous,observable Workflows on GCP

Why we had to change

Moving Past the Naive Setup

Designing the scalable architecture

Sync isn’t dead..It’s just optional

Observability across the full system

How responsibilities are split

Conclusion

Find a better way to customAI solutions

Find a better way to
customAI solutions