Blog Strategy June 17, 2026 7 min read

The 5 Architectural Decisions That Determine Whether Your AI Pilot Survives Production

thessia-labs-five-architectural-decisions

Five architectural decisions made in the first week of an AI pilot determine whether it ever reaches production. Most pilots make all five by accident, default to whatever was easiest in the moment, and discover six months later that the production rebuild costs more than the original build.

The five decisions are: where the data comes from, how the model is served, how the AI integrates with the rest of the stack, what gets monitored from day one, and how governance and rollback work. None of them are flashy. All of them are reversible only at significant cost.

Below is what each decision actually involves, the default that kills production pilots, and the question to ask in week one so you do not end up there.

Decision 1: Where the Data Comes From

The pilot needs data. The question is which source.

Most pilots wire up to whatever was easiest in week one, which is usually the analytics warehouse or a CSV export. The analytics warehouse has clean, governed data. It is also three days stale, which is fine for a demo and useless for any use case that needs to respond to current state.

The production version of the same use case almost always needs operational data with sub-second latency. The team that built the pilot on warehouse data has to rebuild the integration against the operational system, which is a different team, a different access pattern, a different security model, and a different SLA. That rebuild is the work that kills the timeline.

The week one question is not "where can we get data fast." It is "what is the latency this use case actually requires in production, and does the source we are wiring up support that latency under production load?"

If the answer is no, the architecture is wrong. Fix it before the pilot ships, not after.

Decision 2: How the Model Is Served

You can use a managed AI platform (Vertex AI, Bedrock, Azure ML, or similar) or build your own container infrastructure for model serving. This is the most reversible-looking decision in the list. It is actually the least reversible, because the choice locks in your scaling profile, your monitoring tooling, your cost model, and your team's operational footprint.

Managed platforms give you scaling, security defaults, baseline monitoring, and a transparent cost model. They also charge a premium per request and limit your ability to customize the serving layer.

Custom containers give you flexibility, lower per-token cost at scale, and full control over the serving stack. They also require you to build everything the managed platform gives you for free, and to staff someone to operate it past launch.

For mid market organizations, the default should be a managed platform unless there is a specific technical or cost reason to do otherwise. The reasoning is straightforward. A managed platform lets a small team ship a production AI system. A custom container needs an MLOps function that most mid market organizations do not have and cannot afford to build for a single pilot.

The week one question is "do we have a named operator who will run this custom serving stack in production." If the answer is no, use the managed platform.

Decision 3: How the AI Integrates with the Rest of the Stack

Once the model is serving predictions, how does the rest of your stack use them? Three patterns dominate.

API first means the AI exposes a clean API and every consumer builds a client. Strong separation, easy to evolve, but every new consumer is a small project.

Embedded means the AI is bolted directly into one application. Fast to demo, painful to evolve, and impossible to share across teams without rebuilding.

Event driven means the AI consumes events from a queue and publishes results back. Scalable, async, and excellent for high volume use cases, but requires existing event infrastructure.

Most pilots default to embedded because it is the fastest path to a demo. That is the right call for the pilot and the wrong call for anything that needs to scale beyond one team. The embedded pattern looks like it works until the second team asks for the same capability and the first team realizes they have to rebuild from scratch to share it.

The week one question is "will more than one team eventually consume this AI." If yes, do not embed. Use API first or event driven from the start.

Decision 4: What Gets Instrumented from Day One

This is the decision most teams skip because the pilot does not need it yet. Production absolutely does.

The instrumentation that matters from day one:

Input distribution monitoring tells you when the data feeding the model has shifted from what you trained on. Without it, the model degrades silently.

Output distribution monitoring tells you when the model itself has drifted. Without it, you find out from a customer complaint.

User feedback capture means the operators can flag a wrong output and the team can review it. Without it, the model is a black box even to its owners.

Latency and error rate monitoring is operational health. Without it, the first outage is also the first time anyone notices something is wrong.

Cost per query is financial control. AI workloads can drift from thousands of dollars per month to tens of thousands without a corresponding increase in usage. Without cost monitoring, you find out from finance.

Adding all five to the pilot in week one takes about a week of engineering effort. Adding them after launch takes a month of retrofitting, with a much harder political case to justify the spend.

The week one question is "if this AI degrades in production, how do we know within an hour and not from a customer call." If you cannot answer that, instrumentation is not in place.

Decision 5: Governance and Rollback

The final decision is the one nobody wants to make in week one because it feels premature. It is not premature. It is the decision that lets the AI ship safely.

Four questions need answers before the pilot is allowed to touch production.

Who approves what the AI can do? The model has a scope. Someone needs to own that scope and approve any expansion of it.

How does the AI get disabled if it starts going wrong? There must be a documented kill switch, a manual fallback path, and a named person who has the authority to pull it.

What is the manual fallback? When the AI is off, the work still has to get done. The fallback process must exist and someone must have practiced it.

Who is paged when it breaks? Production AI breaks. Someone must be on call. If the answer is "the data scientist who built it," the AI is not production ready.

These four answers do not require a 30 person governance function. They require a one page document and a real owner. Most mid market organizations can produce both in a week.

The week one question is "if this AI made a wrong decision that cost a customer something tomorrow, what would happen next?" If the answer involves "we would figure it out," governance is not in place.

The 1 Page Pilot Architecture Checklist

Before any week one architecture work starts, the team should be able to answer all of the following on a single page. If they cannot, the pilot is not ready to build.

  • Data source: What system, what latency, what production load profile, what owner.
  • Model serving: Managed or custom, named operator, monitoring tooling, cost model.
  • Integration surface: API first, embedded, or event driven, and the reason for the choice.
  • Instrumentation: Input distribution, output distribution, user feedback, latency and error rate, cost per query. All five before launch.
  • Governance and rollback: Scope owner, kill switch, manual fallback, on call rotation.

A pilot that can answer all five categories has a real chance at production. A pilot that cannot is a demo with extra steps.

The Choice Is Architectural, Not Technical

The five decisions above are not about which model to use. They are about whether the system around the model can survive contact with real users, real data, and real operational pressure. Most pilots fail because the team focused on the model and skipped the system. The 12 percent that succeed got the system right in week one.

Curious whether your organization is ready to make these five decisions well?

The Thessia Enterprise AI Readiness Assessment scores your organization across the readiness dimensions that determine whether the architecture above can actually be executed. Fifteen questions, five minutes, free.

Frequently asked questions

Q1: Why can't I just use our analytics warehouse for the AI pilot data source?
The analytics warehouse is typically the easiest data source to connect in week one, but it's usually 2 to 3 days stale. While that's fine for demos, production AI that needs to respond to current state requires operational data with sub-second latency. If you build the pilot on warehouse data, you'll likely need to rebuild the entire integration against operational systems later, a different team, access pattern, security model, and SLA. That rebuild is what kills most production timelines.
Q2: When should we choose custom container serving over a managed AI platform?
For most mid-market organizations, the default should be a managed platform (like Vertex AI, Bedrock, or Azure ML) unless there's a specific technical or cost reason to go custom. Managed platforms let a small team ship production AI with built in scaling, security, monitoring, and transparent costs. Custom containers require you to build and operate all of that yourself, which demands an MLOps function most mid-market organizations don't have. The key week one question: Do you have a named operator who will run the custom stack in production? If not, use managed.
Q3: What instrumentation do we actually need before launch, and why can't we add it later?
You need five things instrumented from day one: (1) input distribution monitoring to catch data drift, (2) output distribution monitoring to catch model drift, (3) user feedback capture so operators can flag bad outputs, (4) latency and error rate monitoring for operational health, and (5) cost per query for financial control. Adding all five during the pilot takes about a week of engineering. Retrofitting them after launch takes roughly a month and comes with a much harder political case to justify the spend. Without them, you'll likely find out about degradation from customer complaints or finance, not from your own systems.
Published June 17, 2026
Share LinkedIn X Email Back to the blog
Partner with Thessia Labs

Make impact with AI delivery

Turn strategy into working AI, data, and cloud outcomes with Thessia Labs.

Start a conversation