Cloud Architecture Options for CATz Deployments

Every CATz engagement runs somewhere. This article is the decision framework for where.

The CATz stack has specific infrastructure requirements that interact differently with each cloud platform's pricing, managed service availability, and operational posture. Getting this decision right during Scaffold is load-bearing: the wrong choice surfaces at scale, not at proof of concept.

The CATz Stack — Infrastructure Requirements

The Canary Retail platform that anchors every CATz engagement depends on five infrastructure primitives:

Requirement	Component	Notes
Relational database + vector index	PostgreSQL 17 + pgvector extension	pgvector must be available; version compatibility matters — not all managed PostgreSQL offerings run 17 at extension-compatible patch levels
In-memory data structure store	Valkey 8 (Redis-compatible)	Used for sessions, cache, Valkey Streams for the TSP ingestion pipeline. Managed offerings under the "Redis" label vary widely in compatibility
Local inference / embedding	Ollama with qwen3-embedding:8b	1024-dimension vectors; GPU-accelerated preferred but CPU workloads are viable at SMB scale — matters for container sizing
Object storage	S3-compatible	Transaction exports, vault snapshots, tenant backups
Container runtime	Flask 3 + Gunicorn	Deployed in containers; managed container options vary in operational overhead

Platform Comparison

PostgreSQL 17 + pgvector

The canonical choice for managed PostgreSQL is a function of whether pgvector is available and at what version cost. pgvector 0.7+ is required for 1024-dimension index support (half-vector compression and HNSW indexes).

Platform	Managed Offering	pgvector Support	Cost Reference (us-east-1 / us-central-1 / eastus equivalent)
AWS	RDS for PostgreSQL, Aurora PostgreSQL	pgvector available on RDS PostgreSQL 15+, Aurora PostgreSQL 15+. AWS maintains pgvector actively — version support is current.	RDS db.t3.micro (1 vCPU, 1GB RAM): ~$14/month + $0.115/GB-month storage. db.t3.small (2 vCPU, 2GB RAM): ~$28/month. Multi-AZ doubles the compute cost.
GCP	Cloud SQL for PostgreSQL	pgvector available as an extension since Cloud SQL PostgreSQL 15. AlloyDB (Aurora equivalent) supports pgvector.	Cloud SQL db-g1-small (0.5 vCPU, 1.7GB RAM): ~$25/month + $0.17/GB-month SSD storage. Shared-core pricing attractive at low scale; jumps at growth.
Azure	Azure Database for PostgreSQL Flexible Server	pgvector supported on Flexible Server (PostgreSQL 13+). Extension enable/disable is self-service in Flexible Server.	Burstable B1ms (1 vCPU, 2GB RAM): ~$14/month + $0.115/GB-month storage. Comparable to AWS at entry tier.

Verdict. AWS RDS PostgreSQL is the most operationally straightforward choice if the rest of the stack is AWS-native. Azure Flexible Server is price-competitive and the pgvector support is solid. GCP Cloud SQL is capable but the pricing model is less transparent at small scale.

Valkey / Redis

Valkey is the Redis-compatible open-source fork. Managed offerings universally market themselves as "Redis compatible" — the critical check is Valkey Stream support, which drives the TSP ingestion pipeline.

Platform	Managed Offering	Valkey/Redis Streams Support	Cost Reference
AWS	ElastiCache for Redis (OSS)	Streams supported. ElastiCache OSS Redis runs the upstream open-source codebase, not the Redis Ltd commercial version, making it Valkey-compatible.	cache.t3.micro (1 vCPU, 0.5GB): ~$13/month. cache.t3.small (1 vCPU, 1.4GB): ~$26/month. Multi-AZ: ~2x.
GCP	Memorystore for Redis (Cluster or Single Instance)	Streams supported in Memorystore for Redis 6.x+.	M1 Basic (1GB): ~$35/month. Higher floor than AWS/Azure at entry tier.
Azure	Azure Cache for Redis	Streams supported. C0 Basic (250MB) not suitable for Streams workloads; C1 Standard (1GB) is minimum viable.	C1 Standard: ~$54/month. Higher cost at entry tier; discounted substantially at reserved pricing (1-year: ~$33/month).

Verdict. AWS ElastiCache is the lowest-cost entry and has the deepest operational tooling. For self-managed deployments (including GrowDirect-managed infrastructure where the retailer has control), Valkey can run in-container alongside the Flask app on a single EC2 instance — eliminating the managed service cost entirely until scale demands it.

Ollama / GPU Inference (Embedding Workloads)

The qwen3-embedding:8b model running under Ollama generates 1024-dimension vectors for the memory bus. At SMB retailer scale (100–5,000 chunks/day being embedded), CPU-only inference is viable. GPU becomes relevant when embedding workloads exceed ~50,000 chunks/day — which corresponds roughly to a 100+ store rollout.

Platform	GPU Option	Cost Reference (A10G-class or equivalent)	CPU Alternative
AWS	EC2 g4dn.xlarge (T4 GPU, 4 vCPU, 16GB RAM)	~$0.526/hr on-demand; ~$0.188/hr Spot. For batch embedding jobs, Spot pricing is viable. ~$135/month sustained on-demand, ~$50/month as a Spot batch runner.	EC2 t3.medium (2 vCPU, 4GB RAM): ~$30/month. Viable for qwen3-embedding:8b at low throughput.
GCP	Cloud Run with GPU (NVIDIA L4, preview)	~$0.80/hr for L4-attached Cloud Run jobs. Best for bursty, not sustained, workloads.	Cloud Run CPU: per-request billing eliminates idle cost. Suitable for low-frequency embedding jobs.
Azure	NC4as T4 v3 (T4 GPU, 4 vCPU, 28GB RAM)	~$0.526/hr on-demand; reserved pricing competitive.	Standard_B2ms (2 vCPU, 8GB RAM): ~$60/month. More expensive CPU baseline than AWS.

Practical recommendation. At SMB scale, run Ollama in a container on the application server (CPU-only). The embedding workload for a 5–15 store retailer is well within CPU capacity — qwen3-embedding:8b requires ~8GB RAM at full precision but runs acceptably on a 4GB instance with quantization. Separate GPU infrastructure only when the embedding queue consistently lags or when the retailer's vault is growing faster than ~10,000 new chunks/month.

Object Storage

All three platforms offer S3-compatible object storage. The differentiating factor at CATz scale is egress cost and the cross-region transfer story when tenants are in one region and reporting consumers are in another.

Platform	Offering	Storage Cost	Egress to Internet (per GB)
AWS	S3 Standard	$0.023/GB-month	$0.09/GB (first 10TB/month)
GCP	Cloud Storage Standard	$0.020/GB-month	$0.08/GB (Americas/Europe egress)
Azure	Blob Storage (LRS)	$0.018/GB-month	$0.087/GB

At CATz scale (see method/data-economics for worked examples), storage differences are in the noise. Egress becomes the relevant cost when exporting transaction history to external analytics consumers or when migrating tenants. GCP is marginally cheaper on egress; AWS is the default because the rest of the stack already runs there.

Managed Container Options

The CATz Flask stack runs in containers (Flask 3 + Gunicorn). Managed container platforms differ on cold-start latency, persistent volume support, and operational overhead for a workload that is persistent-process (not bursty/serverless).

Platform	Offering	Fit for CATz Flask	Entry Cost
AWS	ECS Fargate, ECS on EC2	Fargate is viable for the web tier; the TSP consumers (stream processors) run better on EC2 because they are persistent processes, not request-scoped. ECS on EC2 gives more control.	Fargate: ~$0.04048/vCPU-hour + $0.004445/GB-hour. For 0.5 vCPU / 1GB: ~$15/month. EC2 t3.micro: ~$7.50/month (bare instance).
GCP	Cloud Run	Cloud Run is request-scoped; persistent stream consumers require minimum instance = 1, which changes the cost profile. Not ideal for the TSP pipeline. Cloud Run Jobs work for batch workloads.	Cloud Run (0.5 vCPU / 512MB, always-on): ~$18/month.
Azure	Azure Container Apps	Similar to Cloud Run — consumption-based pricing favors bursty workloads. Persistent process model requires dedicated replicas.	Basic tier (0.5 vCPU / 1GB, dedicated): ~$13/month.

Verdict. For small deployments (single tenant, low transaction volume), EC2 with Docker Compose is the simplest and cheapest option — no managed container service overhead. ECS on EC2 becomes attractive when operating 10+ tenants and the operational leverage of managed scheduling is worth the overhead.

Store Count	Tenant Model	Recommended Architecture	Estimated Monthly Infra Cost
1–10 tenants, 1–50 stores total	Shared GrowDirect-managed	EC2 t3.small, shared PostgreSQL, Valkey, Ollama	~$75–120/month shared
10–50 tenants, 50–500 stores total	Hybrid: shared control plane, isolated data planes	EC2 t3.medium control plane; RDS t3.micro per tenant for databases; shared Valkey/Ollama	~$50–75/month per tenant + ~$100/month shared infra
50–200 tenants, 500–2,000 stores total	Fully isolated per tenant	ECS on EC2 per tenant, RDS per tenant, ElastiCache cluster shared or per-tenant	~$80–150/month per tenant
200+ tenants, 2,000+ stores	Platform play	Multi-region, auto-scaling, dedicated SRE discipline	~$50–80/month per tenant at scale (unit economics improve)

Tenant Isolation Model

The current production pattern is per-tenant PostgreSQL databases on a shared host. This is the lowest-cost isolation mechanism and the right starting posture for a GrowDirect-managed deployment.

Per-Tenant Database Pattern (Current)

Each tenant gets: - Dedicated PostgreSQL database (canary_<tenant_id>, cove_<tenant_id>) - Database-level user with no cross-tenant privileges - Valkey namespace isolation via key prefix convention (<tenant_id>:*) - Application-layer tenant context enforcement (Flask middleware)

Cost of 50 isolated tenant databases on each platform:

Platform	Service	Per-DB Cost	50-tenant Total
AWS RDS	RDS t3.micro Multi-AZ shared cluster (Aurora Serverless v2)	Aurora Serverless v2: ~$0.12/ACU-hour; minimum 0.5 ACU. 50 databases on one Aurora instance: ~$43/month + $0.10/GB storage.	~$100–200/month for 50 tenants (storage + ACU, light workload)
AWS RDS	RDS for PostgreSQL t3.micro per tenant (isolated instances)	~$14/month/tenant × 50 = $700/month	$700/month — rarely the right choice at this scale
GCP Cloud SQL	Shared instance, per-database isolation	Cloud SQL is instance-scoped — per-tenant billing is instance-based, not database-based. 50 databases on one db-g1-small: ~$25/month. Scale requires instance upgrades.	~$50–200/month depending on instance tier
Azure Flexible Server	Shared instance per region	Same pattern as GCP — Flexible Server is instance-scoped. 50 databases on one B2s (2 vCPU, 4GB RAM): ~$55/month.	~$55–150/month

Recommended. For GrowDirect-managed deployments at 10–50 tenants: shared PostgreSQL host (RDS db.t3.small or equivalent), per-tenant databases. Aurora Serverless v2 is the upgrade path when workload patterns are spiky — the ACU auto-scaling eliminates over-provisioning. Isolated RDS instances per tenant are only warranted when a specific tenant's compliance or SLA requirement demands it.

Decision Summary

Decision	Recommendation	When to Revisit
Cloud platform	AWS (default for GrowDirect-managed; follow client requirements for self-hosted)	Client mandate or regulatory constraint forces another platform
PostgreSQL	RDS PostgreSQL 17 or shared EC2 Docker for ≤10 tenants	Aurora Serverless v2 when tenant count exceeds 10
Valkey	In-container Valkey for ≤5 tenants; ElastiCache for higher scale	ElastiCache when stream consumer lag becomes observable
Ollama/inference	In-container CPU for ≤31 stores; EC2 g4dn Spot for batch embedding at scale	When embedding queue lags consistently or vault growth exceeds 10K chunks/month
Object storage	S3 Standard	No change unless multi-cloud mandate
Container runtime	Docker Compose on EC2 for ≤10 tenants; ECS on EC2 for ≥10 tenants	When operational overhead of Docker Compose exceeds ECS management cost
Tenant isolation	Per-database shared host	Per-instance isolation when compliance or SLA demands it

method/catz-method-detail — CDF phases and workstreams (WS1 — Architecture & Infra)
method/data-economics — data egress economics, storage costs, own-your-data argument
method/it-architecture-options — Phase II architecture evaluation framework

Cloud Architecture Options for CATz Deployments

The CATz Stack — Infrastructure Requirements

Platform Comparison

PostgreSQL 17 + pgvector

Valkey / Redis

Ollama / GPU Inference (Embedding Workloads)

Object Storage

Managed Container Options

Recommended Posture

Current Production (GrowDirect-Managed Infrastructure)

Upgrade Path at Scale

Tenant Isolation Model

Per-Tenant Database Pattern (Current)

Decision Summary

Cloud Architecture Options for CATz Deployments

The CATz Stack — Infrastructure Requirements

Platform Comparison

PostgreSQL 17 + pgvector

Valkey / Redis

Ollama / GPU Inference (Embedding Workloads)

Object Storage

Managed Container Options

Recommended Posture

Current Production (GrowDirect-Managed Infrastructure)

Upgrade Path at Scale

Tenant Isolation Model

Per-Tenant Database Pattern (Current)

Decision Summary

Related