Cloud Architecture Options for CATz Deployments

Every CATz engagement runs somewhere. This article is the decision framework for where.

The CATz stack has specific infrastructure requirements that interact differently with each cloud platform's pricing, managed service availability, and operational posture. Getting this decision right during Scaffold is load-bearing: the wrong choice surfaces at scale, not at proof of concept.


The CATz Stack — Infrastructure Requirements

The Canary Retail platform that anchors every CATz engagement depends on five infrastructure primitives:

Requirement Component Notes
Relational database + vector index PostgreSQL 17 + pgvector extension pgvector must be available; version compatibility matters — not all managed PostgreSQL offerings run 17 at extension-compatible patch levels
In-memory data structure store Valkey 8 (Redis-compatible) Used for sessions, cache, Valkey Streams for the TSP ingestion pipeline. Managed offerings under the "Redis" label vary widely in compatibility
Local inference / embedding Ollama with qwen3-embedding:8b 1024-dimension vectors; GPU-accelerated preferred but CPU workloads are viable at SMB scale — matters for container sizing
Object storage S3-compatible Transaction exports, vault snapshots, tenant backups
Container runtime Flask 3 + Gunicorn Deployed in containers; managed container options vary in operational overhead

Platform Comparison

PostgreSQL 17 + pgvector

The canonical choice for managed PostgreSQL is a function of whether pgvector is available and at what version cost. pgvector 0.7+ is required for 1024-dimension index support (half-vector compression and HNSW indexes).

Platform Managed Offering pgvector Support Cost Reference (us-east-1 / us-central-1 / eastus equivalent)
AWS RDS for PostgreSQL, Aurora PostgreSQL pgvector available on RDS PostgreSQL 15+, Aurora PostgreSQL 15+. AWS maintains pgvector actively — version support is current. RDS db.t3.micro (1 vCPU, 1GB RAM): ~$14/month + $0.115/GB-month storage. db.t3.small (2 vCPU, 2GB RAM): ~$28/month. Multi-AZ doubles the compute cost.
GCP Cloud SQL for PostgreSQL pgvector available as an extension since Cloud SQL PostgreSQL 15. AlloyDB (Aurora equivalent) supports pgvector. Cloud SQL db-g1-small (0.5 vCPU, 1.7GB RAM): ~$25/month + $0.17/GB-month SSD storage. Shared-core pricing attractive at low scale; jumps at growth.
Azure Azure Database for PostgreSQL Flexible Server pgvector supported on Flexible Server (PostgreSQL 13+). Extension enable/disable is self-service in Flexible Server. Burstable B1ms (1 vCPU, 2GB RAM): ~$14/month + $0.115/GB-month storage. Comparable to AWS at entry tier.

Verdict. AWS RDS PostgreSQL is the most operationally straightforward choice if the rest of the stack is AWS-native. Azure Flexible Server is price-competitive and the pgvector support is solid. GCP Cloud SQL is capable but the pricing model is less transparent at small scale.


Valkey / Redis

Valkey is the Redis-compatible open-source fork. Managed offerings universally market themselves as "Redis compatible" — the critical check is Valkey Stream support, which drives the TSP ingestion pipeline.

Platform Managed Offering Valkey/Redis Streams Support Cost Reference
AWS ElastiCache for Redis (OSS) Streams supported. ElastiCache OSS Redis runs the upstream open-source codebase, not the Redis Ltd commercial version, making it Valkey-compatible. cache.t3.micro (1 vCPU, 0.5GB): ~$13/month. cache.t3.small (1 vCPU, 1.4GB): ~$26/month. Multi-AZ: ~2x.
GCP Memorystore for Redis (Cluster or Single Instance) Streams supported in Memorystore for Redis 6.x+. M1 Basic (1GB): ~$35/month. Higher floor than AWS/Azure at entry tier.
Azure Azure Cache for Redis Streams supported. C0 Basic (250MB) not suitable for Streams workloads; C1 Standard (1GB) is minimum viable. C1 Standard: ~$54/month. Higher cost at entry tier; discounted substantially at reserved pricing (1-year: ~$33/month).

Verdict. AWS ElastiCache is the lowest-cost entry and has the deepest operational tooling. For self-managed deployments (including GrowDirect-managed infrastructure where the retailer has control), Valkey can run in-container alongside the Flask app on a single EC2 instance — eliminating the managed service cost entirely until scale demands it.


Ollama / GPU Inference (Embedding Workloads)

The qwen3-embedding:8b model running under Ollama generates 1024-dimension vectors for the memory bus. At SMB retailer scale (100–5,000 chunks/day being embedded), CPU-only inference is viable. GPU becomes relevant when embedding workloads exceed ~50,000 chunks/day — which corresponds roughly to a 100+ store rollout.

Platform GPU Option Cost Reference (A10G-class or equivalent) CPU Alternative
AWS EC2 g4dn.xlarge (T4 GPU, 4 vCPU, 16GB RAM) ~$0.526/hr on-demand; ~$0.188/hr Spot. For batch embedding jobs, Spot pricing is viable. ~$135/month sustained on-demand, ~$50/month as a Spot batch runner. EC2 t3.medium (2 vCPU, 4GB RAM): ~$30/month. Viable for qwen3-embedding:8b at low throughput.
GCP Cloud Run with GPU (NVIDIA L4, preview) ~$0.80/hr for L4-attached Cloud Run jobs. Best for bursty, not sustained, workloads. Cloud Run CPU: per-request billing eliminates idle cost. Suitable for low-frequency embedding jobs.
Azure NC4as T4 v3 (T4 GPU, 4 vCPU, 28GB RAM) ~$0.526/hr on-demand; reserved pricing competitive. Standard_B2ms (2 vCPU, 8GB RAM): ~$60/month. More expensive CPU baseline than AWS.

Practical recommendation. At SMB scale, run Ollama in a container on the application server (CPU-only). The embedding workload for a 5–15 store retailer is well within CPU capacity — qwen3-embedding:8b requires ~8GB RAM at full precision but runs acceptably on a 4GB instance with quantization. Separate GPU infrastructure only when the embedding queue consistently lags or when the retailer's vault is growing faster than ~10,000 new chunks/month.


Object Storage

All three platforms offer S3-compatible object storage. The differentiating factor at CATz scale is egress cost and the cross-region transfer story when tenants are in one region and reporting consumers are in another.

Platform Offering Storage Cost Egress to Internet (per GB)
AWS S3 Standard $0.023/GB-month $0.09/GB (first 10TB/month)
GCP Cloud Storage Standard $0.020/GB-month $0.08/GB (Americas/Europe egress)
Azure Blob Storage (LRS) $0.018/GB-month $0.087/GB

At CATz scale (see method/data-economics for worked examples), storage differences are in the noise. Egress becomes the relevant cost when exporting transaction history to external analytics consumers or when migrating tenants. GCP is marginally cheaper on egress; AWS is the default because the rest of the stack already runs there.


Managed Container Options

The CATz Flask stack runs in containers (Flask 3 + Gunicorn). Managed container platforms differ on cold-start latency, persistent volume support, and operational overhead for a workload that is persistent-process (not bursty/serverless).

Platform Offering Fit for CATz Flask Entry Cost
AWS ECS Fargate, ECS on EC2 Fargate is viable for the web tier; the TSP consumers (stream processors) run better on EC2 because they are persistent processes, not request-scoped. ECS on EC2 gives more control. Fargate: ~$0.04048/vCPU-hour + $0.004445/GB-hour. For 0.5 vCPU / 1GB: ~$15/month. EC2 t3.micro: ~$7.50/month (bare instance).
GCP Cloud Run Cloud Run is request-scoped; persistent stream consumers require minimum instance = 1, which changes the cost profile. Not ideal for the TSP pipeline. Cloud Run Jobs work for batch workloads. Cloud Run (0.5 vCPU / 512MB, always-on): ~$18/month.
Azure Azure Container Apps Similar to Cloud Run — consumption-based pricing favors bursty workloads. Persistent process model requires dedicated replicas. Basic tier (0.5 vCPU / 1GB, dedicated): ~$13/month.

Verdict. For small deployments (single tenant, low transaction volume), EC2 with Docker Compose is the simplest and cheapest option — no managed container service overhead. ECS on EC2 becomes attractive when operating 10+ tenants and the operational leverage of managed scheduling is worth the overhead.


Current Production (GrowDirect-Managed Infrastructure)

GrowDirect's production stack runs on EC2 t3.micro (AWS us-east-1) under the growdirect.app domain. This is the GrowDirect-managed posture: one EC2 instance runs the shared Docker Compose stack (PostgreSQL, Valkey, Ollama, Flask apps), with per-tenant database isolation on a single PostgreSQL host.

This posture holds when: - Total tenant count ≤ 10 (connection pooling and shared PostgreSQL remain manageable) - Aggregate transaction volume ≤ ~2,000 transactions/hour (EC2 t3.micro CPU stays below 70% sustained) - No tenant has compliance requirements (PCI DSS, SOC 2) that demand dedicated infrastructure - The VAR or MSP model (Rapid POS, for example) doesn't require tenant-specific cloud credentials

When it breaks: - Any tenant requires dedicated cloud infrastructure (their own AWS account, their own RDS instance) - Aggregate load exceeds t3.micro capacity — next tier is t3.small (~$15/month) or t3.medium (~$30/month) - A third-party audit requires infrastructure isolation documentation that shared Docker Compose can't provide - Sub-tenant MSP model: Rapid POS deploys Canary to their merchant base, and each merchant needs an isolated environment

Upgrade Path at Scale

Store Count Tenant Model Recommended Architecture Estimated Monthly Infra Cost
1–10 tenants, 1–50 stores total Shared GrowDirect-managed EC2 t3.small, shared PostgreSQL, Valkey, Ollama ~$75–120/month shared
10–50 tenants, 50–500 stores total Hybrid: shared control plane, isolated data planes EC2 t3.medium control plane; RDS t3.micro per tenant for databases; shared Valkey/Ollama ~$50–75/month per tenant + ~$100/month shared infra
50–200 tenants, 500–2,000 stores total Fully isolated per tenant ECS on EC2 per tenant, RDS per tenant, ElastiCache cluster shared or per-tenant ~$80–150/month per tenant
200+ tenants, 2,000+ stores Platform play Multi-region, auto-scaling, dedicated SRE discipline ~$50–80/month per tenant at scale (unit economics improve)

These are estimates. Actual costs vary materially based on transaction volume, data retention policy, and whether GPU inference is required.


Tenant Isolation Model

The current production pattern is per-tenant PostgreSQL databases on a shared host. This is the lowest-cost isolation mechanism and the right starting posture for a GrowDirect-managed deployment.

Per-Tenant Database Pattern (Current)

Each tenant gets: - Dedicated PostgreSQL database (canary_<tenant_id>, cove_<tenant_id>) - Database-level user with no cross-tenant privileges - Valkey namespace isolation via key prefix convention (<tenant_id>:*) - Application-layer tenant context enforcement (Flask middleware)

Cost of 50 isolated tenant databases on each platform:

Platform Service Per-DB Cost 50-tenant Total
AWS RDS RDS t3.micro Multi-AZ shared cluster (Aurora Serverless v2) Aurora Serverless v2: ~$0.12/ACU-hour; minimum 0.5 ACU. 50 databases on one Aurora instance: ~$43/month + $0.10/GB storage. ~$100–200/month for 50 tenants (storage + ACU, light workload)
AWS RDS RDS for PostgreSQL t3.micro per tenant (isolated instances) ~$14/month/tenant × 50 = $700/month $700/month — rarely the right choice at this scale
GCP Cloud SQL Shared instance, per-database isolation Cloud SQL is instance-scoped — per-tenant billing is instance-based, not database-based. 50 databases on one db-g1-small: ~$25/month. Scale requires instance upgrades. ~$50–200/month depending on instance tier
Azure Flexible Server Shared instance per region Same pattern as GCP — Flexible Server is instance-scoped. 50 databases on one B2s (2 vCPU, 4GB RAM): ~$55/month. ~$55–150/month

Recommended. For GrowDirect-managed deployments at 10–50 tenants: shared PostgreSQL host (RDS db.t3.small or equivalent), per-tenant databases. Aurora Serverless v2 is the upgrade path when workload patterns are spiky — the ACU auto-scaling eliminates over-provisioning. Isolated RDS instances per tenant are only warranted when a specific tenant's compliance or SLA requirement demands it.


Decision Summary

Decision Recommendation When to Revisit
Cloud platform AWS (default for GrowDirect-managed; follow client requirements for self-hosted) Client mandate or regulatory constraint forces another platform
PostgreSQL RDS PostgreSQL 17 or shared EC2 Docker for ≤10 tenants Aurora Serverless v2 when tenant count exceeds 10
Valkey In-container Valkey for ≤5 tenants; ElastiCache for higher scale ElastiCache when stream consumer lag becomes observable
Ollama/inference In-container CPU for ≤31 stores; EC2 g4dn Spot for batch embedding at scale When embedding queue lags consistently or vault growth exceeds 10K chunks/month
Object storage S3 Standard No change unless multi-cloud mandate
Container runtime Docker Compose on EC2 for ≤10 tenants; ECS on EC2 for ≥10 tenants When operational overhead of Docker Compose exceeds ECS management cost
Tenant isolation Per-database shared host Per-instance isolation when compliance or SLA demands it