geocrop-platform./plan/restructuringPlan/00_restructuring_plan.md

3.5 KiB

Sovereign MLOps Platform: LULC Crop-Mapping Portfolio

Overview

This document outlines the execution plan for restructuring the GeoCrop platform into a GitOps-driven, self-hosted MLOps platform on K3s. It replaces the full Supabase stack with a lightweight Postgres+PostGIS standalone container to conserve RAM while meeting all spatial querying requirements.

Phased Execution Strategy

Phase 1: Infrastructure Setup (The Foundation)

  1. Terraform (Namespaces & Quotas): Apply Terraform to configure the K3s namespace (geocrop) with explicit ResourceQuotas. We will apply 512MB limits to lightweight services (API, Web) but allocate 2GB to the ML Worker and Jupyter instances to prevent OOM errors.
  2. Database (Postgres + PostGIS): Deploy a standalone StatefulSet for PostGIS on port 5433 (db.techarvest.co.zw), fully isolated from other apps.
  3. MLOps Tools (MLflow & Jupyter):
    • Deploy MLflow (ml.techarvest.co.zw) backed by the new PostGIS DB and the existing MinIO artifact store.
    • Deploy a Jupyter Data Science workspace (lab.techarvest.co.zw) configured to pull datasets directly from the MinIO geocrop-datasets bucket, ensuring node-agnostic scheduling.
  4. GitOps Tools (Gitea & ArgoCD): Initialize Gitea (git.techarvest.co.zw) and ArgoCD (cd.techarvest.co.zw) to take over cluster management.

Phase 2: Frontend (React/Vite) Setup & Testing

  1. Zero-Downtime Requirement: The current live web page at portfolio.techarvest.co.zw MUST remain active and untouched during this transition as it is actively receiving traffic from job applications.
  2. Parallel Loading Strategy: Configure the new React frontend components to instantly fetch and render Dynamic World (DW) baselines (2015-2025) via the TiTiler service (tiles.portfolio.techarvest.co.zw) while awaiting ML inference.
  3. ArgoCD Deployment: Commit the new frontend manifests to the Gitea repository and sync via ArgoCD, carefully routing traffic to avoid disrupting the live welcome page.
  4. Verification: Test that the new frontend components successfully load and render TiTiler COGs instantly without backend dependency.

Phase 3: Backend (API + ML Worker) Setup & CI/CD

  1. Gitea Actions (CI/CD): Implement .gitea/workflows/build-push.yaml to automatically build apps/worker/Dockerfile and apps/api/Dockerfile, and push them to Docker Hub (frankchine/geocrop-worker:latest, etc.).
  2. ArgoCD Deployment: Update backend Kubernetes manifests in the GitOps repo to pull from frankchine/.... Sync ArgoCD.
  3. Worker Tuning: Ensure the ML worker is correctly configured to use the standalone PostGIS database (if spatial logging is needed) and MinIO for models/results.

Phase 4: End-to-End System Testing

  1. Trigger Job: Submit an AOI via the React frontend.
  2. Verify Instant UX: Ensure the DW baseline renders immediately.
  3. Verify Inference: Monitor the Redis queue and ML Worker logs to ensure it pulls STAC data, runs the XGBoost/Ensemble model, and writes the output COG to MinIO.
  4. Verify Result Overlay: Ensure the frontend polls the API and seamlessly overlays the high-resolution LULC prediction once complete.
  5. Verify MLflow: Check ml.techarvest.co.zw to confirm the run metrics were logged successfully. to MinIO.
  6. Verify Result Overlay: Ensure the frontend polls the API and seamlessly overlays the high-resolution LULC prediction once complete.
  7. Verify MLflow: Check ml.techarvest.co.zw to confirm the run metrics were logged successfully.