36 lines
3.5 KiB
Markdown
36 lines
3.5 KiB
Markdown
# Sovereign MLOps Platform: LULC Crop-Mapping Portfolio
|
|
|
|
## Overview
|
|
This document outlines the execution plan for restructuring the GeoCrop platform into a GitOps-driven, self-hosted MLOps platform on K3s. It replaces the full Supabase stack with a lightweight Postgres+PostGIS standalone container to conserve RAM while meeting all spatial querying requirements.
|
|
|
|
## Phased Execution Strategy
|
|
|
|
### Phase 1: Infrastructure Setup (The Foundation)
|
|
1. **Terraform (Namespaces & Quotas):** Apply Terraform to configure the K3s namespace (`geocrop`) with explicit ResourceQuotas. We will apply 512MB limits to lightweight services (API, Web) but allocate 2GB to the ML Worker and Jupyter instances to prevent OOM errors.
|
|
2. **Database (Postgres + PostGIS):** Deploy a standalone StatefulSet for PostGIS on port 5433 (`db.techarvest.co.zw`), fully isolated from other apps.
|
|
3. **MLOps Tools (MLflow & Jupyter):**
|
|
- Deploy MLflow (`ml.techarvest.co.zw`) backed by the new PostGIS DB and the existing MinIO artifact store.
|
|
- Deploy a Jupyter Data Science workspace (`lab.techarvest.co.zw`) configured to pull datasets directly from the MinIO `geocrop-datasets` bucket, ensuring node-agnostic scheduling.
|
|
4. **GitOps Tools (Gitea & ArgoCD):** Initialize Gitea (`git.techarvest.co.zw`) and ArgoCD (`cd.techarvest.co.zw`) to take over cluster management.
|
|
|
|
### Phase 2: Frontend (React/Vite) Setup & Testing
|
|
1. **Zero-Downtime Requirement:** The current live web page at `portfolio.techarvest.co.zw` MUST remain active and untouched during this transition as it is actively receiving traffic from job applications.
|
|
2. **Parallel Loading Strategy:** Configure the new React frontend components to instantly fetch and render Dynamic World (DW) baselines (2015-2025) via the TiTiler service (`tiles.portfolio.techarvest.co.zw`) while awaiting ML inference.
|
|
3. **ArgoCD Deployment:** Commit the new frontend manifests to the Gitea repository and sync via ArgoCD, carefully routing traffic to avoid disrupting the live welcome page.
|
|
4. **Verification:** Test that the new frontend components successfully load and render TiTiler COGs instantly without backend dependency.
|
|
|
|
### Phase 3: Backend (API + ML Worker) Setup & CI/CD
|
|
1. **Gitea Actions (CI/CD):** Implement `.gitea/workflows/build-push.yaml` to automatically build `apps/worker/Dockerfile` and `apps/api/Dockerfile`, and push them to Docker Hub (`frankchine/geocrop-worker:latest`, etc.).
|
|
2. **ArgoCD Deployment:** Update backend Kubernetes manifests in the GitOps repo to pull from `frankchine/...`. Sync ArgoCD.
|
|
3. **Worker Tuning:** Ensure the ML worker is correctly configured to use the standalone PostGIS database (if spatial logging is needed) and MinIO for models/results.
|
|
|
|
### Phase 4: End-to-End System Testing
|
|
1. **Trigger Job:** Submit an AOI via the React frontend.
|
|
2. **Verify Instant UX:** Ensure the DW baseline renders immediately.
|
|
3. **Verify Inference:** Monitor the Redis queue and ML Worker logs to ensure it pulls STAC data, runs the XGBoost/Ensemble model, and writes the output COG to MinIO.
|
|
4. **Verify Result Overlay:** Ensure the frontend polls the API and seamlessly overlays the high-resolution LULC prediction once complete.
|
|
5. **Verify MLflow:** Check `ml.techarvest.co.zw` to confirm the run metrics were logged successfully.
|
|
to MinIO.
|
|
4. **Verify Result Overlay:** Ensure the frontend polls the API and seamlessly overlays the high-resolution LULC prediction once complete.
|
|
5. **Verify MLflow:** Check `ml.techarvest.co.zw` to confirm the run metrics were logged successfully.
|