geocrop-platform./plan/restructuringPlan/02_final_restructuring_repo...

4.1 KiB

Restructuring Report: GeoCrop Sovereign MLOps Platform

This document summarizes the end-to-end transformation of the GeoCrop project into a professional, GitOps-driven MLOps platform on K3s.

1. Foundation & Backup

  • Image Migration: Identified 31 unique container images running in the cluster. Systematically pulled, re-tagged, and uploaded them to the frankchine/ repository on Docker Hub to ensure local ownership of all dependencies.
  • Project Renaming: Transitioned from a simple application folder to a unified monorepo structure ready for professional portfolio showcase.

2. Infrastructure as Code (Phase 1)

  • Terraform Management: Established Terraform as the authority for cluster namespaces (geocrop, argocd).
  • Gitea Deployment: Launched a self-hosted Gitea instance (git.techarvest.co.zw) as the central source of truth and CI/CD hub.
  • Database Isolation: Replaced the heavy Supabase stack with a lightweight standalone PostGIS instance on port 5433, ensuring low RAM usage and full spatial capabilities.
  • MLOps Tooling:
    • MLflow: Live at ml.techarvest.co.zw, connected to PostGIS for experiment tracking.
    • JupyterLab: Live at lab.techarvest.co.zw with 20Gi persistent storage for interactive data science.
  • GitOps Orchestration: Deployed ArgoCD to manage the lifecycle of all services via Git.

3. Frontend & UX Strategy (Phase 2)

  • Zero-Downtime Migration: Maintained the live portfolio page at portfolio.techarvest.co.zw throughout the entire transition.
  • Parallel Loading implemented: Updated the React MapComponent to support a dual-layer strategy:
    1. Instant Context: Immediate rendering of Dynamic World baselines from MinIO via TiTiler.
    2. Async Overlay: Background polling for high-resolution ML predictions.
  • GitOps Integration: Moved all Kubernetes manifests to k8s/base/ and configured ArgoCD to track the Gitea repository.

4. Backend Automation & Training (Phase 3)

  • CI/CD Pipeline:
    • Deployed a Gitea Action runner with Docker-in-Docker (DinD) support.
    • Created a workflow to automatically build and push Worker/API images to Docker Hub on every commit.
  • Argo Workflows: Installed to support future automated retraining pipelines.
  • Training Workflow:
    • Created a reusable MinIOStorageClient for high-performance, in-memory dataset loading.
    • Implemented a training template (train_v2.py) that logs to MLflow, saves models to MinIO, and dynamically generates tailored inference scripts.

5. Troubleshooting & Stability

  • Network Resolution: Diagnosed and bypassed a persistent egress blockage on node vmi3047336 by migrating the JupyterLab workspace to node vmi3045103.
  • Database Connectivity: Fixed MLflow connectivity issues by implementing the official image with the correct psycopg2 drivers.
  • Cluster Balance: Carefully managed pod placement to ensure the control-plane node remains safe for other host services like CloudPanel and the mail server.

5. Portfolio & Recruiter Experience (Phase 5)

  • Technical Deep-Dive: Implemented a comprehensive TechnicalDocs suite within the frontend.
  • Interactive Architecture: Visualized the system with a custom SVG architecture diagram.
  • Transparent Engineering: Documented trade-offs (e.g., Gitea vs GitLab), resource strategies, and MLOps workflows.
  • Live Observability: Integrated a service health dashboard and links to monitoring endpoints (Grafana, Uptime Kuma).

📈 Current Status: COMPLETED


Report generated on: Thursday, April 23, 2026.