geocrop-platform./GEMINI.md

3.9 KiB

GeoCrop - Sovereign MLOps Platform

GeoCrop is a production-grade, self-hosted ML platform designed for crop-type classification in Zimbabwe. It utilizes Sentinel-2 satellite imagery (DEA STAC), computes 51 spectral/phenological features, and employs ensemble ML models to generate high-resolution Cloud Optimized GeoTIFFs (COGs).

🚀 System Architecture

The platform follows a Sovereign MLOps philosophy, hosting the entire lifecycle—from source control and experiment tracking to inference and GitOps—on a private K3s cluster.

  • Frontend: React 19 + OpenLayers/Leaflet (Portfolio & App).
  • Backend: FastAPI REST API + Redis/RQ Job Queue.
  • ML Engine: Python Inference Workers + XGBoost/CatBoost/LightGBM Ensembles.
  • Infrastructure:
    • GitOps: ArgoCD (CD) + Gitea (Source Control & CI).
    • Experiment Tracking: MLflow (Postgres/MinIO backend).
    • Development: JupyterLab (integrated with MinIO).
    • Storage: MinIO (S3-compatible) for datasets, models, and results.
    • Database: Postgres + PostGIS for spatial metadata and app state.

🛠️ Building and Running

Development

# Frontend Development
cd apps/web && npm install && npm run dev

# API Development
cd apps/api && pip install -r requirements.txt
uvicorn main:app --reload

# Worker Development
cd apps/worker && pip install -r requirements.txt
python worker.py --worker

GitOps Workflow (CI/CD)

  1. Push code to Gitea (git.techarvest.co.zw).
  2. CI: Gitea Actions build images using Kaniko (no DIND). Images are tagged with the Git commit SHA.
  3. CD: CI pipeline updates k8s/base/kustomization.yaml with new SHA tags. ArgoCD detects these changes and reconciles the cluster state.

🛑 Engineering Policy

  • STRICT GitOps: All approved changes MUST be committed and pushed to Gitea.
  • NO Manual Deploys: All deployments MUST occur ONLY through the CI/CD pipeline via ArgoCD.
  • NO Hotfixes: Direct manual modification of running containers or K8s resources is forbidden.
  • Deterministic Tagging: Never rely on :latest for production rollouts; always use SHA-based tags managed by CI.

Kubernetes Deployment

# Manual apply (if not using ArgoCD auto-sync)
kubectl apply -k k8s/base/

📐 Development Conventions

Critical Patterns (Non-Obvious)

  • Kubernetes Only: Focus exclusively on resources managed by Kubernetes (pods, services, ingresses, etc.). NEVER modify host-level Nginx configurations (/etc/nginx/), CloudPanel settings, or system services outside the cluster.
  • AOI Format: Always use (lon, lat, radius_m) tuple. Longitude comes first.
  • Season Window: Sept 1st to May 31st (Zimbabwe Summer Season). year=2022 implies 2022-09-01 to 2023-05-31.
  • Feature Order: FEATURE_ORDER_V1 (51 features) is immutable; changing it breaks model compatibility.
  • Storage Contract: Use geocrop-results for outputs and geocrop-models for serialized artifacts.

Storage Layout (MinIO)

  • geocrop-models/: ML model .pkl files and MLflow artifacts.
  • geocrop-baselines/: Dynamic World COGs (dw/zim/summer/...).
  • geocrop-results/: Output COGs (results/<job_id>/...).
  • geocrop-datasets/: Training CSVs and ground-truth labels.

📂 Key Files

  • apps/web/src/App.tsx: Main React entry point with Portfolio/App view logic.
  • apps/worker/worker.py: Core orchestration of the inference pipeline.
  • k8s/base/: GitOps manifests for all services (ArgoCD tracking root).
  • k8s/argocd-app.yaml: ArgoCD Application definition for GeoCrop.
  • .gitea/workflows/build-push.yaml: CI pipeline for Docker builds.

🌐 Infrastructure (Endpoints)

  • Frontend: portfolio.techarvest.co.zw
  • API: api.portfolio.techarvest.co.zw
  • Gitea: git.techarvest.co.zw
  • ArgoCD: cd.techarvest.co.zw
  • MLflow: ml.techarvest.co.zw
  • Jupyter: lab.techarvest.co.zw
  • Tiler: tiles.portfolio.techarvest.co.zw
  • MinIO: minio.portfolio.techarvest.co.zw