3.9 KiB
3.9 KiB
GeoCrop - Sovereign MLOps Platform
GeoCrop is a production-grade, self-hosted ML platform designed for crop-type classification in Zimbabwe. It utilizes Sentinel-2 satellite imagery (DEA STAC), computes 51 spectral/phenological features, and employs ensemble ML models to generate high-resolution Cloud Optimized GeoTIFFs (COGs).
🚀 System Architecture
The platform follows a Sovereign MLOps philosophy, hosting the entire lifecycle—from source control and experiment tracking to inference and GitOps—on a private K3s cluster.
- Frontend: React 19 + OpenLayers/Leaflet (Portfolio & App).
- Backend: FastAPI REST API + Redis/RQ Job Queue.
- ML Engine: Python Inference Workers + XGBoost/CatBoost/LightGBM Ensembles.
- Infrastructure:
- GitOps: ArgoCD (CD) + Gitea (Source Control & CI).
- Experiment Tracking: MLflow (Postgres/MinIO backend).
- Development: JupyterLab (integrated with MinIO).
- Storage: MinIO (S3-compatible) for datasets, models, and results.
- Database: Postgres + PostGIS for spatial metadata and app state.
🛠️ Building and Running
Development
# Frontend Development
cd apps/web && npm install && npm run dev
# API Development
cd apps/api && pip install -r requirements.txt
uvicorn main:app --reload
# Worker Development
cd apps/worker && pip install -r requirements.txt
python worker.py --worker
GitOps Workflow (CI/CD)
- Push code to Gitea (
git.techarvest.co.zw). - CI: Gitea Actions build images using Kaniko (no DIND). Images are tagged with the Git commit SHA.
- CD: CI pipeline updates
k8s/base/kustomization.yamlwith new SHA tags. ArgoCD detects these changes and reconciles the cluster state.
🛑 Engineering Policy
- STRICT GitOps: All approved changes MUST be committed and pushed to Gitea.
- NO Manual Deploys: All deployments MUST occur ONLY through the CI/CD pipeline via ArgoCD.
- NO Hotfixes: Direct manual modification of running containers or K8s resources is forbidden.
- Deterministic Tagging: Never rely on
:latestfor production rollouts; always use SHA-based tags managed by CI.
Kubernetes Deployment
# Manual apply (if not using ArgoCD auto-sync)
kubectl apply -k k8s/base/
📐 Development Conventions
Critical Patterns (Non-Obvious)
- Kubernetes Only: Focus exclusively on resources managed by Kubernetes (pods, services, ingresses, etc.). NEVER modify host-level Nginx configurations (
/etc/nginx/), CloudPanel settings, or system services outside the cluster. - AOI Format: Always use
(lon, lat, radius_m)tuple. Longitude comes first. - Season Window: Sept 1st to May 31st (Zimbabwe Summer Season).
year=2022implies 2022-09-01 to 2023-05-31. - Feature Order:
FEATURE_ORDER_V1(51 features) is immutable; changing it breaks model compatibility. - Storage Contract: Use
geocrop-resultsfor outputs andgeocrop-modelsfor serialized artifacts.
Storage Layout (MinIO)
geocrop-models/: ML model.pklfiles and MLflow artifacts.geocrop-baselines/: Dynamic World COGs (dw/zim/summer/...).geocrop-results/: Output COGs (results/<job_id>/...).geocrop-datasets/: Training CSVs and ground-truth labels.
📂 Key Files
apps/web/src/App.tsx: Main React entry point with Portfolio/App view logic.apps/worker/worker.py: Core orchestration of the inference pipeline.k8s/base/: GitOps manifests for all services (ArgoCD tracking root).k8s/argocd-app.yaml: ArgoCD Application definition for GeoCrop..gitea/workflows/build-push.yaml: CI pipeline for Docker builds.
🌐 Infrastructure (Endpoints)
- Frontend:
portfolio.techarvest.co.zw - API:
api.portfolio.techarvest.co.zw - Gitea:
git.techarvest.co.zw - ArgoCD:
cd.techarvest.co.zw - MLflow:
ml.techarvest.co.zw - Jupyter:
lab.techarvest.co.zw - Tiler:
tiles.portfolio.techarvest.co.zw - MinIO:
minio.portfolio.techarvest.co.zw