geocrop-platform./README.md

3.3 KiB

Sovereign MLOps Platform: GeoCrop LULC Portfolio

Welcome to the Sovereign MLOps Platform, a comprehensive self-hosted environment on K3s designed for end-to-end Land Use / Land Cover (LULC) crop-mapping in Zimbabwe.

This project showcases professional skills in MLOps, Cloud-Native Architecture, Geospatial Analysis, and GitOps.

🏗️ System Architecture

The platform is built on a robust, self-hosted Kubernetes (K3s) cluster with a focus on data sovereignty and scalability.

  • Source Control & CI/CD: Gitea (Self-hosted GitHub alternative)
  • Infrastructure as Code: Terraform (Managing K3s Namespaces & Quotas)
  • GitOps: ArgoCD (Automated deployment from Git to Cluster)
  • Experiment Tracking: MLflow (Model versioning & metrics)
  • Interactive Workspace: JupyterLab (Data science & training)
  • Spatial Database: Standalone PostgreSQL + PostGIS (Port 5433)
  • Object Storage: MinIO (S3-compatible storage for datasets, baselines, and models)
  • Frontend: React 19 + OpenLayers (Parallel loading of baselines and ML predictions)
  • Backend: FastAPI + Redis Queue (Job orchestration)
  • Visualization: TiTiler (Dynamic tile server for Cloud Optimized GeoTIFFs)

🗺️ UX Data Flow: Parallel Loading Strategy

To ensure a seamless user experience, the system implements a dual-loading strategy:

  1. Instant Context: While waiting for ML inference, Dynamic World (DW) TIFF baselines (2015-2025) are immediately served from MinIO via TiTiler.
  2. Asynchronous Inference: The ML worker processes heavy classification tasks in the background and overlays high-resolution predictions once complete.

🛠️ Training Workflow

Training is performed in JupyterLab using a custom MinIOStorageClient that bridges the gap between object storage and in-memory data processing.

Using the MinIO Storage Client

from training.storage_client import MinIOStorageClient

# Initialize client (uses environment variables automatically)
storage = MinIOStorageClient()

# List available training batches
batches = storage.list_files('geocrop-datasets')

# Load a batch directly into memory (No disk I/O)
df = storage.load_dataset('geocrop-datasets', 'batch_1.csv')

# Train your model and upload the artifact
# ... training code ...
storage.upload_file('model.pkl', 'geocrop-models', 'Zimbabwe_Ensemble_Model.pkl')

🚀 Deployment & GitOps

The platform follows a strict GitOps workflow:

  1. All changes are committed to the geocrop-platform repository on Gitea.
  2. Gitea Actions build and push containers to Docker Hub (frankchine).
  3. ArgoCD monitors the k8s/base directory and automatically synchronizes the cluster state.

🖥️ Service Registry


Created and maintained by fchinembiri.