3.3 KiB
3.3 KiB
Sovereign MLOps Platform: GeoCrop LULC Portfolio
Welcome to the Sovereign MLOps Platform, a comprehensive self-hosted environment on K3s designed for end-to-end Land Use / Land Cover (LULC) crop-mapping in Zimbabwe.
This project showcases professional skills in MLOps, Cloud-Native Architecture, Geospatial Analysis, and GitOps.
🏗️ System Architecture
The platform is built on a robust, self-hosted Kubernetes (K3s) cluster with a focus on data sovereignty and scalability.
- Source Control & CI/CD: Gitea (Self-hosted GitHub alternative)
- Infrastructure as Code: Terraform (Managing K3s Namespaces & Quotas)
- GitOps: ArgoCD (Automated deployment from Git to Cluster)
- Experiment Tracking: MLflow (Model versioning & metrics)
- Interactive Workspace: JupyterLab (Data science & training)
- Spatial Database: Standalone PostgreSQL + PostGIS (Port 5433)
- Object Storage: MinIO (S3-compatible storage for datasets, baselines, and models)
- Frontend: React 19 + OpenLayers (Parallel loading of baselines and ML predictions)
- Backend: FastAPI + Redis Queue (Job orchestration)
- Visualization: TiTiler (Dynamic tile server for Cloud Optimized GeoTIFFs)
🗺️ UX Data Flow: Parallel Loading Strategy
To ensure a seamless user experience, the system implements a dual-loading strategy:
- Instant Context: While waiting for ML inference, Dynamic World (DW) TIFF baselines (2015-2025) are immediately served from MinIO via TiTiler.
- Asynchronous Inference: The ML worker processes heavy classification tasks in the background and overlays high-resolution predictions once complete.
🛠️ Training Workflow
Training is performed in JupyterLab using a custom MinIOStorageClient that bridges the gap between object storage and in-memory data processing.
Using the MinIO Storage Client
from training.storage_client import MinIOStorageClient
# Initialize client (uses environment variables automatically)
storage = MinIOStorageClient()
# List available training batches
batches = storage.list_files('geocrop-datasets')
# Load a batch directly into memory (No disk I/O)
df = storage.load_dataset('geocrop-datasets', 'batch_1.csv')
# Train your model and upload the artifact
# ... training code ...
storage.upload_file('model.pkl', 'geocrop-models', 'Zimbabwe_Ensemble_Model.pkl')
🚀 Deployment & GitOps
The platform follows a strict GitOps workflow:
- All changes are committed to the
geocrop-platformrepository on Gitea. - Gitea Actions build and push containers to Docker Hub (
frankchine). - ArgoCD monitors the
k8s/basedirectory and automatically synchronizes the cluster state.
🖥️ Service Registry
- Portfolio Frontend: portfolio.techarvest.co.zw
- Source Control: git.techarvest.co.zw
- JupyterLab: lab.techarvest.co.zw
- MLflow: ml.techarvest.co.zw
- ArgoCD: cd.techarvest.co.zw
- MinIO Console: console.minio.portfolio.techarvest.co.zw
Created and maintained by fchinembiri.