69 lines
3.3 KiB
Markdown
69 lines
3.3 KiB
Markdown
# Sovereign MLOps Platform: GeoCrop LULC Portfolio
|
|
|
|
Welcome to the **Sovereign MLOps Platform**, a comprehensive self-hosted environment on K3s designed for end-to-end Land Use / Land Cover (LULC) crop-mapping in Zimbabwe.
|
|
|
|
This project showcases professional skills in **MLOps, Cloud-Native Architecture, Geospatial Analysis, and GitOps**.
|
|
|
|
## 🏗️ System Architecture
|
|
|
|
The platform is built on a robust, self-hosted Kubernetes (K3s) cluster with a focus on data sovereignty and scalability.
|
|
|
|
- **Source Control & CI/CD**: [Gitea](https://git.techarvest.co.zw) (Self-hosted GitHub alternative)
|
|
- **Infrastructure as Code**: Terraform (Managing K3s Namespaces & Quotas)
|
|
- **GitOps**: ArgoCD (Automated deployment from Git to Cluster)
|
|
- **Experiment Tracking**: [MLflow](https://ml.techarvest.co.zw) (Model versioning & metrics)
|
|
- **Interactive Workspace**: [JupyterLab](https://lab.techarvest.co.zw) (Data science & training)
|
|
- **Spatial Database**: Standalone PostgreSQL + PostGIS (Port 5433)
|
|
- **Object Storage**: MinIO (S3-compatible storage for datasets, baselines, and models)
|
|
- **Frontend**: React 19 + OpenLayers (Parallel loading of baselines and ML predictions)
|
|
- **Backend**: FastAPI + Redis Queue (Job orchestration)
|
|
- **Visualization**: TiTiler (Dynamic tile server for Cloud Optimized GeoTIFFs)
|
|
|
|
## 🗺️ UX Data Flow: Parallel Loading Strategy
|
|
|
|
To ensure a seamless user experience, the system implements a dual-loading strategy:
|
|
1. **Instant Context**: While waiting for ML inference, Dynamic World (DW) TIFF baselines (2015-2025) are immediately served from MinIO via TiTiler.
|
|
2. **Asynchronous Inference**: The ML worker processes heavy classification tasks in the background and overlays high-resolution predictions once complete.
|
|
|
|
## 🛠️ Training Workflow
|
|
|
|
Training is performed in **JupyterLab** using a custom `MinIOStorageClient` that bridges the gap between object storage and in-memory data processing.
|
|
|
|
### Using the MinIO Storage Client
|
|
|
|
```python
|
|
from training.storage_client import MinIOStorageClient
|
|
|
|
# Initialize client (uses environment variables automatically)
|
|
storage = MinIOStorageClient()
|
|
|
|
# List available training batches
|
|
batches = storage.list_files('geocrop-datasets')
|
|
|
|
# Load a batch directly into memory (No disk I/O)
|
|
df = storage.load_dataset('geocrop-datasets', 'batch_1.csv')
|
|
|
|
# Train your model and upload the artifact
|
|
# ... training code ...
|
|
storage.upload_file('model.pkl', 'geocrop-models', 'Zimbabwe_Ensemble_Model.pkl')
|
|
```
|
|
|
|
## 🚀 Deployment & GitOps
|
|
|
|
The platform follows a strict **GitOps** workflow:
|
|
1. All changes are committed to the `geocrop-platform` repository on Gitea.
|
|
2. Gitea Actions build and push containers to Docker Hub (`frankchine`).
|
|
3. ArgoCD monitors the `k8s/base` directory and automatically synchronizes the cluster state.
|
|
|
|
## 🖥️ Service Registry
|
|
|
|
- **Portfolio Frontend**: [portfolio.techarvest.co.zw](https://portfolio.techarvest.co.zw)
|
|
- **Source Control**: [git.techarvest.co.zw](https://git.techarvest.co.zw)
|
|
- **JupyterLab**: [lab.techarvest.co.zw](https://lab.techarvest.co.zw)
|
|
- **MLflow**: [ml.techarvest.co.zw](https://ml.techarvest.co.zw)
|
|
- **ArgoCD**: [cd.techarvest.co.zw](https://cd.techarvest.co.zw)
|
|
- **MinIO Console**: [console.minio.portfolio.techarvest.co.zw](https://console.minio.portfolio.techarvest.co.zw)
|
|
|
|
---
|
|
*Created and maintained by [fchinembiri](mailto:fchinembiri24@gmail.com).*
|