Go to file
fchinembiri 8817ba5233 Phase 3: Configure API and Worker to use standalone PostGIS and Docker Hub images 2026-04-23 22:56:40 +02:00
.gitea/workflows Add Gitea runner and Build-Push workflow for Docker Hub 2026-04-23 22:42:04 +02:00
apps Restructure k8s manifests for GitOps alignment in k8s/base/ 2026-04-23 22:14:31 +02:00
k8s Phase 3: Configure API and Worker to use standalone PostGIS and Docker Hub images 2026-04-23 22:56:40 +02:00
ops Initial commit: Restructuring GeoCrop to Sovereign MLOps Platform 2026-04-23 22:02:12 +02:00
plan Initial commit: Restructuring GeoCrop to Sovereign MLOps Platform 2026-04-23 22:02:12 +02:00
terraform Restructure k8s manifests for GitOps alignment in k8s/base/ 2026-04-23 22:14:31 +02:00
training Update storage client with load_dataset and add comprehensive README 2026-04-23 22:29:19 +02:00
.geminiignore Initial commit: Restructuring GeoCrop to Sovereign MLOps Platform 2026-04-23 22:02:12 +02:00
.gitignore Initial commit: Restructuring GeoCrop to Sovereign MLOps Platform 2026-04-23 22:02:12 +02:00
AGENTS.md Initial commit: Restructuring GeoCrop to Sovereign MLOps Platform 2026-04-23 22:02:12 +02:00
CLAUDE.md Initial commit: Restructuring GeoCrop to Sovereign MLOps Platform 2026-04-23 22:02:12 +02:00
GEMINI.md Initial commit: Restructuring GeoCrop to Sovereign MLOps Platform 2026-04-23 22:02:12 +02:00
I10A3339~2.jpg Initial commit: Restructuring GeoCrop to Sovereign MLOps Platform 2026-04-23 22:02:12 +02:00
PXL_20231209_104246132.PORTRAIT.jpg Initial commit: Restructuring GeoCrop to Sovereign MLOps Platform 2026-04-23 22:02:12 +02:00
README.md Enhance README with Mermaid diagrams for architecture, DFD, and GitOps pipeline 2026-04-23 22:31:22 +02:00
mc_mirror_dw.log Initial commit: Restructuring GeoCrop to Sovereign MLOps Platform 2026-04-23 22:02:12 +02:00
studiofranknkaycee-72.jpg Initial commit: Restructuring GeoCrop to Sovereign MLOps Platform 2026-04-23 22:02:12 +02:00
studiofranknkaycee-75.jpg Initial commit: Restructuring GeoCrop to Sovereign MLOps Platform 2026-04-23 22:02:12 +02:00

README.md

Sovereign MLOps Platform: GeoCrop LULC Portfolio

Welcome to the Sovereign MLOps Platform, a comprehensive self-hosted environment on K3s designed for end-to-end Land Use / Land Cover (LULC) crop-mapping in Zimbabwe.

This project showcases professional skills in MLOps, Cloud-Native Architecture, Geospatial Analysis, and GitOps.

🏗️ System Architecture

The platform is built on a robust, self-hosted Kubernetes (K3s) cluster with a focus on data sovereignty and scalability.

graph TD
    subgraph "Frontend & Entry"
        WEB[React 19 Frontend]
        ING[Nginx Ingress]
    end

    subgraph "Core Services (geocrop namespace)"
        API[FastAPI Backend]
        RQ[Redis Queue]
        WORKER[ML Inference Worker]
        TILER[TiTiler Dynamic Server]
    end

    subgraph "MLOps & Infra"
        GITEA[Gitea Source Control]
        ARGO[ArgoCD GitOps]
        MLF[MLflow Tracking]
        JUPYTER[JupyterLab Workspace]
    end

    subgraph "Storage & Data"
        MINIO[(MinIO S3 Storage)]
        POSTGIS[(Postgres + PostGIS)]
    end

    %% Flow
    WEB --> ING
    ING --> API
    API --> RQ
    RQ --> WORKER
    WORKER --> MINIO
    WORKER --> POSTGIS
    TILER --> MINIO
    WEB --> TILER
    ARGO --> GITEA
    ARGO --> ING
    JUPYTER --> MINIO
    MLF --> POSTGIS

📊 System Data Flow (DFD)

How data moves from raw satellite imagery to final crop-type predictions:

graph LR
    subgraph "External Sources"
        DEA[Digital Earth Africa STAC]
    end

    subgraph "Storage (MinIO)"
        DS[(/geocrop-datasets)]
        BS[(/geocrop-baselines)]
        MD[(/geocrop-models)]
        RS[(/geocrop-results)]
    end

    subgraph "Processing"
        TRAIN[Jupyter Training]
        INFER[Inference Worker]
    end

    %% Data movement
    DEA -- "Sentinel-2 Imagery" --> INFER
    DS -- "CSV Batches" --> TRAIN
    TRAIN -- "Trained Models" --> MD
    MD -- "Model Load" --> INFER
    BS -- "DW TIFFs" --> INFER
    INFER -- "Classification COG" --> RS
    RS -- "Map Tiles" --> WEB[Frontend Visualization]

🗺️ UX Data Flow: Parallel Loading Strategy

To ensure a seamless user experience, the system implements a dual-loading strategy:

sequenceDiagram
    participant U as User (Frontend)
    participant T as TiTiler (S3 Proxy)
    participant A as FastAPI
    participant W as ML Worker
    participant M as MinIO

    U->>A: Submit Job (AOI + Year)
    A->>U: Job ID (Accepted)
    
    par Instant Visual Context
        U->>T: Fetch Baseline Tiles (DW)
        T->>M: Stream Baseline COG
        M->>T: 
        T->>U: Render Baseline Map
    and Asynchronous Prediction
        A->>W: Enqueue Task
        W->>M: Fetch Model & Data
        W->>W: Run Inference & Post-processing
        W->>M: Upload Prediction COG
        loop Polling
            U->>A: Get Status?
            A-->>U: Processing...
        end
        W->>A: Job Complete
        U->>A: Get Status?
        A->>U: Prediction URL
        U->>T: Fetch Prediction Tiles
        T->>M: Stream Prediction COG
        T->>U: Overlay High-Res Result
    end

🚀 Deployment & GitOps Pipeline

graph LR
    DEV[Developer] -->|Push| GITEA[Gitea]
    
    subgraph "CI/CD Pipeline"
        GITEA -->|Trigger| GA[Gitea Actions]
        GA -->|Build & Push| DH[Docker Hub: frankchine]
    end

    subgraph "GitOps Sync"
        ARGO[ArgoCD] -->|Monitor| GITEA
        DH -->|Image Pull| K3S[K3s Cluster]
        ARGO -->|Apply Manifests| K3S
    end

🛠️ Training Workflow

Training is performed in JupyterLab using a custom MinIOStorageClient that bridges the gap between object storage and in-memory data processing.

Using the MinIO Storage Client

from training.storage_client import MinIOStorageClient

# Initialize client (uses environment variables automatically)
storage = MinIOStorageClient()

# List available training batches
batches = storage.list_files('geocrop-datasets')

# Load a batch directly into memory (No disk I/O)
df = storage.load_dataset('geocrop-datasets', 'batch_1.csv')

# Train your model and upload the artifact
# ... training code ...
storage.upload_file('model.pkl', 'geocrop-models', 'Zimbabwe_Ensemble_Model.pkl')

🖥️ Service Registry


Created and maintained by fchinembiri.