From 9f3a2c040831696f0a5fd3762a2b7c3090ac3a3e Mon Sep 17 00:00:00 2001 From: fchinembiri Date: Thu, 23 Apr 2026 22:31:22 +0200 Subject: [PATCH] Enhance README with Mermaid diagrams for architecture, DFD, and GitOps pipeline --- README.md | 143 ++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 124 insertions(+), 19 deletions(-) diff --git a/README.md b/README.md index c15ba42..bed323d 100644 --- a/README.md +++ b/README.md @@ -8,22 +8,134 @@ This project showcases professional skills in **MLOps, Cloud-Native Architecture The platform is built on a robust, self-hosted Kubernetes (K3s) cluster with a focus on data sovereignty and scalability. -- **Source Control & CI/CD**: [Gitea](https://git.techarvest.co.zw) (Self-hosted GitHub alternative) -- **Infrastructure as Code**: Terraform (Managing K3s Namespaces & Quotas) -- **GitOps**: ArgoCD (Automated deployment from Git to Cluster) -- **Experiment Tracking**: [MLflow](https://ml.techarvest.co.zw) (Model versioning & metrics) -- **Interactive Workspace**: [JupyterLab](https://lab.techarvest.co.zw) (Data science & training) -- **Spatial Database**: Standalone PostgreSQL + PostGIS (Port 5433) -- **Object Storage**: MinIO (S3-compatible storage for datasets, baselines, and models) -- **Frontend**: React 19 + OpenLayers (Parallel loading of baselines and ML predictions) -- **Backend**: FastAPI + Redis Queue (Job orchestration) -- **Visualization**: TiTiler (Dynamic tile server for Cloud Optimized GeoTIFFs) +```mermaid +graph TD + subgraph "Frontend & Entry" + WEB[React 19 Frontend] + ING[Nginx Ingress] + end + + subgraph "Core Services (geocrop namespace)" + API[FastAPI Backend] + RQ[Redis Queue] + WORKER[ML Inference Worker] + TILER[TiTiler Dynamic Server] + end + + subgraph "MLOps & Infra" + GITEA[Gitea Source Control] + ARGO[ArgoCD GitOps] + MLF[MLflow Tracking] + JUPYTER[JupyterLab Workspace] + end + + subgraph "Storage & Data" + MINIO[(MinIO S3 Storage)] + POSTGIS[(Postgres + PostGIS)] + end + + %% Flow + WEB --> ING + ING --> API + API --> RQ + RQ --> WORKER + WORKER --> MINIO + WORKER --> POSTGIS + TILER --> MINIO + WEB --> TILER + ARGO --> GITEA + ARGO --> ING + JUPYTER --> MINIO + MLF --> POSTGIS +``` + +## πŸ“Š System Data Flow (DFD) + +How data moves from raw satellite imagery to final crop-type predictions: + +```mermaid +graph LR + subgraph "External Sources" + DEA[Digital Earth Africa STAC] + end + + subgraph "Storage (MinIO)" + DS[(/geocrop-datasets)] + BS[(/geocrop-baselines)] + MD[(/geocrop-models)] + RS[(/geocrop-results)] + end + + subgraph "Processing" + TRAIN[Jupyter Training] + INFER[Inference Worker] + end + + %% Data movement + DEA -- "Sentinel-2 Imagery" --> INFER + DS -- "CSV Batches" --> TRAIN + TRAIN -- "Trained Models" --> MD + MD -- "Model Load" --> INFER + BS -- "DW TIFFs" --> INFER + INFER -- "Classification COG" --> RS + RS -- "Map Tiles" --> WEB[Frontend Visualization] +``` ## πŸ—ΊοΈ UX Data Flow: Parallel Loading Strategy To ensure a seamless user experience, the system implements a dual-loading strategy: -1. **Instant Context**: While waiting for ML inference, Dynamic World (DW) TIFF baselines (2015-2025) are immediately served from MinIO via TiTiler. -2. **Asynchronous Inference**: The ML worker processes heavy classification tasks in the background and overlays high-resolution predictions once complete. + +```mermaid +sequenceDiagram + participant U as User (Frontend) + participant T as TiTiler (S3 Proxy) + participant A as FastAPI + participant W as ML Worker + participant M as MinIO + + U->>A: Submit Job (AOI + Year) + A->>U: Job ID (Accepted) + + par Instant Visual Context + U->>T: Fetch Baseline Tiles (DW) + T->>M: Stream Baseline COG + M->>T: + T->>U: Render Baseline Map + and Asynchronous Prediction + A->>W: Enqueue Task + W->>M: Fetch Model & Data + W->>W: Run Inference & Post-processing + W->>M: Upload Prediction COG + loop Polling + U->>A: Get Status? + A-->>U: Processing... + end + W->>A: Job Complete + U->>A: Get Status? + A->>U: Prediction URL + U->>T: Fetch Prediction Tiles + T->>M: Stream Prediction COG + T->>U: Overlay High-Res Result + end +``` + +## πŸš€ Deployment & GitOps Pipeline + +```mermaid +graph LR + DEV[Developer] -->|Push| GITEA[Gitea] + + subgraph "CI/CD Pipeline" + GITEA -->|Trigger| GA[Gitea Actions] + GA -->|Build & Push| DH[Docker Hub: frankchine] + end + + subgraph "GitOps Sync" + ARGO[ArgoCD] -->|Monitor| GITEA + DH -->|Image Pull| K3S[K3s Cluster] + ARGO -->|Apply Manifests| K3S + end +``` ## πŸ› οΈ Training Workflow @@ -48,13 +160,6 @@ df = storage.load_dataset('geocrop-datasets', 'batch_1.csv') storage.upload_file('model.pkl', 'geocrop-models', 'Zimbabwe_Ensemble_Model.pkl') ``` -## πŸš€ Deployment & GitOps - -The platform follows a strict **GitOps** workflow: -1. All changes are committed to the `geocrop-platform` repository on Gitea. -2. Gitea Actions build and push containers to Docker Hub (`frankchine`). -3. ArgoCD monitors the `k8s/base` directory and automatically synchronizes the cluster state. - ## πŸ–₯️ Service Registry - **Portfolio Frontend**: [portfolio.techarvest.co.zw](https://portfolio.techarvest.co.zw)