Enhance README with Mermaid diagrams for architecture, DFD, and GitOps pipeline
This commit is contained in:
parent
bdc4d52f21
commit
9f3a2c0408
143
README.md
143
README.md
|
|
@ -8,22 +8,134 @@ This project showcases professional skills in **MLOps, Cloud-Native Architecture
|
|||
|
||||
The platform is built on a robust, self-hosted Kubernetes (K3s) cluster with a focus on data sovereignty and scalability.
|
||||
|
||||
- **Source Control & CI/CD**: [Gitea](https://git.techarvest.co.zw) (Self-hosted GitHub alternative)
|
||||
- **Infrastructure as Code**: Terraform (Managing K3s Namespaces & Quotas)
|
||||
- **GitOps**: ArgoCD (Automated deployment from Git to Cluster)
|
||||
- **Experiment Tracking**: [MLflow](https://ml.techarvest.co.zw) (Model versioning & metrics)
|
||||
- **Interactive Workspace**: [JupyterLab](https://lab.techarvest.co.zw) (Data science & training)
|
||||
- **Spatial Database**: Standalone PostgreSQL + PostGIS (Port 5433)
|
||||
- **Object Storage**: MinIO (S3-compatible storage for datasets, baselines, and models)
|
||||
- **Frontend**: React 19 + OpenLayers (Parallel loading of baselines and ML predictions)
|
||||
- **Backend**: FastAPI + Redis Queue (Job orchestration)
|
||||
- **Visualization**: TiTiler (Dynamic tile server for Cloud Optimized GeoTIFFs)
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph "Frontend & Entry"
|
||||
WEB[React 19 Frontend]
|
||||
ING[Nginx Ingress]
|
||||
end
|
||||
|
||||
subgraph "Core Services (geocrop namespace)"
|
||||
API[FastAPI Backend]
|
||||
RQ[Redis Queue]
|
||||
WORKER[ML Inference Worker]
|
||||
TILER[TiTiler Dynamic Server]
|
||||
end
|
||||
|
||||
subgraph "MLOps & Infra"
|
||||
GITEA[Gitea Source Control]
|
||||
ARGO[ArgoCD GitOps]
|
||||
MLF[MLflow Tracking]
|
||||
JUPYTER[JupyterLab Workspace]
|
||||
end
|
||||
|
||||
subgraph "Storage & Data"
|
||||
MINIO[(MinIO S3 Storage)]
|
||||
POSTGIS[(Postgres + PostGIS)]
|
||||
end
|
||||
|
||||
%% Flow
|
||||
WEB --> ING
|
||||
ING --> API
|
||||
API --> RQ
|
||||
RQ --> WORKER
|
||||
WORKER --> MINIO
|
||||
WORKER --> POSTGIS
|
||||
TILER --> MINIO
|
||||
WEB --> TILER
|
||||
ARGO --> GITEA
|
||||
ARGO --> ING
|
||||
JUPYTER --> MINIO
|
||||
MLF --> POSTGIS
|
||||
```
|
||||
|
||||
## 📊 System Data Flow (DFD)
|
||||
|
||||
How data moves from raw satellite imagery to final crop-type predictions:
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph "External Sources"
|
||||
DEA[Digital Earth Africa STAC]
|
||||
end
|
||||
|
||||
subgraph "Storage (MinIO)"
|
||||
DS[(/geocrop-datasets)]
|
||||
BS[(/geocrop-baselines)]
|
||||
MD[(/geocrop-models)]
|
||||
RS[(/geocrop-results)]
|
||||
end
|
||||
|
||||
subgraph "Processing"
|
||||
TRAIN[Jupyter Training]
|
||||
INFER[Inference Worker]
|
||||
end
|
||||
|
||||
%% Data movement
|
||||
DEA -- "Sentinel-2 Imagery" --> INFER
|
||||
DS -- "CSV Batches" --> TRAIN
|
||||
TRAIN -- "Trained Models" --> MD
|
||||
MD -- "Model Load" --> INFER
|
||||
BS -- "DW TIFFs" --> INFER
|
||||
INFER -- "Classification COG" --> RS
|
||||
RS -- "Map Tiles" --> WEB[Frontend Visualization]
|
||||
```
|
||||
|
||||
## 🗺️ UX Data Flow: Parallel Loading Strategy
|
||||
|
||||
To ensure a seamless user experience, the system implements a dual-loading strategy:
|
||||
1. **Instant Context**: While waiting for ML inference, Dynamic World (DW) TIFF baselines (2015-2025) are immediately served from MinIO via TiTiler.
|
||||
2. **Asynchronous Inference**: The ML worker processes heavy classification tasks in the background and overlays high-resolution predictions once complete.
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant U as User (Frontend)
|
||||
participant T as TiTiler (S3 Proxy)
|
||||
participant A as FastAPI
|
||||
participant W as ML Worker
|
||||
participant M as MinIO
|
||||
|
||||
U->>A: Submit Job (AOI + Year)
|
||||
A->>U: Job ID (Accepted)
|
||||
|
||||
par Instant Visual Context
|
||||
U->>T: Fetch Baseline Tiles (DW)
|
||||
T->>M: Stream Baseline COG
|
||||
M->>T:
|
||||
T->>U: Render Baseline Map
|
||||
and Asynchronous Prediction
|
||||
A->>W: Enqueue Task
|
||||
W->>M: Fetch Model & Data
|
||||
W->>W: Run Inference & Post-processing
|
||||
W->>M: Upload Prediction COG
|
||||
loop Polling
|
||||
U->>A: Get Status?
|
||||
A-->>U: Processing...
|
||||
end
|
||||
W->>A: Job Complete
|
||||
U->>A: Get Status?
|
||||
A->>U: Prediction URL
|
||||
U->>T: Fetch Prediction Tiles
|
||||
T->>M: Stream Prediction COG
|
||||
T->>U: Overlay High-Res Result
|
||||
end
|
||||
```
|
||||
|
||||
## 🚀 Deployment & GitOps Pipeline
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
DEV[Developer] -->|Push| GITEA[Gitea]
|
||||
|
||||
subgraph "CI/CD Pipeline"
|
||||
GITEA -->|Trigger| GA[Gitea Actions]
|
||||
GA -->|Build & Push| DH[Docker Hub: frankchine]
|
||||
end
|
||||
|
||||
subgraph "GitOps Sync"
|
||||
ARGO[ArgoCD] -->|Monitor| GITEA
|
||||
DH -->|Image Pull| K3S[K3s Cluster]
|
||||
ARGO -->|Apply Manifests| K3S
|
||||
end
|
||||
```
|
||||
|
||||
## 🛠️ Training Workflow
|
||||
|
||||
|
|
@ -48,13 +160,6 @@ df = storage.load_dataset('geocrop-datasets', 'batch_1.csv')
|
|||
storage.upload_file('model.pkl', 'geocrop-models', 'Zimbabwe_Ensemble_Model.pkl')
|
||||
```
|
||||
|
||||
## 🚀 Deployment & GitOps
|
||||
|
||||
The platform follows a strict **GitOps** workflow:
|
||||
1. All changes are committed to the `geocrop-platform` repository on Gitea.
|
||||
2. Gitea Actions build and push containers to Docker Hub (`frankchine`).
|
||||
3. ArgoCD monitors the `k8s/base` directory and automatically synchronizes the cluster state.
|
||||
|
||||
## 🖥️ Service Registry
|
||||
|
||||
- **Portfolio Frontend**: [portfolio.techarvest.co.zw](https://portfolio.techarvest.co.zw)
|
||||
|
|
|
|||
Loading…
Reference in New Issue