91 lines
4.3 KiB
Markdown
91 lines
4.3 KiB
Markdown
# GeoCrop - Sovereign MLOps Platform
|
|
|
|
GeoCrop is a production-grade, self-hosted ML platform designed for crop-type classification in Zimbabwe. It utilizes Sentinel-2 satellite imagery (DEA STAC), computes 51 spectral/phenological features, and employs ensemble ML models to generate high-resolution Cloud Optimized GeoTIFFs (COGs).
|
|
|
|
## 🚀 System Architecture
|
|
|
|
The platform follows a **Sovereign MLOps** philosophy, hosting the entire lifecycle—from source control and experiment tracking to inference and GitOps—on a private K3s cluster.
|
|
|
|
- **Frontend**: React 19 + OpenLayers/Leaflet (Portfolio & App).
|
|
- **Backend**: FastAPI REST API + Redis/RQ Job Queue.
|
|
- **ML Engine**: Python Inference Workers + XGBoost/CatBoost/LightGBM Ensembles.
|
|
- **Infrastructure**:
|
|
- **GitOps**: ArgoCD (CD) + Gitea (Source Control & CI).
|
|
- **Experiment Tracking**: MLflow (Postgres/MinIO backend).
|
|
- **Development**: JupyterLab (integrated with MinIO).
|
|
- **Storage**: MinIO (S3-compatible) for datasets, models, and results.
|
|
- **Database**: Postgres + PostGIS for spatial metadata and app state.
|
|
|
|
## 🛠️ Building and Running
|
|
|
|
### Development
|
|
```bash
|
|
# Frontend Development
|
|
cd apps/web && npm install && npm run dev
|
|
|
|
# API Development
|
|
cd apps/api && pip install -r requirements.txt
|
|
uvicorn main:app --reload
|
|
|
|
# Worker Development
|
|
cd apps/worker && pip install -r requirements.txt
|
|
python worker.py --worker
|
|
```
|
|
|
|
### GitOps Workflow (CI/CD)
|
|
1. **Push** code to Gitea (`git.techarvest.co.zw`).
|
|
2. **CI**: Gitea Actions build images using **Kaniko** (no DIND). Images are tagged with the Git commit SHA.
|
|
3. **CD**: CI pipeline updates `k8s/base/kustomization.yaml` with new SHA tags. ArgoCD detects these changes and reconciles the cluster state.
|
|
|
|
## 🛑 Engineering Policy
|
|
- **STRICT GitOps:** All approved changes MUST be committed and pushed to Gitea.
|
|
- **NO Manual Deploys:** All deployments MUST occur ONLY through the CI/CD pipeline via ArgoCD.
|
|
- **NO Hotfixes:** Direct manual modification of running containers or K8s resources is forbidden.
|
|
- **Deterministic Tagging:** Never rely on `:latest` for production rollouts; always use SHA-based tags managed by CI.
|
|
|
|
### Kubernetes Deployment
|
|
```bash
|
|
# Manual apply (if not using ArgoCD auto-sync)
|
|
kubectl apply -k k8s/base/
|
|
```
|
|
|
|
## 📐 Development Conventions
|
|
|
|
### Critical Patterns (Non-Obvious)
|
|
- **Kubernetes Only:** Focus exclusively on resources managed by Kubernetes (pods, services, ingresses, etc.). **NEVER** modify host-level Nginx configurations (`/etc/nginx/`), CloudPanel settings, or system services outside the cluster.
|
|
- **AOI Format:** Always use `(lon, lat, radius_m)` tuple. Longitude comes first.
|
|
- **Season Window:** Sept 1st to May 31st (Zimbabwe Summer Season). `year=2022` implies 2022-09-01 to 2023-05-31.
|
|
- **Feature Order:** `FEATURE_ORDER_V1` (51 features) is immutable; changing it breaks model compatibility.
|
|
- **Storage Contract:** Use `geocrop-results` for outputs and `geocrop-models` for serialized artifacts.
|
|
|
|
### Storage Layout (MinIO)
|
|
- `geocrop-models/`: ML model `.pkl` files and MLflow artifacts.
|
|
- `geocrop-baselines/`: Dynamic World COGs (`dw/zim/summer/...`).
|
|
- `geocrop-results/`: Output COGs (`results/<job_id>/...`).
|
|
- `geocrop-datasets/`: Training CSVs and ground-truth labels.
|
|
|
|
## 📂 Key Files
|
|
- `apps/web/src/App.tsx`: Main React entry point with Portfolio/App view logic.
|
|
- `apps/worker/worker.py`: Core orchestration of the inference pipeline.
|
|
- `k8s/base/`: GitOps manifests for all services (ArgoCD tracking root).
|
|
- `k8s/argocd-app.yaml`: ArgoCD Application definition for GeoCrop.
|
|
- `.gitea/workflows/build-push.yaml`: CI pipeline for Docker builds.
|
|
|
|
## 🌐 Infrastructure (Endpoints)
|
|
- **Frontend**: `portfolio.techarvest.co.zw`
|
|
- **API**: `api.portfolio.techarvest.co.zw`
|
|
- **Gitea**: `git.techarvest.co.zw`
|
|
- **ArgoCD**: `cd.techarvest.co.zw`
|
|
- **MLflow**: `ml.techarvest.co.zw`
|
|
- **Jupyter**: `lab.techarvest.co.zw`
|
|
- **Tiler**: `tiles.portfolio.techarvest.co.zw`
|
|
- **MinIO**: `minio.portfolio.techarvest.co.zw`
|
|
|
|
### 🛑 STRICT ENGINEERING POLICY
|
|
* All approved changes MUST be committed and pushed to Gitea.
|
|
* All deployments MUST occur ONLY through the CI/CD pipeline via ArgoCD.
|
|
* Direct manual server modifications are forbidden.
|
|
* No bypassing ArgoCD.
|
|
* No hotfixes directly on running containers.
|
|
* Infrastructure state must remain GitOps-managed.
|