556 lines
13 KiB
Markdown
556 lines
13 KiB
Markdown
# GeoCrop Portfolio App — End-State Checklist, Architecture, and Next Steps
|
||
|
||
*Last updated: 27 Feb 2026 (Africa/Harare)*
|
||
|
||
This document captures:
|
||
|
||
* What’s **already built and verified** in your K3s cluster
|
||
* The **full end-state feature checklist** (public + admin)
|
||
* The **target architecture** and data flow
|
||
* The **next steps** (what to build next, in the order that won’t get you stuck)
|
||
* Notes to make this **agent-friendly** (Roo / Minimax execution)
|
||
|
||
---
|
||
|
||
## 0) Current progress — what you have done so far (verified)
|
||
|
||
### 0.1 Cluster + networking
|
||
|
||
* **K3s cluster running** (1 control-plane + 2 workers)
|
||
* **NGINX Ingress Controller installed and running**
|
||
|
||
* Ingress controller exposed on worker `vmi3045103` public IP `167.86.68.48`
|
||
* **cert-manager installed**
|
||
* **Let’s Encrypt prod ClusterIssuer created** (`letsencrypt-prod`) and is Ready=True
|
||
|
||
### 0.2 DNS
|
||
|
||
A records pointing to `167.86.68.48`:
|
||
|
||
* `portfolio.techarvest.co.zw`
|
||
* `api.portfolio.techarvest.co.zw`
|
||
* `minio.portfolio.techarvest.co.zw`
|
||
* `console.minio.portfolio.techarvest.co.zw`
|
||
|
||
### 0.3 Namespace + core services (geocrop)
|
||
|
||
Namespace:
|
||
|
||
* `geocrop`
|
||
|
||
Running components:
|
||
|
||
* **Redis** (queue/broker)
|
||
* **MinIO** (S3 storage) with PVC (30Gi, local-path)
|
||
* Placeholder web + API behind Ingress
|
||
* TLS certificates for all subdomains (Ready=True)
|
||
|
||
### 0.4 Connectivity tests (verified)
|
||
|
||
* `portfolio.techarvest.co.zw` reachable over HTTPS
|
||
* `api.portfolio.techarvest.co.zw` reachable over HTTPS
|
||
* `console.minio.portfolio.techarvest.co.zw` loads correctly
|
||
|
||
### 0.5 What you added recently (major progress)
|
||
|
||
* Uploaded ML model artifact to **MinIO** (geocrop-models bucket)
|
||
* Implemented working **FastAPI backend** with JWT authentication
|
||
* Implemented **Python RQ worker** consuming Redis queue
|
||
* Verified end-to-end async job submission + dummy inference response
|
||
|
||
### 0.6 Dynamic World Baseline Migration (Completed)
|
||
|
||
* Configured **rclone** with Google Drive remote (`gdrive`)
|
||
* Successfully copied ~7.9 GiB of Dynamic World seasonal GeoTIFFs (132 files) from Google Drive to server path:
|
||
|
||
* `~/geocrop/data/dw_baselines`
|
||
* Installed `rio-cogeo`, `rasterio`, `pyproj`, and dependencies
|
||
* Converted all baseline GeoTIFFs to **Cloud Optimized GeoTIFFs (COGs)**:
|
||
|
||
* Output directory: `~/geocrop/data/dw_cogs`
|
||
|
||
> This is a major milestone: your Dynamic World baselines are now local and being converted to COG format, which is required for efficient tiling and MinIO-based serving.
|
||
|
||
> Note: Your earlier `10-redis.yaml` and `20-minio.yaml` editing had some terminal echo corruption, but K8s objects did apply and are running. We’ll clean manifests into a proper repo layout next.
|
||
|
||
---
|
||
|
||
## 1) End-state: what the app should have (complete checklist)
|
||
|
||
### 1.1 Public user experience
|
||
|
||
**Auth & access**
|
||
|
||
* Login for public users (best for portfolio: **invite-only registration** or “request access”)
|
||
* JWT auth (already planned)
|
||
* Clear “demo limits” messaging
|
||
|
||
**AOI selection**
|
||
|
||
* Leaflet map:
|
||
|
||
* Place a marker OR draw a circle (center + radius)
|
||
* Radius slider up to **5 km**
|
||
* Optional polygon draw (but enforce max area / vertex count)
|
||
* Manual input:
|
||
|
||
* Latitude/Longitude center
|
||
* Radius (meters / km)
|
||
|
||
**Parameters**
|
||
|
||
* Year chooser: **2015 → present**
|
||
* Season chooser:
|
||
|
||
* Summer cropping only (Nov 1 → Apr 30) for now
|
||
* Model chooser:
|
||
|
||
* RandomForest / XGBoost / LightGBM / CatBoost / Ensemble
|
||
|
||
**Job lifecycle UI**
|
||
|
||
* Submit job
|
||
* Loading/progress screen with stages:
|
||
|
||
* Queued → Downloading imagery → Computing indices → Running model → Smoothing → Exporting GeoTIFF → Uploading → Done
|
||
* Results page:
|
||
|
||
* Map viewer with layer toggles
|
||
* Download links (GeoTIFF only)
|
||
|
||
**Map layers (toggles)**
|
||
|
||
* ✅ Refined crop/LULC map (final product) at **10m**
|
||
* ✅ Dynamic World baseline toggle
|
||
|
||
* Prefer **Highest Confidence** composite (as you stated)
|
||
* ✅ True colour composite
|
||
* ✅ Indices toggles:
|
||
|
||
* Peak NDVI
|
||
* Peak EVI
|
||
* Peak SAVI
|
||
* (Optional later: NDMI, NDRE)
|
||
|
||
**Outputs**
|
||
|
||
* Download refined result as **GeoTIFF only**
|
||
* Optional downloads:
|
||
|
||
* Baseline DW clipped AOI (GeoTIFF)
|
||
* True colour composite (GeoTIFF)
|
||
* Indices rasters (GeoTIFF)
|
||
|
||
**Legend / key**
|
||
|
||
* On-map legend showing your refined classes (color-coded)
|
||
* Class list includes:
|
||
|
||
* Your refined crop classes (from your image)
|
||
* Plus non-crop landcover classes so it remains full LULC
|
||
|
||
### 1.2 Processing pipeline requirements
|
||
|
||
**Validation**
|
||
|
||
* AOI inside Zimbabwe only
|
||
* Radius ≤ 5 km
|
||
* Reject overly complex geometries
|
||
|
||
**Data sources**
|
||
|
||
* DEA STAC endpoint:
|
||
|
||
* `https://explorer.digitalearth.africa/stac/search`
|
||
* Dynamic World baseline:
|
||
|
||
* Your pre-exported DW GeoTIFFs per year/season (now in Google Drive; migrate to MinIO)
|
||
|
||
**Core computations**
|
||
|
||
* Pull imagery from DEA STAC for selected year + summer season window
|
||
* Build feature stack:
|
||
|
||
* True colour
|
||
* Indices: NDVI, EVI, SAVI (+ optional NDRE/NDMI)
|
||
* “Peak” index logic (seasonal maximum)
|
||
* Load DW baseline for the same year/season, clip to AOI
|
||
|
||
**ML refinement**
|
||
|
||
* Take baseline DW + EO features and run selected ML model
|
||
* Refine crops into crop-specific classes
|
||
* Keep non-crop classes to output full LULC map
|
||
|
||
**Neighborhood smoothing**
|
||
|
||
* Majority filter rule:
|
||
|
||
* If pixel is surrounded by majority class, set it to majority class
|
||
* Configurable kernel sizes: 3×3 / 5×5
|
||
|
||
**Export and storage**
|
||
|
||
* Export refined output as GeoTIFF (prefer **Cloud Optimized GeoTIFF**)
|
||
* Save to MinIO
|
||
* Provide **signed URLs** for downloads
|
||
|
||
### 1.3 Admin capabilities
|
||
|
||
* Admin login (role-based)
|
||
* Dataset uploads:
|
||
|
||
* Upload training CSVs and/or labeled GeoTIFFs
|
||
* Version datasets (v1, v2…)
|
||
* Retraining:
|
||
|
||
* Trigger model retraining using Kubernetes Job
|
||
* Save trained models to MinIO (versioned)
|
||
* Promote a model to “production default”
|
||
* Job monitoring:
|
||
|
||
* See queue/running/failed jobs, timing, logs
|
||
* User management:
|
||
|
||
* Invite/create/disable users
|
||
* Per-user limits
|
||
|
||
### 1.4 Reliability + portfolio safety (high value)
|
||
|
||
**Compute control**
|
||
|
||
* Global concurrency cap (cluster-wide): e.g. **2 jobs running**
|
||
* Per-user daily limits: e.g. **3–5 jobs/day**
|
||
* Job timeouts: kill jobs > 25 minutes
|
||
|
||
**Caching**
|
||
|
||
* Deterministic caching:
|
||
|
||
* If (AOI + year + season + model) repeats → return cached output
|
||
|
||
**Resilience**
|
||
|
||
* Queue-based async processing (RQ)
|
||
* Retry logic for STAC fetch
|
||
* Clean error reporting to user
|
||
|
||
### 1.5 Security
|
||
|
||
* HTTPS everywhere (already done)
|
||
* JWT auth
|
||
* RBAC roles: admin vs user
|
||
* K8s Secrets for:
|
||
|
||
* JWT secret
|
||
* MinIO credentials
|
||
* DB credentials
|
||
* MinIO should not be publicly writable
|
||
* Downloads are signed URLs only
|
||
|
||
### 1.6 Nice-to-have portfolio boosters
|
||
|
||
* Swipe/slider compare: Refined vs DW baseline
|
||
* Confidence raster toggle (if model outputs probabilities)
|
||
* Stats panel:
|
||
|
||
* area per class (ha)
|
||
* Metadata JSON (small but very useful even if downloads are “GeoTIFF only”)
|
||
|
||
* job_id, timestamp, year/season, model version, AOI, CRS, pixel size
|
||
|
||
---
|
||
|
||
## 2) Recommendation: “best” login + limiting approach for a portfolio
|
||
|
||
Because this is a portfolio project on VPS resources:
|
||
|
||
**Best default**
|
||
|
||
* **Invite-only accounts** (you create accounts or send invites)
|
||
* Simple password login (JWT)
|
||
* Hard limits:
|
||
|
||
* Global: 1–2 jobs running
|
||
* Per user: 3 jobs/day
|
||
|
||
**Why invite-only is best for portfolio**
|
||
|
||
* It prevents random abuse from your CV link
|
||
* It keeps your compute predictable
|
||
* It still demonstrates full auth + quota features
|
||
|
||
**Optional later**
|
||
|
||
* Public “Request Access” form (email + reason)
|
||
* Or Google OAuth (more work, not necessary for portfolio)
|
||
|
||
---
|
||
|
||
## 3) Target architecture (final)
|
||
|
||
### 3.1 Components
|
||
|
||
* **Frontend**: React + Leaflet
|
||
|
||
* Select AOI + params
|
||
* Submit job
|
||
* Poll status
|
||
* Render map layers from tiles
|
||
* Download GeoTIFF
|
||
|
||
* **API**: FastAPI
|
||
|
||
* Auth (JWT)
|
||
* Validate AOI + quotas
|
||
* Create job records
|
||
* Push job to Redis queue
|
||
* Generate signed URLs
|
||
|
||
* **Worker**: Python RQ Worker
|
||
|
||
* Pull job
|
||
* Query DEA STAC
|
||
* Compute features/indices
|
||
* Load DW baseline
|
||
* Run model inference
|
||
* Neighborhood smoothing
|
||
* Write outputs as COG GeoTIFF
|
||
* Update job status
|
||
|
||
* **Redis**
|
||
|
||
* job queue
|
||
|
||
* **MinIO**
|
||
|
||
* Baselines (DW)
|
||
* Models
|
||
* Results (COGs)
|
||
|
||
* **Database (recommended)**
|
||
|
||
* Postgres (preferred) for:
|
||
|
||
* users, roles
|
||
* jobs, params
|
||
* quotas usage
|
||
* model registry metadata
|
||
|
||
* **Tile server**
|
||
|
||
* TiTiler or rio-tiler based service
|
||
* Serves tiles from MinIO-hosted COGs
|
||
|
||
### 3.2 Buckets (MinIO)
|
||
|
||
* `geocrop-baselines` (DW GeoTIFF/COG)
|
||
* `geocrop-models` (pkl/onnx + metadata)
|
||
* `geocrop-results` (output COGs)
|
||
* `geocrop-datasets` (training data uploads)
|
||
|
||
### 3.3 Subdomains
|
||
|
||
* `portfolio.techarvest.co.zw` → frontend
|
||
* `api.portfolio.techarvest.co.zw` → FastAPI
|
||
* `tiles.portfolio.techarvest.co.zw` → TiTiler (recommended add)
|
||
* `minio.portfolio.techarvest.co.zw` → MinIO API (private)
|
||
* `console.minio.portfolio.techarvest.co.zw` → MinIO Console (admin-only)
|
||
|
||
---
|
||
|
||
## 4) What to build next (exact order)
|
||
|
||
### Phase A — Clean repo + manifests (so you stop fighting YAML)
|
||
|
||
1. Create a Git repo layout:
|
||
|
||
* `geocrop/`
|
||
|
||
* `k8s/`
|
||
|
||
* `base/`
|
||
* `prod/`
|
||
* `api/`
|
||
* `worker/`
|
||
* `web/`
|
||
|
||
2. Move your current YAML into files with predictable names:
|
||
|
||
* `k8s/base/00-namespace.yaml`
|
||
* `k8s/base/10-redis.yaml`
|
||
* `k8s/base/20-minio.yaml`
|
||
* `k8s/base/30-api.yaml`
|
||
* `k8s/base/40-worker.yaml`
|
||
* `k8s/base/50-web.yaml`
|
||
* `k8s/base/60-ingress.yaml`
|
||
|
||
3. Add `kubectl apply -k` using Kustomize later (optional).
|
||
|
||
### Phase B — Make API real (replace hello-api)
|
||
|
||
4. Build FastAPI endpoints:
|
||
|
||
* `POST /auth/register` (admin-only or invite)
|
||
* `POST /auth/login`
|
||
* `POST /jobs` (create job)
|
||
* `GET /jobs/{job_id}` (status)
|
||
* `GET /jobs/{job_id}/download` (signed url)
|
||
* `GET /models` (list available models)
|
||
|
||
5. Add quotas + concurrency guard:
|
||
|
||
* Global running jobs ≤ 2
|
||
* Per-user jobs/day ≤ 3–5
|
||
|
||
6. Store job status:
|
||
|
||
* Start with Redis
|
||
* Upgrade to Postgres when stable
|
||
|
||
### Phase C — Worker: “real pipeline v1”
|
||
|
||
7. Implement DEA STAC search + download clip for AOI:
|
||
|
||
* Sentinel-2 (s2_l2a) is likely easiest first
|
||
* Compute indices (NDVI, EVI, SAVI)
|
||
* Compute peak indices (season max)
|
||
|
||
8. Load DW baseline GeoTIFF for the year:
|
||
|
||
* Step 1: upload DW GeoTIFFs from Google Drive to MinIO
|
||
* Step 2: clip to AOI
|
||
|
||
9. Run model inference:
|
||
|
||
* Load model from MinIO
|
||
* Apply to feature stack
|
||
* Output refined label raster
|
||
|
||
10. Neighborhood smoothing:
|
||
|
||
* Majority filter 3×3 / 5×5 (configurable)
|
||
|
||
11. Export result as GeoTIFF (prefer COG)
|
||
|
||
* Write to temp
|
||
* Upload to MinIO
|
||
|
||
### Phase D — Tiles + map UI
|
||
|
||
12. Deploy TiTiler service and expose:
|
||
|
||
* `tiles.portfolio...`
|
||
|
||
13. Frontend:
|
||
|
||
* Leaflet selection + coords input
|
||
* Submit job + poll
|
||
* Add layers from tile URLs
|
||
* Legend + downloads
|
||
|
||
### Phase E — Admin portal + retraining
|
||
|
||
14. Admin UI:
|
||
|
||
* Dataset upload
|
||
* Model list + promote
|
||
|
||
15. Retraining pipeline:
|
||
|
||
* Kubernetes Job that:
|
||
|
||
* pulls dataset from MinIO
|
||
* trains models
|
||
* saves artifact to MinIO
|
||
* registers new model version
|
||
|
||
---
|
||
|
||
## 5) Important “you might forget” items (add now)
|
||
|
||
### 5.1 Model registry metadata
|
||
|
||
For each model artifact store:
|
||
|
||
* model_name
|
||
* version
|
||
* training datasets used
|
||
* training timestamp
|
||
* feature list expected
|
||
* class mapping
|
||
|
||
### 5.2 Class mapping (must be consistent)
|
||
|
||
Create a single `classes.json` used by:
|
||
|
||
* training
|
||
* inference
|
||
* frontend legend
|
||
|
||
### 5.3 Zimbabwe boundary validation
|
||
|
||
Use a Zimbabwe boundary polygon in the API/worker to validate AOI.
|
||
|
||
* Best: store the boundary geometry as GeoJSON in repo.
|
||
|
||
### 5.4 Deterministic job cache key
|
||
|
||
Hash:
|
||
|
||
* year
|
||
* season
|
||
* model_version
|
||
* center lat/lon
|
||
* radius
|
||
|
||
If exists → return cached result (huge compute saver).
|
||
|
||
### 5.5 Signed downloads
|
||
|
||
Never expose MinIO objects publicly.
|
||
|
||
* API generates signed GET URLs that expire.
|
||
|
||
---
|
||
|
||
## 6) Open items to decide (tomorrow)
|
||
|
||
1. **Frontend framework**: React + Vite (recommended)
|
||
2. **Tile approach**: TiTiler vs pre-render PNGs (TiTiler looks much more professional)
|
||
3. **DB**: add Postgres now vs later (recommended soon for quotas + user mgmt)
|
||
4. **Which DEA collections** to use for the first version:
|
||
|
||
* Start with Sentinel-2 L2A (s2_l2a)
|
||
* Later add Landsat fallback
|
||
5. **Model input features**: exact feature vector and normalization rules
|
||
|
||
---
|
||
|
||
## 7) Roo/Minimax execution notes (so it doesn’t get confused)
|
||
|
||
* Treat current cluster as **production-like**
|
||
* All services live in namespace: `geocrop`
|
||
* Ingress class: `nginx`
|
||
* ClusterIssuer: `letsencrypt-prod`
|
||
* Public IP of ingress node: `167.86.68.48`
|
||
* Subdomains already configured and reachable
|
||
* Next change should be swapping placeholder services for real deployments
|
||
|
||
---
|
||
|
||
## 8) Short summary
|
||
|
||
You already have the hard part done:
|
||
|
||
* K3s + ingress + TLS + DNS works
|
||
* MinIO + Redis work
|
||
* You proved async jobs can be queued and processed
|
||
|
||
Next is mostly **application engineering**:
|
||
|
||
* Replace placeholder web/api with real app
|
||
* Add job status + quotas
|
||
* Implement DEA STAC fetch + DW baseline clipping + ML inference
|
||
* Export COG + tile server + map UI
|