geocrop-platform./plan/plan.md

13 KiB
Raw Blame History

GeoCrop Portfolio App — End-State Checklist, Architecture, and Next Steps

Last updated: 27 Feb 2026 (Africa/Harare)

This document captures:

  • Whats already built and verified in your K3s cluster
  • The full end-state feature checklist (public + admin)
  • The target architecture and data flow
  • The next steps (what to build next, in the order that wont get you stuck)
  • Notes to make this agent-friendly (Roo / Minimax execution)

0) Current progress — what you have done so far (verified)

0.1 Cluster + networking

  • K3s cluster running (1 control-plane + 2 workers)

  • NGINX Ingress Controller installed and running

    • Ingress controller exposed on worker vmi3045103 public IP 167.86.68.48
  • cert-manager installed

  • Lets Encrypt prod ClusterIssuer created (letsencrypt-prod) and is Ready=True

0.2 DNS

A records pointing to 167.86.68.48:

  • portfolio.techarvest.co.zw
  • api.portfolio.techarvest.co.zw
  • minio.portfolio.techarvest.co.zw
  • console.minio.portfolio.techarvest.co.zw

0.3 Namespace + core services (geocrop)

Namespace:

  • geocrop

Running components:

  • Redis (queue/broker)
  • MinIO (S3 storage) with PVC (30Gi, local-path)
  • Placeholder web + API behind Ingress
  • TLS certificates for all subdomains (Ready=True)

0.4 Connectivity tests (verified)

  • portfolio.techarvest.co.zw reachable over HTTPS
  • api.portfolio.techarvest.co.zw reachable over HTTPS
  • console.minio.portfolio.techarvest.co.zw loads correctly

0.5 What you added recently (major progress)

  • Uploaded ML model artifact to MinIO (geocrop-models bucket)
  • Implemented working FastAPI backend with JWT authentication
  • Implemented Python RQ worker consuming Redis queue
  • Verified end-to-end async job submission + dummy inference response

0.6 Dynamic World Baseline Migration (Completed)

  • Configured rclone with Google Drive remote (gdrive)

  • Successfully copied ~7.9 GiB of Dynamic World seasonal GeoTIFFs (132 files) from Google Drive to server path:

    • ~/geocrop/data/dw_baselines
  • Installed rio-cogeo, rasterio, pyproj, and dependencies

  • Converted all baseline GeoTIFFs to Cloud Optimized GeoTIFFs (COGs):

    • Output directory: ~/geocrop/data/dw_cogs

This is a major milestone: your Dynamic World baselines are now local and being converted to COG format, which is required for efficient tiling and MinIO-based serving.

Note: Your earlier 10-redis.yaml and 20-minio.yaml editing had some terminal echo corruption, but K8s objects did apply and are running. Well clean manifests into a proper repo layout next.


1) End-state: what the app should have (complete checklist)

1.1 Public user experience

Auth & access

  • Login for public users (best for portfolio: invite-only registration or “request access”)
  • JWT auth (already planned)
  • Clear “demo limits” messaging

AOI selection

  • Leaflet map:

    • Place a marker OR draw a circle (center + radius)
    • Radius slider up to 5 km
    • Optional polygon draw (but enforce max area / vertex count)
  • Manual input:

    • Latitude/Longitude center
    • Radius (meters / km)

Parameters

  • Year chooser: 2015 → present

  • Season chooser:

    • Summer cropping only (Nov 1 → Apr 30) for now
  • Model chooser:

    • RandomForest / XGBoost / LightGBM / CatBoost / Ensemble

Job lifecycle UI

  • Submit job

  • Loading/progress screen with stages:

    • Queued → Downloading imagery → Computing indices → Running model → Smoothing → Exporting GeoTIFF → Uploading → Done
  • Results page:

    • Map viewer with layer toggles
    • Download links (GeoTIFF only)

Map layers (toggles)

  • Refined crop/LULC map (final product) at 10m

  • Dynamic World baseline toggle

    • Prefer Highest Confidence composite (as you stated)
  • True colour composite

  • Indices toggles:

    • Peak NDVI
    • Peak EVI
    • Peak SAVI
    • (Optional later: NDMI, NDRE)

Outputs

  • Download refined result as GeoTIFF only

  • Optional downloads:

    • Baseline DW clipped AOI (GeoTIFF)
    • True colour composite (GeoTIFF)
    • Indices rasters (GeoTIFF)

Legend / key

  • On-map legend showing your refined classes (color-coded)

  • Class list includes:

    • Your refined crop classes (from your image)
    • Plus non-crop landcover classes so it remains full LULC

1.2 Processing pipeline requirements

Validation

  • AOI inside Zimbabwe only
  • Radius ≤ 5 km
  • Reject overly complex geometries

Data sources

  • DEA STAC endpoint:

    • https://explorer.digitalearth.africa/stac/search
  • Dynamic World baseline:

    • Your pre-exported DW GeoTIFFs per year/season (now in Google Drive; migrate to MinIO)

Core computations

  • Pull imagery from DEA STAC for selected year + summer season window

  • Build feature stack:

    • True colour
    • Indices: NDVI, EVI, SAVI (+ optional NDRE/NDMI)
    • “Peak” index logic (seasonal maximum)
  • Load DW baseline for the same year/season, clip to AOI

ML refinement

  • Take baseline DW + EO features and run selected ML model
  • Refine crops into crop-specific classes
  • Keep non-crop classes to output full LULC map

Neighborhood smoothing

  • Majority filter rule:

    • If pixel is surrounded by majority class, set it to majority class
  • Configurable kernel sizes: 3×3 / 5×5

Export and storage

  • Export refined output as GeoTIFF (prefer Cloud Optimized GeoTIFF)
  • Save to MinIO
  • Provide signed URLs for downloads

1.3 Admin capabilities

  • Admin login (role-based)

  • Dataset uploads:

    • Upload training CSVs and/or labeled GeoTIFFs
    • Version datasets (v1, v2…)
  • Retraining:

    • Trigger model retraining using Kubernetes Job
    • Save trained models to MinIO (versioned)
    • Promote a model to “production default”
  • Job monitoring:

    • See queue/running/failed jobs, timing, logs
  • User management:

    • Invite/create/disable users
    • Per-user limits

1.4 Reliability + portfolio safety (high value)

Compute control

  • Global concurrency cap (cluster-wide): e.g. 2 jobs running
  • Per-user daily limits: e.g. 35 jobs/day
  • Job timeouts: kill jobs > 25 minutes

Caching

  • Deterministic caching:

    • If (AOI + year + season + model) repeats → return cached output

Resilience

  • Queue-based async processing (RQ)
  • Retry logic for STAC fetch
  • Clean error reporting to user

1.5 Security

  • HTTPS everywhere (already done)

  • JWT auth

  • RBAC roles: admin vs user

  • K8s Secrets for:

    • JWT secret
    • MinIO credentials
    • DB credentials
  • MinIO should not be publicly writable

  • Downloads are signed URLs only

1.6 Nice-to-have portfolio boosters

  • Swipe/slider compare: Refined vs DW baseline

  • Confidence raster toggle (if model outputs probabilities)

  • Stats panel:

    • area per class (ha)
  • Metadata JSON (small but very useful even if downloads are “GeoTIFF only”)

    • job_id, timestamp, year/season, model version, AOI, CRS, pixel size

2) Recommendation: “best” login + limiting approach for a portfolio

Because this is a portfolio project on VPS resources:

Best default

  • Invite-only accounts (you create accounts or send invites)

  • Simple password login (JWT)

  • Hard limits:

    • Global: 12 jobs running
    • Per user: 3 jobs/day

Why invite-only is best for portfolio

  • It prevents random abuse from your CV link
  • It keeps your compute predictable
  • It still demonstrates full auth + quota features

Optional later

  • Public “Request Access” form (email + reason)
  • Or Google OAuth (more work, not necessary for portfolio)

3) Target architecture (final)

3.1 Components

  • Frontend: React + Leaflet

    • Select AOI + params
    • Submit job
    • Poll status
    • Render map layers from tiles
    • Download GeoTIFF
  • API: FastAPI

    • Auth (JWT)
    • Validate AOI + quotas
    • Create job records
    • Push job to Redis queue
    • Generate signed URLs
  • Worker: Python RQ Worker

    • Pull job
    • Query DEA STAC
    • Compute features/indices
    • Load DW baseline
    • Run model inference
    • Neighborhood smoothing
    • Write outputs as COG GeoTIFF
    • Update job status
  • Redis

    • job queue
  • MinIO

    • Baselines (DW)
    • Models
    • Results (COGs)
  • Database (recommended)

    • Postgres (preferred) for:

      • users, roles
      • jobs, params
      • quotas usage
      • model registry metadata
  • Tile server

    • TiTiler or rio-tiler based service
    • Serves tiles from MinIO-hosted COGs

3.2 Buckets (MinIO)

  • geocrop-baselines (DW GeoTIFF/COG)
  • geocrop-models (pkl/onnx + metadata)
  • geocrop-results (output COGs)
  • geocrop-datasets (training data uploads)

3.3 Subdomains

  • portfolio.techarvest.co.zw → frontend
  • api.portfolio.techarvest.co.zw → FastAPI
  • tiles.portfolio.techarvest.co.zw → TiTiler (recommended add)
  • minio.portfolio.techarvest.co.zw → MinIO API (private)
  • console.minio.portfolio.techarvest.co.zw → MinIO Console (admin-only)

4) What to build next (exact order)

Phase A — Clean repo + manifests (so you stop fighting YAML)

  1. Create a Git repo layout:

    • geocrop/

      • k8s/

        • base/
        • prod/
      • api/

      • worker/

      • web/

  2. Move your current YAML into files with predictable names:

    • k8s/base/00-namespace.yaml
    • k8s/base/10-redis.yaml
    • k8s/base/20-minio.yaml
    • k8s/base/30-api.yaml
    • k8s/base/40-worker.yaml
    • k8s/base/50-web.yaml
    • k8s/base/60-ingress.yaml
  3. Add kubectl apply -k using Kustomize later (optional).

Phase B — Make API real (replace hello-api)

  1. Build FastAPI endpoints:

    • POST /auth/register (admin-only or invite)
    • POST /auth/login
    • POST /jobs (create job)
    • GET /jobs/{job_id} (status)
    • GET /jobs/{job_id}/download (signed url)
    • GET /models (list available models)
  2. Add quotas + concurrency guard:

    • Global running jobs ≤ 2
    • Per-user jobs/day ≤ 35
  3. Store job status:

    • Start with Redis
    • Upgrade to Postgres when stable

Phase C — Worker: “real pipeline v1”

  1. Implement DEA STAC search + download clip for AOI:

    • Sentinel-2 (s2_l2a) is likely easiest first
    • Compute indices (NDVI, EVI, SAVI)
    • Compute peak indices (season max)
  2. Load DW baseline GeoTIFF for the year:

    • Step 1: upload DW GeoTIFFs from Google Drive to MinIO
    • Step 2: clip to AOI
  3. Run model inference:

    • Load model from MinIO
    • Apply to feature stack
    • Output refined label raster
  4. Neighborhood smoothing:

  • Majority filter 3×3 / 5×5 (configurable)
  1. Export result as GeoTIFF (prefer COG)
  • Write to temp
  • Upload to MinIO

Phase D — Tiles + map UI

  1. Deploy TiTiler service and expose:
  • tiles.portfolio...
  1. Frontend:
  • Leaflet selection + coords input
  • Submit job + poll
  • Add layers from tile URLs
  • Legend + downloads

Phase E — Admin portal + retraining

  1. Admin UI:
  • Dataset upload
  • Model list + promote
  1. Retraining pipeline:
  • Kubernetes Job that:

    • pulls dataset from MinIO
    • trains models
    • saves artifact to MinIO
    • registers new model version

5) Important “you might forget” items (add now)

5.1 Model registry metadata

For each model artifact store:

  • model_name
  • version
  • training datasets used
  • training timestamp
  • feature list expected
  • class mapping

5.2 Class mapping (must be consistent)

Create a single classes.json used by:

  • training
  • inference
  • frontend legend

5.3 Zimbabwe boundary validation

Use a Zimbabwe boundary polygon in the API/worker to validate AOI.

  • Best: store the boundary geometry as GeoJSON in repo.

5.4 Deterministic job cache key

Hash:

  • year
  • season
  • model_version
  • center lat/lon
  • radius

If exists → return cached result (huge compute saver).

5.5 Signed downloads

Never expose MinIO objects publicly.

  • API generates signed GET URLs that expire.

6) Open items to decide (tomorrow)

  1. Frontend framework: React + Vite (recommended)

  2. Tile approach: TiTiler vs pre-render PNGs (TiTiler looks much more professional)

  3. DB: add Postgres now vs later (recommended soon for quotas + user mgmt)

  4. Which DEA collections to use for the first version:

    • Start with Sentinel-2 L2A (s2_l2a)
    • Later add Landsat fallback
  5. Model input features: exact feature vector and normalization rules


7) Roo/Minimax execution notes (so it doesnt get confused)

  • Treat current cluster as production-like
  • All services live in namespace: geocrop
  • Ingress class: nginx
  • ClusterIssuer: letsencrypt-prod
  • Public IP of ingress node: 167.86.68.48
  • Subdomains already configured and reachable
  • Next change should be swapping placeholder services for real deployments

8) Short summary

You already have the hard part done:

  • K3s + ingress + TLS + DNS works
  • MinIO + Redis work
  • You proved async jobs can be queued and processed

Next is mostly application engineering:

  • Replace placeholder web/api with real app
  • Add job status + quotas
  • Implement DEA STAC fetch + DW baseline clipping + ML inference
  • Export COG + tile server + map UI