geocrop-platform./plan/srs.md

7.0 KiB
Raw Blame History

Software Requirements Specification (SRS) & Development Context

Project: GeoCrop Platform Format: Modified IEEE Std 830-1998 (Optimized for AI Agent / Roo Code Initialization) Date: February 2026


1. Introduction

1.1 Purpose

This document defines the Software Requirements Specification (SRS) for the GeoCrop Platform.

⚠️ This document also serves as the master initialization and execution context for Roo Code (AI agent). It explicitly states:

  • What infrastructure already exists and must NOT be rebuilt
  • What services are live and working
  • What remains to be implemented
  • Architectural constraints that must be respected

Roo must treat the infrastructure layer as stable and focus on application-layer implementation.


1.2 Scope

GeoCrop is a cloud-native web application designed to generate refined Land Use and Land Cover (LULC) maps for regions in Zimbabwe.

The system:

  • Uses satellite imagery from the Digital Earth Africa (DEA) STAC API
  • Uses Dynamic World (DW) seasonal baselines
  • Applies custom ML models for crop refinement
  • Is geographically restricted to Zimbabwe only
  • Is spatially restricted to maximum 5km AOI radius
  • Is deployed on a self-managed K3s Kubernetes cluster

Outputs are delivered as:

  • Refined LULC GeoTIFF (10m resolution)
  • Optional supporting rasters (DW baseline, indices, true color)

2. Current System State (Already Built — Do Not Rebuild)

⚠️ ATTN ROO CODE: Infrastructure is complete and running. Do NOT recreate cluster, ingress, TLS, or storage.


2.1 Infrastructure & Networking

  • K3s cluster (1 control plane, 2 workers)

  • NGINX Ingress Controller active

  • cert-manager with Lets Encrypt Production ClusterIssuer

  • All domains operational over HTTPS:

    • portfolio.techarvest.co.zw
    • api.portfolio.techarvest.co.zw
    • minio.portfolio.techarvest.co.zw
    • console.minio.portfolio.techarvest.co.zw

2.2 Live Services (Namespace: geocrop)

MinIO (minio)

  • S3-compatible object storage

  • Buckets (planned/partially used):

    • geocrop-baselines
    • geocrop-models
    • geocrop-results
    • geocrop-datasets

Redis (redis)

  • Used as message broker for asynchronous ML tasks
  • Queue name: geocrop_tasks

FastAPI Backend (geocrop-api)

  • Live and publicly accessible
  • JWT authentication functional
  • Accepts AOI payload (Lat, Lon, Radius)
  • Pushes async tasks to Redis
  • Returns job_id

Python RQ Worker (geocrop-worker)

  • Live and listening to Redis queue
  • Currently runs mock inference using time.sleep()
  • Returns hardcoded JSON result

Dynamic World Baselines

  • DW seasonal GeoTIFFs successfully migrated from Google Drive
  • Converted to Cloud Optimized GeoTIFFs (COGs)
  • Stored locally and ready for MinIO upload

3. Development Objectives for Roo Code

Primary Objective: Replace mock pipeline with real geospatial + ML processing.


Phase 1 — Real Worker Pipeline

3.1 STAC Integration

Worker must:

  • Query DEA STAC endpoint: https://explorer.digitalearth.africa/stac/search

  • Filter by:

    • AOI geometry (circle polygon)
    • Date range: Summer Cropping Season = Sept 1 May 30 (must match model training window exactly)
    • Year range: 2015 Present
  • Use Sentinel-2 L2A collection (initial version)

⚠️ Correct seasonal window: Sept 1 May 30 (not SeptMay)


3.2 Feature Engineering

Worker must compute:

  • True Color composite
  • NDVI
  • EVI
  • SAVI
  • Peak NDVI/EVI/SAVI (seasonal max)

Feature consistency must match training feature_list.json.


3.3 ML Inference

Worker must:

  • Load selected model from MinIO (geocrop-models bucket)

  • Load:

    • model.pkl
    • feature_list.json
    • scaler.pkl (if used)
    • label_encoder.json
  • Validate feature alignment

  • Run inference


3.4 Neighborhood Smoothing

Implement configurable majority filter:

  • Default: 3x3 kernel
  • Optional: 5x5

Rule: If pixel class confidence is low AND surrounded by strong majority → flip to majority.


3.5 Output Generation

Worker must:

  • Export refined output as Cloud Optimized GeoTIFF (COG)
  • Save to MinIO under: geocrop-results/jobs/{job_id}/refined.tif

Optional outputs:

  • truecolor.tif
  • ndvi_peak.tif
  • dw_clipped.tif

Phase 2 — Tile Service (TiTiler)

Deploy geocrop-tiler service:

  • Reads COGs from MinIO
  • Serves XYZ tiles
  • Exposed via tiles.portfolio.techarvest.co.zw

Requirement:

  • Sub-second tile latency

Phase 3 — React Frontend

Frontend must:

  • Implement JWT login
  • AOI selection via Leaflet
  • Radius limit 5km
  • Zimbabwe boundary validation
  • Year dropdown (2015present)
  • Model dropdown
  • Submit job to API
  • Poll status endpoint
  • Render layers via tile server
  • Download GeoTIFF via signed URL

Map Layers:

  • Refined ML LULC
  • Dynamic World baseline
  • True Color
  • Peak NDVI/EVI/SAVI

Legend required.


4. Functional Requirements

4.1 Authentication & Quotas

REQ-1.1: JWT authentication required. REQ-1.2: Standard users limited to 5 jobs per 24 hours. REQ-1.3: Global concurrency cap = 2 running jobs cluster-wide. REQ-1.4: Admin users bypass quotas.


4.2 AOI Validation

REQ-2.1: AOI must fall strictly within Zimbabwe boundary (GeoJSON boundary file required). REQ-2.2: Radius must be ≤ 5km. REQ-2.3: Reject complex polygons exceeding vertex threshold.


4.3 Job Pipeline

REQ-3.1: API queues job in Redis. REQ-3.2: Worker processes job asynchronously. REQ-3.3: Worker saves COG outputs to MinIO. REQ-3.4: API generates signed URL for download. REQ-3.5: Deterministic cache key must be implemented:

Hash(year + season + model_version + lat + lon + radius)

If identical request exists → return cached result.


4.4 Admin Operations

REQ-4.1: Admin upload dataset to geocrop-datasets bucket. REQ-4.2: Admin trigger retraining via Kubernetes Job. REQ-4.3: Model registry must store:

  • model_name
  • version
  • training_date
  • features_used
  • class_mapping

5. External Interfaces

5.1 DEA STAC API

HTTPS-based STAC search queries.

5.2 MinIO S3 API

All object storage via signed S3 URLs.

5.3 Optional Historical Weather API

Used to enrich metadata (not required for MVP).


6. Performance & Security Attributes

  • GET /jobs/{id} response < 500ms
  • ML job timeout = 25 minutes
  • Tile latency < 1 second
  • All traffic via HTTPS
  • MinIO private
  • Signed URLs only
  • Secrets stored as Kubernetes Secrets

7. Architectural Clarifications for Roo

  • Namespace: geocrop
  • Ingress class: nginx
  • ClusterIssuer: letsencrypt-prod
  • Do NOT rebuild cluster
  • Do NOT expose MinIO publicly
  • Do NOT bypass quota logic

8. Future Enhancements (Post-MVP)

  • Postgres for persistent user/job storage
  • Confidence raster layer
  • Area statistics panel
  • Model comparison mode
  • Side-by-side DW vs Refined slider

Summary

Infrastructure is complete.

Next focus:

  1. Replace mock worker with real STAC + ML pipeline
  2. Deploy tiler
  3. Build frontend
  4. Add quotas + caching

This SRS defines execution boundaries for Roo Code.