geocrop-platform./GEMINI.md

3.1 KiB
Raw Blame History

GeoCrop - Crop-Type Classification Platform

GeoCrop is an ML-based platform designed for crop-type classification in Zimbabwe. It utilizes Sentinel-2 satellite imagery from Digital Earth Africa (DEA) STAC, computes advanced spectral and phenological features, and employs multiple ML models (XGBoost, LightGBM, CatBoost, and Soft-Voting Ensembles) to generate high-resolution classification maps.

🚀 Project Overview

  • Architecture: Distributed system with a FastAPI REST API, Redis/RQ job queue, and Python workers.
  • Data Pipeline:
    1. DEA STAC: Fetches Sentinel-2 L2A imagery.
    2. Feature Engineering: Computes 51 features (NDVI, NDRE, EVI, SAVI, CI_RE, NDWI) including phenology, harmonics, and seasonal window summaries.
    3. Inference: Loads models from MinIO, runs windowed predictions, and applies a majority filter.
    4. Output: Generates Cloud Optimized GeoTIFFs (COGs) stored in MinIO and served via TiTiler.
  • Deployment: Kubernetes (K3s) with automated SSL (cert-manager) and NGINX Ingress.

🛠️ Building and Running

Development

# API Development
cd apps/api && pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8000

# Worker Development
cd apps/worker && pip install -r requirements.txt
python worker.py --worker

# Training Models
cd training && pip install -r requirements.txt
python train.py --data /path/to/data.csv --out ./artifacts --variant Raw

Docker

docker build -t frankchine/geocrop-api:v1 apps/api/
docker build -t frankchine/geocrop-worker:v1 apps/worker/

Kubernetes

# Apply manifests in order
kubectl apply -f k8s/00-namespace.yaml
kubectl apply -f k8s/

📐 Development Conventions

Critical Patterns (Non-Obvious)

  • AOI Format: Always use (lon, lat, radius_m) tuple. Longitude comes first.
  • Season Window: Sept 1st to May 31st (Zimbabwe Summer Season). year=2022 implies 2022-09-01 to 2023-05-31.
  • Zimbabwe Bounds: Lon 25.233.1, Lat -22.5 to -15.6.
  • Feature Order: FEATURE_ORDER_V1 (51 features) is immutable; changing it breaks existing model compatibility.
  • Redis Connection: Use redis.geocrop.svc.cluster.local within the cluster.
  • Queue: Always use the geocrop_tasks queue.

Storage Layout (MinIO)

  • geocrop-models: ML model .pkl files in the root directory.
  • geocrop-baselines: Dynamic World COGs (dw/zim/summer/...).
  • geocrop-results: Output COGs (results/<job_id>/...).
  • geocrop-datasets: Training CSV files.

📂 Key Files

  • apps/api/main.py: REST API entry point and job dispatcher.
  • apps/worker/worker.py: Core orchestration logic for the inference pipeline.
  • apps/worker/feature_computation.py: Implementation of the 51 spectral features.
  • training/train.py: Script for training and exporting ML models to MinIO.
  • CLAUDE.md: Primary guide for Claude Code development patterns.
  • AGENTS.md: Technical stack details and current cluster state.

🌐 Infrastructure

  • API: api.portfolio.techarvest.co.zw
  • Tiler: tiles.portfolio.techarvest.co.zw
  • MinIO: minio.portfolio.techarvest.co.zw
  • Frontend: portfolio.techarvest.co.zw