3.1 KiB
3.1 KiB
GeoCrop - Crop-Type Classification Platform
GeoCrop is an ML-based platform designed for crop-type classification in Zimbabwe. It utilizes Sentinel-2 satellite imagery from Digital Earth Africa (DEA) STAC, computes advanced spectral and phenological features, and employs multiple ML models (XGBoost, LightGBM, CatBoost, and Soft-Voting Ensembles) to generate high-resolution classification maps.
🚀 Project Overview
- Architecture: Distributed system with a FastAPI REST API, Redis/RQ job queue, and Python workers.
- Data Pipeline:
- DEA STAC: Fetches Sentinel-2 L2A imagery.
- Feature Engineering: Computes 51 features (NDVI, NDRE, EVI, SAVI, CI_RE, NDWI) including phenology, harmonics, and seasonal window summaries.
- Inference: Loads models from MinIO, runs windowed predictions, and applies a majority filter.
- Output: Generates Cloud Optimized GeoTIFFs (COGs) stored in MinIO and served via TiTiler.
- Deployment: Kubernetes (K3s) with automated SSL (cert-manager) and NGINX Ingress.
🛠️ Building and Running
Development
# API Development
cd apps/api && pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8000
# Worker Development
cd apps/worker && pip install -r requirements.txt
python worker.py --worker
# Training Models
cd training && pip install -r requirements.txt
python train.py --data /path/to/data.csv --out ./artifacts --variant Raw
Docker
docker build -t frankchine/geocrop-api:v1 apps/api/
docker build -t frankchine/geocrop-worker:v1 apps/worker/
Kubernetes
# Apply manifests in order
kubectl apply -f k8s/00-namespace.yaml
kubectl apply -f k8s/
📐 Development Conventions
Critical Patterns (Non-Obvious)
- AOI Format: Always use
(lon, lat, radius_m)tuple. Longitude comes first. - Season Window: Sept 1st to May 31st (Zimbabwe Summer Season).
year=2022implies 2022-09-01 to 2023-05-31. - Zimbabwe Bounds: Lon 25.2–33.1, Lat -22.5 to -15.6.
- Feature Order:
FEATURE_ORDER_V1(51 features) is immutable; changing it breaks existing model compatibility. - Redis Connection: Use
redis.geocrop.svc.cluster.localwithin the cluster. - Queue: Always use the
geocrop_tasksqueue.
Storage Layout (MinIO)
geocrop-models: ML model.pklfiles in the root directory.geocrop-baselines: Dynamic World COGs (dw/zim/summer/...).geocrop-results: Output COGs (results/<job_id>/...).geocrop-datasets: Training CSV files.
📂 Key Files
apps/api/main.py: REST API entry point and job dispatcher.apps/worker/worker.py: Core orchestration logic for the inference pipeline.apps/worker/feature_computation.py: Implementation of the 51 spectral features.training/train.py: Script for training and exporting ML models to MinIO.CLAUDE.md: Primary guide for Claude Code development patterns.AGENTS.md: Technical stack details and current cluster state.
🌐 Infrastructure
- API:
api.portfolio.techarvest.co.zw - Tiler:
tiles.portfolio.techarvest.co.zw - MinIO:
minio.portfolio.techarvest.co.zw - Frontend:
portfolio.techarvest.co.zw