# GeoCrop - Crop-Type Classification Platform GeoCrop is an ML-based platform designed for crop-type classification in Zimbabwe. It utilizes Sentinel-2 satellite imagery from Digital Earth Africa (DEA) STAC, computes advanced spectral and phenological features, and employs multiple ML models (XGBoost, LightGBM, CatBoost, and Soft-Voting Ensembles) to generate high-resolution classification maps. ## 🚀 Project Overview - **Architecture**: Distributed system with a FastAPI REST API, Redis/RQ job queue, and Python workers. - **Data Pipeline**: 1. **DEA STAC**: Fetches Sentinel-2 L2A imagery. 2. **Feature Engineering**: Computes 51 features (NDVI, NDRE, EVI, SAVI, CI_RE, NDWI) including phenology, harmonics, and seasonal window summaries. 3. **Inference**: Loads models from MinIO, runs windowed predictions, and applies a majority filter. 4. **Output**: Generates Cloud Optimized GeoTIFFs (COGs) stored in MinIO and served via TiTiler. - **Deployment**: Kubernetes (K3s) with automated SSL (cert-manager) and NGINX Ingress. ## 🛠️ Building and Running ### Development ```bash # API Development cd apps/api && pip install -r requirements.txt uvicorn main:app --host 0.0.0.0 --port 8000 # Worker Development cd apps/worker && pip install -r requirements.txt python worker.py --worker # Training Models cd training && pip install -r requirements.txt python train.py --data /path/to/data.csv --out ./artifacts --variant Raw ``` ### Docker ```bash docker build -t frankchine/geocrop-api:v1 apps/api/ docker build -t frankchine/geocrop-worker:v1 apps/worker/ ``` ### Kubernetes ```bash # Apply manifests in order kubectl apply -f k8s/00-namespace.yaml kubectl apply -f k8s/ ``` ## 📐 Development Conventions ### Critical Patterns (Non-Obvious) - **AOI Format**: Always use `(lon, lat, radius_m)` tuple. Longitude comes first. - **Season Window**: Sept 1st to May 31st (Zimbabwe Summer Season). `year=2022` implies 2022-09-01 to 2023-05-31. - **Zimbabwe Bounds**: Lon 25.2–33.1, Lat -22.5 to -15.6. - **Feature Order**: `FEATURE_ORDER_V1` (51 features) is immutable; changing it breaks existing model compatibility. - **Redis Connection**: Use `redis.geocrop.svc.cluster.local` within the cluster. - **Queue**: Always use the `geocrop_tasks` queue. ### Storage Layout (MinIO) - `geocrop-models`: ML model `.pkl` files in the root directory. - `geocrop-baselines`: Dynamic World COGs (`dw/zim/summer/...`). - `geocrop-results`: Output COGs (`results//...`). - `geocrop-datasets`: Training CSV files. ## 📂 Key Files - `apps/api/main.py`: REST API entry point and job dispatcher. - `apps/worker/worker.py`: Core orchestration logic for the inference pipeline. - `apps/worker/feature_computation.py`: Implementation of the 51 spectral features. - `training/train.py`: Script for training and exporting ML models to MinIO. - `CLAUDE.md`: Primary guide for Claude Code development patterns. - `AGENTS.md`: Technical stack details and current cluster state. ## 🌐 Infrastructure - **API**: `api.portfolio.techarvest.co.zw` - **Tiler**: `tiles.portfolio.techarvest.co.zw` - **MinIO**: `minio.portfolio.techarvest.co.zw` - **Frontend**: `portfolio.techarvest.co.zw`