geocrop-platform./GEMINI.md

74 lines
3.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# GeoCrop - Crop-Type Classification Platform
GeoCrop is an ML-based platform designed for crop-type classification in Zimbabwe. It utilizes Sentinel-2 satellite imagery from Digital Earth Africa (DEA) STAC, computes advanced spectral and phenological features, and employs multiple ML models (XGBoost, LightGBM, CatBoost, and Soft-Voting Ensembles) to generate high-resolution classification maps.
## 🚀 Project Overview
- **Architecture**: Distributed system with a FastAPI REST API, Redis/RQ job queue, and Python workers.
- **Data Pipeline**:
1. **DEA STAC**: Fetches Sentinel-2 L2A imagery.
2. **Feature Engineering**: Computes 51 features (NDVI, NDRE, EVI, SAVI, CI_RE, NDWI) including phenology, harmonics, and seasonal window summaries.
3. **Inference**: Loads models from MinIO, runs windowed predictions, and applies a majority filter.
4. **Output**: Generates Cloud Optimized GeoTIFFs (COGs) stored in MinIO and served via TiTiler.
- **Deployment**: Kubernetes (K3s) with automated SSL (cert-manager) and NGINX Ingress.
## 🛠️ Building and Running
### Development
```bash
# API Development
cd apps/api && pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8000
# Worker Development
cd apps/worker && pip install -r requirements.txt
python worker.py --worker
# Training Models
cd training && pip install -r requirements.txt
python train.py --data /path/to/data.csv --out ./artifacts --variant Raw
```
### Docker
```bash
docker build -t frankchine/geocrop-api:v1 apps/api/
docker build -t frankchine/geocrop-worker:v1 apps/worker/
```
### Kubernetes
```bash
# Apply manifests in order
kubectl apply -f k8s/00-namespace.yaml
kubectl apply -f k8s/
```
## 📐 Development Conventions
### Critical Patterns (Non-Obvious)
- **AOI Format**: Always use `(lon, lat, radius_m)` tuple. Longitude comes first.
- **Season Window**: Sept 1st to May 31st (Zimbabwe Summer Season). `year=2022` implies 2022-09-01 to 2023-05-31.
- **Zimbabwe Bounds**: Lon 25.233.1, Lat -22.5 to -15.6.
- **Feature Order**: `FEATURE_ORDER_V1` (51 features) is immutable; changing it breaks existing model compatibility.
- **Redis Connection**: Use `redis.geocrop.svc.cluster.local` within the cluster.
- **Queue**: Always use the `geocrop_tasks` queue.
### Storage Layout (MinIO)
- `geocrop-models`: ML model `.pkl` files in the root directory.
- `geocrop-baselines`: Dynamic World COGs (`dw/zim/summer/...`).
- `geocrop-results`: Output COGs (`results/<job_id>/...`).
- `geocrop-datasets`: Training CSV files.
## 📂 Key Files
- `apps/api/main.py`: REST API entry point and job dispatcher.
- `apps/worker/worker.py`: Core orchestration logic for the inference pipeline.
- `apps/worker/feature_computation.py`: Implementation of the 51 spectral features.
- `training/train.py`: Script for training and exporting ML models to MinIO.
- `CLAUDE.md`: Primary guide for Claude Code development patterns.
- `AGENTS.md`: Technical stack details and current cluster state.
## 🌐 Infrastructure
- **API**: `api.portfolio.techarvest.co.zw`
- **Tiler**: `tiles.portfolio.techarvest.co.zw`
- **MinIO**: `minio.portfolio.techarvest.co.zw`
- **Frontend**: `portfolio.techarvest.co.zw`