74 lines
3.1 KiB
Markdown
74 lines
3.1 KiB
Markdown
# GeoCrop - Crop-Type Classification Platform
|
||
|
||
GeoCrop is an ML-based platform designed for crop-type classification in Zimbabwe. It utilizes Sentinel-2 satellite imagery from Digital Earth Africa (DEA) STAC, computes advanced spectral and phenological features, and employs multiple ML models (XGBoost, LightGBM, CatBoost, and Soft-Voting Ensembles) to generate high-resolution classification maps.
|
||
|
||
## 🚀 Project Overview
|
||
|
||
- **Architecture**: Distributed system with a FastAPI REST API, Redis/RQ job queue, and Python workers.
|
||
- **Data Pipeline**:
|
||
1. **DEA STAC**: Fetches Sentinel-2 L2A imagery.
|
||
2. **Feature Engineering**: Computes 51 features (NDVI, NDRE, EVI, SAVI, CI_RE, NDWI) including phenology, harmonics, and seasonal window summaries.
|
||
3. **Inference**: Loads models from MinIO, runs windowed predictions, and applies a majority filter.
|
||
4. **Output**: Generates Cloud Optimized GeoTIFFs (COGs) stored in MinIO and served via TiTiler.
|
||
- **Deployment**: Kubernetes (K3s) with automated SSL (cert-manager) and NGINX Ingress.
|
||
|
||
## 🛠️ Building and Running
|
||
|
||
### Development
|
||
```bash
|
||
# API Development
|
||
cd apps/api && pip install -r requirements.txt
|
||
uvicorn main:app --host 0.0.0.0 --port 8000
|
||
|
||
# Worker Development
|
||
cd apps/worker && pip install -r requirements.txt
|
||
python worker.py --worker
|
||
|
||
# Training Models
|
||
cd training && pip install -r requirements.txt
|
||
python train.py --data /path/to/data.csv --out ./artifacts --variant Raw
|
||
```
|
||
|
||
### Docker
|
||
```bash
|
||
docker build -t frankchine/geocrop-api:v1 apps/api/
|
||
docker build -t frankchine/geocrop-worker:v1 apps/worker/
|
||
```
|
||
|
||
### Kubernetes
|
||
```bash
|
||
# Apply manifests in order
|
||
kubectl apply -f k8s/00-namespace.yaml
|
||
kubectl apply -f k8s/
|
||
```
|
||
|
||
## 📐 Development Conventions
|
||
|
||
### Critical Patterns (Non-Obvious)
|
||
- **AOI Format**: Always use `(lon, lat, radius_m)` tuple. Longitude comes first.
|
||
- **Season Window**: Sept 1st to May 31st (Zimbabwe Summer Season). `year=2022` implies 2022-09-01 to 2023-05-31.
|
||
- **Zimbabwe Bounds**: Lon 25.2–33.1, Lat -22.5 to -15.6.
|
||
- **Feature Order**: `FEATURE_ORDER_V1` (51 features) is immutable; changing it breaks existing model compatibility.
|
||
- **Redis Connection**: Use `redis.geocrop.svc.cluster.local` within the cluster.
|
||
- **Queue**: Always use the `geocrop_tasks` queue.
|
||
|
||
### Storage Layout (MinIO)
|
||
- `geocrop-models`: ML model `.pkl` files in the root directory.
|
||
- `geocrop-baselines`: Dynamic World COGs (`dw/zim/summer/...`).
|
||
- `geocrop-results`: Output COGs (`results/<job_id>/...`).
|
||
- `geocrop-datasets`: Training CSV files.
|
||
|
||
## 📂 Key Files
|
||
- `apps/api/main.py`: REST API entry point and job dispatcher.
|
||
- `apps/worker/worker.py`: Core orchestration logic for the inference pipeline.
|
||
- `apps/worker/feature_computation.py`: Implementation of the 51 spectral features.
|
||
- `training/train.py`: Script for training and exporting ML models to MinIO.
|
||
- `CLAUDE.md`: Primary guide for Claude Code development patterns.
|
||
- `AGENTS.md`: Technical stack details and current cluster state.
|
||
|
||
## 🌐 Infrastructure
|
||
- **API**: `api.portfolio.techarvest.co.zw`
|
||
- **Tiler**: `tiles.portfolio.techarvest.co.zw`
|
||
- **MinIO**: `minio.portfolio.techarvest.co.zw`
|
||
- **Frontend**: `portfolio.techarvest.co.zw`
|