715 lines
24 KiB
Markdown
715 lines
24 KiB
Markdown
# AGENTS.md
|
||
|
||
This file provides guidance to agents when working with code in this repository.
|
||
|
||
## Project Stack
|
||
- **API**: FastAPI + Redis + RQ job queue
|
||
- **Worker**: Python 3.11, rasterio, scikit-learn, XGBoost, LightGBM, CatBoost
|
||
- **Storage**: MinIO (S3-compatible) with signed URLs
|
||
- **K8s**: Namespace `geocrop`, ingress class `nginx`, ClusterIssuer `letsencrypt-prod`
|
||
|
||
## Build Commands
|
||
|
||
### API
|
||
```bash
|
||
cd apps/api && pip install -r requirements.txt && uvicorn main:app --host 0.0.0.0 --port 8000
|
||
```
|
||
|
||
### Worker
|
||
```bash
|
||
cd apps/worker && pip install -r requirements.txt && python worker.py
|
||
```
|
||
|
||
### Training
|
||
```bash
|
||
cd training && python train.py --data /path/to/data.csv --out ./artifacts --variant Scaled
|
||
```
|
||
|
||
### Docker Build
|
||
```bash
|
||
docker build -t frankchine/geocrop-api:v1 apps/api/
|
||
docker build -t frankchine/geocrop-worker:v1 apps/worker/
|
||
```
|
||
|
||
## Critical Non-Obvious Patterns
|
||
|
||
### Season Window (Sept → May, NOT Nov-Apr)
|
||
[`apps/worker/config.py:135-141`](apps/worker/config.py:135) - Use `InferenceConfig.season_dates(year, "summer")` which returns Sept 1 to May 31 of following year.
|
||
|
||
### AOI Tuple Format (lon, lat, radius_m)
|
||
[`apps/worker/features.py:80`](apps/worker/features.py:80) - AOI is `(lon, lat, radius_m)` NOT `(lat, lon, radius)`.
|
||
|
||
### Redis Service Name
|
||
[`apps/api/main.py:18`](apps/api/main.py:18) - Use `redis.geocrop.svc.cluster.local` (Kubernetes DNS), NOT `localhost`.
|
||
|
||
### RQ Queue Name
|
||
[`apps/api/main.py:20`](apps/api/main.py:20) - Queue name is `geocrop_tasks`.
|
||
|
||
### Job Timeout
|
||
[`apps/api/main.py:96`](apps/api/main.py:96) - Job timeout is 25 minutes (`job_timeout='25m'`).
|
||
|
||
### Max Radius
|
||
[`apps/api/main.py:90`](apps/api/main.py:90) - Radius cannot exceed 5.0 km.
|
||
|
||
### Zimbabwe Bounds (rough bbox)
|
||
[`apps/worker/features.py:97-98`](apps/worker/features.py:97) - Lon: 25.2 to 33.1, Lat: -22.5 to -15.6.
|
||
|
||
### Model Artifacts Expected
|
||
[`apps/worker/inference.py:66-70`](apps/worker/inference.py:66) - `model.joblib`, `label_encoder.joblib`, `scaler.joblib` (optional), `selected_features.json`.
|
||
|
||
### DEA STAC Endpoint
|
||
[`apps/worker/config.py:147-148`](apps/worker/config.py:147) - Use `https://explorer.digitalearth.africa/stac/search`.
|
||
|
||
### Feature Names
|
||
[`apps/worker/features.py:221`](apps/worker/features.py:221) - Currently: `["ndvi_peak", "evi_peak", "savi_peak"]`.
|
||
|
||
### Majority Filter Kernel
|
||
[`apps/worker/features.py:254`](apps/worker/features.py:254) - Must be odd (3, 5, 7).
|
||
|
||
### DW Baseline Filename Format
|
||
[`Plan/srs.md:173`](Plan/srs.md:173) - `DW_Zim_HighestConf_YYYY_YYYY.tif`
|
||
|
||
### MinIO Buckets
|
||
- `geocrop-models` - trained ML models
|
||
- `geocrop-results` - output COGs
|
||
- `geocrop-baselines` - DW baseline COGs
|
||
- `geocrop-datasets` - training datasets
|
||
|
||
## Current Kubernetes Cluster State (as of 2026-02-27)
|
||
|
||
### Namespaces
|
||
- `geocrop` - Main application namespace
|
||
- `cert-manager` - Certificate management
|
||
- `ingress-nginx` - Ingress controller
|
||
- `kubernetes-dashboard` - Dashboard
|
||
|
||
### Deployments (geocrop namespace)
|
||
| Deployment | Image | Status | Age |
|
||
|------------|-------|--------|-----|
|
||
| geocrop-api | frankchine/geocrop-api:v3 | Running (1/1) | 159m |
|
||
| geocrop-worker | frankchine/geocrop-worker:v2 | Running (1/1) | 86m |
|
||
| redis | redis:alpine | Running (1/1) | 25h |
|
||
| minio | minio/minio | Running (1/1) | 25h |
|
||
| hello-web | nginx | Running (1/1) | 25h |
|
||
|
||
### Services (geocrop namespace)
|
||
| Service | Type | Cluster IP | Ports |
|
||
|---------|------|------------|-------|
|
||
| geocrop-api | ClusterIP | 10.43.7.69 | 8000/TCP |
|
||
| geocrop-web | ClusterIP | 10.43.101.43 | 80/TCP |
|
||
| redis | ClusterIP | 10.43.15.14 | 6379/TCP |
|
||
| minio | ClusterIP | 10.43.71.8 | 9000/TCP, 9001/TCP |
|
||
|
||
### Ingress (geocrop namespace)
|
||
| Ingress | Hosts | TLS | Backend |
|
||
|---------|-------|-----|---------|
|
||
| geocrop-web-api | portfolio.techarvest.co.zw, api.portfolio.techarvest.co.zw | geocrop-web-api-tls | geocrop-web:80, geocrop-api:8000 |
|
||
| geocrop-minio | minio.portfolio.techarvest.co.zw, console.minio.portfolio.techarvest.co.zw | minio-api-tls, minio-console-tls | minio:9000, minio:9001 |
|
||
|
||
### Storage
|
||
- MinIO PVC: 30Gi (local-path storage class), bound to pvc-44bf8a0f-cbc9-4336-aa54-edf1c4d0be86
|
||
|
||
### TLS Certificates
|
||
- ClusterIssuer: letsencrypt-prod (cert-manager)
|
||
- All TLS certificates are managed by cert-manager with automatic renewal
|
||
|
||
---
|
||
|
||
## STEP 0: Alignment Notes (Worker Implementation)
|
||
|
||
### Current Mock Behavior (apps/worker/*)
|
||
|
||
| File | Current State | Gap |
|
||
|------|--------------|-----|
|
||
| `features.py` | [`build_feature_stack_from_dea()`](apps/worker/features.py:193) returns placeholder zeros | **CRITICAL** - Need full DEA STAC loading + feature engineering |
|
||
| `inference.py` | Model loading with expected bundle format | Need to adapt to ROOT bucket format |
|
||
| `config.py` | [`MinIOStorage`](apps/worker/config.py:130) class exists | May need refinement for ROOT bucket access |
|
||
| `worker.py` | Mock handler returning fake results | Need full staged pipeline |
|
||
|
||
### Training Pipeline Expectations (plan/original_training.py)
|
||
|
||
#### Feature Engineering (must match exactly):
|
||
1. **Smoothing**: [`apply_smoothing()`](plan/original_training.py:69) - Savitzky-Golay (window=5, polyorder=2) + linear interpolation of zeros
|
||
2. **Phenology**: [`extract_phenology()`](plan/original_training.py:101) - max, min, mean, std, amplitude, auc, peak_timestep, max_slope_up, max_slope_down
|
||
3. **Harmonics**: [`add_harmonics()`](plan/original_training.py:141) - harmonic1_sin/cos, harmonic2_sin/cos
|
||
4. **Windows**: [`add_interactions_and_windows()`](plan/original_training.py:177) - early/peak/late windows, interactions
|
||
|
||
#### Indices Computed:
|
||
- ndvi, ndre, evi, savi, ci_re, ndwi
|
||
|
||
#### Junk Columns Dropped:
|
||
```python
|
||
['.geo', 'system:index', 'latitude', 'longitude', 'lat', 'lon', 'ID', 'parent_id', 'batch_id', 'is_syn']
|
||
```
|
||
|
||
### Model Storage Convention (FINAL)
|
||
|
||
**Location**: ROOT of `geocrop-models` bucket (no subfolders)
|
||
|
||
**Exact Object Names**:
|
||
```
|
||
geocrop-models/
|
||
├── Zimbabwe_XGBoost_Raw_Model.pkl
|
||
├── Zimbabwe_XGBoost_Model.pkl
|
||
├── Zimbabwe_RandomForest_Raw_Model.pkl
|
||
├── Zimbabwe_RandomForest_Model.pkl
|
||
├── Zimbabwe_LightGBM_Raw_Model.pkl
|
||
├── Zimbabwe_LightGBM_Model.pkl
|
||
├── Zimbabwe_Ensemble_Raw_Model.pkl
|
||
└── Zimbabwe_CatBoost_Raw_Model.pkl
|
||
```
|
||
|
||
**Model Selection Logic**:
|
||
| Job "model" value | MinIO filename | Scaler needed? |
|
||
|-------------------|---------------|----------------|
|
||
| "Ensemble" | Zimbabwe_Ensemble_Raw_Model.pkl | No |
|
||
| "Ensemble_Raw" | Zimbabwe_Ensemble_Raw_Model.pkl | No |
|
||
| "Ensemble_Scaled" | Zimbabwe_Ensemble_Model.pkl | Yes |
|
||
| "RandomForest" | Zimbabwe_RandomForest_Model.pkl | Yes |
|
||
| "XGBoost" | Zimbabwe_XGBoost_Model.pkl | Yes |
|
||
| "LightGBM" | Zimbabwe_LightGBM_Model.pkl | Yes |
|
||
| "CatBoost" | Zimbabwe_CatBoost_Raw_Model.pkl | No |
|
||
|
||
**Label Encoder Handling**:
|
||
- No separate `label_encoder.joblib` file exists
|
||
- Labels encoded in model via `model.classes_` attribute
|
||
- Default classes (if not available): `["cropland_rainfed", "cropland_irrigated", "tree_crop", "grassland", "shrubland", "urban", "water", "bare"]`
|
||
|
||
### DEA STAC Configuration
|
||
|
||
| Setting | Value |
|
||
|---------|-------|
|
||
| STAC Root | `https://explorer.digitalearth.africa/stac` |
|
||
| STAC Search | `https://explorer.digitalearth.africa/stac/search` |
|
||
| Primary Collection | `s2_l2a` (Sentinel-2 L2A) |
|
||
| Required Bands | red, green, blue, nir, nir08 (red-edge), swir16, swir22 |
|
||
| Cloud Filter | eo:cloud_cover < 30% |
|
||
| Season Window | Sep 1 → May 31 (year → year+1) |
|
||
|
||
### Dynamic World Baseline Layout
|
||
|
||
**Bucket**: `geocrop-baselines`
|
||
|
||
**Path Pattern**: `dw/zim/summer/<season>/<type>/DW_Zim_<Type>_<year>_<year+1>.tif`
|
||
|
||
**Tile Format**: COGs with 65536x65536 pixel tiles
|
||
- Example: `DW_Zim_HighestConf_2021_2022-0000000000-0000000000.tif`
|
||
|
||
### Results Layout
|
||
|
||
**Bucket**: `geocrop-results`
|
||
|
||
**Path Pattern**: `results/<job_id>/<filename>`
|
||
|
||
**Output Files**:
|
||
- `refined.tif` - Main classification result
|
||
- `dw_baseline.tif` - Clipped DW baseline (if requested)
|
||
- `truecolor.tif` - RGB composite (if requested)
|
||
- `ndvi_peak.tif`, `evi_peak.tif`, `savi_peak.tif` - Index peaks (if requested)
|
||
|
||
### Job Payload Schema
|
||
|
||
```json
|
||
{
|
||
"job_id": "uuid",
|
||
"user_id": "uuid",
|
||
"lat": -17.8,
|
||
"lon": 31.0,
|
||
"radius_m": 2000,
|
||
"year": 2022,
|
||
"season": "summer",
|
||
"model": "Ensemble",
|
||
"smoothing_kernel": 5,
|
||
"outputs": {
|
||
"refined": true,
|
||
"dw_baseline": false,
|
||
"true_color": false,
|
||
"indices": []
|
||
}
|
||
}
|
||
```
|
||
|
||
**Required Fields**: `job_id`, `lat`, `lon`, `radius_m`, `year`
|
||
|
||
**Defaults**:
|
||
- `season`: "summer"
|
||
- `model`: "Ensemble"
|
||
- `smoothing_kernel`: 5
|
||
- `outputs.refined`: true
|
||
|
||
### Pipeline Stages
|
||
|
||
| Stage | Description |
|
||
|-------|-------------|
|
||
| `fetch_stac` | Query DEA STAC for Sentinel-2 scenes |
|
||
| `build_features` | Load bands, compute indices, apply feature engineering |
|
||
| `load_dw` | Load and clip Dynamic World baseline |
|
||
| `infer` | Run ML model inference |
|
||
| `smooth` | Apply majority filter post-processing |
|
||
| `export_cog` | Write GeoTIFF as COG |
|
||
| `upload` | Upload to MinIO |
|
||
| `done` | Complete |
|
||
|
||
### Environment Variables
|
||
|
||
| Variable | Default | Description |
|
||
|----------|---------|-------------|
|
||
| `REDIS_HOST` | `redis.geocrop.svc.cluster.local` | Redis service |
|
||
| `MINIO_ENDPOINT` | `minio.geocrop.svc.cluster.local:9000` | MinIO service |
|
||
| `MINIO_ACCESS_KEY` | `minioadmin` | MinIO access key |
|
||
| `MINIO_SECRET_KEY` | `minioadmin` | MinIO secret key |
|
||
| `MINIO_SECURE` | `false` | Use HTTPS for MinIO |
|
||
| `GEOCROP_CACHE_DIR` | `/tmp/geocrop-cache` | Local cache directory |
|
||
|
||
### Assumptions / TODOs
|
||
|
||
1. **EPSG**: Default to UTM Zone 36S (EPSG:32736) for Zimbabwe - compute dynamically from AOI center in production
|
||
2. **Feature Names**: Training uses selected features from LightGBM importance - may vary per model
|
||
3. **Label Encoder**: No separate file - extract from model or use defaults
|
||
4. **Scaler**: Only for non-Raw models; Raw models use unscaled features
|
||
5. **DW Tiles**: Must handle 2x2 tile mosaicking for full AOI coverage
|
||
|
||
---
|
||
|
||
## Worker Contracts (STEP 1)
|
||
|
||
### Job Payload Contract
|
||
|
||
```python
|
||
# Minimal required fields:
|
||
{
|
||
"job_id": "uuid",
|
||
"lat": -17.8,
|
||
"lon": 31.0,
|
||
"radius_m": 2000, # max 5000m
|
||
"year": 2022 # 2015-current
|
||
}
|
||
|
||
# Full with all options:
|
||
{
|
||
"job_id": "uuid",
|
||
"user_id": "uuid", # optional
|
||
"lat": -17.8,
|
||
"lon": 31.0,
|
||
"radius_m": 2000,
|
||
"year": 2022,
|
||
"season": "summer", # default
|
||
"model": "Ensemble", # or RandomForest, XGBoost, LightGBM, CatBoost
|
||
"smoothing_kernel": 5, # 3, 5, or 7
|
||
"outputs": {
|
||
"refined": True,
|
||
"dw_baseline": True,
|
||
"true_color": True,
|
||
"indices": ["ndvi_peak", "evi_peak", "savi_peak"]
|
||
},
|
||
"stac": {
|
||
"cloud_cover_lt": 20,
|
||
"max_items": 60
|
||
}
|
||
}
|
||
```
|
||
|
||
### Worker Stages
|
||
|
||
```
|
||
fetch_stac → build_features → load_dw → infer → smooth → export_cog → upload → done
|
||
```
|
||
|
||
### Default Class List (TEMPORARY V1)
|
||
|
||
Until we make fully dynamic, use these classes (order matters if model doesn't provide classes):
|
||
|
||
```python
|
||
CLASSES_V1 = [
|
||
"Avocado","Banana","Bare Surface","Blueberry","Built-Up","Cabbage","Chilli","Citrus","Cotton","Cowpea",
|
||
"Finger Millet","Forest","Grassland","Groundnut","Macadamia","Maize","Pasture Legume","Pearl Millet",
|
||
"Peas","Potato","Roundnut","Sesame","Shrubland","Sorghum","Soyabean","Sugarbean","Sugarcane","Sunflower",
|
||
"Sunhem","Sweet Potato","Tea","Tobacco","Tomato","Water","Woodland"
|
||
]
|
||
```
|
||
|
||
Note: This is TEMPORARY - later we will extract class names dynamically from the trained model.
|
||
|
||
---
|
||
|
||
## STEP 2: Storage Adapter (MinIO)
|
||
|
||
### Environment Variables
|
||
|
||
| Variable | Default | Description |
|
||
|----------|---------|-------------|
|
||
| `MINIO_ENDPOINT` | `minio.geocrop.svc.cluster.local:9000` | MinIO service |
|
||
| `MINIO_ACCESS_KEY` | `minioadmin` | MinIO access key |
|
||
| `MINIO_SECRET_KEY` | `minioadmin123` | MinIO secret key |
|
||
| `MINIO_SECURE` | `false` | Use HTTPS for MinIO |
|
||
| `MINIO_REGION` | `us-east-1` | AWS region |
|
||
| `MINIO_BUCKET_MODELS` | `geocrop-models` | Models bucket |
|
||
| `MINIO_BUCKET_BASELINES` | `geocrop-baselines` | Baselines bucket |
|
||
| `MINIO_BUCKET_RESULTS` | `geocrop-results` | Results bucket |
|
||
|
||
### Bucket/Key Conventions
|
||
|
||
- **Models**: ROOT of `geocrop-models` (no subfolders)
|
||
- **DW Baselines**: `geocrop-baselines/dw/zim/summer/<season>/<type>/DW_Zim_<Type>_<year>_<year+1>.tif`
|
||
- **Results**: `geocrop-results/results/<job_id>/<filename>`
|
||
|
||
### Model Filename Mapping
|
||
|
||
| Job model value | Primary filename | Fallback |
|
||
|-----------------|-----------------|----------|
|
||
| "Ensemble" | Zimbabwe_Ensemble_Model.pkl | Zimbabwe_Ensemble_Raw_Model.pkl |
|
||
| "RandomForest" | Zimbabwe_RandomForest_Model.pkl | Zimbabwe_RandomForest_Raw_Model.pkl |
|
||
| "XGBoost" | Zimbabwe_XGBoost_Model.pkl | Zimbabwe_XGBoost_Raw_Model.pkl |
|
||
| "LightGBM" | Zimbabwe_LightGBM_Model.pkl | Zimbabwe_LightGBM_Raw_Model.pkl |
|
||
| "CatBoost" | Zimbabwe_CatBoost_Model.pkl | Zimbabwe_CatBoost_Raw_Model.pkl |
|
||
|
||
### Methods
|
||
|
||
- `ping()` → `(bool, str)`: Check MinIO connectivity
|
||
- `head_object(bucket, key)` → `dict|None`: Get object metadata
|
||
- `list_objects(bucket, prefix)` → `list[str]`: List object keys
|
||
- `download_file(bucket, key, dest_path)` → `Path`: Download file
|
||
- `download_model_file(model_name, dest_dir)` → `Path`: Download model with fallback
|
||
- `upload_file(bucket, key, local_path)` → `str`: Upload file, returns s3:// URI
|
||
- `upload_result(job_id, local_path, filename)` → `(s3_uri, key)`: Upload result
|
||
- `presign_get(bucket, key, expires)` → `str`: Generate presigned URL
|
||
|
||
---
|
||
|
||
## STEP 3: STAC Client (DEA)
|
||
|
||
### Environment Variables
|
||
|
||
| Variable | Default | Description |
|
||
|----------|---------|-------------|
|
||
| `DEA_STAC_ROOT` | `https://explorer.digitalearth.africa/stac` | STAC root URL |
|
||
| `DEA_STAC_SEARCH` | `https://explorer.digitalearth.africa/stac/search` | STAC search URL |
|
||
| `DEA_CLOUD_MAX` | `30` | Cloud cover filter (percent) |
|
||
| `DEA_TIMEOUT_S` | `30` | Request timeout (seconds) |
|
||
|
||
### Collection Resolution
|
||
|
||
Preferred Sentinel-2 collection IDs (in order):
|
||
1. `s2_l2a`
|
||
2. `s2_l2a_c1`
|
||
3. `sentinel-2-l2a`
|
||
4. `sentinel_2_l2a`
|
||
|
||
If none found, raises ValueError with available collections.
|
||
|
||
### Methods
|
||
|
||
- `list_collections()` → `list[str]`: List available collections
|
||
- `resolve_s2_collection()` → `str|None`: Resolve best S2 collection
|
||
- `search_items(bbox, start_date, end_date)` → `list[pystac.Item]`: Search for items
|
||
- `summarize_items(items)` → `dict`: Summarize search results without downloading
|
||
|
||
### summarize_items() Output Structure
|
||
|
||
```python
|
||
{
|
||
"count": int,
|
||
"collection": str,
|
||
"time_start": "ISO datetime",
|
||
"time_end": "ISO datetime",
|
||
"items": [
|
||
{
|
||
"id": str,
|
||
"datetime": "ISO datetime",
|
||
"bbox": [minx, miny, maxx, maxy],
|
||
"cloud_cover": float|None,
|
||
"assets": {
|
||
"red": {"href": str, "type": str, "roles": list},
|
||
...
|
||
}
|
||
}, ...
|
||
]
|
||
}
|
||
```
|
||
|
||
**Note**: stackstac loading is NOT implemented in this step. It will come in Step 4/5.
|
||
|
||
---
|
||
|
||
## STEP 4A: Feature Computation (Math)
|
||
|
||
### Features Produced
|
||
|
||
**Base indices (time-series):**
|
||
- ndvi, ndre, evi, savi, ci_re, ndwi
|
||
|
||
**Smoothed time-series:**
|
||
- For every index above, Savitzky-Golay smoothing (window=5, polyorder=2)
|
||
- Suffix: *_smooth
|
||
|
||
**Phenology metrics (computed across time for NDVI, NDRE, EVI):**
|
||
- _max, _min, _mean, _std, _amplitude, _auc, _peak_timestep, _max_slope_up, _max_slope_down
|
||
|
||
**Harmonic features (for NDVI only):**
|
||
- ndvi_harmonic1_sin, ndvi_harmonic1_cos, ndvi_harmonic2_sin, ndvi_harmonic2_cos
|
||
|
||
**Interaction features:**
|
||
- ndvi_ndre_peak_diff = ndvi_max - ndre_max
|
||
- canopy_density_contrast = evi_mean / (ndvi_mean + 0.001)
|
||
|
||
### Smoothing Approach
|
||
|
||
1. **fill_zeros_linear**: Treats 0 as missing, linear interpolates between non-zero neighbors
|
||
2. **savgol_smooth_1d**: Uses scipy.signal.savgol_filter if available, falls back to simple moving average
|
||
|
||
### Phenology Metrics Definitions
|
||
|
||
| Metric | Formula |
|
||
|--------|---------|
|
||
| max | np.max(y) |
|
||
| min | np.min(y) |
|
||
| mean | np.mean(y) |
|
||
| std | np.std(y) |
|
||
| amplitude | max - min |
|
||
| auc | trapezoidal integral (dx=10 days) |
|
||
| peak_timestep | argmax(y) |
|
||
| max_slope_up | max(diff(y)) |
|
||
| max_slope_down | min(diff(y)) |
|
||
|
||
### Harmonic Coefficient Definition
|
||
|
||
For normalized time t = 2*pi*k/N:
|
||
- h1_sin = mean(y * sin(t))
|
||
- h1_cos = mean(y * cos(t))
|
||
- h2_sin = mean(y * sin(2t))
|
||
- h2_cos = mean(y * cos(2t))
|
||
|
||
### Note
|
||
Step 4B will add seasonal window summaries and final feature vector ordering.
|
||
|
||
---
|
||
|
||
## STEP 4B: Window Summaries + Feature Order
|
||
|
||
### Seasonal Window Features (18 features)
|
||
|
||
Season window is Oct–Jun, split into:
|
||
- **Early**: Oct–Dec
|
||
- **Peak**: Jan–Mar
|
||
- **Late**: Apr–Jun
|
||
|
||
For each window, computed for NDVI, NDWI, NDRE:
|
||
- `<index>_<window>_mean`
|
||
- `<index>_<window>_max`
|
||
|
||
Total: 3 indices × 3 windows × 2 stats = **18 features**
|
||
|
||
### Feature Ordering (FEATURE_ORDER_V1)
|
||
|
||
51 scalar features in order:
|
||
1. **Phenology metrics** (27): ndvi, ndre, evi (each with max, min, mean, std, amplitude, auc, peak_timestep, max_slope_up, max_slope_down)
|
||
2. **Harmonics** (4): ndvi_harmonic1_sin/cos, ndvi_harmonic2_sin/cos
|
||
3. **Interactions** (2): ndvi_ndre_peak_diff, canopy_density_contrast
|
||
4. **Window summaries** (18): ndvi/ndwi/ndre × early/peak/late × mean/max
|
||
|
||
Note: Additional smoothed array features (*_smooth) are not in FEATURE_ORDER_V1 since they are arrays, not scalars.
|
||
|
||
### Window Splitting Logic
|
||
- If `dates` provided: Use month membership (10,11,12 = early; 1,2,3 = peak; 4,5,6 = late)
|
||
- Fallback: Positional split (first 9 steps = early, next 9 = peak, next 9 = late)
|
||
|
||
---
|
||
|
||
## STEP 5: DW Baseline Loading
|
||
|
||
### DW Object Layout
|
||
|
||
**Bucket**: `geocrop-baselines`
|
||
|
||
**Prefix**: `dw/zim/summer/`
|
||
|
||
**Path Pattern**: `dw/zim/summer/<season>/<type>/DW_Zim_<Type>_<year>_<year+1>.tif`
|
||
|
||
**Tile Naming**: COGs with 65536x65536 pixel tiles
|
||
- Example: `DW_Zim_HighestConf_2021_2022-0000000000-0000000000.tif`
|
||
- Format: `{Type}_{Year}_{Year+1}-{TileRow}-{TileCol}.tif`
|
||
|
||
### DW Types
|
||
- `HighestConf` - Highest confidence class
|
||
- `Agreement` - Class agreement across predictions
|
||
- `Mode` - Most common class
|
||
|
||
### Windowed Reads
|
||
|
||
The worker MUST use windowed reads to avoid downloading entire huge COG tiles:
|
||
|
||
1. **Presigned URL**: Get temporary URL via `storage.presign_get(bucket, key, expires=3600)`
|
||
2. **AOI Transform**: Convert AOI bbox from WGS84 to tile CRS using `rasterio.warp.transform_bounds`
|
||
3. **Window Creation**: Use `rasterio.windows.from_bounds` to compute window from transformed bbox
|
||
4. **Selective Read**: Call `src.read(window=window)` to read only the needed portion
|
||
5. **Mosaic**: If multiple tiles needed, read each window and mosaic into single array
|
||
|
||
### CRS Handling
|
||
|
||
- DW tiles may be in EPSG:3857 (Web Mercator) or UTM - do NOT assume
|
||
- Always transform AOI bbox to tile CRS before computing window
|
||
- Output profile uses tile's native CRS
|
||
|
||
### Error Handling
|
||
|
||
- If no matching tiles found: Raise `FileNotFoundError` with searched prefix
|
||
- If window read fails: Retry 3x with exponential backoff
|
||
- Nodata value: 0 (preserved from DW)
|
||
|
||
### Primary Function
|
||
|
||
```python
|
||
def load_dw_baseline_window(
|
||
storage,
|
||
year: int,
|
||
season: str = "summer",
|
||
aoi_bbox_wgs84: List[float], # [min_lon, min_lat, max_lon, max_lat]
|
||
dw_type: str = "HighestConf",
|
||
bucket: str = "geocrop-baselines",
|
||
max_retries: int = 3,
|
||
) -> Tuple[np.ndarray, dict]:
|
||
"""Load DW baseline clipped to AOI window from MinIO.
|
||
|
||
Returns:
|
||
dw_arr: uint8 or int16 raster clipped to AOI
|
||
profile: rasterio profile for writing outputs aligned to this window
|
||
"""
|
||
```
|
||
|
||
---
|
||
|
||
## Plan 02 - Step 1: TiTiler Deployment+Service
|
||
|
||
### Files Changed
|
||
- Created: [`k8s/25-tiler.yaml`](k8s/25-tiler.yaml)
|
||
- Created: Kubernetes Secret `geocrop-secrets` with MinIO credentials
|
||
|
||
### Commands Run
|
||
```bash
|
||
kubectl create secret generic geocrop-secrets -n geocrop --from-literal=minio-access-key=minioadmin --from-literal=minio-secret-key=minioadmin123
|
||
kubectl -n geocrop apply -f k8s/25-tiler.yaml
|
||
kubectl -n geocrop get deploy,svc | grep geocrop-tiler
|
||
```
|
||
|
||
### Expected Output / Acceptance Criteria
|
||
- `kubectl -n geocrop apply -f k8s/25-tiler.yaml` succeeds (syntax correct)
|
||
- Creates Deployment `geocrop-tiler` with 2 replicas
|
||
- Creates Service `geocrop-tiler` (ClusterIP on port 8000 → container port 80)
|
||
- TiTiler container reads COGs from MinIO via S3
|
||
- Pods are Running and Ready (1/1)
|
||
|
||
### Actual Output
|
||
```
|
||
deployment.apps/geocrop-tiler 2/2 2 2 2m
|
||
service/geocrop-tiler ClusterIP 10.43.47.225 <none> 8000/TCP 2m
|
||
```
|
||
|
||
### TiTiler Environment Variables
|
||
| Variable | Value |
|
||
|----------|-------|
|
||
| AWS_ACCESS_KEY_ID | from secret geocrop-secrets |
|
||
| AWS_SECRET_ACCESS_KEY | from secret geocrop-secrets |
|
||
| AWS_REGION | us-east-1 |
|
||
| AWS_S3_ENDPOINT_URL | http://minio.geocrop.svc.cluster.local:9000 |
|
||
| AWS_HTTPS | NO |
|
||
| TILED_READER | cog |
|
||
|
||
### Notes
|
||
- Container listens on port 80 (not 8000) - service maps 8000 → 80
|
||
- Health probe path `/healthz` on port 80
|
||
- Secret `geocrop-secrets` created for MinIO credentials
|
||
|
||
### Next Step
|
||
- Step 2: Add Ingress for TiTiler (with TLS)
|
||
|
||
---
|
||
|
||
## Plan 02 - Step 2: TiTiler Ingress
|
||
|
||
### Files Changed
|
||
- Created: [`k8s/26-tiler-ingress.yaml`](k8s/26-tiler-ingress.yaml)
|
||
|
||
### Commands Run
|
||
```bash
|
||
kubectl -n geocrop apply -f k8s/26-tiler-ingress.yaml
|
||
kubectl -n geocrop get ingress geocrop-tiler -o wide
|
||
kubectl -n geocrop describe ingress geocrop-tiler
|
||
```
|
||
|
||
### Expected Output / Acceptance Criteria
|
||
- Ingress object created with host `tiles.portfolio.techarvest.co.zw`
|
||
- TLS certificate will be pending until DNS A record is pointed to ingress IP
|
||
|
||
### Actual Output
|
||
```
|
||
NAME CLASS HOSTS ADDRESS PORTS AGE
|
||
geocrop-tiler nginx tiles.portfolio.techarvest.co.zw 167.86.68.48 80, 443 30s
|
||
```
|
||
|
||
### Ingress Details
|
||
- Host: tiles.portfolio.techarvest.co.zw
|
||
- Backend: geocrop-tiler:8000
|
||
- TLS: geocrop-tiler-tls (cert-manager with letsencrypt-prod)
|
||
- Annotations: nginx.ingress.kubernetes.io/proxy-body-size: "50m"
|
||
|
||
### DNS Requirement
|
||
External DNS A record must point to ingress IP (167.86.68.48):
|
||
- `tiles.portfolio.techarvest.co.zw` → `167.86.68.48`
|
||
|
||
---
|
||
|
||
## Plan 02 - Step 3: TiTiler Smoke Test
|
||
|
||
### Commands Run
|
||
```bash
|
||
kubectl -n geocrop port-forward svc/geocrop-tiler 8000:8000 &
|
||
curl -sS http://127.0.0.1:8000/ | head
|
||
curl -sS -o /dev/null -w "%{http_code}\n" http://127.0.0.1:8000/healthz
|
||
```
|
||
|
||
### Test Results
|
||
| Endpoint | Status | Notes |
|
||
|----------|--------|-------|
|
||
| `/` | 200 | Landing page JSON returned |
|
||
| `/healthz` | 200 | Health check passes |
|
||
| `/api` | 200 | OpenAPI docs available |
|
||
|
||
### Final Probe Path
|
||
- **Confirmed**: `/healthz` on port 80 works correctly
|
||
- No manifest changes needed
|
||
|
||
---
|
||
|
||
## Plan 02 - Step 4: MinIO S3 Access Test
|
||
|
||
### Commands Run
|
||
```bash
|
||
# With correct credentials (minioadmin/minioadmin123)
|
||
curl -sS "http://127.0.0.1:8000/cog/info?url=s3://geocrop-baselines/dw/zim/summer/summer/highest/DW_Zim_HighestConf_2016_2017-0000000000-0000000000.tif"
|
||
```
|
||
|
||
### Test Results
|
||
| Test | Result | Notes |
|
||
|------|--------|-------|
|
||
| S3 Access | ❌ Failed | Error: "The AWS Access Key Id you provided does not exist in our records" |
|
||
|
||
### Issue Analysis
|
||
- MinIO credentials used: `minioadmin` / `minioadmin123`
|
||
- The root user is `minioadmin` with password `minioadmin123`
|
||
- TiTiler pods have correct env vars set (verified via `kubectl exec`)
|
||
- Issue may be: (1) bucket not created, (2) bucket path incorrect, or (3) network policy
|
||
|
||
### Environment Variables (Verified Working)
|
||
| Variable | Value |
|
||
|----------|-------|
|
||
| AWS_ACCESS_KEY_ID | minioadmin |
|
||
| AWS_SECRET_ACCESS_KEY | minioadmin123 |
|
||
| AWS_S3_ENDPOINT_URL | http://minio.geocrop.svc.cluster.local:9000 |
|
||
| AWS_HTTPS | NO |
|
||
| AWS_REGION | us-east-1 |
|
||
|
||
### Next Step
|
||
- Verify bucket exists in MinIO
|
||
- Check bucket naming convention in MinIO console
|
||
- Or upload test COG to verify S3 access
|