docs: update system documentation to reflect current MLOps/GitOps infrastructure

This commit is contained in:
fchinembiri 2026-04-28 21:08:16 +02:00
parent 8fd6c8d4e5
commit dba7d2bf99
3 changed files with 162 additions and 907 deletions

777
AGENTS.md
View File

@ -1,714 +1,77 @@
# AGENTS.md
# AGENTS.md - GeoCrop Intelligence & Patterns
This file provides guidance to agents when working with code in this repository.
This file provides foundational guidance for AI agents working within this repository. Adhere to these patterns to maintain system integrity.
## Project Stack
- **API**: FastAPI + Redis + RQ job queue
- **Worker**: Python 3.11, rasterio, scikit-learn, XGBoost, LightGBM, CatBoost
- **Storage**: MinIO (S3-compatible) with signed URLs
- **K8s**: Namespace `geocrop`, ingress class `nginx`, ClusterIssuer `letsencrypt-prod`
## 🛠️ Project Stack
- **Frontend**: React 19 + TypeScript + Vite + OpenLayers (Leaflet fallback).
- **API**: FastAPI + Redis + RQ Job Queue.
- **Worker**: Python 3.11, rasterio, scikit-learn, XGBoost, LightGBM, CatBoost.
- **GitOps**: Gitea (Source) + Gitea Actions (CI) + ArgoCD (CD).
- **Storage**: MinIO (S3-compatible) + PostGIS (Metadata).
- **Observability**: MLflow (Experiments) + JupyterLab (Research).
## Build Commands
## 🚀 Build & Dev Commands
### Frontend
`cd apps/web && npm install && npm run dev`
### API
```bash
cd apps/api && pip install -r requirements.txt && uvicorn main:app --host 0.0.0.0 --port 8000
```
`cd apps/api && uvicorn main:app --host 0.0.0.0 --port 8000 --reload`
### Worker
```bash
cd apps/worker && pip install -r requirements.txt && python worker.py
```
### Training
```bash
cd training && python train.py --data /path/to/data.csv --out ./artifacts --variant Scaled
```
### Docker Build
```bash
docker build -t frankchine/geocrop-api:v1 apps/api/
docker build -t frankchine/geocrop-worker:v1 apps/worker/
```
## Critical Non-Obvious Patterns
### Season Window (Sept → May, NOT Nov-Apr)
[`apps/worker/config.py:135-141`](apps/worker/config.py:135) - Use `InferenceConfig.season_dates(year, "summer")` which returns Sept 1 to May 31 of following year.
### AOI Tuple Format (lon, lat, radius_m)
[`apps/worker/features.py:80`](apps/worker/features.py:80) - AOI is `(lon, lat, radius_m)` NOT `(lat, lon, radius)`.
### Redis Service Name
[`apps/api/main.py:18`](apps/api/main.py:18) - Use `redis.geocrop.svc.cluster.local` (Kubernetes DNS), NOT `localhost`.
### RQ Queue Name
[`apps/api/main.py:20`](apps/api/main.py:20) - Queue name is `geocrop_tasks`.
### Job Timeout
[`apps/api/main.py:96`](apps/api/main.py:96) - Job timeout is 25 minutes (`job_timeout='25m'`).
### Max Radius
[`apps/api/main.py:90`](apps/api/main.py:90) - Radius cannot exceed 5.0 km.
### Zimbabwe Bounds (rough bbox)
[`apps/worker/features.py:97-98`](apps/worker/features.py:97) - Lon: 25.2 to 33.1, Lat: -22.5 to -15.6.
### Model Artifacts Expected
[`apps/worker/inference.py:66-70`](apps/worker/inference.py:66) - `model.joblib`, `label_encoder.joblib`, `scaler.joblib` (optional), `selected_features.json`.
### DEA STAC Endpoint
[`apps/worker/config.py:147-148`](apps/worker/config.py:147) - Use `https://explorer.digitalearth.africa/stac/search`.
### Feature Names
[`apps/worker/features.py:221`](apps/worker/features.py:221) - Currently: `["ndvi_peak", "evi_peak", "savi_peak"]`.
### Majority Filter Kernel
[`apps/worker/features.py:254`](apps/worker/features.py:254) - Must be odd (3, 5, 7).
### DW Baseline Filename Format
[`Plan/srs.md:173`](Plan/srs.md:173) - `DW_Zim_HighestConf_YYYY_YYYY.tif`
### MinIO Buckets
- `geocrop-models` - trained ML models
- `geocrop-results` - output COGs
- `geocrop-baselines` - DW baseline COGs
- `geocrop-datasets` - training datasets
## Current Kubernetes Cluster State (as of 2026-02-27)
### Namespaces
- `geocrop` - Main application namespace
- `cert-manager` - Certificate management
- `ingress-nginx` - Ingress controller
- `kubernetes-dashboard` - Dashboard
### Deployments (geocrop namespace)
| Deployment | Image | Status | Age |
|------------|-------|--------|-----|
| geocrop-api | frankchine/geocrop-api:v3 | Running (1/1) | 159m |
| geocrop-worker | frankchine/geocrop-worker:v2 | Running (1/1) | 86m |
| redis | redis:alpine | Running (1/1) | 25h |
| minio | minio/minio | Running (1/1) | 25h |
| hello-web | nginx | Running (1/1) | 25h |
### Services (geocrop namespace)
| Service | Type | Cluster IP | Ports |
|---------|------|------------|-------|
| geocrop-api | ClusterIP | 10.43.7.69 | 8000/TCP |
| geocrop-web | ClusterIP | 10.43.101.43 | 80/TCP |
| redis | ClusterIP | 10.43.15.14 | 6379/TCP |
| minio | ClusterIP | 10.43.71.8 | 9000/TCP, 9001/TCP |
### Ingress (geocrop namespace)
| Ingress | Hosts | TLS | Backend |
|---------|-------|-----|---------|
| geocrop-web-api | portfolio.techarvest.co.zw, api.portfolio.techarvest.co.zw | geocrop-web-api-tls | geocrop-web:80, geocrop-api:8000 |
| geocrop-minio | minio.portfolio.techarvest.co.zw, console.minio.portfolio.techarvest.co.zw | minio-api-tls, minio-console-tls | minio:9000, minio:9001 |
### Storage
- MinIO PVC: 30Gi (local-path storage class), bound to pvc-44bf8a0f-cbc9-4336-aa54-edf1c4d0be86
### TLS Certificates
- ClusterIssuer: letsencrypt-prod (cert-manager)
- All TLS certificates are managed by cert-manager with automatic renewal
---
## STEP 0: Alignment Notes (Worker Implementation)
### Current Mock Behavior (apps/worker/*)
| File | Current State | Gap |
|------|--------------|-----|
| `features.py` | [`build_feature_stack_from_dea()`](apps/worker/features.py:193) returns placeholder zeros | **CRITICAL** - Need full DEA STAC loading + feature engineering |
| `inference.py` | Model loading with expected bundle format | Need to adapt to ROOT bucket format |
| `config.py` | [`MinIOStorage`](apps/worker/config.py:130) class exists | May need refinement for ROOT bucket access |
| `worker.py` | Mock handler returning fake results | Need full staged pipeline |
### Training Pipeline Expectations (plan/original_training.py)
#### Feature Engineering (must match exactly):
1. **Smoothing**: [`apply_smoothing()`](plan/original_training.py:69) - Savitzky-Golay (window=5, polyorder=2) + linear interpolation of zeros
2. **Phenology**: [`extract_phenology()`](plan/original_training.py:101) - max, min, mean, std, amplitude, auc, peak_timestep, max_slope_up, max_slope_down
3. **Harmonics**: [`add_harmonics()`](plan/original_training.py:141) - harmonic1_sin/cos, harmonic2_sin/cos
4. **Windows**: [`add_interactions_and_windows()`](plan/original_training.py:177) - early/peak/late windows, interactions
#### Indices Computed:
- ndvi, ndre, evi, savi, ci_re, ndwi
#### Junk Columns Dropped:
```python
['.geo', 'system:index', 'latitude', 'longitude', 'lat', 'lon', 'ID', 'parent_id', 'batch_id', 'is_syn']
```
### Model Storage Convention (FINAL)
**Location**: ROOT of `geocrop-models` bucket (no subfolders)
**Exact Object Names**:
```
geocrop-models/
├── Zimbabwe_XGBoost_Raw_Model.pkl
├── Zimbabwe_XGBoost_Model.pkl
├── Zimbabwe_RandomForest_Raw_Model.pkl
├── Zimbabwe_RandomForest_Model.pkl
├── Zimbabwe_LightGBM_Raw_Model.pkl
├── Zimbabwe_LightGBM_Model.pkl
├── Zimbabwe_Ensemble_Raw_Model.pkl
└── Zimbabwe_CatBoost_Raw_Model.pkl
```
**Model Selection Logic**:
| Job "model" value | MinIO filename | Scaler needed? |
|-------------------|---------------|----------------|
| "Ensemble" | Zimbabwe_Ensemble_Raw_Model.pkl | No |
| "Ensemble_Raw" | Zimbabwe_Ensemble_Raw_Model.pkl | No |
| "Ensemble_Scaled" | Zimbabwe_Ensemble_Model.pkl | Yes |
| "RandomForest" | Zimbabwe_RandomForest_Model.pkl | Yes |
| "XGBoost" | Zimbabwe_XGBoost_Model.pkl | Yes |
| "LightGBM" | Zimbabwe_LightGBM_Model.pkl | Yes |
| "CatBoost" | Zimbabwe_CatBoost_Raw_Model.pkl | No |
**Label Encoder Handling**:
- No separate `label_encoder.joblib` file exists
- Labels encoded in model via `model.classes_` attribute
- Default classes (if not available): `["cropland_rainfed", "cropland_irrigated", "tree_crop", "grassland", "shrubland", "urban", "water", "bare"]`
### DEA STAC Configuration
| Setting | Value |
|---------|-------|
| STAC Root | `https://explorer.digitalearth.africa/stac` |
| STAC Search | `https://explorer.digitalearth.africa/stac/search` |
| Primary Collection | `s2_l2a` (Sentinel-2 L2A) |
| Required Bands | red, green, blue, nir, nir08 (red-edge), swir16, swir22 |
| Cloud Filter | eo:cloud_cover < 30% |
| Season Window | Sep 1 → May 31 (year → year+1) |
### Dynamic World Baseline Layout
**Bucket**: `geocrop-baselines`
**Path Pattern**: `dw/zim/summer/<season>/<type>/DW_Zim_<Type>_<year>_<year+1>.tif`
**Tile Format**: COGs with 65536x65536 pixel tiles
- Example: `DW_Zim_HighestConf_2021_2022-0000000000-0000000000.tif`
### Results Layout
**Bucket**: `geocrop-results`
**Path Pattern**: `results/<job_id>/<filename>`
**Output Files**:
- `refined.tif` - Main classification result
- `dw_baseline.tif` - Clipped DW baseline (if requested)
- `truecolor.tif` - RGB composite (if requested)
- `ndvi_peak.tif`, `evi_peak.tif`, `savi_peak.tif` - Index peaks (if requested)
### Job Payload Schema
```json
{
"job_id": "uuid",
"user_id": "uuid",
"lat": -17.8,
"lon": 31.0,
"radius_m": 2000,
"year": 2022,
"season": "summer",
"model": "Ensemble",
"smoothing_kernel": 5,
"outputs": {
"refined": true,
"dw_baseline": false,
"true_color": false,
"indices": []
}
}
```
**Required Fields**: `job_id`, `lat`, `lon`, `radius_m`, `year`
**Defaults**:
- `season`: "summer"
- `model`: "Ensemble"
- `smoothing_kernel`: 5
- `outputs.refined`: true
### Pipeline Stages
| Stage | Description |
|-------|-------------|
| `fetch_stac` | Query DEA STAC for Sentinel-2 scenes |
| `build_features` | Load bands, compute indices, apply feature engineering |
| `load_dw` | Load and clip Dynamic World baseline |
| `infer` | Run ML model inference |
| `smooth` | Apply majority filter post-processing |
| `export_cog` | Write GeoTIFF as COG |
| `upload` | Upload to MinIO |
| `done` | Complete |
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `REDIS_HOST` | `redis.geocrop.svc.cluster.local` | Redis service |
| `MINIO_ENDPOINT` | `minio.geocrop.svc.cluster.local:9000` | MinIO service |
| `MINIO_ACCESS_KEY` | `minioadmin` | MinIO access key |
| `MINIO_SECRET_KEY` | `minioadmin` | MinIO secret key |
| `MINIO_SECURE` | `false` | Use HTTPS for MinIO |
| `GEOCROP_CACHE_DIR` | `/tmp/geocrop-cache` | Local cache directory |
### Assumptions / TODOs
1. **EPSG**: Default to UTM Zone 36S (EPSG:32736) for Zimbabwe - compute dynamically from AOI center in production
2. **Feature Names**: Training uses selected features from LightGBM importance - may vary per model
3. **Label Encoder**: No separate file - extract from model or use defaults
4. **Scaler**: Only for non-Raw models; Raw models use unscaled features
5. **DW Tiles**: Must handle 2x2 tile mosaicking for full AOI coverage
---
## Worker Contracts (STEP 1)
### Job Payload Contract
```python
# Minimal required fields:
{
"job_id": "uuid",
"lat": -17.8,
"lon": 31.0,
"radius_m": 2000, # max 5000m
"year": 2022 # 2015-current
}
# Full with all options:
{
"job_id": "uuid",
"user_id": "uuid", # optional
"lat": -17.8,
"lon": 31.0,
"radius_m": 2000,
"year": 2022,
"season": "summer", # default
"model": "Ensemble", # or RandomForest, XGBoost, LightGBM, CatBoost
"smoothing_kernel": 5, # 3, 5, or 7
"outputs": {
"refined": True,
"dw_baseline": True,
"true_color": True,
"indices": ["ndvi_peak", "evi_peak", "savi_peak"]
},
"stac": {
"cloud_cover_lt": 20,
"max_items": 60
}
}
```
### Worker Stages
```
fetch_stac → build_features → load_dw → infer → smooth → export_cog → upload → done
```
### Default Class List (TEMPORARY V1)
Until we make fully dynamic, use these classes (order matters if model doesn't provide classes):
```python
CLASSES_V1 = [
"Avocado","Banana","Bare Surface","Blueberry","Built-Up","Cabbage","Chilli","Citrus","Cotton","Cowpea",
"Finger Millet","Forest","Grassland","Groundnut","Macadamia","Maize","Pasture Legume","Pearl Millet",
"Peas","Potato","Roundnut","Sesame","Shrubland","Sorghum","Soyabean","Sugarbean","Sugarcane","Sunflower",
"Sunhem","Sweet Potato","Tea","Tobacco","Tomato","Water","Woodland"
]
```
Note: This is TEMPORARY - later we will extract class names dynamically from the trained model.
---
## STEP 2: Storage Adapter (MinIO)
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `MINIO_ENDPOINT` | `minio.geocrop.svc.cluster.local:9000` | MinIO service |
| `MINIO_ACCESS_KEY` | `minioadmin` | MinIO access key |
| `MINIO_SECRET_KEY` | `minioadmin123` | MinIO secret key |
| `MINIO_SECURE` | `false` | Use HTTPS for MinIO |
| `MINIO_REGION` | `us-east-1` | AWS region |
| `MINIO_BUCKET_MODELS` | `geocrop-models` | Models bucket |
| `MINIO_BUCKET_BASELINES` | `geocrop-baselines` | Baselines bucket |
| `MINIO_BUCKET_RESULTS` | `geocrop-results` | Results bucket |
### Bucket/Key Conventions
- **Models**: ROOT of `geocrop-models` (no subfolders)
- **DW Baselines**: `geocrop-baselines/dw/zim/summer/<season>/<type>/DW_Zim_<Type>_<year>_<year+1>.tif`
- **Results**: `geocrop-results/results/<job_id>/<filename>`
### Model Filename Mapping
| Job model value | Primary filename | Fallback |
|-----------------|-----------------|----------|
| "Ensemble" | Zimbabwe_Ensemble_Model.pkl | Zimbabwe_Ensemble_Raw_Model.pkl |
| "RandomForest" | Zimbabwe_RandomForest_Model.pkl | Zimbabwe_RandomForest_Raw_Model.pkl |
| "XGBoost" | Zimbabwe_XGBoost_Model.pkl | Zimbabwe_XGBoost_Raw_Model.pkl |
| "LightGBM" | Zimbabwe_LightGBM_Model.pkl | Zimbabwe_LightGBM_Raw_Model.pkl |
| "CatBoost" | Zimbabwe_CatBoost_Model.pkl | Zimbabwe_CatBoost_Raw_Model.pkl |
### Methods
- `ping()``(bool, str)`: Check MinIO connectivity
- `head_object(bucket, key)``dict|None`: Get object metadata
- `list_objects(bucket, prefix)``list[str]`: List object keys
- `download_file(bucket, key, dest_path)``Path`: Download file
- `download_model_file(model_name, dest_dir)``Path`: Download model with fallback
- `upload_file(bucket, key, local_path)``str`: Upload file, returns s3:// URI
- `upload_result(job_id, local_path, filename)``(s3_uri, key)`: Upload result
- `presign_get(bucket, key, expires)``str`: Generate presigned URL
---
## STEP 3: STAC Client (DEA)
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `DEA_STAC_ROOT` | `https://explorer.digitalearth.africa/stac` | STAC root URL |
| `DEA_STAC_SEARCH` | `https://explorer.digitalearth.africa/stac/search` | STAC search URL |
| `DEA_CLOUD_MAX` | `30` | Cloud cover filter (percent) |
| `DEA_TIMEOUT_S` | `30` | Request timeout (seconds) |
### Collection Resolution
Preferred Sentinel-2 collection IDs (in order):
1. `s2_l2a`
2. `s2_l2a_c1`
3. `sentinel-2-l2a`
4. `sentinel_2_l2a`
If none found, raises ValueError with available collections.
### Methods
- `list_collections()``list[str]`: List available collections
- `resolve_s2_collection()``str|None`: Resolve best S2 collection
- `search_items(bbox, start_date, end_date)``list[pystac.Item]`: Search for items
- `summarize_items(items)``dict`: Summarize search results without downloading
### summarize_items() Output Structure
```python
{
"count": int,
"collection": str,
"time_start": "ISO datetime",
"time_end": "ISO datetime",
"items": [
{
"id": str,
"datetime": "ISO datetime",
"bbox": [minx, miny, maxx, maxy],
"cloud_cover": float|None,
"assets": {
"red": {"href": str, "type": str, "roles": list},
...
}
}, ...
]
}
```
**Note**: stackstac loading is NOT implemented in this step. It will come in Step 4/5.
---
## STEP 4A: Feature Computation (Math)
### Features Produced
**Base indices (time-series):**
- ndvi, ndre, evi, savi, ci_re, ndwi
**Smoothed time-series:**
- For every index above, Savitzky-Golay smoothing (window=5, polyorder=2)
- Suffix: *_smooth
**Phenology metrics (computed across time for NDVI, NDRE, EVI):**
- _max, _min, _mean, _std, _amplitude, _auc, _peak_timestep, _max_slope_up, _max_slope_down
**Harmonic features (for NDVI only):**
- ndvi_harmonic1_sin, ndvi_harmonic1_cos, ndvi_harmonic2_sin, ndvi_harmonic2_cos
**Interaction features:**
- ndvi_ndre_peak_diff = ndvi_max - ndre_max
- canopy_density_contrast = evi_mean / (ndvi_mean + 0.001)
### Smoothing Approach
1. **fill_zeros_linear**: Treats 0 as missing, linear interpolates between non-zero neighbors
2. **savgol_smooth_1d**: Uses scipy.signal.savgol_filter if available, falls back to simple moving average
### Phenology Metrics Definitions
| Metric | Formula |
|--------|---------|
| max | np.max(y) |
| min | np.min(y) |
| mean | np.mean(y) |
| std | np.std(y) |
| amplitude | max - min |
| auc | trapezoidal integral (dx=10 days) |
| peak_timestep | argmax(y) |
| max_slope_up | max(diff(y)) |
| max_slope_down | min(diff(y)) |
### Harmonic Coefficient Definition
For normalized time t = 2*pi*k/N:
- h1_sin = mean(y * sin(t))
- h1_cos = mean(y * cos(t))
- h2_sin = mean(y * sin(2t))
- h2_cos = mean(y * cos(2t))
### Note
Step 4B will add seasonal window summaries and final feature vector ordering.
---
## STEP 4B: Window Summaries + Feature Order
### Seasonal Window Features (18 features)
Season window is OctJun, split into:
- **Early**: OctDec
- **Peak**: JanMar
- **Late**: AprJun
For each window, computed for NDVI, NDWI, NDRE:
- `<index>_<window>_mean`
- `<index>_<window>_max`
Total: 3 indices × 3 windows × 2 stats = **18 features**
### Feature Ordering (FEATURE_ORDER_V1)
51 scalar features in order:
1. **Phenology metrics** (27): ndvi, ndre, evi (each with max, min, mean, std, amplitude, auc, peak_timestep, max_slope_up, max_slope_down)
2. **Harmonics** (4): ndvi_harmonic1_sin/cos, ndvi_harmonic2_sin/cos
3. **Interactions** (2): ndvi_ndre_peak_diff, canopy_density_contrast
4. **Window summaries** (18): ndvi/ndwi/ndre × early/peak/late × mean/max
Note: Additional smoothed array features (*_smooth) are not in FEATURE_ORDER_V1 since they are arrays, not scalars.
### Window Splitting Logic
- If `dates` provided: Use month membership (10,11,12 = early; 1,2,3 = peak; 4,5,6 = late)
- Fallback: Positional split (first 9 steps = early, next 9 = peak, next 9 = late)
---
## STEP 5: DW Baseline Loading
### DW Object Layout
**Bucket**: `geocrop-baselines`
**Prefix**: `dw/zim/summer/`
**Path Pattern**: `dw/zim/summer/<season>/<type>/DW_Zim_<Type>_<year>_<year+1>.tif`
**Tile Naming**: COGs with 65536x65536 pixel tiles
- Example: `DW_Zim_HighestConf_2021_2022-0000000000-0000000000.tif`
- Format: `{Type}_{Year}_{Year+1}-{TileRow}-{TileCol}.tif`
### DW Types
- `HighestConf` - Highest confidence class
- `Agreement` - Class agreement across predictions
- `Mode` - Most common class
### Windowed Reads
The worker MUST use windowed reads to avoid downloading entire huge COG tiles:
1. **Presigned URL**: Get temporary URL via `storage.presign_get(bucket, key, expires=3600)`
2. **AOI Transform**: Convert AOI bbox from WGS84 to tile CRS using `rasterio.warp.transform_bounds`
3. **Window Creation**: Use `rasterio.windows.from_bounds` to compute window from transformed bbox
4. **Selective Read**: Call `src.read(window=window)` to read only the needed portion
5. **Mosaic**: If multiple tiles needed, read each window and mosaic into single array
### CRS Handling
- DW tiles may be in EPSG:3857 (Web Mercator) or UTM - do NOT assume
- Always transform AOI bbox to tile CRS before computing window
- Output profile uses tile's native CRS
### Error Handling
- If no matching tiles found: Raise `FileNotFoundError` with searched prefix
- If window read fails: Retry 3x with exponential backoff
- Nodata value: 0 (preserved from DW)
### Primary Function
```python
def load_dw_baseline_window(
storage,
year: int,
season: str = "summer",
aoi_bbox_wgs84: List[float], # [min_lon, min_lat, max_lon, max_lat]
dw_type: str = "HighestConf",
bucket: str = "geocrop-baselines",
max_retries: int = 3,
) -> Tuple[np.ndarray, dict]:
"""Load DW baseline clipped to AOI window from MinIO.
Returns:
dw_arr: uint8 or int16 raster clipped to AOI
profile: rasterio profile for writing outputs aligned to this window
"""
```
---
## Plan 02 - Step 1: TiTiler Deployment+Service
### Files Changed
- Created: [`k8s/25-tiler.yaml`](k8s/25-tiler.yaml)
- Created: Kubernetes Secret `geocrop-secrets` with MinIO credentials
### Commands Run
```bash
kubectl create secret generic geocrop-secrets -n geocrop --from-literal=minio-access-key=minioadmin --from-literal=minio-secret-key=minioadmin123
kubectl -n geocrop apply -f k8s/25-tiler.yaml
kubectl -n geocrop get deploy,svc | grep geocrop-tiler
```
### Expected Output / Acceptance Criteria
- `kubectl -n geocrop apply -f k8s/25-tiler.yaml` succeeds (syntax correct)
- Creates Deployment `geocrop-tiler` with 2 replicas
- Creates Service `geocrop-tiler` (ClusterIP on port 8000 → container port 80)
- TiTiler container reads COGs from MinIO via S3
- Pods are Running and Ready (1/1)
### Actual Output
```
deployment.apps/geocrop-tiler 2/2 2 2 2m
service/geocrop-tiler ClusterIP 10.43.47.225 <none> 8000/TCP 2m
```
### TiTiler Environment Variables
| Variable | Value |
|----------|-------|
| AWS_ACCESS_KEY_ID | from secret geocrop-secrets |
| AWS_SECRET_ACCESS_KEY | from secret geocrop-secrets |
| AWS_REGION | us-east-1 |
| AWS_S3_ENDPOINT_URL | http://minio.geocrop.svc.cluster.local:9000 |
| AWS_HTTPS | NO |
| TILED_READER | cog |
### Notes
- Container listens on port 80 (not 8000) - service maps 8000 → 80
- Health probe path `/healthz` on port 80
- Secret `geocrop-secrets` created for MinIO credentials
### Next Step
- Step 2: Add Ingress for TiTiler (with TLS)
---
## Plan 02 - Step 2: TiTiler Ingress
### Files Changed
- Created: [`k8s/26-tiler-ingress.yaml`](k8s/26-tiler-ingress.yaml)
### Commands Run
```bash
kubectl -n geocrop apply -f k8s/26-tiler-ingress.yaml
kubectl -n geocrop get ingress geocrop-tiler -o wide
kubectl -n geocrop describe ingress geocrop-tiler
```
### Expected Output / Acceptance Criteria
- Ingress object created with host `tiles.portfolio.techarvest.co.zw`
- TLS certificate will be pending until DNS A record is pointed to ingress IP
### Actual Output
```
NAME CLASS HOSTS ADDRESS PORTS AGE
geocrop-tiler nginx tiles.portfolio.techarvest.co.zw 167.86.68.48 80, 443 30s
```
### Ingress Details
- Host: tiles.portfolio.techarvest.co.zw
- Backend: geocrop-tiler:8000
- TLS: geocrop-tiler-tls (cert-manager with letsencrypt-prod)
- Annotations: nginx.ingress.kubernetes.io/proxy-body-size: "50m"
### DNS Requirement
External DNS A record must point to ingress IP (167.86.68.48):
- `tiles.portfolio.techarvest.co.zw``167.86.68.48`
---
## Plan 02 - Step 3: TiTiler Smoke Test
### Commands Run
```bash
kubectl -n geocrop port-forward svc/geocrop-tiler 8000:8000 &
curl -sS http://127.0.0.1:8000/ | head
curl -sS -o /dev/null -w "%{http_code}\n" http://127.0.0.1:8000/healthz
```
### Test Results
| Endpoint | Status | Notes |
|----------|--------|-------|
| `/` | 200 | Landing page JSON returned |
| `/healthz` | 200 | Health check passes |
| `/api` | 200 | OpenAPI docs available |
### Final Probe Path
- **Confirmed**: `/healthz` on port 80 works correctly
- No manifest changes needed
---
## Plan 02 - Step 4: MinIO S3 Access Test
### Commands Run
```bash
# With correct credentials (minioadmin/minioadmin123)
curl -sS "http://127.0.0.1:8000/cog/info?url=s3://geocrop-baselines/dw/zim/summer/summer/highest/DW_Zim_HighestConf_2016_2017-0000000000-0000000000.tif"
```
### Test Results
| Test | Result | Notes |
|------|--------|-------|
| S3 Access | ❌ Failed | Error: "The AWS Access Key Id you provided does not exist in our records" |
### Issue Analysis
- MinIO credentials used: `minioadmin` / `minioadmin123`
- The root user is `minioadmin` with password `minioadmin123`
- TiTiler pods have correct env vars set (verified via `kubectl exec`)
- Issue may be: (1) bucket not created, (2) bucket path incorrect, or (3) network policy
### Environment Variables (Verified Working)
| Variable | Value |
|----------|-------|
| AWS_ACCESS_KEY_ID | minioadmin |
| AWS_SECRET_ACCESS_KEY | minioadmin123 |
| AWS_S3_ENDPOINT_URL | http://minio.geocrop.svc.cluster.local:9000 |
| AWS_HTTPS | NO |
| AWS_REGION | us-east-1 |
### Next Step
- Verify bucket exists in MinIO
- Check bucket naming convention in MinIO console
- Or upload test COG to verify S3 access
`cd apps/worker && python worker.py --worker`
### Docker (Local Build)
`docker build -t frankchine/geocrop-web:latest apps/web/`
## 🧠 Critical Patterns (Non-Obvious)
### 🚫 Scoping Mandate
- **Kubernetes Only:** Focus exclusively on resources managed by Kubernetes. **NEVER** modify host-level Nginx, CloudPanel, or system services outside the cluster.
### 🗺️ Geospatial Conventions
- **AOI Format:** Always `(lon, lat, radius_m)`. (Longitude first!).
- **Season Window:** "Summer" = Sept 1st to May 31st of following year.
- **Zimbabwe Bounds:** Lon 25.233.1, Lat -22.5 to -15.6.
- **Feature Order:** `FEATURE_ORDER_V1` (51 features) is strictly immutable.
### 🔌 Connectivity
- **Redis Host:** `redis.geocrop.svc.cluster.local` (Port 6379).
- **MinIO Host:** `minio.geocrop.svc.cluster.local` (Port 9000).
- **Queue Name:** `geocrop_tasks`.
### 📦 Storage Layout (MinIO)
- `geocrop-models/`: Serialized ML models (`.pkl`) and MLflow artifacts.
- `geocrop-baselines/`: Dynamic World COGs (`dw/zim/summer/...`).
- `geocrop-results/`: Output COGs (`results/<job_id>/...`).
- `geocrop-datasets/`: Training CSVs.
## 🚢 GitOps Workflow
- **CI**: Build and Push via `.gitea/workflows/build-push.yaml`.
- **CD**: ArgoCD tracks `k8s/base/` in the `geocrop-platform` application.
- **Secrets**: Managed via Kubernetes Secrets (e.g., `geocrop-secrets`, `geocrop-db-secret`).
## 📊 Current Kubernetes State (geocrop namespace)
| Deployment | Role | Status |
|------------|------|--------|
| `geocrop-web` | React Frontend | Running (1/1) |
| `geocrop-api` | FastAPI Backend | Running (1/1) |
| `geocrop-worker` | Inference Engine | Running (1/1) |
| `gitea` | Source Control | Running (1/1) |
| `gitea-runner` | CI Runner (Actions) | Running (1/1) |
| `mlflow` | Experiment Tracking | Running (1/1) |
| `jupyter-lab` | Data Science IDE | Running (1/1) |
| `geocrop-db` | PostGIS Database | Running (1/1) |
| `redis` | Job Broker | Running (1/1) |
| `minio` | S3 Storage | Running (1/1) |
| `geocrop-tiler` | Dynamic Tile Server | Running (2/2) |
### 🌐 Endpoints
- **Portfolio**: `portfolio.techarvest.co.zw`
- **API Docs**: `api.portfolio.techarvest.co.zw/docs`
- **Gitea**: `git.techarvest.co.zw`
- **ArgoCD**: `cd.techarvest.co.zw`
- **MLflow**: `ml.techarvest.co.zw`
- **Jupyter**: `lab.techarvest.co.zw`
- **Tiler**: `tiles.portfolio.techarvest.co.zw`

205
CLAUDE.md
View File

@ -1,176 +1,65 @@
# CLAUDE.md
# CLAUDE.md - GeoCrop Engineering Guide
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
GeoCrop is a production-grade, self-hosted ML platform for crop-type classification in Zimbabwe.
## What This Project Does
GeoCrop is a crop-type classification platform for Zimbabwe. It:
1. Accepts an AOI (lat/lon + radius) and year via REST API
2. Queues an inference job via Redis/RQ
3. Worker fetches Sentinel-2 imagery from DEA STAC, computes 51 spectral features, loads a Dynamic World baseline, runs an ML model (XGBoost/LightGBM/CatBoost/Ensemble), and uploads COG results to MinIO
4. Results are served via TiTiler (tile server reading COGs directly from MinIO over S3)
## Build & Run Commands
## 🚀 Key Commands
```bash
# API
cd apps/api && pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8000
# Frontend (React 19 + TypeScript)
cd apps/web && npm install && npm run dev
# Worker
cd apps/worker && pip install -r requirements.txt
python worker.py --worker # start RQ worker
python worker.py --test # syntax/import self-test only
# API (FastAPI)
cd apps/api && uvicorn main:app --reload
# Web frontend (React + Vite + TypeScript)
cd apps/web && npm install
npm run dev # dev server (hot reload)
npm run build # production build → dist/
npm run lint # ESLint check
npm run preview # preview production build locally
# Worker (RQ)
cd apps/worker && python worker.py --worker
# Training
cd training && python train.py --data /path/to/data.csv --out ./artifacts --variant Raw
# With MinIO upload:
MINIO_ENDPOINT=... MINIO_ACCESS_KEY=... MINIO_SECRET_KEY=... \
python train.py --data /path/to/data.csv --out ./artifacts --variant Raw --upload-minio
# Docker
docker build -t frankchine/geocrop-api:v1 apps/api/
docker build -t frankchine/geocrop-worker:v1 apps/worker/
# Infrastructure (K8s)
# Manifests are managed via ArgoCD. Pushing to 'main' triggers reconciliation.
# Root manifests in k8s/base/
```
## Kubernetes Deployment
## 🚢 CI/CD & GitOps
All k8s manifests are in `k8s/` — numbered for apply order:
- **Source Control**: Gitea (`git.techarvest.co.zw`).
- **CI**: Gitea Actions (`.gitea/workflows/build-push.yaml`) builds and pushes images to Docker Hub.
- **CD**: ArgoCD (`cd.techarvest.co.zw`) tracks `k8s/base/` and auto-syncs to the `geocrop` namespace.
- **Git Repo**: `http://gitea.geocrop.svc.cluster.local:3000/fchinembiri/geocrop-platform..git`
```bash
kubectl apply -f k8s/00-namespace.yaml
kubectl apply -f k8s/ # apply all in order
kubectl -n geocrop rollout restart deployment/geocrop-api
kubectl -n geocrop rollout restart deployment/geocrop-worker
```
## 🌐 Endpoints
Namespace: `geocrop`. Ingress class: `nginx`. ClusterIssuer: `letsencrypt-prod`.
- **Portfolio**: `portfolio.techarvest.co.zw`
- **API**: `api.portfolio.techarvest.co.zw`
- **Gitea**: `git.techarvest.co.zw`
- **ArgoCD**: `cd.techarvest.co.zw`
- **MLflow**: `ml.techarvest.co.zw`
- **Jupyter**: `lab.techarvest.co.zw`
- **Tiler**: `tiles.portfolio.techarvest.co.zw`
- **MinIO**: `minio.portfolio.techarvest.co.zw`
Exposed hosts:
- `portfolio.techarvest.co.zw` → geocrop-web (nginx static)
- `api.portfolio.techarvest.co.zw` → geocrop-api:8000
- `tiles.portfolio.techarvest.co.zw` → geocrop-tiler:8000 (TiTiler)
- `minio.portfolio.techarvest.co.zw` → MinIO API
- `console.minio.portfolio.techarvest.co.zw` → MinIO Console
## 📐 Architecture & Patterns
## Architecture
### Components
- **Web**: React 19 + OpenLayers. UI for portfolio and interactive crop mapping.
- **API**: FastAPI. Handles auth (JWT), job validation, and queueing.
- **Worker**: RQ-based Python worker. Orchestrates STAC fetch → Feature extraction → Inference → smoothing → COG export.
- **Tiler**: TiTiler. Serves tiles directly from MinIO COGs via S3 protocol.
```
Web (React/Vite/OL) → API (FastAPI) → Redis Queue (geocrop_tasks) → Worker (RQ)
DEA STAC → feature_computation.py (51 features)
MinIO → dw_baseline.py (windowed read)
MinIO → inference.py (model load + predict)
→ postprocess.py (majority filter)
→ cog.py (write COG)
→ MinIO geocrop-results/
TiTiler reads COGs from MinIO via S3 protocol
```
### Storage (MinIO)
- `geocrop-models`: ML models and MLflow artifacts.
- `geocrop-baselines`: Dynamic World COGs.
- `geocrop-results`: Inference outputs (COGs).
Job status is written to Redis at `job:{job_id}:status` with 24h expiry.
### Non-Obvious Constraints
- **Kubernetes Only**: Only modify resources managed by K8s. **Avoid** host-level configs (Nginx/CloudPanel).
- **AOI Format**: `(lon, lat, radius_m)` — Longitude first.
- **Season Window**: Sept 1st to May 31st (Zimbabwe Summer).
- **Feature Order**: `FEATURE_ORDER_V1` (51 features) is immutable.
**Web frontend** (`apps/web/`): React 19 + TypeScript + Vite. Uses OpenLayers for the map (click-to-set-coordinates). Components: `Login`, `Welcome`, `JobForm`, `StatusMonitor`, `MapComponent`, `Admin`. State is in `App.tsx`; JWT token stored in `localStorage`.
## 📂 Repository Structure
**API user store**: Users are stored in an in-memory dict (`USERS` in `apps/api/main.py`) — lost on restart. Admin panel (`/admin/users`) manages users at runtime. Any user additions must be re-done after pod restarts unless the dict is seeded in code.
## Critical Non-Obvious Patterns
**Season window**: Sept 1 → May 31 of the following year. `year=2022` → 2022-09-01 to 2023-05-31. See `InferenceConfig.season_dates()` in `apps/worker/config.py`.
**AOI format**: `(lon, lat, radius_m)` — NOT `(lat, lon)`. Longitude first everywhere in `features.py`.
**Zimbabwe bounds**: Lon 25.233.1, Lat -22.5 to -15.6 (enforced in `worker.py` validation).
**Radius limit**: Max 5000m enforced in both API (`apps/api/main.py:90`) and worker validation.
**RQ queue name**: `geocrop_tasks`. Redis service: `redis.geocrop.svc.cluster.local`.
**API vs worker function name mismatch**: `apps/api/main.py` enqueues `'worker.run_inference'` but the worker only defines `run_job`. Any new worker entry point must be named `run_inference` (or the API call must be updated) for end-to-end jobs to work.
**Smoothing kernel**: Must be odd — 3, 5, or 7 only (`postprocess.py`).
**Feature order**: `FEATURE_ORDER_V1` in `feature_computation.py` — exactly 51 scalar features. Order matters for model inference. Changing this breaks all existing models.
## MinIO Buckets & Path Conventions
| Bucket | Purpose | Path pattern |
|--------|---------|-------------|
| `geocrop-models` | ML model `.pkl` files | ROOT — no subfolders |
| `geocrop-baselines` | Dynamic World COG tiles | `dw/zim/summer/<season>/<type>/DW_Zim_<Type>_<year>_<year+1>-<row>-<col>.tif` |
| `geocrop-results` | Output COGs | `results/<job_id>/<filename>` |
| `geocrop-datasets` | Training data CSVs | — |
**Model filenames** (ROOT of `geocrop-models`):
- `Zimbabwe_Ensemble_Raw_Model.pkl` — no scaler needed
- `Zimbabwe_XGBoost_Model.pkl`, `Zimbabwe_LightGBM_Model.pkl`, `Zimbabwe_RandomForest_Model.pkl` — require scaler
- `Zimbabwe_CatBoost_Raw_Model.pkl` — no scaler
**DW baseline tiles**: COGs are 65536×65536 pixel tiles. Worker MUST use windowed reads via presigned URL — never download the full tile. Always transform AOI bbox to tile CRS before computing window.
## Environment Variables
| Variable | Default | Notes |
|----------|---------|-------|
| `REDIS_HOST` | `redis.geocrop.svc.cluster.local` | Also supports `REDIS_URL` |
| `MINIO_ENDPOINT` | `minio.geocrop.svc.cluster.local:9000` | |
| `MINIO_ACCESS_KEY` | `minioadmin` | |
| `MINIO_SECRET_KEY` | `minioadmin123` | |
| `MINIO_SECURE` | `false` | |
| `GEOCROP_CACHE_DIR` | `/tmp/geocrop-cache` | |
| `SECRET_KEY` | (change in prod) | API JWT signing |
TiTiler uses `AWS_S3_ENDPOINT_URL=http://minio.geocrop.svc.cluster.local:9000`, `AWS_HTTPS=NO`, credentials from `geocrop-secrets` k8s secret.
## Feature Engineering (must match training exactly)
Pipeline in `feature_computation.py`:
1. Compute indices: ndvi, ndre, evi, savi, ci_re, ndwi
2. Fill zeros linearly, then Savitzky-Golay smooth (window=5, polyorder=2)
3. Phenology metrics for ndvi/ndre/evi: max, min, mean, std, amplitude, auc, peak_timestep, max_slope_up, max_slope_down (27 features)
4. Harmonics for ndvi only: harmonic1_sin/cos, harmonic2_sin/cos (4 features)
5. Interactions: ndvi_ndre_peak_diff, canopy_density_contrast (2 features)
6. Window summaries (early=OctDec, peak=JanMar, late=AprJun) for ndvi/ndwi/ndre × mean/max (18 features)
**Total: 51 features** — see `FEATURE_ORDER_V1` for exact ordering.
Training junk columns dropped: `.geo`, `system:index`, `latitude`, `longitude`, `lat`, `lon`, `ID`, `parent_id`, `batch_id`, `is_syn`.
## DEA STAC
- Search endpoint: `https://explorer.digitalearth.africa/stac/search`
- Primary collection: `s2_l2a` (falls back to `s2_l2a_c1`, `sentinel-2-l2a`, `sentinel_2_l2a`)
- Required bands: red, green, blue, nir, nir08 (red-edge), swir16, swir22
- Cloud filter: `eo:cloud_cover < 30`
## Worker Pipeline Stages
`fetch_stac → build_features → load_dw → infer → smooth → export_cog → upload → done`
When real DEA STAC data is unavailable, worker falls back to synthetic features (seeded by year+coords) to allow end-to-end pipeline testing.
## Label Classes (V1 — temporary)
35 classes including Maize, Tobacco, Soyabean, etc. — defined as `CLASSES_V1` in `apps/worker/worker.py`. Extract dynamically from `model.classes_` when available; fall back to this list only if not present.
## Training Artifacts
`train.py --variant Raw` produces `artifacts/model_raw/`:
- `model.joblib` — VotingClassifier (soft) over RF + XGBoost + LightGBM + CatBoost
- `label_encoder.joblib` — sklearn LabelEncoder (maps string class → int)
- `selected_features.json` — feature subset chosen by scout RF (subset of FEATURE_ORDER_V1)
- `meta.json` — class names, n_features, config snapshot
- `metrics.json` — per-model accuracy/F1/classification report
`--variant Scaled` also emits `scaler.joblib`. Models uploaded to MinIO via `--upload-minio` go under `geocrop-models` at the ROOT (no subfolders).
## Plans & Docs
`plan/` contains detailed step-by-step implementation plans (0105) and an SRS. Read these before making significant architectural changes. `ops/` contains MinIO upload scripts and storage setup docs.
- `apps/`: Source code for web, api, and worker.
- `k8s/base/`: Kubernetes manifests (ArgoCD target).
- `training/`: Model training scripts and research.
- `plan/`: Architectural blueprints and restructuring reports.
- `ops/`: Infrastructure scripts and data migration tools.

View File

@ -1,73 +1,76 @@
# GeoCrop - Crop-Type Classification Platform
# GeoCrop - Sovereign MLOps Platform
GeoCrop is an ML-based platform designed for crop-type classification in Zimbabwe. It utilizes Sentinel-2 satellite imagery from Digital Earth Africa (DEA) STAC, computes advanced spectral and phenological features, and employs multiple ML models (XGBoost, LightGBM, CatBoost, and Soft-Voting Ensembles) to generate high-resolution classification maps.
GeoCrop is a production-grade, self-hosted ML platform designed for crop-type classification in Zimbabwe. It utilizes Sentinel-2 satellite imagery (DEA STAC), computes 51 spectral/phenological features, and employs ensemble ML models to generate high-resolution Cloud Optimized GeoTIFFs (COGs).
## 🚀 Project Overview
## 🚀 System Architecture
- **Architecture**: Distributed system with a FastAPI REST API, Redis/RQ job queue, and Python workers.
- **Data Pipeline**:
1. **DEA STAC**: Fetches Sentinel-2 L2A imagery.
2. **Feature Engineering**: Computes 51 features (NDVI, NDRE, EVI, SAVI, CI_RE, NDWI) including phenology, harmonics, and seasonal window summaries.
3. **Inference**: Loads models from MinIO, runs windowed predictions, and applies a majority filter.
4. **Output**: Generates Cloud Optimized GeoTIFFs (COGs) stored in MinIO and served via TiTiler.
- **Deployment**: Kubernetes (K3s) with automated SSL (cert-manager) and NGINX Ingress.
The platform follows a **Sovereign MLOps** philosophy, hosting the entire lifecycle—from source control and experiment tracking to inference and GitOps—on a private K3s cluster.
- **Frontend**: React 19 + OpenLayers/Leaflet (Portfolio & App).
- **Backend**: FastAPI REST API + Redis/RQ Job Queue.
- **ML Engine**: Python Inference Workers + XGBoost/CatBoost/LightGBM Ensembles.
- **Infrastructure**:
- **GitOps**: ArgoCD (CD) + Gitea (Source Control & CI).
- **Experiment Tracking**: MLflow (Postgres/MinIO backend).
- **Development**: JupyterLab (integrated with MinIO).
- **Storage**: MinIO (S3-compatible) for datasets, models, and results.
- **Database**: Postgres + PostGIS for spatial metadata and app state.
## 🛠️ Building and Running
### Development
```bash
# Frontend Development
cd apps/web && npm install && npm run dev
# API Development
cd apps/api && pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8000
uvicorn main:app --reload
# Worker Development
cd apps/worker && pip install -r requirements.txt
python worker.py --worker
# Training Models
cd training && pip install -r requirements.txt
python train.py --data /path/to/data.csv --out ./artifacts --variant Raw
```
### Docker
```bash
docker build -t frankchine/geocrop-api:v1 apps/api/
docker build -t frankchine/geocrop-worker:v1 apps/worker/
```
### GitOps Workflow (CI/CD)
1. **Push** code to Gitea (`git.techarvest.co.zw`).
2. **CI**: Gitea Actions build and push Docker images to Docker Hub.
3. **CD**: ArgoCD detects manifest changes or image updates and reconciles the cluster state.
### Kubernetes
### Kubernetes Deployment
```bash
# Apply manifests in order
kubectl apply -f k8s/00-namespace.yaml
kubectl apply -f k8s/
# Manual apply (if not using ArgoCD auto-sync)
kubectl apply -k k8s/base/
```
## 📐 Development Conventions
### Critical Patterns (Non-Obvious)
- **AOI Format**: Always use `(lon, lat, radius_m)` tuple. Longitude comes first.
- **Season Window**: Sept 1st to May 31st (Zimbabwe Summer Season). `year=2022` implies 2022-09-01 to 2023-05-31.
- **Zimbabwe Bounds**: Lon 25.233.1, Lat -22.5 to -15.6.
- **Feature Order**: `FEATURE_ORDER_V1` (51 features) is immutable; changing it breaks existing model compatibility.
- **Redis Connection**: Use `redis.geocrop.svc.cluster.local` within the cluster.
- **Queue**: Always use the `geocrop_tasks` queue.
- **Kubernetes Only:** Focus exclusively on resources managed by Kubernetes (pods, services, ingresses, etc.). **NEVER** modify host-level Nginx configurations (`/etc/nginx/`), CloudPanel settings, or system services outside the cluster.
- **AOI Format:** Always use `(lon, lat, radius_m)` tuple. Longitude comes first.
- **Season Window:** Sept 1st to May 31st (Zimbabwe Summer Season). `year=2022` implies 2022-09-01 to 2023-05-31.
- **Feature Order:** `FEATURE_ORDER_V1` (51 features) is immutable; changing it breaks model compatibility.
- **Storage Contract:** Use `geocrop-results` for outputs and `geocrop-models` for serialized artifacts.
### Storage Layout (MinIO)
- `geocrop-models`: ML model `.pkl` files in the root directory.
- `geocrop-baselines`: Dynamic World COGs (`dw/zim/summer/...`).
- `geocrop-results`: Output COGs (`results/<job_id>/...`).
- `geocrop-datasets`: Training CSV files.
- `geocrop-models/`: ML model `.pkl` files and MLflow artifacts.
- `geocrop-baselines/`: Dynamic World COGs (`dw/zim/summer/...`).
- `geocrop-results/`: Output COGs (`results/<job_id>/...`).
- `geocrop-datasets/`: Training CSVs and ground-truth labels.
## 📂 Key Files
- `apps/api/main.py`: REST API entry point and job dispatcher.
- `apps/worker/worker.py`: Core orchestration logic for the inference pipeline.
- `apps/worker/feature_computation.py`: Implementation of the 51 spectral features.
- `training/train.py`: Script for training and exporting ML models to MinIO.
- `CLAUDE.md`: Primary guide for Claude Code development patterns.
- `AGENTS.md`: Technical stack details and current cluster state.
- `apps/web/src/App.tsx`: Main React entry point with Portfolio/App view logic.
- `apps/worker/worker.py`: Core orchestration of the inference pipeline.
- `k8s/base/`: GitOps manifests for all services (ArgoCD tracking root).
- `k8s/argocd-app.yaml`: ArgoCD Application definition for GeoCrop.
- `.gitea/workflows/build-push.yaml`: CI pipeline for Docker builds.
## 🌐 Infrastructure
## 🌐 Infrastructure (Endpoints)
- **Frontend**: `portfolio.techarvest.co.zw`
- **API**: `api.portfolio.techarvest.co.zw`
- **Gitea**: `git.techarvest.co.zw`
- **ArgoCD**: `cd.techarvest.co.zw`
- **MLflow**: `ml.techarvest.co.zw`
- **Jupyter**: `lab.techarvest.co.zw`
- **Tiler**: `tiles.portfolio.techarvest.co.zw`
- **MinIO**: `minio.portfolio.techarvest.co.zw`
- **Frontend**: `portfolio.techarvest.co.zw`