5.9 KiB
5.9 KiB
Storage Contract
Overview
This document defines the storage layout, naming conventions, and metadata requirements for the GeoCrop project MinIO buckets.
Bucket Structure
| Bucket | Purpose | Example Path |
|---|---|---|
geocrop-baselines |
Dynamic World baseline COGs | dw/zim/summer/YYYY_YYYY/ |
geocrop-datasets |
Training datasets | datasets/{name}/{version}/ |
geocrop-models |
Trained ML models | models/{name}/{version}/ |
geocrop-results |
Inference output COGs | jobs/{job_id}/ |
1. geocrop-baselines
Path Structure
geocrop-baselines/
└── dw/
└── zim/
└── summer/
├── {season}/
│ ├── agreement/
│ │ └── DW_Zim_Agreement_{season}-{tileX}-{tileY}.tif
│ ├── highest_conf/
│ │ └── DW_Zim_HighestConf_{season}-{tileX}-{tileY}.tif
│ └── mode/
│ └── DW_Zim_Mode_{season}-{tileX}-{tileY}.tif
└── manifests/
└── dw_baseline_keys.txt
Naming Convention
- Season format:
YYYY_YYYY(e.g.,2015_2016,2025_2026) - Tile format:
{tileX}-{tileY}(e.g.,0000000000-0000000000) - Composite types:
Agreement,HighestConf,Mode
Example Object Keys
dw/zim/summer/2020_2021/highest_conf/DW_Zim_HighestConf_2020_2021-0000000000-0000000000.tif
dw/zim/summer/2020_2021/highest_conf/DW_Zim_HighestConf_2020_2021-0000000000-0000065536.tif
dw/zim/summer/2020_2021/highest_conf/DW_Zim_HighestConf_2020_2021-0000065536-0000000000.tif
dw/zim/summer/2020_2021/highest_conf/DW_Zim_HighestConf_2020_2021-0000065536-0000065536.tif
2. geocrop-datasets
Path Structure
geocrop-datasets/
└── datasets/
└── {dataset_name}/
└── {version}/
├── data/
│ └── *.csv
└── metadata.json
Naming Convention
- Dataset name: Lowercase, alphanumeric with hyphens (e.g.,
zimbabwe-full,augmented-v2) - Version: Semantic versioning (e.g.,
v1,v2.0,v2.1.0)
Required Metadata File (metadata.json)
{
"version": "v1",
"created": "2026-02-27",
"description": "Augmented training dataset for GeoCrop crop classification",
"source": "Manual labeling from high-resolution imagery + augmentation",
"classes": ["cropland", "grass", "shrubland", "forest", "water", "builtup", "bare"],
"features": ["ndvi_peak", "evi_peak", "savi_peak"],
"total_samples": 25000,
"spatial_extent": "Zimbabwe",
"batches": 23
}
3. geocrop-models
Path Structure
geocrop-models/
└── models/
└── {model_name}/
└── {version}/
├── model.joblib
├── label_encoder.joblib
├── scaler.joblib (optional)
├── selected_features.json
└── metadata.json
Naming Convention
- Model name: Lowercase, alphanumeric with hyphens (e.g.,
xgboost-crop,ensemble-v1) - Version: Semantic versioning
Required Metadata File
{
"name": "xgboost-crop",
"version": "v1",
"created": "2026-02-27",
"model_type": "XGBoost",
"features": ["ndvi_peak", "evi_peak", "savi_peak"],
"classes": ["cropland", "grass", "shrubland", "forest", "water", "builtup", "bare"],
"training_samples": 20000,
"accuracy": 0.92,
"scaler": "StandardScaler"
}
4. geocrop-results
Path Structure
geocrop-results/
└── jobs/
└── {job_id}/
├── output.tif
├── metadata.json
└── thumbnail.png (optional)
Naming Convention
- Job ID: UUID format (e.g.,
a1b2c3d4-e5f6-7890-abcd-ef1234567890)
Required Metadata File
{
"job_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"created": "2026-02-27T10:30:00Z",
"status": "completed",
"aoi": {
"lon": 29.0,
"lat": -19.0,
"radius_m": 5000
},
"season": "2024_2025",
"model": {
"name": "xgboost-crop",
"version": "v1"
},
"output": {
"format": "COG",
"bounds": [25.0, -22.0, 33.0, -15.0],
"resolution": 10,
"classes": ["cropland", "grass", "shrubland", "forest", "water", "builtup", "bare"]
}
}
Metadata Requirements Summary
| Resource | Required Metadata Files |
|---|---|
| Baselines | manifests/dw_baseline_keys.txt (optional) |
| Datasets | metadata.json |
| Models | metadata.json + model files |
| Results | metadata.json |
Access Patterns
Worker Access (Internal)
- Read from:
geocrop-baselines/ - Read from:
geocrop-models/ - Write to:
geocrop-results/
API Access
- Read from:
geocrop-results/ - Generate signed URLs for downloads
Frontend Access
- Request signed URLs from API for downloads
- Never access MinIO directly
Date: 2026-02-28 Status: ✅ Structure Implemented
Implementation Status (2026-02-28)
✅ geocrop-baselines
- Structure:
dw/zim/summer/{season}/directories created for seasons 2015_2016 through 2025_2026 - Status: Partial - Agreement files exist but need reorganization to
{season}/agreement/subdirectory - Files: 12 Agreement TIF files in
dw/zim/summer/ - Needs: Reorganization script at
ops/reorganize_storage.sh
✅ geocrop-datasets
- Structure:
datasets/zimbabwe-full/v1/data/+metadata.json - Status: Partial - CSV files exist at root level
- Files: 30 CSV batch files in root
- Metadata: ✅ metadata.json uploaded
✅ geocrop-models
- Structure:
models/xgboost-crop/v1/with metadata - Status: Partial - .pkl files exist at root level
- Files: 9 model files in root
- Metadata: ✅ metadata.json + selected_features.json uploaded
✅ geocrop-results
- Structure:
jobs/directory created - Status: Empty (ready for inference outputs)