geocrop-platform./plan/00D_storage_security_notes.md

101 lines
3.2 KiB
Markdown

# Storage Security Notes
## Overview
All MinIO buckets in the geocrop project are configured as **private** with no public access. Downloads require authenticated access through signed URLs generated by the API.
## Why MinIO Stays Private
### 1. Data Sensitivity
- **Baseline COGs**: Dynamic World data covering Zimbabwe contains land use information that should not be publicly exposed
- **Training Data**: Contains labeled geospatial data that may have privacy considerations
- **Model Artifacts**: Proprietary ML models should be protected
- **Inference Results**: User-generated outputs should only be accessible to the respective users
### 2. Security Best Practices
- **Least Privilege**: Only authenticated services and users can access storage
- **Defense in Depth**: Multiple layers of security (network policies, authentication, bucket policies)
- **Audit Trail**: All access can be logged through MinIO audit logs
## Access Model
### Internal Access (Within Kubernetes Cluster)
Services running inside the `geocrop` namespace can access MinIO using:
- **Endpoint**: `minio.geocrop.svc.cluster.local:9000`
- **Credentials**: Stored as Kubernetes secrets
- **Access**: Service account / node IAM
### External Access (Outside Kubernetes)
External clients (web frontend, API consumers) must use **signed URLs**:
```python
# Example: Generate signed URL via API
from minio import Minio
client = Minio(
"minio.geocrop.svc.cluster.local:9000",
access_key=os.getenv("MINIO_ACCESS_KEY"),
secret_key=os.getenv("MINIO_SECRET_KEY),
)
# Generate presigned URL (valid for 1 hour)
url = client.presigned_get_object(
"geocrop-results",
"jobs/job-123/result.tif",
expires=3600
)
```
## Bucket Policies Applied
All buckets have anonymous access disabled:
```bash
mc anonymous set none geocrop-minio/geocrop-baselines
mc anonymous set none geocrop-minio/geocrop-datasets
mc anonymous set none geocrop-minio/geocrop-results
mc anonymous set none geocrop-minio/geocrop-models
```
## Future: Signed URL Workflow
1. **User requests download** via API (`GET /api/v1/results/{job_id}/download`)
2. **API validates** user has permission to access the job
3. **API generates** presigned URL with short expiration (15-60 minutes)
4. **User downloads** directly from MinIO via the signed URL
5. **URL expires** after the specified time
## Network Policies
For additional security, Kubernetes NetworkPolicies should be configured to restrict which pods can communicate with MinIO. Recommended:
- Allow only `geocrop-api` and `geocrop-worker` pods to access MinIO
- Deny all other pods by default
## Verification
To verify bucket policies:
```bash
mc anonymous get geocrop-minio/geocrop-baselines
# Expected: "Policy not set" (meaning private)
mc anonymous list geocrop-minio/geocrop-baselines
# Expected: empty (no public access)
```
## Recommendations for Production
1. **Enable MinIO Audit Logs**: Track all API access for compliance
2. **Use TLS**: Ensure all MinIO communication uses TLS 1.2+
3. **Rotate Credentials**: Regularly rotate MinIO root access keys
4. **Implement Bucket Quotas**: Prevent any single bucket from consuming all storage
5. **Enable Versioning**: For critical buckets to prevent accidental deletion
---
**Date**: 2026-02-28
**Status**: ✅ Documented