101 lines
3.2 KiB
Markdown
101 lines
3.2 KiB
Markdown
# Storage Security Notes
|
|
|
|
## Overview
|
|
|
|
All MinIO buckets in the geocrop project are configured as **private** with no public access. Downloads require authenticated access through signed URLs generated by the API.
|
|
|
|
## Why MinIO Stays Private
|
|
|
|
### 1. Data Sensitivity
|
|
- **Baseline COGs**: Dynamic World data covering Zimbabwe contains land use information that should not be publicly exposed
|
|
- **Training Data**: Contains labeled geospatial data that may have privacy considerations
|
|
- **Model Artifacts**: Proprietary ML models should be protected
|
|
- **Inference Results**: User-generated outputs should only be accessible to the respective users
|
|
|
|
### 2. Security Best Practices
|
|
- **Least Privilege**: Only authenticated services and users can access storage
|
|
- **Defense in Depth**: Multiple layers of security (network policies, authentication, bucket policies)
|
|
- **Audit Trail**: All access can be logged through MinIO audit logs
|
|
|
|
## Access Model
|
|
|
|
### Internal Access (Within Kubernetes Cluster)
|
|
|
|
Services running inside the `geocrop` namespace can access MinIO using:
|
|
- **Endpoint**: `minio.geocrop.svc.cluster.local:9000`
|
|
- **Credentials**: Stored as Kubernetes secrets
|
|
- **Access**: Service account / node IAM
|
|
|
|
### External Access (Outside Kubernetes)
|
|
|
|
External clients (web frontend, API consumers) must use **signed URLs**:
|
|
|
|
```python
|
|
# Example: Generate signed URL via API
|
|
from minio import Minio
|
|
|
|
client = Minio(
|
|
"minio.geocrop.svc.cluster.local:9000",
|
|
access_key=os.getenv("MINIO_ACCESS_KEY"),
|
|
secret_key=os.getenv("MINIO_SECRET_KEY),
|
|
)
|
|
|
|
# Generate presigned URL (valid for 1 hour)
|
|
url = client.presigned_get_object(
|
|
"geocrop-results",
|
|
"jobs/job-123/result.tif",
|
|
expires=3600
|
|
)
|
|
```
|
|
|
|
## Bucket Policies Applied
|
|
|
|
All buckets have anonymous access disabled:
|
|
|
|
```bash
|
|
mc anonymous set none geocrop-minio/geocrop-baselines
|
|
mc anonymous set none geocrop-minio/geocrop-datasets
|
|
mc anonymous set none geocrop-minio/geocrop-results
|
|
mc anonymous set none geocrop-minio/geocrop-models
|
|
```
|
|
|
|
## Future: Signed URL Workflow
|
|
|
|
1. **User requests download** via API (`GET /api/v1/results/{job_id}/download`)
|
|
2. **API validates** user has permission to access the job
|
|
3. **API generates** presigned URL with short expiration (15-60 minutes)
|
|
4. **User downloads** directly from MinIO via the signed URL
|
|
5. **URL expires** after the specified time
|
|
|
|
## Network Policies
|
|
|
|
For additional security, Kubernetes NetworkPolicies should be configured to restrict which pods can communicate with MinIO. Recommended:
|
|
|
|
- Allow only `geocrop-api` and `geocrop-worker` pods to access MinIO
|
|
- Deny all other pods by default
|
|
|
|
## Verification
|
|
|
|
To verify bucket policies:
|
|
|
|
```bash
|
|
mc anonymous get geocrop-minio/geocrop-baselines
|
|
# Expected: "Policy not set" (meaning private)
|
|
|
|
mc anonymous list geocrop-minio/geocrop-baselines
|
|
# Expected: empty (no public access)
|
|
```
|
|
|
|
## Recommendations for Production
|
|
|
|
1. **Enable MinIO Audit Logs**: Track all API access for compliance
|
|
2. **Use TLS**: Ensure all MinIO communication uses TLS 1.2+
|
|
3. **Rotate Credentials**: Regularly rotate MinIO root access keys
|
|
4. **Implement Bucket Quotas**: Prevent any single bucket from consuming all storage
|
|
5. **Enable Versioning**: For critical buckets to prevent accidental deletion
|
|
|
|
---
|
|
|
|
**Date**: 2026-02-28
|
|
**Status**: ✅ Documented
|