OrcaS: Open Ready-to-Use Content Addressable Storage
OrcaS (Open Ready-to-Use Content Addressable Storage) is a lightweight, high-performance object storage system built with Content Addressable Storage (CAS) at its core. It provides enterprise-grade features like instant deduplication, multi-versioning, zero-knowledge encryption, and smart compression - all in a single binary that's ready to deploy.
- 🌐 Open: Open source (MIT license), transparent, community-driven development
- ✅ Ready-to-Use: Content Addressable Storage ensures data integrity and automatic deduplication, production-ready out of the box
- 🎯 Content Addressable Storage: Data is stored by content hash, enabling automatic deduplication and integrity verification
- ⚡ Instant Upload (Deduplication): Upload files in seconds, not minutes - identical files are detected instantly without uploading
- 🔒 Zero-Knowledge Encryption: Your data, your keys - end-to-end encryption with industry-standard algorithms
- 📦 Production Ready: S3-compatible API, VFS mount support, and comprehensive documentation
- 🚀 High Performance: Optimized for both small and large files with intelligent packaging and chunking
What it does: Upload identical files instantly without transferring data.
How it works:
- Calculates multiple checksums (XXH3, SHA-256) for each file
- Before uploading, checks if identical content already exists
- If found, creates a reference to existing data instead of uploading
- Result: Upload time drops from minutes to milliseconds for duplicate files
Use cases:
- Backup systems (same files across multiple backups)
- Version control systems (similar files across versions)
- Multi-user environments (shared files)
- CDN edge storage (cached content)
Benefits:
- 🚀 99%+ faster uploads for duplicate files
- 💾 Massive storage savings - store 1 copy, reference it N times
- ⚡ Bandwidth savings - no redundant data transfer
- 🔍 Automatic integrity verification - content hash ensures data correctness
What it does: Efficiently stores many small files together.
How it works:
- Groups small files (< 64KB) into packages
- Reduces metadata overhead and I/O operations
- Maintains individual file access while optimizing storage
Benefits:
- 📈 10x+ performance improvement for small file operations
- 💰 Reduced storage costs - less metadata overhead
- ⚡ Faster operations - batch metadata writes
What it does: Splits large files into manageable chunks.
How it works:
- Automatically chunks files larger than configured threshold (default 10MB)
- Each chunk stored independently with its own checksum
- Enables parallel upload/download and efficient updates
Benefits:
- 🔄 Parallel processing - upload/download chunks concurrently
- 🛡️ Resumable transfers - retry failed chunks independently
- ✏️ Efficient updates - only modified chunks need re-upload
- 📊 Better resource utilization - process large files efficiently
What it does: Automatically maintains file version history.
How it works:
- Each file modification creates a new version
- Old versions preserved automatically
- Configurable retention policies
- Space-efficient through content deduplication
Benefits:
- 🔙 Point-in-time recovery - restore any previous version
- 🛡️ Data protection - accidental deletions are recoverable
- 📚 Audit trail - track all changes over time
- 💾 Space efficient - unchanged data shared across versions
What it does: End-to-end encryption where only you hold the keys.
How it works:
- AES-256 encryption (industry standard)
- Encryption keys never leave your control
- Optional per-bucket encryption keys
- Transparent encryption/decryption
Benefits:
- 🔒 Maximum security - even storage admins can't read your data
- ✅ Compliance ready - meets strict security requirements
- 🛡️ Data privacy - your data, your control
- 🌍 International standards - AES-256 encryption
What it does: Automatically compresses data to save space.
How it works:
- Configurable compression algorithms (zstd, gzip, etc.)
- Compression applied before encryption
- Automatic detection of already-compressed data
- Per-bucket compression settings
Benefits:
- 💾 Storage savings - typically 30-70% reduction
- ⚡ Bandwidth savings - less data to transfer
- 🎯 Smart defaults - works out of the box
- ⚙️ Configurable - adjust per your needs
OrcaS is built on Content Addressable Storage principles, where data is stored and retrieved by its content hash rather than location.
Key Benefits of CAS:
- Automatic Deduplication: Identical content stored once, referenced many times
- Integrity Verification: Content hash ensures data hasn't been corrupted
- Efficient Versioning: New versions only store changed content
- Simplified Backup: Same content = same hash = no re-upload needed
Storage Layout:
├── Metadata (SQLite)
│ ├── Objects (files, directories)
│ ├── DataInfo (content metadata)
│ ├── Versions (version history)
│ └── References (deduplication)
│
└── Data Blocks (File System)
└── <bucket_id>/
└── <hash_prefix>/
└── <hash>/
└── <dataID>_<chunk_number>
- Instant Upload: 99%+ faster for duplicate files (milliseconds vs minutes)
- Small Files: 10x+ performance improvement with packaging
- Large Files: Parallel chunk processing for optimal throughput
- Storage Efficiency: 30-70% space savings with compression + deduplication
- Concurrent Operations: Optimized for high concurrency
Performance Test Reports:
OrcaS supports flexible path management, allowing you to use different storage paths within the same process. This is useful for multi-tenant scenarios or when managing multiple storage locations.
NewLocalHandler requires both basePath and dataPath parameters:
import (
"github.com/orcastor/orcas/core"
)
// Create handler with custom paths
handler := core.NewLocalHandler("/custom/base/path", "/custom/data/path")
defer handler.Close()
// basePath: path for main database and bucket databases
// dataPath: path for data file storageNewNoAuthHandler only requires dataPath parameter. The basePath is automatically set to empty string (no main database):
// Create NoAuthHandler (bypasses authentication)
handler := core.NewNoAuthHandler("/custom/data/path")
defer handler.Close()
// Only dataPath is needed, basePath is always empty for NoAuth modeNewLocalAdmin requires both basePath and dataPath parameters:
// Create admin with custom paths
admin := core.NewLocalAdmin("/custom/base/path", "/custom/data/path")
// basePath: path for main database and bucket databases
// dataPath: path for data file storageNewNoAuthAdmin only requires dataPath parameter. The basePath is automatically set to empty string (no main database):
// Create NoAuthAdmin (bypasses authentication and permission checks)
admin := core.NewNoAuthAdmin("/custom/data/path")
// Only dataPath is needed, basePath is always empty for NoAuth mode// Example: Using current directory for both paths
handler := core.NewLocalHandler(".", ".")
admin := core.NewLocalAdmin(".", ".")
// Example: Separate paths for base and data
handler := core.NewLocalHandler("/var/orcas/base", "/var/orcas/data")
admin := core.NewLocalAdmin("/var/orcas/base", "/var/orcas/data")
// Example: NoAuth mode (no main database, only data path)
handler := core.NewNoAuthHandler("/var/orcas/data")
admin := core.NewNoAuthAdmin("/var/orcas/data")- 🔄 Multi-tenant Support: Different contexts can use different storage paths
- 🎯 Flexible Configuration: Specify paths directly when creating handlers/admins
- ⚙️ NoAuth Mode: Simplified path management for NoAuth handlers/admins (only dataPath needed)
- 🚀 Process Isolation: Multiple storage locations in the same process
- Full Documentation
- VFS Mount Guide - Complete guide for VFS filesystem mounting
- S3 API Documentation
- No Main Database Mode Guide - Run without main database (no user management)
Contributions are welcome! Please feel free to submit a Pull Request.
MIT License - see LICENSE file for details.
- 🎯 Production Ready: Battle-tested, actively maintained
- 🚀 High Performance: Optimized for real-world workloads
- 🔒 Security First: Zero-knowledge encryption built-in
- 💾 Storage Efficient: Automatic deduplication saves space and costs
- 🛠️ Easy to Use: S3-compatible API, VFS mount, comprehensive docs
- 🌟 Innovative: Content Addressable Storage with instant deduplication
- 📈 Actively Developed: Regular updates and improvements
- 🤝 Open Source: MIT licensed, community-driven
Star us if you find this project useful! ⭐



