I am researching for a build I would like to do from a lot of spare server parts I've hoarded. Basically what I would like to create is a Sandstorm server but with Sandstorm business features specifically the enterprise features which are still not released. I do not need LDAP and the other "sandstorm work" features so a basic sandstorm install will do, but I would like automatic scale out, automatic failover, and as much data redundancy and resiliency as possible.
The goal is to provide a sandstorm service that is near equivalent to the Sandstorm Oasis service but with massively more storage for myself, family, and some friends and completely replace google services with as little maintenance as possible. I require, myself alone, orders of magnitude more storage than Oasis even offers in its largest subscription and at what they would charge I might as well run my own.
I am pretty sure I understand everything I need except for the real time data replication which requires a distributed filesystem, how the distributed filesystem in userspace functions and its limitations, and how to put together all the pieces. I need to be rock solid sure before I get started due to time limitations.
- FOSS (GPL or compatible preferred)
- As low maintenance as possible
- High Availability
- Load Balancing
- Must run on Linux (Can not use BSD)
- Data Replication
- Data Redundancy
- Data Resiliancy
- Real Time synchronization
- Bit Rot detection
- Commodity Hardware
- Would be nice:
- Filesystem level encryption
- Automatic bit rot error correction
- Not having to deal with LVM, RAID, and dm-crypt separately
System Scaling Requirements:
- Scale out
- Ability to scale storage and compute arbitrarily
- Ability to scale storage and compute with non-identical hardware
High availability is all new territory for me with the exception of a bitcoin then cryptocoin mining farm I ran for a few years but that had very different requirements. I have read loads of documentation in a very short time, but from my research I have come up with the following network architecture sketch:
In order to implement it with load balancing and failover I have read that you must have real time data synchronization which can be provided by a distributed file system. I then settled on GlusterFS for this although there are others I have not researched thoroughly.
Sandstorm Cluster Choices
Sorry if they are obvious again this is mostly new to me:
- Before I get started, am I on the right track? Is this architecture going to work?
- Are there any best practices I should know about?
- Is it possible to simplify anywhere?
Distributed File System Specific
- If I mount the GlusterFS storage onto the standard Debian OS do I simply install sandstorm on that mount point? Or do I need to install on Debian and configure sandstorm to use GlusterFS for all of its storage? The object storage and file storage on a distributed filesystem, and userspace filesystem have me unsure about how this works. All examples I've found deal with websites and Sandstorm is very different with a variety of very different web apps.
- Are there (objectively not opinionated) better filesystems I could use that work within my design constraints and requirements? Specifically as in they provide all features that GlusterFS on BtrFS does and more directly relating to the requirements I specified. For example I know ZFS currently would be a better choice than BtrFS but it is not so on Linux which is required by Sandstorm and therefore is not an option.
Note: I would have provided more links but I'm new here with a reputation limit. Also, I'm not allowed to add a Sandstorm tag.