Since launching FlashNAS ZFS in January, there’s one question we get that sounds like an invitation to religious debate: “Isn’t your ZFS NAS clustered the same way as a cluster stack I can install on any commodity server?”
Most of you would probably expect any vendor in our position to reflexively respond, “of course not,” immediately followed by a tap-dance about open source bundles and heroic efforts to “hide complexity.” In our case you’d be right—but only about the first part.
Yes, FlashNAS ZFS is based on open-source Solaris and the ZFS file system. But to make our NAS elegantly handle controller failover, we chose a completely different path from the norm in ZFS-based storage.
Simplicity Begins With Design
There’s an old saying, usually referred to by its acronym: “Keep It Simple, Stupid (KISS)!” Applying that principle to product engineering, most software-based ZFS NAS vendors build their products on Intel servers running Linux (or OpenSolaris) and ZFS. To add High Availability, most bundle a cluster software product such as RSF-1 that’s designed to protect multiple file systems and applications in a wide variety of cluster configurations.
To flexibly meet all of those needs, such cluster software offers several options and settings. Lots of options means cluster admins have to make lots of decisions. Reliably hiding that complexity is no easy task. Especially when customers are also cabling and configuring the hardware being managed. So it’s no surprise most end up passing a lot of that complexity on to their customers.
At Winchester Systems, we took the opposite approach. Instead of bundling commodity hardware and off-the-shelf software, our engineers chose to tightly integrate a product using plug-in hardware modules, passive backplanes and purpose-built software. All of these components were designed specifically for our high-availability NAS product—and only that product.
Wait. Isn’t that overly ambitious? It sure sounds that way. But we had a very ambitious goal: to make a redundant-controller NAS product that’s as easy to install, operate and maintain as a single-controller system. In other words, we were fine making things much harder for engineers if it meant keeping them simple for customers.
Let’s compare the results.
A typical software-based ZFS cluster requires the following hardware and software:
- Two servers running an OS + ZFS file system stack
- One or more JBOD disk enclosures directly connected to each server via SAS cables
- Additional Network and/or serial “heart-beat” interconnects between the two servers
- Installed clustering software
- DNS entries for each “service name” in the cluster
- Installed cluster-aware iSCSI-target software
Then an IT staffer has to configure the cluster. Common steps include (an example is demonstrated here):
- Access the cluster-management page in the GUI (if one is provided; otherwise this gets more complicated).
- “Initialize” the cluster, setting various parameters/options “if required to meet your needs.”
- Add a volume to the ZFS cluster.
- Select separate “heartbeat drives” (at least 2 typically recommended) for the volume.
- Specify the failover “host” (i.e., storage controller) for the volume.
- Go back to step 3, and repeat for each volume in the cluster.
- Trouble-shoot any configuration mistakes made.
Contrast this with FlashNAS ZFS, which deeply integrates an entire ZFS cluster within a single 2U or 3U hardware enclosure. No external cables. No software to install. No cluster management. In fact, you can’t tell there’s a cluster running at all.
Invisible Clustering? Really?
At this point, you’re probably wondering just how invisible this really can be. Well, let’s look at the major events all clustered systems must handle:
Automatic Failover. If a FlashNAS ZFS storage controller should stop working, failover is automatic. No operator involvement is required—except for replacing the failed hardware module, of course. Most software ZFS NAS clusters handle this failover case pretty well.
Automatic Fail-Back. Restoring storage services to their original state, also known as “fail-back” in clustering terminology, is completely automatic in FlashNAS ZFS. When a replacement module is inserted and powers up, FlashNAS ZFS puts the replacement module into service completely automatically. No operator intervention is required. Many software-based ZFS cluster cannot do this at all. Instead, they require manually “failing” each individual service back to its original location.
Assured Data Integrity. Last, but not least, the ZFS file system ensures data integrity throughout failover and fail-back, checking the ZFS Intent Log (ZIL) while mounting a failed unit’s file systems and properly completing any in-flight I/O for disk writes that had been acknowledged to a client.
It took significant engineering work, but we managed to make redundant-controller ZFS completely invisible. And replacing a FlashNAS ZFS controller module can’t be any simpler: Just slide it in. The rest is automatic.
The result: a ZFS based NAS with redundant controllers that’s quite literally as easy to setup, configure and operate as easy as a single controller system. And that enables Winchester Systems to provide a low cost network-attached storage system with the kind of deeply-integrated data availability and integrity in that’s been limited to high-end storage products for far too long. You might want to give it a closer look.
Does this kind of tightly-integrated approach make sense? Let us know what you think!