Brocade's big, fat datacenter fabric
- 10 March, 2008 08:14
At 230 pounds, the Brocade DCX Backbone would be on the lighter end of middle linebackers in the NFL, but it's well-built to fill the middle of a storage network. Unveiled in late January, the DCX represents the first deliverable of Brocade's DCF (Data Center Fabric), the company's newly designed architecture that promises a more flexible, easier-to-manage, policy-driven network, one that embraces multiple connectivity protocols and is better able to respond to applications' demands and to support new technologies such as server virtualization.
In Brocade's vision, the DCX Backbone is the cornerstone of that architecture, with specs that suggest a level of performance never attained before. In fact, Brocade assures me that the DCX is capable of sustaining full transfer rates of 8Gbps on the 896 Fibre Channel ports supported in its largest, dual-chassis configuration.
In addition to FC, the DCX supports just about any other connectivity protocol, including FICON (Fiber Connectivity), FCoIP (FC over IP), Gigabit Ethernet, and iSCSI. That versatility brings to mind the Silkworm Multiprotocol Router, which was the first product from Brocade aimed at consolidating multiple SANs (see my review, "Building the uber-SAN").
In the belly of the beast
I recently had the chance to visit Brocade's labs to see what the DCX can do. Though my test configuration provided plenty of ports to spare, it's interesting to note that the DCX has dedicated ISL (Inter-Switch Link) ports that don't take away from the number of ports available for, say, storage arrays or application servers.
As impressive as the raw specs of the DCX may be, the DCX's most innovative features are software functions that provide better control of bandwidth allocation, let you restrict access to specific ports according to security policies, and allow you to create independent domains to separately manage different areas of the fabric.
I started my evaluation with the bandwidth monitoring features. In a traditional fabric, each connection acts like a garden hose, a passive conduit that has no ability to regulate the flow it carries. With DCX, Brocade's Adaptive Networking option lets you limit the I/O rate on selected ports, a feature that Brocade calls Ingress Rate Limiting.
Here is how it works. In my test configuration Brocade had installed two DCX units: one linked to six HBAs on three hosts, the other linked to a storage array. To better show the traffic management capabilities of the DCX, each host HBA was assigned a dedicated LUN (logical unit number) and a dedicated storage port. The two DCX chassis were connected using two 4Gbps ISLs.
Using a simple Iometer script, I generated a significant volume of traffic on each host. To measure how that traffic spread across the fabric, I invoked Top Talkers, the performance monitoring tool. A new capability of Brocade's Fabric OS 6.0, which was running on both DCX chassis, is to define a Top Talkers profile either for specific ports or for the whole fabric.
As the name suggests, Top Talkers lists the source-destination pairs that are carrying the most traffic. It told me that I had four source-destination pairs that were exchanging more than 40MB of data per second, and a fifth that was flowing at a trickle.
The next step was to limit the traffic flowing from one of those hosts, in order to open more bandwidth to higher-priority streams. After moving to the CLI of the hosts-facing DCX, I typed portcfgqos --setratelimit 3/2 200, setting a maximum data rate of 200mbps on slot 3, port 2 of the DCX, where the HBA in question was connected.
Moving back to the storage-facing DCX, Top Talkers was showing a much reduced traffic rate on that pair (fourth in the list), making more bandwidth available to the other pairs. Now the first three pairs were flowing at 51.1MBps, 45.6MBps, and 45.5MBps respectively, while that fourth pair (previously running at 43.2MBps) dropped to 14.5MBps.
Zone flow control
The rate limit, which can be applied in 200-megabit increments, is an invaluable tool to prevent damaging data transfer bursts. A typical real-world use could be to rein in bandwidth-intensive applications such as backups. Rate limits can easily be flipped on when needed, and then easily reset with a similar command to bring those ports back to the previous, unrestricted flow.
To prepare for the next test, I needed to reduce the bandwidth between the two DCX chassis to make it easier to exceed its data rate. Therefore, I disabled one of the ISL ports and set the other one to 1Gbps.
Almost immediately, the Brocade Enterprise Fabric Connectivity Monitor displayed the link between the two DCX in bright red, indicating traffic congestion.
Sure enough, Top Talkers showed that the transfer rate had plunged to about 22MBps on each pair. Of course, no one in their right mind would choke an ISL like this in real life. But it does help show how you can use the DCX to assign a specific service level to each zone in the fabric.
Strangely enough, Brocade has devised a zone naming convention to assign those QoS levels: A zone named with the QOSH prefix will be assigned a high service level, while a zone named with the QOSL prefix will be assigned a low service level. Of course the initials QOSM identify a zone with medium service level, which is also the default for zones not following the name coding. High, medium, and low reserve 60, 30, and 10 per cent of available bandwidth, respectively, for their zones.
If you think this is an odd way of assigning a QoS level, you are not alone. I would have preferred setting the QoS as an attribute, in order not to require changing the zone names. However, Brocade maintains that the zone name approach will better meet customers' expectations because it's simple to understand and monitor. In fact, simple it is.
To see the effect of different QoS levels on my bandwidth-constrained fabric, I created new zones following the proper name coding and assigned hosts and storage devices to each zone.
Back to the DCX, where Top Talkers was already active, I saw the transfer rate of the two pairs with high QoS jump well above the others, while the pair in the medium range settled around 20MBps. The transfer rate of the third pair, in the low QoS zone, fell to 17MBps.
Whatever you think of the naming convention Brocade follows, its QoS mechanism is a very simple and efficient way to set your applications in the proper pecking order and make the best use of the bandwidth available, however limited or abundant it may be.
Naturally, a larger SAN installation -- such as the result of consolidating multiple fabrics with DCX -- is more vulnerable than smaller environments to both trivial errors and security breaks. If you want to keep human errors to a minimum or are concerned about the possibility of someone spoofing a WWN (worldwide name) to connect a rogue device to the network, the DCX's Fabric OS offers a system of policies that can bring some additional protection.
For example, you can define policies to control the connection of storage targets, switches, and hosts, allowing access only when a device, identified by its WWN, is connected to a specific port.
This screen image shows the commands to define a DCC (Device Connection Control) policy for each of the two devices on ports 133 and 134 and to make those two policies active.
For a large installation, manually setting a policy for each port could be a long and tedious process, but for initial deployments, a similar command can automatically create a policy from an existing configuration that links each active port to the WWN of its connected device.
When a DCC policy is active, trying to connect a device with a different WWN will trigger an error message and access to the port will be denied.
The DCX security policies are not foolproof. Obviously anyone with access to an admin account with proper credentials can modify them, but the system offers an easy-to-audit log of possible violations, which can simplify monitoring and enforcement of those policies.
The Virtual Fabric feature of the Fabric OS is a perfect complement to device connection policies. Virtual Fabric is neither a new nor mandatory feature, but its implementation becomes nearly indispensable in the large, consolidated networks built on DCX.
In essence, with Virtual Fabric you can divide the physical fabric into isolated administrative domains (the analogy with Cisco's VSANs is too good to pass up), creating software borders that shelter each domain from its neighbors.
Another nice feature of Virtual Fabric is that the implementation of ADs (administrative domains) is granular and nondisruptive. For example, you can migrate all or part of an existing zone to a new AD and assign a device to multiple ADs.
In the large, consolidated environments that the DCX makes possible, the ability to assign separate administrative duties for an individual AD is also a great feature, because separation of duties facilitates independent domain updates and confines the impact of each change to that AD.
To prove that claim, I copied the same zone to two ADs, then assigned a different admin account to each. To create some traffic, I started Iometer on one host connected to the first domain. Next, to simulate a conflicting configuration change, I logged in as admin for the second domain and removed that host port from the zone.
Finally, I switched back to the host on the first domain: Iometer was still running, unaffected by the change made on the parallel AD.
Sharing devices across separate administrative domains is more likely to happen with storage targets than with host ports. However, the type of device shared by domains is irrelevant, because the principle is the same: Changes made on one administrative domain have no impact outside of its borders.
The last action item of my DCX evaluation was less spectacular, consisting simply of connecting a switch to the DCX and proving that the devices connected to that switch were readily available and usable in normal zones.
Usually connecting a switch to a fabric would be a rather boring and straightforward exercise, but the switch I had in mind was a McData 6140 working in native mode.
As McData customers know all too well, connecting McData switches in native mode to a Brocade fabric was not possible before Brocade's acquisition of McData. After seeing it work in practice, I am glad to say that it now can be done, at least for the 6140.
For details on which legacy devices are supported by the DCX, a good reference is the Brocade connectivity matrix.
Obviously there are some limitations when connecting legacy devices. For example, the old interoperability mode is now obsolete, replaced by two new modes: McData native (the mode I tested) and McData open. Further, according to Brocade those two modes work only when connecting devices running Fabric OS 6 and EOS 9.6.
Even with those limits, customers with McData-branded items should be in better shape regarding connectivity support than they were before the acquisition. However, note that features such as Top Talkers that query performance counters embedded in Brocade ASICs are not available for legacy devices.
The most difficult part of reviewing a complex product such as the DCX Backbone is deciding where to stop. In addition to being a fabric backbone, the DCX is a powerful switch. My evaluation skipped on some of the typical grinding you would challenge a switch with, in order to focus on the features that allow you to pull together a super-fabric of datacenter networks. So while the specs of the DCX are impressive both for performance and capacity, what I like most about the solution is the effort to bring more intelligence to the fabric.
As impressed as I was by my first experience with the DCX, it would be deceiving to think that future requirements would be satisfied by the cocktail of networking muscle and smart management tools that Brocade injected into this debut version. Nor is the shift in focus from storage fabric to datacenter network unique to Brocade. Cisco has been preaching the datacenter cult perhaps even longer than Brocade. Cisco's new switch targeting the datacenter, the Nexus 7000, released around the same time as the DCX, proves that Cisco is not conceding anything to Brocade.
In addition to faster connectivity -- for example, converging both native and Fibre Channel traffic on 40 Gigabit Ethernet and 100 Gigabit Ethernet via FCoE (Fibre Channel over Ethernet) -- the challenge ahead for these megahubs includes becoming good virtualization mates, offering more networking services such as encryption, and becoming easier to tame and control via policy management and ever more sophisticated rules. The rate limiting, QoS capability, and the initial, rudimentary attempt at enforcing security from the fabric are a promising beginning, but I can't help thinking that those features are -- must be -- only a first step, and that the best of the DCX intelligence is yet to come.