One of the first issues I have to contend with when talking about Coraid storage and its use of the ATA-over-Ethernet (AoE) protocol to transfer data, is the response “Ethernet? Oh, so it’s iSCSI then?”.
No it isn’t….
AoE was built from the ground up as an open source data transfer protocol, specifically concerned with finding the most efficient way to transmit raw disk I/O commands over raw Ethernet, and keeping the overhead as low as possible to maximize the throughput.
In many ways AoE is more akin to Fibre Channel (FC) than it is to iSCSI in that it is a non routable protocol designed for locally based storage rather than sending data over the Internet. Like FC, AoE can be made to route over the Internet when it needs to, such as in site-to-site DR applications, but the non routable nature of the protocol makes accidental exposure of data to non authorized networks that much harder.
So in order to help differentiate the data transfer protocols upon which all your networked storage systems are based, this blog entry is here to help dispel some of the myths about AoE.
The only real comparison of AoE and iSCSI is that they both use Ethernet as the transport medium. iSCSI uses TCP/IP at Layer 4 and AoE Layer 2, but after that things get very different.
Data delivery the iSCSI way
The diagram below shows how data is sent from a client to a disk device using the iSCSI protocol.
iSCSI is a connection based topology, as is FC, and therefore requires sequenced serial delivery of the data packets over the network. Each 64K I/O transfer is wrapped in a iSCSI header with a CRC appended and broken up into segments for transmission. These segments themselves are inserted in a TCP/IP wrapper with another CRC, and the resultant Ethernet frames are then sent in a sequential manner down a single connection.
It must be stressed here that no matter how many iSCSI ports you have on your servers and storage, only 1 of these is used at any one time to transmit data due to the connection requirement. The other ports can be utilized in a round robin fashion to increase throughput, but this basic single channel transfer method is key to how iSCSI works – iSCSI cannot distribute I/O between multiple ports.
This can be seen clearly in a VMware environment, when looking at the paths to the storage in the hypervisor client shows 1 active path and all others in standby. On failure of the iSCSI active path (and a look in the VMKernel log shows you do get quite a few!), one of the standby paths is then made active.
So iSCSI does a lot of work to get data from initiator to target, and once at the target everything needs to be reassembled in the correct order and checked for integrity. After this the data is still streamed to disk via 512 byte disk sectors (most likely for a 64k I/O). Disk I/O is held off until entire I/O is reassembled, and TCP imposes significant latency if data is dropped in the network.
FC faces the same connectivity problems and has a difficult time using multiple network paths, although this is made up for by the transmission medium and very expensive switches and controllers. AoE just requires commodity layer 2 switches with the ability to provide jumbo frame support and a decent throughput capability.
Data delivery the AoE way
The diagram below (click to enlarge) shows how data is sent from a client to a disk device using the AoE protocol.
As you can see, this is significantly different from the iSCSI diagram above. Here the 64K I/O is simply split into 8K blocks, and each one of these blocks is placed into an Ethernet frame with AoE header and a CRC appended. This is why it is important for jumbo frames to be enabled for AoE networks, as it allows each 8K block to occupy a single Ethernet frame for maximum efficiency.
AoE then sends these disk I/O datagrams in parallel over the network, utilizing all available ports automatically. AoE does not require sessions or sequence numbers, and each AoE Frame is an idempotent 8k disk I/O. Frames can arrive out of order, or not at all – if a Frame (or returning Ack) is lost, the initiator will resend within microseconds (vs. standard 200mS TCP timeout).
This makes AoE much more efficient than iSCSI (parallel vs. serial I/O), and that doesn’t include TCP overhead. In practice throughput for AoE is 2x to 4x more than with equivalent iSCSI (4 initiator iSCSI connections compared to 4 AoE ports) – allowing peak throughput approaching 2 Gbyte/s for a single Coraid shelf of 15K disks with 2x 10Gbit AoE CX4 ports. This efficiency, along with the low latency design of the Coraid appliances, also allows the maximum amount of disk IOPS to be delivered from the storage network.
It is very important that the parallel nature of transmission is highlighted as this means that simply adding more ports to an AoE network can dramatically increase throughput; and because the use of all ports is automatic, with no MPIO configuration required AT ALL, this increase in throughput becomes a simple plug and play operation that can be performed in minutes.
In a previous blog entry we highlighted some of the many misconceptions that AoE suffers, being an unknown protocol in a world dominated by FC and iSCSI vendors. Answers there should be read alongside the information given in this blog entry, a summary of which appears below:
Summary of differences discussed between FC/iSCSI and AoE protocols
IO TCP reassembly
Datagram Disk IO
Plug and Play
iSCSI: TCP with Retransmit
FC: Link Flow Control, IO timeout
Out of Order Data
iSCSI: TCP block delivery
FC: Prevented via Link Level Flow Control
In order arrival not mandated
Fibre Channel is up to 2-4x faster performance than iSCSI
AoE is 30% more efficient than Fibre Channel
Source: Coraid Inc
When approaching Coraid opportunities I do try to impress that AoE is an alternative to Fibre Channel, and not another flavour of iSCSI, and it is in the FC market that I see the most benefit is to be had, especially when comparing to Fibre Channel over Ethernet (FCoE) – which is at best an interim fix to allow organisations with heavy investment in FC hardware to migrate to Ethernet without wasting that investment. Once a full hardware refresh becomes viable in an organisation I see full Ethernet solutions will really see their day, and AoE is far better placed than iSCSI to take over where FC leaves off.
In conclusion I hope it can now be seen in what ways AoE is definitely NOT iSCSI, and being based on Ethernet is a much better choice to efficiently and relatively cheaply connect your storage networks.
For more details or a discussion on what AoE can do for your Enterprise Storage drop us a line at [email protected] or visit our website via the link in the blog profile opposite.