By NVM Express
Thank you to everyone who joined us for our Q1 2020 webcast “NVMe-oF™ 1.1 Specification: Key Features and Use Cases Audiences.” During the webcast, we discussed how the latest NVMe-oF 1.1 specification is creating “data center storage disruption,” the latest features and provided a couple of key uses cases on NVMe/TCP and Asymmetric Namespace Access (ANA) capabilities.
We received many thought-provoking inquiries from the audience during the webcast, but we were not able to answer all of them live. In this blog, we will answer those remaining questions.
NVMe-oF 1.1 Specification Overview
- For ANA and multipathing, why can’t we just assume both controllers are always identical?
For simple data access, the connectivity within the storage array (this is not typically used for M.2 cards in your laptop), there are access paths to the data where the performance will be fast, and access paths to the data where the performance will be slower. There are multiple paths (typically 2 or 4) to the data for redundancy – if one path goes down, you can still get to your data. Because of those multiple paths, it is valuable for the host to know the difference between the performance characteristics of those paths, so the host can use the higher performance path as much as possible and use the lower performance path only when necessary.
Another use case/example is migration use cases. You have a smaller sized unit, your role in the larger unit and connect them. The data magically migrates, and you disconnect the small one. You’ve transparently migrated from the small to the large (which aren’t identical). But, during the migration, performance characteristics may change, and making the host aware of those changes provides improved performance.
- Can each path be in a separate domain?
Generally, no. The reason is that one domain can contain multiple namespaces. Since the paths go to a namespace and you have a multipathing situation, the different paths go to the same namespace, which is in one domain. Multipathing to one namespace would be going to the same domain, however, you could have different paths to other namespaces going to other domains.
- What is tail latency and why is it important for NVMe and NVMe over Fabrics technologies?
Many people look at latency as an absolute metric, in terms of fabric latencies. It is important to understand that there is a stack on both sides as well as applications and actual media that can contribute to total end-to-end latency. When you look at latency, please look at it beyond Fabrics, including storage services that might be deployed on a potential enterprise storage network.
Also, many applications are sensitive to IO and you cannot deploy an application in some storage in which some IOs take 10 seconds to complete and others take 20 seconds to complete. That will make performance unpredictable in applications or the database. If you have a standard performance characterization tool in your labs or deployment, they help you understand and plot latency across every single IO to help understand what your tail latency is – don’t ignore that.
- Do new versions of the NVMe-oF specification automatically support all features in new versions of the NVMe specification?
Mostly yes—some new NVMe features could require changes to the NVMe-oF specification to support that feature, but that is rare. The layering of the protocols is intended to make it so that it doesn’t happen. For example, changes to the TCP layer or IP layer of the network stack allow existing FTP or HTTP applications to continue working. However, sometimes there are connections between the transport layer (NVMe-oF architecture) and the higher layers (NVMe architecture). But those connections are rare and intentionally kept to a minimum.
- Why is the NVMe specification now at version 1.4 but the NVMe-oF specification only just reached version 1.1?
Because they are different specifications with different histories. Each specification adds new features when needed (they are called TPs – Technical Proposals), and each specification is revised as needed. This is like how the TCP/IP specification and the FTP or HTTP specifications are independent.
- Will NVMe and NVMe-oF specifications continue to develop separately or be merged?
Some of both. There are parts of the NVMe-oF specification that belong in the base NVMe specification, and there are parts that are specific to fabric transports. NVM Express is currently examining the structure of the various NVMe specifications to determine the best path forward to produce a set of documents that are easily usable by the widest range of consumers.
NVMe/TCP Overview
- What does multipathing look like on NVMe/TCP technology?
Multipathing is a layer above the actual transport drive, so it is transport agnostic. Thanks to the NVMe-oF 1.1 specification, you have ANA access so the system can define preferred parts and more.
- Can you explain how NVMe/TCP architecture handles network congestion?
It’s not NVMe/TCP technology that handles network congestion, it’s the TCP protocol. If you understand TCP, end-to-end flow control is handled by the TCP. Currently, there are talks about making NVMe/TCP architecture more efficient in terms of flow control. We believe that NVMe/TCP technology can estimate congestion, but its actual value has yet to be proven.
- If NVMe/TCP technology was approved in late 2018, how is it part of NVMe-oF 1.1 specification, which was approved in late 2019?
The technical proposal (TP) was approved by the NVM Express Board of Directors for publication in late 2018. Then, in late 2019, rev 1.1 specification was released. NVMe-oF 1.1 specification contains everything from the rev 1.0 specification plus the integration of all the published TPs up to that date.
- If NVMe-oF technology submission queue flow control is now optional, where is the flow control going to be managed?
Management is not under the control of the NVMe specifications, and therefore will be implementation dependent (meaning, Windows, Linux, and others will each have management tools for controlling this capability).
- How is NVMe/TCP different from NVMe/iWARP? Don’t both use TCP?
Both use TCP, but the semantics are different, as iWARP has RDMA semantics underneath. NVMe/iWARP is a specific protocol that is optimized for long distances when the TCP environment is not optimal. iWARP requires an adapter or a special network Ethernet NIC that has the iWARP offloads, where NVMe/TCP technology can use NICs with TCP offloads, but it doesn’t have to.
- Does Windows 2019 support ANA over TCP and NVMe over TCP technology?
There is no support to Windows directly for NVMe-oF architecture. You need a hypervisor or a host that supports NVMe technology. If you look at the latest Linux kernel starting with 5.0 there is support for NVMe/TCP technology. It’s a good place to get started with evaluating NVMe/TCP architecture.
NVMe-oF and Max Data Solution
- Is NVMe-oF specification / Max Data enterprise ready from a Data Protection, Redundancy and management perspective?
Next to the NVMe-oF 1.1 specification definition, the NVMe-MI specification definition is developed and will merge into upcoming specifications, Vendors can expand their existing management GUIs by using NVMe-MI technology so you (the user) can keep using your standard tooling.
MAX Data with ONTAP (FC and iSCSI) has extensive data management capabilities like SnapShots and Cloning, which will cover data protection
In NVMe-oF 1.1 architecture, multipathing has been added so in case of a glitch in the path it will failover just like other storage solutions.
- Is MaxData open source or only available from specific partners or integrators and does it only work with Intel DCPMM?
MaxData is not open source; it is license software from NetApp. It does not only work with Intel DCPMM, but it also works with NVDIMM or emulated DRAM. There are ways to emulate different memory models so you can start experimenting before purchasing the actual persistent memory.
- If MaxData supports write caching to PM (or to DRAM), how does it enforce data consistency in case of a power failure?
The system memory and the host software ensure that data is consistent. If you want to make a snapshot, you must host software for the MaxData solution to work.
Learn More About NVMe-oF Technology
If you have more questions about powering the NVMe-oF 1.1 specification or are a member interested in becoming more involved in the specification development process, please contact us to learn more. If you missed the live webcast or want to re-watch sections, a full recording is available on our BrightTALK channel.