How are storage systems and technologies evolving? How should your strategy change to adapt to the new tech? The answers to these and some other vital questions in the storage domain were explored by global leaders at a recent event. Here are a few excerpts…
Organisations are now dealing with massive data that needs accessible, flexible and secured storage. So they need to build on storage systems and their standards, and this is where open source storage comes in.
Storage technologies and trends
There are five storage technologies in use today — file, block, object, persistent memory and addressable types of storage, and computational storage. An ideal storage solution supports analytics and allows organisations to gather intelligence from the data in their virtual storage units.
A recent survey on ‘Adoption of cloud storage infrastructure services’ indicates a 20 per cent increase in cloud usage over just two years. Speaking during a panel discussion at the SODACON 2021 virtual event, Richelle Ahlvers, storage technology enablement architect, Intel, said: “There is more adoption of public cloud storage today, particularly for file share storage type applications.”
Transition to SSDs: Solid-state drives (SSDs) have seen shifts in technology, and have moved from serial advanced technology attachment (SATA) to serial-attached SCSI (SAS) and non-volatile memory express (NVMe) drives. There is also a shift in the usage of configurations. “We can really expect to see SSDs dominate the desktop environments. The user may actually see a tiered approach between the hard drive, SAS SSDs and NVMe SSDs as appropriate, although likely not all three in the same configuration,” she said.
In the case of external storage, there is much of the same shift in a more accelerated manner, she noted. Recent trends indicate a strong demand for all flash arrays, as they continue to deliver higher performance configuration, with users moving towards NVMe SSD based arrays.
A survey on ‘Types of SSDs organisations use in all-flash arrays’ reveals a 22 per cent increase in the usage of SAS SSDs and a greater increase of 26 per cent in the use of NVMes over the last two years..
Another emerging area is the persistent memory. Ahlver said, “Users may be planning a 12 to 20 per cent increased adoption of persistent memories over the next couple of years. Each one of these trends in the storage industry has a huge impact on how users are building their storage environments and data management.”
While there are many different technologies of interest in the storage industry, there are some nascent areas too such as computational storage. Existing technologies are being standardised, and there is renewed interest in standardised storage fabric management.
Cloud architecture: The larger vendors use their own storage technologies most of the time. But the usage pattern, be it block, object or file storage, depends on the convenience of use. If the priority is performance or cost, technologies like object and file are appropriate. “Sometimes clients prefer data locality — local storage and that particular region support kind of storage,” said Krishna Kumar M., technology architect cloud at Accenture, speaking at the event.
“The technologies are being offered by the big cloud providers. And it is the convenience of usage and the business needs that decide what kind of storage we use. This includes cloud native technologies and similar technologies in combination,” he added.
Cloud storage is being mainly used for disaster recovery (DR) and backup. The cloud is heavily used for backing up the data set, which is also backed up in some other region for DR. Kumar pointed to the US model in which data is stored in the western, central and eastern regions to ensure it survives catastrophic incidents. “An increasing number of people are taking backup of additional data sets to meet their RPO (recovery point objective) and RTO (recovery time objective),” Kumar said.
File management is another common trend, which aims at high usability and convenient sharing of files.
Companies like Linbit are actively encouraging users to build their private clouds. Philipp Reisner, CEO of Linbit, said that companies are combining on-premises clouds or private clouds and public clouds. They prefer locating their data in a place that is accessible and secure, at an economical cost.
ActionSpot is a company that focuses on blockchain tools. “We also use a lot of Kubernetes. It will probably be pretty dominant in the next ten years. Blockchain is a part of cryptography and data security,” said Olga Buchonina, CEO and founder, ActionSpot Corp.
The blockchain is on the cloud and has never used onsite infrastructure. “The FinTech sector, such as banks like HSBC, is adopting blockchain technology,” said Buchonina.
She added, “This just indicates that we are moving forward. However, in some cases, having hybrid solutions makes more logical sense than having a pure cloud, especially if it’s a security concern to have an on-premises solution. So people need to understand what’s needed. Blockchain technology uses cloud, Dockers, Kubernetes and container based databases.”
Stop obsessing on storage type
Alex McDonald, director, SNIA EMEA, encouraged everybody to stop obsessing about the difference between NVMe or small computer system interface (SCSI), unless the developers are deeply involved with the storage hardware level or a certain layer above that. “Stop obsessing about the difference between file block and object,” he said.
He explained further, pointing to an instance that used POSIX semantics for doing file open, read and write clauses. “That is the way we need to think about all data access,” he said. For example, if there is a replication issue, the developer should stop replicating at the wrong level. And the wrong level may be, for instance, in the NoSQL database that you choose, while you let it replicate. “Replication shouldn’t know anything about the underlying storage architecture, its frailties, its fragility or lack of performance or whatever it might be that needs to be dealt with in a layer below, where we need to get the levels of abstraction,” McDonald added.
It is wiser to focus on the application with knowledge about what data it uses, what data is really needed from the application consumption perspective, and make sure the data services are abstracted to virtually where all the workloads can benefit.
According to Reisner, edge deployment is on the rise but there is a need for automation. The IoT devices connected at the network edge are being pushed to take on more storage functions. They can process data in small quantities before sending it to a central storage location. The raw data can be processed or managed at the edge.
Edge-specific use cases is another area the panelists in the discussion thought may need a lot of work, specifically, having the data replicated from the edge to the cloud backend, and intercloud replications.
Emerging technologies and use cases
Kumar also talked of access control. “When there is a data set, it has to access the General Data Protection Regulation (GDPR) compliance and other requirements, maybe row level or column level data. Only a small part of a data set may need to be accessed. Hence, we need specific access mechanisms,” he said.
Second, if the user is able to access the data in the US, he or she may not be able to access it when travelling to some other country, because the data is restricted to that specific region. “So we are seeing a lot of access mechanisms and technologies in place where the data is not exposed beyond certain boundaries,” Kumar explained. He said this will bring in more stringent technologies that allow data to be granularised and accessed.
He also shed light on application-aware storage of data: “Sometimes, data should be completely separate from the application; but in some cases, the application needs to be aware of the data, for example, when it comes to performance issues.”
Security comes first
If cloud storage can be accessed anywhere, there should be a focus on security. Storage vendors are adopting important security technologies to protect data from security risks. This includes ransomware protection, and ‘air-gapping’ of secondary storage from primary storage to prevent attacks on backup and archived data. This is effectively making data storage technology in some cases the last line of defence against attacks.
McDonald said that through geolocation based access controls, we give everyone (good or bad) on the Internet an excuse to go around blocking data as and when they see fit. “We need to be really careful when we start talking about securing data, particularly by geolocation; there are some big ethical questions here. It is one of the things that bothers me enormously,” he said.
There is also a concern about where the data is located. “If I have data processing taking place on the network, then the concept of data becomes a bit fuzzy. There are concerns about privacy and the kinds of things we will be able to do in the network to data, without anybody really knowing about it. That’s a serious problem as well,” he added.
He believes that HTTP/3, a protocol that Google brought to the ITF (International Transport Forum), will develop further and make its way into the network file system (NFS), just as it reached server message block (SMB). These two major file protocols can provide true end-to-end security. “Data privacy is one of the things. The other is standards,” he added.
Recollecting the five storage technologies, McDonald said that computational storage is about to transform in terms of focusing on applications and in the way applications access data. He added that computational storage, along with the emerging pattern of data centres and cloud becoming highly distributed, is what we need to watch out for. “Instead of asking for small parts of the data set, we may actually ask the devices themselves to be smart about the data, and tell us what they know about the data they have collected,” he explained.
The user may have storage devices that are able to process the data, in order to collect metadata. Then it will be possible to work with the storage device instead of asking every object that’s got a metadata tag. “We may be working towards actually asking the device, how many red objects it has got? And if your answer is none, there is no data transfer. It’s a simple Knack type message that we can send to a device. And I think we may see a lot more of that in future. We are going to see a DPU (data processing unit) in smart network cards that can transform the data. We can actually process data on the network rather than having to move it to a CPU to get processed or to the GPU to get further smashed and computed on,” said McDonald.
Software defined storage
The services related to storage are defined by a software stack that usually runs on commodity hardware. The vendor’s hardware can be optimised to run its software stacks or it may have some hardware base that affects storage performance.
Reisner said that with Linbit’s focus on software-defined storage and open source, clients are driving the company to adapt its software to the existing high-end hardware. Though its storage devices provide an extremely high number of IOPS (input/output operations per second), there is some concern about how to ensure the software stack can actually take advantage and exploit their functionality and performance attributes. “We have single PCI devices doing like 1.6 million IOPS, and a server can have eight of these. This was a real challenge for me,” he added.
The future may see more software stacks available from popular storage brands.
Standardisations and SNIA (Storage Networking Industry Association)
Many of the storage related standardisation problems faced earlier by users continue to-date and have not been fixed. McDonald said, “We may have14 standards but may need another one when we are trying to standardise stuff.”
Storage is all about bytes. “I would like to see more of Kubernetes in storage technology. We need to get the right levels of abstraction and the right types of interfaces to storage. Standards bodies like SNIA have a huge role to play and are absolutely essential in this environment,” he added.
The storage, management and protection of data will continue to be critical in the future. Businesses will collect data at an ever-increasing rate, and need new ways of keeping it accessible and secure. Storage decisions will always be made based on an organisation’s need to grow its operations and generate profit.
The article has been compiled from a panel discussion held at SODACON 2021.