Tag Archives: Storage
Keeping Up With Microsoft Dynamics 365 Storage
As Microsoft continues to blur the lines between Dynamics and PowerApps, so does it continue to redefine how Common Data Service (now the Dataverse) storage is measured and billed. In the age of Coronavirus and remote work, cloud solutions like Dynamics 365 are going to be more important than ever for business continuity. Keeping up with Dynamics storage will help keep costs in check and may expose opportunities for process improvement. D365 admins have a number of tools at their disposal to manage storage, and it all starts with the Power Platform Admin Center.
The Power Platform Admin Center
D365 administrators can access the Power Platform Admin Center through the Admin option when clicking the waffle from within Dynamics, or simply by navigating to https://admin.powerplatform.microsoft.com/.
Once inside the admin center, there are a number of different options to help you manage your Dynamics instances. For this article, we’re interested in the Capacity selection under Analytics.
Viewing Storage Usage and Drilling Down
Once you’ve navigated to the Capacity option, a Summary of your tenant’s storage is displayed. Storage is broken into three categories.
- Database – capacity consumed by records within your Dynamics deployment, like Accounts, Contacts, and Activities
- File – capacity consumed by files stored as attachments to notes or emails within Dynamics
- Log – capacity consumed by plugin trace logs or audit logs, if you’ve enabled field auditing
In the example below, you can see that this tenant is over-consuming its database storage by a huge margin.
We can drill into that database consumption by navigating to the Storage capacity tab. In this example, you can see that no one environment is consuming the preponderance of Database storage in this tenant. That doesn’t mean that we can’t take an even closer look at each environment to see exactly how that storage is being consumed. By clicking on the blue-highlighted graph next to each environment, we can view specific information about data consumption.
On the resulting page, an interactive graph breaks down database usage by table over time. Some things are out of your control as a Dynamics admin, like RibbonClientMetadataBase or RuntimeDependencyBase, but other tables can be managed through bulk-delete jobs within the application. For example, the WorkflowLogBase highlighted below indicates that logs from completed workflows are consuming 478 MB of storage. These logs can be reviewed for failures and deleted to free up database space.
Taking the Proactive Approach
Reviewing database storage also offers D365 admins the ability to think proactively about data and application management. Maybe the WorkflowLogBase table is inflated because a workflow is consistently failing or configured incorrectly? If a high degree of storage is consumed by activities, a data archival strategy might be necessary. Maybe an abundance of notes indicates that OneNote integration should be enabled, where an overage on attachments might warrant enabling SharePoint integration.
Beringer Can Help!
Whatever storage concerns you may discover under the hood of your Dynamics deployment, Beringer Technology Group is here to help. We have experienced, Microsoft-certified professionals ready to help you rein in your database consumption and optimize your Dynamics experience.
Beringer Technology Group, a leading Microsoft Gold Certified Partner specializing in
Unlocking Data Storage: The Traditional Data Warehouse vs. Cloud Data Warehouse
We live in a world of data: There’s more of it than ever before, in a ceaselessly expanding array of forms and locations. Dealing with Data is your window into the ways data teams are tackling the challenges of this new world to help their companies and their customers thrive.
The data industry has changed drastically over the last 10 years, with perhaps some of the biggest changes happening in the realm of data storage and processing.
The datasphere is expanding at an exponential rate, and companies of all sizes are sitting on immense data stores. And where does all this data live? The cloud.
Modern businesses are born on the cloud: Their systems are built with cloud-native architecture, and their data teams work with cloud data systems instead of on-premises servers.
The proliferation of cloud options has coincided with a lower bar to entry for younger companies, but businesses of all ages have seen the sense of storing their data online instead of on-premises.
The increased interest in cloud storage (and increased volume of data being stored) coincides with an increased demand for data processing engines that can handle more data than ever before.
The shift to the cloud has opened a lot of doors for teams to build bolder products and infuse insights of all kinds into their in-house workflows, user apps, and more.
The cloud is the future, but how did we get here?
Let’s dig into the history of the traditional data warehouse versus cloud data warehouses.

Data warehouse vs. databases
The boosted popularity of data warehouses has caused a misconception that they are wildly different from databases. While the architecture of traditional data warehouses and cloud data warehouses does differ, the ways in which data professionals interact with them (via SQL or SQL-like languages) is roughly the same.
The primary differentiator is the data workload they serve. Let’s explore:
Data warehouse: online analytical processing (OLAP) |
Database: online transaction processing (OLTP) |
Write once, read many | Write many, read many |
Best for large table scans | Best for short table scans |
Typically a collection of many data sources | Usually one source that serves an application |
Petabyte-level storage | Terabyte-level storage |
Columnar-based storage | Row-based storage |
Lower concurrency | Higher concurrency |
Examples: Redshift, BigQuery, Snowflake | Examples: Postgres, MySQL |
Given that both data warehouses and databases can be queried with SQL, the skillset required to use a data warehouse versus a database is roughly the same. The decision as to which one to use then comes down to what problem you’re looking to solve.
If there’s a need for data storage and processing of transactional data that serves an application, then an OLTP database is great. However, if the goal is to perform complex analytics on large sets of data from disparate sources, a warehouse is the better solution.
Before we look at modern data warehouses, it’s important to understand where data warehouses started to see why cloud data warehouses solve many analytics challenges.

Traditional vs. Cloud Explained
Traditional data warehouses
Before the rush to move infrastructure to the cloud, data being captured and stored by businesses was already increasing, and thus there was a need for an alternative to OLTP databases that could process large volumes of data more efficiently. The business began to build what are now seen as traditional data warehouses.
A traditional data warehouse is typically a multi-tiered series of servers, data stores, and applications.
While the organization of these layers has been refined over the years, the interoperability of the technologies, the myriad software, and orchestration of the systems make the management of these systems a challenge.
Further, these traditional data warehouses are typically on-premises solutions, which makes updating and managing their technology an additional layer of support overhead.
Cloud data warehouses
The traditional data warehouses solved the problem of processing and synthesizing large data volumes, but they presented new challenges for the analytics process.
Cloud data warehouses took the benefits of the cloud and applied them to data warehouses — bringing massive parallel processing to data teams of all sizes.
Software updates, hardware, and availability are all managed by a third-party cloud provider.
Scaling the warehouse as business analytics needs grow is as simple as clicking a few buttons (and in some cases, it is even automatic).
The warehouse being hosted in the cloud makes it more accessible, and with a rise in cloud SaaS products, integrating a company’s myriad cloud apps (Salesforce, Marketo, etc.) with a cloud data warehouse is simple.
The reduced overhead and cost of ownership with cloud data warehouses often makes them much cheaper than traditional warehouses.
Cloud data warehouses in your data stack
We know what data warehouses do, but with so many applications that have their own databases and reporting, where does the warehouse fit inside your data stack?
To answer this question, it’s important to consider what a cloud data warehouse does best: efficiently store and analyze large volumes of data. The cloud data warehouse does not replace your OLTP database, but instead serves as a repository in which you can load and store data from your databases and cloud SaaS tools.
With all of your data in one place, the warehouse acts as an efficient query engine for cleaning the data, aggregating it, and reporting it — often quickly querying your entire dataset with ease for ad hoc analytics needs.
In recent years, there has been a rise in the use of data lakes, and cloud data warehouses are positioning themselves to be paired well with these. Data lakes are essentially sets of structured and unstructured data living in flat files in some kind of data storage. Cloud data warehouses have the ability to connect directly to lakes, making it easy to pair the two data strategies.
A data-driven future powered by cloud data warehouse technologies
The three most popular cloud data warehouse technologies are Amazon’s Redshift, Snowflake, and Google’s BigQuery. They each handle the same workloads relatively well but differ in how computing and storage are architected within the warehouse.
While they’re all great options, the right choice will be based on the scaling needs and data type requirements of the business. Beyond that, the pricing structure for the three varies slightly, and based on the use case, certain warehouses can be more affordable than others.
As the number of cloud data warehouse options on the market grows, niche players will rise and fall in every industry, with companies choosing this or that cloud option based on its ability to handle their data uniquely well.
Whatever your company does and wherever you’re trying to infuse insights, be it into workflows or customer-facing apps, there’ll be a cloud option that works for you.
The future is in the clouds, and companies that understand this and look for ways to put their data in the right hands at the right time will succeed in amazing ways.

Adam Luba is an Analytics Engineer at Sisense who boasts almost five years in the data and analytics space. He’s passionate about empowering data-driven business decisions and loves working with data across its full life cycle.
Storage 101: The Future of Storage
The series so far:
- Storage 101: Welcome to the Wonderful World of Storage
- Storage 101: The Language of Storage
- Storage 101: Understanding the Hard-Disk Drive
- Storage 101: Understanding the NAND Flash Solid State Drive
- Storage 101: Data Center Storage Configurations
- Storage 101: Modern Storage Technologies
- Storage 101: Convergence and Composability
- Storage 101: Cloud Storage
- Storage 101: Data Security and Privacy
- Storage 101: The Future of Storage
An IDC report published in November 2018 predicted that the world’s data would grow to 175 zettabytes by the year 2025. For those unaccustomed to such amounts, a zettabyte is about 1,000 exabytes, which comes to one billion terabytes or one trillion gigabytes. Given our current trajectory, we’ll likely see those predictions come true. Even if we fall short, there will still be a heap load of data.
Current storage technologies are going to have a tough time keeping up. They’re already having a tough time keeping up. With the explosion of mobile devices, followed by the influx of the Internet of Things (IoT), more data than ever is being generated—by people, by applications, by machines. The only way to derive meaning from all that data is to develop innovative high-performing, high capacity storage solutions.
Figure 1. Data’s exponential growth (image by geralt)
Scientists are pioneering storage solutions that can support our data loads into the future. To this end, they’re searching for ways to improve NAND flash and storage class memory, while experimenting with new storage technologies. In this article—the last in my series on storage—I provide an overview of many of these efforts to give you a sense of what we might expect in the near and, with any luck, not-too-distant future.
What’s up with NAND flash?
NAND flash adoption has significant data center market share, offering substantially better performance and durability than what hard-disk drives (HDDs) are physically capable of ever achieving. As NAND’s popularity has increased, along with its densities, prices have steadily dropped, making it a more viable storage option than ever.
Yet even with these improvements, they’re not enough to meet the demands of many of today’s data volumes and workloads, which is why vendors are working hard to make solid-state drives (SSDs) that can deliver better performance and greater densities while minimizing the cost-per-GB.
The primary strategy for doing so is adding more bits per cell, more layers per chip, or a combination of both. Flash SSDs have gone from one bit per cell to two bits and then three. Now we have quad-level cell (QLC) SSDs, which squeeze four bits into each cell. Initially, QLC flash primarily targeted PCs, but that’s starting to change, with some vendors now offering QLC storage for the data center.
More bits per cell increases the need for error correction, slowing program/erase (P/E) cycles. The additional bits also decrease endurance as cells become more labile. Until significant advances are made in P/E processes such as garbage collection, enterprise QLC flash will be limited to read-intensive workloads. In the meantime, vendors are pushing ahead with more bits per cell, even developing penta-level cell (PLC) SSDs that boast five bits per cell.
At some point, adding more bits per cell will no longer be practical, which is why vendors are also adding more layers to their NAND chips, a technology referred to as 3D NAND. In this type of chip, memory cells are stacked into vertical layers to increase capacity. The first 3D NAND chips had 32 layers. Many vendors now offer SSDs with 96 layers.
In addition, several vendors are ramping up production on 128-layer SSDs, with 256-layer devices on the horizon. Devices featuring 500 and even 800 layers or more are forecast. But additional layers mean thinner materials, amplifying manufacturing challenges and costs. The cost-per-GB is unlikely to decline as quickly as it has been without novel technological advances.
Who’s invading the flash space?
While vendors continue to enhance their NAND flash offerings, some are also investing in technologies that could eventually replace flash or be used in conjunction with flash to create a hybrid solution. One of these is Intel’s Optane DC SSD, which is based on the 3D XPoint architecture, a storage-class memory (SCM) technology developed by Intel in partnership with Micron.
The Optane DC SSD provides greater throughput and lower latency than a traditional flash SSD, including Intel’s own line of enterprise flash storage. IBM is now working on its second generation of the Optane DC SSD, offering hints that it might nearly double the speed of its first-gen implementation.
Figure 2. Moving into flash territory (image by Fotocitizen)
Not to be outdone, Samsung now offers its own alternative to traditional NAND flash—the Z-SSD drive (or Z-NAND). Although the Z-SSD is based on NAND technologies, it offers a unique circuit design and controller that delivers much better performance. In fact, the Z-SSD is often described as an SCM device and is considered Samsung’s answer to Intel’s Optane DC SSD.
Micron has also released an SSD built on the XPoint architecture—the X100 NVMe SSD. Both Micron and Samsung appear to be planning their next generation of flash alternatives. But they’ve released few details about the devices or how they’ll perform.
In the meantime, Kioxia (formerly Toshiba Memory) is working on its own NAND flash alternative, Twin BiCs FLASH, which the company describes as the “world’s first three-dimensional (3D) semicircular split-gate flash memory cell structure.” That’s quite the mouthful and certainly sounds intriguing. However, the project is still in research and development and will likely not see the light of day for some time to come.
It’s uncertain at this point what the future looks like for NAND flash alternatives such as those from Intel, Micron, Samsung, and Kioxia. Much will depend on how traditional NAND flash evolves and the affordability of these new devices over the long-term. With workload and data demands increasing, organizations will continue to look for whatever solutions can effectively balance performance and capacity against endurance and cost.
Where does storage class memory fit in?
In the last couple years, storage class memory (SCM) has inspired many headlines, especially with IBM’s recent release of the first Optane DC persistent memory modules (PMMs).The modules plug into standard dual in-line memory module (DIMM) slots, allowing the PMMs to connect directly to the server’s memory space. The Optane DC modules represent a big step forward toward the vision of a new storage tier that sits between traditional dynamic RAM (DRAM) and NAND flash storage to support demanding enterprise workloads.
Intel’s Optane DC modules are typically referred to as a type of phase-change memory (PCM)—“typically” because the company’s messaging has been somewhat mixed around this issue and they are sometimes considered to be a type of resistive RAM. However, the consensus is that the Optane DC modules fit neatly into the PCM category.
Phase-change memory is a type of nonvolatile memory that stores data by rapidly changing a material between amorphous and crystalline states. Phase-change memory offers much faster performance and lower latency than NAND flash and has the potential of delivering greater endurance. On the other hand, PCM is also much more expensive.
But PCM is not the only SCM effort under development. Scientists are actively researching other technologies that they believe can also serve as a bridge between DRAM and flash storage. One of these is resistive RAM (RRAM or ReRAM), another type of nonvolatile memory that promises significantly greater performance than NAND flash, with speeds approaching those of DRAM.
Resistive RAM works by applying different voltage levels to a material in order to switch its resistance from one state to another. Compared to NAND flash, RRAM offers much better performance and higher endurance while consuming less power. In fact, the technology shows so much promise that it has been proposed as a possible replacement for both NAND flash and DRAM.
Another nonvolatile memory technology that shows promise is ferroelectric memory (FRAM or FeRAM), which is built on a ferroelectric capacitor architecture that incorporates a mechanism for controlling polarities. Ferroelectric memory offers high read and write speeds, low power consumption, and high endurance. But in its current form, it has a very low density and its processing costs are high.
Nanotube RAM (NRAM) is another nonvolatile memory technology that’s being actively researched for its DRAM-like performance, low power consumption, and ability to withstand extreme environmental conditions. Nanotube RAM can also retain data far beyond NAND flash capabilities. A NRAM device is made up of tiny carbon nanotubes that are extremely strong and have conductive properties. The nanotubes sit between two electrodes through which voltage is applied to change the resistance, providing the structure for data storage.
Researchers are also focusing on Magnetic RAM (MRAM), which could potentially deliver speeds on par with static RAM (SRAM). Magnetic RAM—also called magnetoresistive RAM—is a nonvolatile memory technology that uses magnetic states to store data bits, rather than using electrical charges like other memory technologies.
Vendors are pursuing different strategies for implementing MRAM. One of the most promising is spin tunnel torque MRAM (STT-MRAM), which leverages the angular momentum in quantum mechanics to store data. The biggest challenge with MRAM, however, is its extremely low density.
All of these memory types—along with others being investigated—are in various stages of research and development. Although several vendors already offer products based on some of these technologies, today’s research is what will drive them into the future and make it possible to create a memory-storage stack in which all memory is nonvolatile, profoundly changing the way we deliver applications and store data.
What does the future hold?
The memory technologies I’ve discussed so far are mostly works in progress, with vendors looking for ways to make them more practical and profitable beyond a handful of small niche use cases. But researchers are also looking further into the future, working on technologies that are still in their infancy or have been around for a while but are now being infused with new efforts.
One area of research that’s caught the industry’s imagination is silica glass, which can be used to store data much like the crystals that taught Superman about his Krypton roots. This idea of silica glass got its boost in 2013 from researchers at the University of Southampton, who demonstrated storing a 300 KB text file in fused glass.
The storage medium, referred to as 5D memory crystal, or 5D storage, relies on superfast femtosecond laser technology, like that used for refractive surgery. The laser etches microscopic nanogratings into the glass to provide the data bit structure. A special technique is then used to retrieve the data, taking advantage of the light’s polarization and intensity.
According to the researchers, a 25-mm silica disk could store as much as 360 TB of data, sustain temperatures up to 190 degrees Celsius, and remain viable for over 13 billion years, making today’s storage media seem like cardboard cutouts. In fact, 5D storage has already received a fair share of notoriety. A silica disk storing Isaac Asimov’s Foundation series now orbits the sun, sitting inside Elon Musk’s cherry red Tesla Roadster, which itself sits onboard the Falcon Heavy SpaceX rocket.
Microsoft was so impressed with the 5D storage technology that it has launched its own initiative, dubbed Project Silica, whose stated goal is to develop the “first-ever storage technology designed and built from the media up, for the cloud.” Project Silica uses femtosecond lasers to write data into quartz glass, the same process used for 5D storage. As its first proof of concept, Microsoft teamed up with Warner Bros. to store and retrieve the entire 1978 Superman movie on a piece of glass about the size of a drink coaster.
Another innovative approach to data storage is racetrack memory, which was first proposed by IBM researchers in 2008. Racetrack memory applies electrical current to nanowires to create domain walls with opposite magnetic regions between them (thus the racetrack concept). The domain walls and their regions provide a structure for efficiently storing data. IBM hopes that racetrack technology might eventually yield a nonvolatile, solid-state storage device that can hold 100 times more data than current technologies at a lower cost-per-GB.
Other researchers are pursuing a different approach to racetrack memory, leveraging the inherent properties in skyrmions, which are microscopic swirls found in certain magnetic materials. Skyrmions work in conjunction with anti-skyrmions to create opposing magnetic swirls that can be used to create a three-dimensional structure for hosting digital data. Skyrmion-based storage requires very little current and has the potential for storing large quantities of data while delivering high-speed performance.
Scientists are also researching the potential of storing data at the molecular level. One of the most publicized approaches is DNA, in which data is encoded directly into the genetic material. Corporate, university, and government researchers are actively pursuing DNA’s potential for persisting data. DNA can store massive amounts of information, is millions of times more efficient than anything we have today, requires almost no maintenance, and can endure for many millennia.
Figure 3. Betting on DNA storage (image by geralt)
The challenge with DNA storage, however, is that it’s error-prone and expensive to produce. To address these issues, scientist have been experimenting with multiple solutions. For example, researchers at the University of Texas at Austin have come up with error-correcting algorithms that help compensate for the high rate of errors. Using synthetic DNA, they have successfully stored the entire book The Wizard of Oz, translated into Esperanto. But this is nothing compared to DNA’s true potential. As many have claimed, DNA could make it possible to store the entire internet in a shoe box.
Despite the enthusiasm around DNA storage, researchers are also investigating different molecular storage techniques, using molecules that are smaller than DNA and other long-chain polymers. The big advantage here is that smaller molecules can be cheaper and easier to produce, and they have the potential for storing more data. If that’s not small enough, scientists are also researching single-atom data storage, with each bit stored in an individual atom. So far, I’ve come across no discussions about going smaller.
Where do we go from here?
If technologies such as molecular storage and silica glass storage can be manufactured in a way that is both efficient and cheap, we’ll be better prepared to handle all the data that’s expected in the years to come. But we have a long way to go before we get there, and until then, we’ll have to rely on the advancements being made with NAND flash and its alternatives, as well as with SCM. What we’ll do with all that data once we figure out how to store is another matter altogether. In terms of storage, however, the sky is indeed the limit.
Storage 101: Data Security and Privacy
The series so far:
- Storage 101: Welcome to the Wonderful World of Storage
- Storage 101: The Language of Storage
- Storage 101: Understanding the Hard-Disk Drive
- Storage 101: Understanding the NAND Flash Solid State Drive
- Storage 101: Data Center Storage Configurations
- Storage 101: Modern Storage Technologies
- Storage 101: Convergence and Composability
- Storage 101: Cloud Storage
- Storage 101: Data Security and Privacy
Most discussions around storage inevitably lead to the topics of data security and privacy. You cannot operate in today’s climate without careful attention to both. If data protection is not built into your storage infrastructure, you’re doing something wrong.
Data protection is an ongoing, organization-wide effort in which storage plays a key role. A secure storage infrastructure is essential to safeguarding sensitive information. Even so, it takes more than secure storage to guarantee the data’s safekeeping throughout its lifespan. For that, an organization needs a comprehensive data protection strategy that takes into account all aspects of data management, including how data is stored.
Securing Data and Protecting Privacy
For many organizations, their most important asset is their data, the bulk of which must be protected against unauthorized access. The data might include intellectual property, legal documents, passwords, encryption keys, personally identifiable information (PII), or a variety of other sensitive material.
An organization that handles sensitive data should have a comprehensive data protection strategy in place to contend with potential threats. Unfortunately, the exact meaning of data protection is not always clearcut and can vary depending on usage and circumstances. It might refer to securing data, safeguarding privacy, protecting storage systems, implementing disaster recovery (DR), or any combination of these.
According to the SNIA (formerly the Storage Networking Industry Association), data protection is the “assurance that data is not corrupted, is accessible for authorized purposes only, and is in compliance with applicable requirements.” In other words, data protection goes beyond just encrypting data or guaranteeing its availability. Data protection ensures that the data remains viable, is safeguarded against all unauthorized access at all times, and is controlled in a way that adheres to applicable compliance laws and regulations, e.g., local, provincial, and federal.
In this view of data protection, storage security is only part of a larger effort to keep sensitive data out of the wrong hands, while ensuring its accuracy and availability to authorized users. To this end, you’ll sometimes see storage security described in terms of confidentiality, integrity, and availability—or CIA—which go hand-in-hand with the larger goal of data protection.
A comprehensive data protection strategy ensures both data security and data privacy. Although the two are related, they’re not the same. Data security protects sensitive information from unauthorized access and from loss and corruption, whether intentional or accidental. Data privacy refers to the appropriate handling of PII and the rights of individuals to control and access their personal information.
With the increasing number of regulations that govern PII, organizations are under greater pressure than ever to protect confidential information and provide a full accounting of how it’s managed. Regulations can vary from region to region and differ significantly. Many organizations operate across multiple regions, making them subject to a mix of governing laws. The General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), Children’s Online Privacy Protection Act (COPPA), and California Consumer Privacy Act (CCPA) are only some of the regulations that organizations now face. Even if an organization operates in only one region, they can still be subject to a confusing mix of laws.
Despite the differences between regulations, however, they all have one thing in common: to implement strict security controls that ensure personal information cannot be compromised when under the organization’s control. Most also define other obligations, such as retention or auditing requirements, but data protection lies at the heart of each one, which is why data security and privacy are intrinsically linked.
The Cybersecurity Threat Landscape
Data is not only an asset. It’s a burden. A data breach can lead to lost revenue, stiff penalties, downtime, legal liabilities, loss of intellectual property, unexpected expenses, and a tarnished reputation from which a company might never recover. No organization is immune to the potential threats that await them, from both inside and outside their domains.
External threats can come from governments, organized crime, terrorists, cybercriminals, competitors, or everyday hackers looking for a bit of sport or profit. And threats can arrive in many forms, often implemented through social engineering strategies that attempt to introduce malware or ransomware or steal user credentials.
Figure 1. The challenges of data security (image by madartzgraphics)
Attackers might also go after an organization’s networks or systems directly, leveraging vulnerabilities to carry out SQL injections, denial-of-service attacks, or other nefarious acts in an attempt to steal data or bring down operations. Their reasons for attacking an organization can vary. They might want to damage the organization’s credibility, acquire sensitive information for competitive advantages, access government secrets, or make money by selling stolen data or locking up systems until a ransom is paid.
Many organizations and even people have fallen victim to cybercrime, ranging from individuals, household, and small municipalities to multinational corporations, including Twitter, Facebook, Yahoo, Equifax, eBay, LinkedIn, and Marriot International. And the threats don’t only come from external players. Organizations must also guard against internal threats, whether from disgruntled or greedy employees, malicious insiders, or careless staff falling victim to weak IT security policies. Data compromised as a result of internal behavior can be just as devastating as an external attack.
Many organizations are also turning to the cloud to store data and support their workloads. Although cloud platforms can often be more secure than an organization’s own environment, they also add storage and data complexity, while increasing data exposure. An organization must rely completely on the provider to ensure that data is being protected from internal and external threats. At the same time, the cloud raises compliance concerns, especially when spanning multiple regions.
Organizations must also contend with the increased risks that come with a remote workforce, whose numbers have grown dramatically with COVID-19. The more people working offsite, the more difficult it becomes to ensure that sensitive data is not being exposed when it is transferred or stored. A home worker might use an unsanctioned cloud service, mislay a peripheral device that contains business data, collaborate on a project through an unsecure platform, or engage in other risky behavior. Even under the best circumstances, few home offices can achieve the same level of physical security you get in a data center.
Implementing a Data Protection Strategy
To ensure data security and privacy, you need a comprehensive plan that specifies how data will be protected both at rest and in motion. As part of this process, you should develop policies that define where data can be stored, who can access it, and what levels of protection the data requires. The policies should also address such issues as when data is deleted, what happens when an employee is terminated, how to handle a data breach and any other issues related to data protection.
Another important part of the planning process is to conduct a thorough assessment of your current data environment to identify potential risks and the steps that must be taken to mitigate those risks. You need to know where sensitive data is located, how it’s being used, and who can access it. You should also look for issues such as whether sensitive data is being transmitted as cleartext, credentials are being sent in an unencrypted format, or users are accessing internal web services via insecure HTTP.
From this assessment, you’ll have a good sense of what data you have and where it’s located. You can then classify the data based on security and compliance requirements. This will help you determine what levels of access to assign to each data type, as reflected in your security policies. Public data, for example, requires far less security than data covered by HIPAA or the GDPR or data governed by national security laws.
If your organization is subject to multiple regulations, you might consider a protection-by-default approach for personal data, rather than trying to create too many data classifications. For example, one regulation might require that you protect user IP addresses, while another does not. It might be better to create a single category that covers both. Too many data categories, which can complicate data management, may lead to a greater risk for regulatory violations.
A simpler category structure can also make it easier to address other compliance-related issues, such as providing users with visibility into their PII or supporting their deletion requests. At the same time, you must still take into account issues such as data retention and disposal requirements, which might force you to create additional categories.
Another part of the planning process is to ensure that you have the tools you need to safeguard your systems and their data. For example, you might implement a data loss prevention (DLP) solution to help automatically discover, monitor, and protect sensitive data. You might benefit from an intrusion detection system (IDS) that identifies traffic anomalies and warns you if something doesn’t look right.
Essential tools for protecting your data include anti-malware, anti-ransomware, and anti-spyware, as well as protections such as firewalls and proxy servers. And, of course, you want to be sure you deploy the proper storage protections. For example, you might implement RAID and other redundancies to provide storage fault tolerance, which can help protect against intentional or unintentional data destruction.
There are plenty of other tools as well. Just remember that no one solution can address all your data protection requirements, and you’ll have to come up with just the right mix to meet your specific needs.
Protecting Data and Privacy
Data protection must take into account both physical and operational security. Physical security ensures that unauthorized individuals cannot access the physical structures where the data is housed or the equipment within those structures. It also protects against circumstances that could lead to data loss, such as power failures or natural disasters. To implement physical security, an organization might employ backup and restore protocols, CCTV monitoring, biometric readers, geofencing, backup generators, and numerous other protections.
Organizations must also protect the individual systems within their secure structures, such as servers or workstations. No one on the inside should be able to walk off with equipment or get at their internal workings unless they’re authorized to do so. IT teams must also take steps to protect portable devices that leave the premises, such as laptops, tablets, or cell phones. This typically means implementing a mobile device management strategy that supports such features as remote lock or remote wipe.
In addition to ensuring the physical security, organizations must implement operational protections, which provide the technical safeguards necessary to protect the data itself. This starts with using advanced algorithms to encrypt sensitive data both at rest and in motion. In addition, IT teams might consider such tools as tokenization or data masking for further protection. They should also have in place a system for securely storing and managing encryption keys.
Another important component of operational security is role-based access control, which determines who can and who cannot view or modify specific sets of data. Access should be based on the principle of least privilege, that is, individuals should be granted only the access they need to do their jobs—and no more. In conjunction with access control, IT should also implement such safeguards as multi-factor authentication or virtual private networks (VPNs), as appropriate, to further protect data access.
An effective data protection strategy also requires a comprehensive infrastructure for continuously monitoring sensitive data, issuing real-time alerts, and generating comprehensive reports on-demand. All data access and modifications should be logged, with an auditing system in place to determine who accessed what data and when that access took place.
Operational protections also include DR systems that ensure data can be made available in the event of data loss or corruption, no matter what the cause. At the same time, IT must be careful that their DR mechanisms, such as backups or replication, don’t violate applicable regulations. Also, they must ensure that PII can be accessed in a timely manner, if required by law, and that it adheres to retention and disposition rules.
The Ongoing Challenges of Data Protection
To implement effective data protections, an organization must take into account the entire data lifecycle, regardless of how the data is being used or where it resides—whether on a remote workstation, on a mobile device, in a data center, on a cloud platform, at a remote facility, or on a server in an office corner. Data protection must be a unified effort that moves beyond infrastructure boundaries to ensure that data is secure, and privacy is protected at all times and under all circumstances.
One of the most important tools that organizations have for protecting data is an effective training and education program that helps employees understand the risks involved with handling data and the steps they can take to minimize those risks. Everyone in an organization should have a clear understanding of that organization’s data usage policies and how best to protect sensitive data. All it takes is one careless act to create a data nightmare.
Data protection is an ongoing, all-encompassing process that extends from the backend storage systems to the smartphones that employees carry in their pockets. Storage security is an integral part of this process and can serve as your last line of defense against intrusion. That said, storage security cannot protect against all risks, just like a firewall alone can’t eliminate all network threats. Any place where data might reside or that provides a means for its access should be considered a potential risk and dealt with accordingly.
Storage 101: Cloud Storage
The series so far:
- Storage 101: Welcome to the Wonderful World of Storage
- Storage 101: The Language of Storage
- Storage 101: Understanding the Hard-Disk Drive
- Storage 101: Understanding the NAND Flash Solid State Drive
- Storage 101: Data Center Storage Configurations
- Storage 101: Modern Storage Technologies
- Storage 101: Convergence and Composability
- Storage 101: Cloud Storage
Throughout this series, I’ve discussed a range of storage-related topics, some of which I mentioned only briefly. One of those is cloud storage, which now plays a vital role in today’s data management strategies. Organizations of all types and sizes now employ cloud storage to varying degrees, either to supplement their on-premises systems or to handle the bulk of their data. Because cloud storage has become so pervasive, this article focuses exclusively on that topic, describing what it is, how it works, and its benefits and challenges.
Cloud storage refers to a system in which data storage is offered as a service by a cloud provider, usually as part of other cloud-based services. The provider stores the data in its own data centers or leases space in third-party data centers. The data centers might be confined to one geographic area or spread out across multiple regions.
With cloud storage, the provider oversees all back-end operations that go with maintaining the storage infrastructure and delivering storage services. A team of experts manages the hosting environments, protects them and their data, and handles administrative tasks such as updating systems, implementing data protections, and ensuring availability.
For now, I’m restricting the context to public cloud storage; I cover private cloud later in this article. In the public model, customers typically pay a monthly rate based on the amount of storage space they use and on selected options such as storage type, data ingress/egress, or performance level. Customers can add or remove capacity on demand, providing them with an extremely flexible data solution.
Cloud Storage Architecture
Cloud storage is delivered through a virtualized infrastructure that logically pools physical storage resources and presents them as services that are accessible through a centralized portal for easy allocation. Customers can also interact with the storage pool through a public API that facilitates data access and management. The storage pool can span multiple servers or even multiple locations, with the data itself distributed across drives. The combination of distributed data and the operation’s inherent redundancy delivers a high degree of fault tolerance and reliability.
Figure 1 provides a conceptual overview of the cloud storage architecture. The exact implementation varies by service provider, with new technologies continuously emerging and existing ones evolving. Even so, the figure should give you a sense of how the physical storage resources are abstracted into logical storage pools that can then be consumed as part of a data management strategy.
Figure 1. Cloud storage architecture (image by Leoinspace, licensed under Creative Commons Attribution-Share Alike 3.0 Unported)
Figure 1 also shows that a provider might support up to three storage types: object, block, and file. I explained each type in this first article in this series, but here’s a summary for your convenience:
- Object storage has its origins in the cloud and was developed to address the growing influx of unstructured data. In object storage, data is broken into self-contained, modular units (objects) that include identifiers and customizable metadata, which simplifies access and supports advanced analytics. Object storage does not perform as well as block storage, especially when it comes to writing data, which is why it’s a good fit for workloads that are not latency-sensitive, such as backups and archiving.
- Block storage breaks data into chunks (blocks) that include no metadata, unless you count the block’s unique identifier. Storage area networks (SANs) use block storage because it is fast and efficient, well-suited for workloads such as email servers, relational databases, and virtual desktop infrastructures. Block storage doesn’t scale as well as object storage, yet it’s still the go-to type for latency-sensitive workloads.
- File storage has the familiar hierarchical structure you find in such applications as Windows Explorer or macOS Finder. It’s the type of storage used by your everyday applications. Both direct-attached storage (DAS) and network-attached storage (NAS) commonly use file storage. It’s easy to work with and understand; however, it has strict scalability limitations and supports little metadata.
Some cloud storage providers might offer only one or two storage types, but most of today’s major players now offer all three, with object and block storage supporting the bulk of enterprise workloads.
The Tempting World of Cloud Storage
Many organizations turn to cloud storage because of its pay-for-use, consumption-based payment structure, which enables them to move from a capital expense (CapEx) model to an operating expense (OpEx) model. Not only does this eliminate the steep up-front costs that come with on-premises storage solutions, but it also avoids having to over-provision storage resources to accommodate fluctuating workloads or anticipated increases in data volumes.
With cloud storage, customers pay only for the storage they need when they need it. There might be base fees attached to the service, but these may be trivial compared to purchasing, housing, and maintaining a system on-premises. That said, cloud services costs can quickly add up, as I’ll explain in a bit.
Related to cost is the inherent elasticity of cloud storage. Customers can scale capacity up and down as their business requirements change without having to negotiate the hurdles and expenses that come with scaling on-premises systems. This elasticity also eliminates the type of over-provisioning that comes with on-premises solutions. Capacity in the cloud is almost unlimited, as long as you can afford to pay for it.
Cloud storage also has the advantage of 24×7 global availability from just about anywhere connectivity can be established, making it possible for business users and applications with an internet connection to access the data whenever they need it. They can even use protocols such as Web Distributed Authoring and Versioning (WebDAV) to map local drives to cloud storage, providing even more flexibility.
Another reason many organizations use cloud storage is that it’s incredibly user-friendly, whether you’re an administrator, developer, tester, or end-user. The centralized portal and API facilitate quick and efficient operations that make it simple to provision new storage, change existing configurations, collaborate with other users, and work with the data. Cloud storage accelerates and streamlines most of the day-to-day operations that come with on-premises storage.
Cloud storage also includes built-in data protections that address both data loss and security. As already mentioned, the built-in redundancy and distributed data lead to a high degree of fault tolerance. Cloud providers also take steps to protect against data loss and lack of availability in the event of hardware failure, natural disaster, or human error.
In addition, providers implement safeguards to ensure the data is protected from both internal and external threats, implementing such protections as monitoring, firewalls, encryption, intrusion detection, multi-factor authentication, and physical security. Most vendors recognize that their reputations are at stake and will take whatever steps necessary to avoid risking a data breach, often exceeding the protections implemented by many organizations.
The Dark Side of Cloud Storage
Despite the protections that cloud storage providers have in place, security remains one of the top reasons that organizations are hesitant to put their data out on the cloud, in no small part because of the number of big-name breaches to make the headlines. Even under the best circumstances, storing data in the cloud increases the attack surface area, with data crossing more networks, distributed to more locations, and replicated more frequently. The larger the attack surface area, the higher the chance that the data can be compromised.
Cloud providers, especially the big players, have big bullseyes painted on them, with international cybercriminals using the most sophisticated weaponry available to try to penetrate infrastructure defenses. In addition, providers must guard against internal threats, which might come in the form of espionage, rogue employees, or careless in-house practices.
Providers must also take into account privacy concerns and applicable compliance regulations. Customers have a lot at stake when it comes to protecting personal data. Compromised data can result in lawsuits, stiff penalties, and tarnished reputations from which a company might never recover. Although cloud providers have taken steps to address privacy concerns and conform to applicable regulations, many organizations still believe the risks are too great to trust their sensitive data to the cloud.
Even if organizations are comfortable with the protections that a provider offers, they must still take into account the cost of storing data in the cloud. On the surface, the CapEx model might appear an affordable alternative to on-premises storage, but a long-range analysis that calculates the true total cost of ownership (TCO) often paints a much different picture, when taking into account ongoing subscription fees, additional charges for extra capacity and performance, data transfer fees, charges for high-speed network connections, and other factors. And unlike equipment you own, you have no assets to sell once you’re done with the service.
At the same time, cloud storage also means losing control over the data. Organizations can access, update, and move their data as necessary, but they don’t get to determine when security patches are applied or maintenance windows scheduled, nor can they control how systems are optimized or when newer technologies might be implemented. The provider regulates just about every aspect of every operation related to the platform.
In addition, organizations that access their data through the Internet are at the mercy of the ebbs and flows of the endless traffic streams. To improve connectivity, they might deploy dedicated private WANs (increasing costs). However, even if they bump up the bandwidth, applications running on-premises and getting their data from the cloud can still encounter significant latency issues. Customers might find that they need to move their applications to the same cloud platform, increasing costs and adding to the risks of vendor lock-in.
Another concern with cloud storage is whether an organization can depend on the provider to remain in the cloud storage business for the foreseeable future. The company might go bankrupt, get eaten up by a by another company, change its business strategy, experience a disaster, or disappear for any number of other reasons. Not only does this put your day-to-day operations at risk, but it might also prevent you from being able to access your data.
Private and Hybrid Cloud Storage
Although many organizations have concerns about cloud storage, they like the service-based delivery model that the cloud affords, which is why some deploy private or hybrid clouds. A private cloud is a dedicated platform that offers storage and compute resources as services, similar to a public cloud. The components that make up a private cloud infrastructure might be housed on-premises or in a colocation facility, and in either case, the organization has complete control over the components.
A private cloud offers some of the same flexibility, scalability, efficiency, and ease-of-use as a public cloud, although not nearly to the same degree, especially when it comes to flexibility and scalability. Even so, the private could be a useful solution for organizations that want more control over their storage or that have strict security and compliance requirements, which is often the case for organizations such as government agencies, financial institutions, or healthcare companies.
Despite the benefits of a private cloud, implementing and maintaining the infrastructure can be a complex and costly undertaking, and organizations must be well prepared to take on such a project. Not only does this require careful planning and budgeting, but it might also mean bringing on personnel with the necessary expertise. As an alternative, some companies go with consumption-based storage, which takes a pay-as-you-go approach to delivering managed, on-premises services.
In some cases, an organization will want to deploy a private cloud but still use public cloud services for some of its storage. One option is to maintain separate operations, with the public and private clouds treated as individual efforts. However, this can increase management complexities and contribute to data siloes.
Another approach is the hybrid cloud, in which private and public cloud storage are coordinated by an orchestration layer that integrates operations across multiple platforms. With an effective hybrid cloud solution in place, an organization can maintain strict control over its sensitive data, while maximizing the advantages of both private and public cloud platforms.
There is, of course, a lot more to private and hybrid cloud systems than what I’ve covered here, as is the case with most storage-related concepts, but this should give you a sense of some of the options available for implementing cloud storage.
Making the Most of Cloud Storage
Initially, cloud storage was seen as a vehicle for reducing CapEx for smaller organizations, allowing them to store their data on a public cloud platform, without the paying for storage they did not need. Now organizations of all sizes and types are leveraging cloud storage, taking advantage of the different deployment options to support a wide range of workloads, such as disaster recovery, file archiving, DevOps development processes, seasonal fluctuations, Internet of Things (IoT) analytics, or any number of other possible use cases.
But as I mentioned earlier, cloud storage also comes with several challenges. For example, there are security and privacy concerns as well as issues related to vendor lock-in and customer control. One concern I didn’t discuss is whether the cloud will be able to handle the anticipated increase in data volumes over the next few years. But that’s a discussion I’ll leave for another time.
Storage 101: Convergence and Composability
The series so far:
These days, discussions around storage inevitably lead to the topics of convergence and composability, approaches to infrastructure that, when done effectively, can simplify administration, improve resource utilization, and reduce operational and capital expenses. These systems currently fall into three categories: converged infrastructure, hyperconverged infrastructure (HCI), and composable infrastructure.
Each infrastructure, to a varying degree, integrates compute, storage, and network resources into a unified solution for supporting various types of applications and workloads. Although every component is essential to operations, storage lies at the heart of each one, often driving the entire architecture. In fact, you’ll sometimes see them referred to as data storage solutions because of the vital role that storage plays in supporting application workloads.
Convergence and composability grew out of the need to address the limitations of the traditional data center, in which systems are siloed and often over-provisioned, leading to higher costs, less flexibility, and more complex management. Converged, hyperconverged, and composable infrastructures all promise to address these issues, but take different approaches to getting there. If you plan to purchase one of these systems, you need to understand those differences in order to choose the most effective solution for your organization.
Before discussing each infrastructure, review this matrix which summarizes the differences.
Converged Infrastructure
A converged infrastructure consolidates compute, storage, network, and virtualization technologies into an integrated platform optimized for specific workloads, such as a database management system or virtual desktop infrastructure (VDI). Each component is prequalified, preconfigured, and assembled into a highly engineered system to provide a complete data solution that’s easier to implement and maintain than a traditional infrastructure.
The components that make up a converged infrastructure can be confined to a single rack or span multiple racks, depending on the supported workloads. Each component serves as a building block that works in conjunction with the other components to create a unified, integrated platform.
Despite this integration, each component remains a discrete resource. In this way, you can add or remove individual components as necessary, while still retaining the infrastructure’s overall functionality (within reason, of course). In addition, you can reuse a removed component for other purposes, an important distinction from HCI.
A converged infrastructure can support a combination of both hard-disk drives (HDDs) and solid-state drives (SSDs), although many solutions have moved toward all-flash storage. The storage is typically attached directly to the servers, with the physical drives pooled to create a virtual storage area network (SAN).
In addition to the hardware components, a converged infrastructure includes virtualization and management software. The platform uses the virtualization software to create resource pools made up of the compute, storage, and network components so they can be shared and managed collectively. Applications see the resources as pooled capacities, rather than individual components.
Despite the virtualization, the converged infrastructure remains a hardware-based solution, which means it doesn’t offer the agility and simplicity that come with software-defined systems such as hyperconverged and composable infrastructures. Even so, when compared to a traditional infrastructure, a converged infrastructure can help simplify management enough to make it worth serious consideration.
The components that make up a converged infrastructure are deployed as a single platform that’s accessible through a centralized interface for controlling and monitoring the various systems. In many cases, this eliminates the need to use the management interfaces available to the individual components. At the same time, you can still tune those components directly if necessary, adding to the system’s flexibility.
Another advantage to the converged infrastructure is that the components are prevalidated to work seamlessly within the platform. Not only does this help simplify procuring the components, but it also makes it easier to install them into the infrastructure, a process that typically takes only a few minutes. That said, it’s still important to validate any new storage you add to the infrastructure to ensure it delivers on its performance promises. Prevalidated components also reduce deployment risks because you’re less likely to run into surprises when you try to install them.
There are plenty of other advantages as well, such as quick deployments, reduced costs, easy scaling and compatibility with cloud computing environments. Even so, a converged infrastructure is not for everybody. Although you have flexibility within the platform’s structure, you have few options for altering the basic configuration, resulting in less flexibility should you want to implement newer application technologies.
In addition, a converged infrastructure inevitably leads to vendor lock-in. Once you’ve invested in a system, you’ve essentially eliminated products that are not part of the approved package. That said, you’d be hard-pressed to deploy any highly optimized and engineered system without experiencing some vendor lock-in.
If you choose a converged infrastructure, you have two basic deployment options. The first is to purchase or lease a dedicated appliance such as the Dell EMC VxBlock 1000. In this way, you get a fully configured system that you can deploy as soon as it arrives. The other option is to use a reference architecture (RA) such as the HP Converged Infrastructure Reference Architecture Design Guide. The RA provides hardware and configuration recommendations for how to assemble the infrastructure. In some cases, you can even use your existing hardware.
Hyperconverged Infrastructure
The HCI platform takes convergence to the next level, moving from a hardware-based model to a software-defined approach that abstracts the physical compute, storage, and (more recently) network components and presents them as shared resource pools available to the virtualized applications.
Hyperconvergence can reduce data center complexity even further than the converged infrastructure while increasing scalability and facilitating automation. HCI may add features such as data protection, data deduplication, intelligent management, and cloud bursting.
An HCI solution is typically made up of commodity compute, storage, and network components that are assembled into self-contained and highly integrated nodes. Multiple nodes are added together to form a cluster, which serves as the foundation for the HCI platform. Because storage is attached directly to each node, there is no need for a physical SAN or network area storage (NAS).
Each node runs a hypervisor, management software, and sometimes other specialized software, which work together to provide a unified platform that pools resources across the entire infrastructure. To scale out the platform, you simply add one or more nodes.
Initially, HCI targeted small-to-midsized organizations that ran specific workloads, such as VDI. Since then, HCI has expanded its reach to organizations of all sizes, while supporting a broader range of workloads, including database management systems, file and print services, email servers, and specialized solutions such as enterprise resource planning.
Like a converged infrastructure, a hyperconverged system offers a variety of benefits, beginning with centralized management. You can control all compute, storage and network resources from a single interface, as well as orchestrate and automate basic operations. In addition, an HCI solution reduces the amount of time and effort needed to manage the environment and carry out administrative tasks. It also makes it easier to implement data protections while ensuring infrastructure resilience.
An HCI solution typically comes as an appliance that is easy to deploy and scale. An organization can start small with two or three nodes (usually three) and then add nodes as needed, without a minimal amount of downtime. The HCI software automatically detects the new hardware and configures the resources. Upgrades are also easier because you’re working with a finite set of hardware/software combinations.
An HCI solution can integrate with other systems and cloud environments, although this can come with some fairly rigid constraints. Even so, the integration can make it easier to accommodate the platform in your current operations. The solution can also help reduce costs by simplifying administration and better utilization of resources while supporting the use of commodity hardware.
Despite these advantages, an HCI solution is not without its challenges. For example, vendor lock-in is difficult to avoid, as with the converged infrastructure, and HCI systems are typically preconfigured for specific workloads, which can limit their flexibility. In addition, nodes are specific to a particular HCI platform. You can’t simply pull a node out of the platform and use it for other purposes, nor can you add nodes from other HCI solutions.
HCI’s node-centric nature can also lead to over-provisioning. For example, you might need to increase storage and not computing. However, you can’t add one without the other. You must purchase a node in its entirety, resulting in more compute resources than you need. Fortunately, many HCI solutions now make it possible to scale compute and storage resources separately or have taken other steps to disaggregate resources. However, you’re still limited to a strictly-defined trajectory when scaling your systems.
You have three primary options for deploying an HCI solution. The first two are the same as a converged infrastructure. You can purchase or lease a preconfigured appliance, such as HPE SimpliVity, or you can use an RA, such as Lenovo’s Reference Architecture: Red Hat Hyperconverged Infrastructure for Virtualization. The third option is to purchase HCI software and build the platform yourself. For example, Nutanix offers its Acropolis software for deploying HCI solutions.
Composable Infrastructure
A composable infrastructure pushes beyond the limits of convergence and hyperconvergence by offering a software-defined infrastructure made up entirely of disaggregated, commodity components. The composable infrastructure abstracts compute, storage, and network resources and presents them as a set of unified services that can be allocated on-demand to accommodate fluctuating workloads. In this way, resources can be dynamically composed and recomposed as needed to support specific requirements.
You’ll sometimes see the term composable infrastructure used interchangeably with software-defined infrastructure (SDI) or infrastructure as code (IaC), implying that they are one in the same, but this can be misleading. Although a composable solution incorporates the principles of both, it would be more accurate to say that the composable infrastructure is a type of SDI that facilitates development methodologies such as IaC. In this sense, the composable infrastructure is an SDI-based expansion of IaC.
Regardless of the labeling, the important point is that a composable infrastructure provides a fluid pool of resources that can be provisioned on-demand to accommodate multiple types of workloads. A pool can consist of just a couple compute and storage nodes or be made up of multiple racks full of components. Organizations can assemble their infrastructures as needed, without the node-centric restrictions typical of HCI.
The composable infrastructure includes intelligent software for managing, provisioning, and pooling the resources. The software is programmable, self-correcting, and template-driven. The infrastructure also provides a standards-based management API for programmatically allocating (composing) resources. Administrators can use the API to control the environment, and developers can use the API to build resource requirements into their applications. The API also enables integration with third-party tools, making it possible to implement a high degree of automation.
A composable infrastructure can support applications running on bare metal, in containers, or in VMs. The infrastructure’s service-based model also makes it suitable for private or hybrid clouds and for workloads that require dynamic resource allocation, such as artificial intelligence (AI). In theory, a composable infrastructure could be made up of any commodity hardware components, and those components could span multiple locations. In reality, today’s solutions come nowhere close to achieving this level of agility, but like SDI, this remains a goal for many.
That said, today’s composable infrastructure solutions still offer many benefits, with flexibility at the top of the list. The infrastructure can automatically and quickly adopt to changing workload requirements, run applications in multiple environments (bare metal, containers, and VMs), and support multiple application types. This flexibility also goes hand-in-hand with better resource utilization. Resources can be scaled independently and provisioned based on workload requirements, without being tied to predefined blocks or nodes.
The composable platform also simplifies management and streamlines operations by providing a single infrastructure model that’s backed by a comprehensive API, which is available to both administrators and developers. IT teams can add components to the infrastructure in plug-and-play fashion, and development teams can launch applications with just a few clicks.
The composable infrastructure makes it possible to compose and decompose resources on demand, while supporting a high degree of automation and orchestration, as well as DevOps methodologies such as IaC. These benefits—along with increased flexibility and better resource utilization—can lead to lower infrastructure costs, in terms of both CapEx and OpEx.
However, the composable infrastructure is not without its challenges. As already pointed out, these systems have a long way to go to achieve the full SDI vision, leaving customers subject to the same vendor lock-in risks that come with converged and hyperconverged solutions.
In addition, the composable infrastructure represents an emerging market, with the composable software still maturing. The industry lacks agreed-upon standards or even a common definition. For example, HPE, Dell EMC, Cisco, and Liqid all offer products referred to as composable infrastructures but that are very different from one another, with each vendor putting its own spin on what is meant by composability.
No doubt the market will settle down at some point, and we’ll get a better sense of where composability is heading. In the meantime, you already have several options for deploying a composable infrastructure. You can purchase an appliance that comes ready to deploy, use a reference architecture like you would a blueprint, or buy composable software and build your own. Just be sure you do your homework before making any decisions so you know exactly what you’re getting for your money.
Converged to Composable and Beyond
Choosing a converged or composable infrastructure is not an all-or-nothing prospect. You can mix-and-match across your organization as best meets your requirements. For example, you might implement HCI systems in your satellite offices but continue to use a traditional infrastructure in your data center. In this way, you can easily manage the HCI platforms remotely and reduce the time administrators need to spend at those sites, while minimizing the amount of space being used at those locations.
Data and storage will play a pivotal role in any infrastructure decisions. To this end, you must take into account multiple considerations, including application performance, data protection, data quantities, how long data will be stored, expected data growth, and any other factors that can impact storage. You should also consider integration with other systems, which can range from monitoring and development tools to hybrid and public cloud platforms.
Of course, storage is only part of the equation when it comes to planning and choosing IT infrastructure. You must also consider such issues as deployment, management, scalability, resource consolidation, and the physical environment. But storage will drive many of your decisions, and the better you understand your data requirements, the more effectively you can choose an infrastructure that meets your specific needs.
Storage 101: Data Center Storage Configurations

The series so far:
Today’s IT teams are turning to a variety of technologies to provide storage for their users and applications. Not only does this include the wide range of hard disk drives (HDDs) and solid-state drives (SSDs), but also technologies such as cloud storage, software-defined storage, and even converged, hyperconverged, or composable infrastructures.
Despite the various options, many data centers continue to rely on three traditional configurations: direct-attached storage (DAS), network-attached storage (NAS), and the storage area network (SAN). Each approach offers both advantages and disadvantages, but it’s not always clear when to use one over the other or the role they might play in more modern technologies such as cloud storage or hyperconverged infrastructures (HCIs).
This article explores the three configurations in order to provide you with a better sense of how they work and when to use them. Later in this series, I’ll cover the more modern technologies so you have complete picture of the available options and what storage strategies might be best for your IT infrastructure. Keep in mind, however, that even with these technologies, DAS, NAS, and SAN will likely still play a vital role in the modern data center.
Direct-Attached Storage
As the name suggests, DAS is a storage configuration in which HDDs or SSDs are attached directly to a computer, rather than connecting via a network such as Ethernet, Fibre Channel, or InfiniBand. DAS typically refers to HDDs or SSDs. Other storage types, such as optical or tape drives, can theoretically be considered DAS if they connect directly to the computer, but references to DAS nearly always refer to HDDs or SSDs, including those in this article.
DAS can connect to a computer internally or externally. External DAS can be a single drive or part of an array or RAID configuration. Whether internal or external, the DAS device is dedicated to and controlled by the host computer.
A computer’s DAS drive can be shared so that other systems can access the drive across the network. Even in this case, however, the computer connected to the drive still controls that drive. Other systems cannot connect to the drive directly but must communicate with the host computer to access the stored data.
DAS connects to a computer via an interface such as Serial-Attached SCSI (SAS), Serial Advanced Technology Attachment (SATA), Small Computer System Interface (SCSI), or Peripheral Component Internet Express (PCIe). Along with other storage technologies, the interface can have a significant impact on drive performance and is an important consideration when choosing a DAS drive. (See the first article in this series for information about interfaces and related storage technologies.)
Some IT teams turn to DAS because it typically provides better performance than networked storage solutions such as NAS and SAN. When using DAS, the host server does not need to contend with potential network bottlenecks such as sluggish network speed or network congestion, and the data is by definition in close proximity to the server. Other systems that connect to the host might run into network issues, but the host itself—and the applications that run on it—have unencumbered access to data hosted on DAS.
DAS is also cheaper and easier to implement and maintain than networked systems such as NAS or SAN. A DAS device can often be implemented through a simple plug-and-play operation, with little administrative overhead. Because DAS storage includes a minimal number of components, other than the SSD or HDD itself, the price tag tends to be much lower than the networked alternatives.
DAS is not without its downsides, however. Because a server can support only a relatively small number of expansion slots or external ports, DAS has limited scalability. In addition, limitations in the server’s compute resources can also impact performance when sharing a drive, as can the data center’s network if contention issues arise. DAS also lacks the type of advanced management and backup features provided by other systems.
Despite these disadvantages, DAS can still play a vital role in some circumstances. For example, high-performing applications or virtualized environments can benefit from DAS because it’s generally the highest performance option, and DAS eliminates potential network bottlenecks. In addition, small-to-medium sized businesses—or departments within larger organizations—might turn to DAS because it’s relatively simple to implement and manage and costs less.
DAS can also be used in hyperscale systems such as Apache Hadoop or Apache Kafka to support large, data-intensive workloads that can be scaled out across a network of distributed computers. More recently, DAS has been gaining traction in HCI appliances, which are made up of multiple server nodes that include both compute and storage resources. The usable storage in each node is combined into a logical storage pool for supporting demanding workloads such as virtual desktop infrastructures (VDIs).
Network-Attached Storage
NAS is a file-level storage device that enables multiple users and applications to access data from a centralized system via the network. With NAS, users have a single access point that is scalable, relatively easy to set up, and cheaper than options such as SAN. NAS also includes built-in fault tolerance, management capabilities, and security protections, and it can support features such as replication and data deduplication.
A NAS device is an independent node on the local area network (LAN) with its own IP address. It is essentially a server that contains multiple HDDs or SSDs, along with processor and memory resources. The device typically runs a lightweight operating system (OS) that manages data storage and file sharing, although in some cases it might run a full OS such as Windows Server or Linux.
Users and applications connect to a NAS device over a TCP/IP network. To facilitate data transport, NAS also employs a file transfer protocol. Some of the more common protocols are Network File System (NFS), Common Internet File System (CIFS), and Server Message Block (SMB). However, a NAS device might also support Internetwork Packet Exchange (IPX), NetBIOS Extended User Interface (NetBEUI), Apple Filing Protocol (AFP), Gigabit Ethernet (GigE), or one of several others. Most NAS devices support multiple protocols.
NAS devices are generally easy to deploy and operate and relatively inexpensive when compared to SANs. In addition, users and applications running on the same network can easily access their files, without the limitations they might encounter if retrieving data from DAS. NAS devices can also be scaled out or integrated with cloud services. In addition, they provide built-in redundancy while offering a great deal of flexibility.
That said, a NAS device must compete with other traffic on the network, so contention can be an issue, especially if network bandwidth is limited. It should be noted, however, that NAS is often configured on private networks, which can help mitigate contention issues. However, too many users can impact storage performance, not only on the network, but also in the NAS device itself. Many NAS devices use HDDs, rather than SSDs, increasing the risk of I/O contention as more users try to access storage.
Because of the network and concurrency issues, NAS is often best suited for small-to-medium sized businesses or small departments within larger organizations. NAS might be used for distributing email, collaborating on spreadsheets, or streaming media files. NAS can also be used for network printing, private clouds, disaster recovery, backups, file archives, or any other use cases that can work within NAS’s limitations, without overwhelming the network or file system.
When deciding whether to implement a NAS device, you should consider the number of users, types of applications, available network bandwidth, and any other factors specific to your environment. DAS might be the optimal choice because it’s typically more performant, cheaper, and easier to set up than NAS. On the other hand, you might consider looking to a SAN for increased scalability and additional management features.
Storage Area Network
A SAN is a dedicated, high-speed network that interconnects one or more storage systems and presents them as a pool of block-level storage resources. In addition to the storage arrays themselves, a SAN includes multiple application servers for managing data access, storage management software that runs on those servers, host bus adapters (HBAs) to connect to the dedicated network, and the physical components that make up that network’s infrastructure, which include high-speed cabling and special switches for routing traffic.
SAN storage arrays can be made up of HDDs or SSDs or a combination of both in hybrid configurations. A SAN might also include one or more tape drives or optical drives. The management software consolidates the different storage devices into a unified resource pool, which enables each server to access the devices as though they were directly connected to that server. Each server also interfaces with the main LAN so client systems and applications can access the storage.
There is a widespread myth that SANs are high-performing systems, but historically this has rarely been true. In fact, slow-performing SANs are ubiquitous across data centers and are first and foremost optimized for data management, not performance. However, now that SSDs are becoming more common, hybrid or all-flash SANs are bringing performance to the forefront.
Integral to an effective SAN solution is a reliable, high-performing network capable of meeting workload demands. For this reason, many modern SANs are based on Fibre Channel, a technology for building network topologies that can deliver high bandwidth and exceptional throughput, with speeds up to 128 gigabits (16 GB) per second. Unfortunately, Fibre Channel is also known for being complex and pricey, causing some organizations to turn to alternatives such as Internet SCSI (iSCSI), Fibre Channel over Ethernet (FCoE), or even NVMe over Fabrics (NVMe-oF).
With the right network topology and internal configuration in place, a SAN can deliver a block-level storage solution that offers high availability and scalability, possibly even high performance. A SAN includes centralized management, failover protection, and disaster recovery, and it can improve storage resource utilization. Because a SAN runs on a dedicated network, the LAN doesn’t have to absorb the SAN-related traffic, eliminating potential contention.
However, a SAN is a complex environment that can be difficult to deploy and maintain, often requiring professionals with specialized skill sets. This alone is enough to drive up costs, but the SAN components themselves can also be pricey. An IT team might try to reduce costs by cutting back in such areas as Fibre Channel or licensed management capabilities, but the result could be lower performance or more application maintenance.
For many organizations—typically larger enterprises—the costs and complexities are worth the investment, especially when dealing with numerous or massive datasets and applications that support a large number of users. SANs can benefit use cases such as email programs, media libraries, database management systems, or distributed applications that require centralized storage and management.
Organizations looking for networked storage solutions often weigh SAN against NAS, taking into account complexity, reliability, performance, management features, and overall cost. NAS is certainly cheaper and easier to deploy and maintain, but it’s not nearly as scalable or fast. For example, a NAS uses file storage, and a SAN uses block storage, which incurs less overhead, although it’s not as easy to work with. Your individual circumstances will determine which storage system is the best fit. (For information about the differences between block and file storage, refer to the first article in this series).
Moving Ahead with DAS, NAS, and SAN
Like any storage technology, SANs are undergoing a transition. For example, vendors now offer something called unified SAN, which can support both block-level and file-level storage in a single solution. Other technologies are also emerging for bridging the gap between NAS and SAN. One example is VMware vSphere, which makes it possible to use NAS and SAN storage in the same cluster as vSAN, VMware’s virtual SAN technology. Another approach to storage is the converged SAN, which implements the SAN environment on the same network used for other traffic, thus eliminating the network redundancy that comes with a more conventional SAN.
For many organizations, traditional DAS, NAS, and SAN solutions properly sized and configured will handle their workloads with ease. If that’s insufficient, they might consider newer technologies that enhance these core configurations, such as converged or hyperconverged infrastructures. Today’s organization can also take advantage of such technologies as cloud storage, object storage, or software-defined storage, as well as the various forms of intelligent storage that are taking hold of the enterprise.
There are, in fact, no shortage of storage options, and those options grow more sophisticated and diverse every day, as technologies continue to evolve and mature in an effort to meet the needs of today’s dynamic and data-intensive workloads.
Storage 101: Understanding the NAND Flash Solid State Drive
The series so far:
Solid-state drive (SSDs) have made significant inroads into enterprise data centers in recent years, supporting workloads once the exclusive domain of hard-disk drive (HDDs). SSDs are faster, smaller, use less energy, and include no moving parts. They’ve also been dropping in price while supporting greater densities, making them suitable for a wide range of applications.
Despite their increasing presence in the enterprise, there’s still a fair amount of confusion about how SSDs work and the features that distinguish one drive from another. Concepts such as NAND chips, multi-level cells, and floating gate technologies can be somewhat daunting if you’re still thinking in terms of rotating disks and moving actuator arms, components that have no place in the SSD.
The better you understand how SSDs operate, the more effectively you can select, deploy, and manage them in you organization. To help with that process, this article introduces you to several important SSD concepts so you have a clearer picture of the components that go into an SSD and how they work together to provide reliable nonvolatile storage.
Bear in mind, however, that an SSD is a complex piece of technology and can easily justify much more in-depth coverage than what a single article can offer. You should think of this as an introduction not a complete treatise, a starting point for building a foundation in understanding the inner workings of your SSDs.
Introducing the NAND Flash SSD
Like an HDD, an SSD is a nonvolatile storage device that store data whether or not it is connected to power. An HDD, however, uses magnetic media to store its data, whereas the SSD uses integrated electronic circuitry to retain specific charge states, which in turn map to the data bit patterns.
SSDs are based on flash memory technologies that enable data to be written, read, and erased multiple times. Flash memory comes in two varieties: NOR and NAND. Although each offers advantages and disadvantages (a discussion beyond the scope of this article), NAND has emerged as the favored technology because it delivers faster erase and write times. Most contemporary SSDs are based on NAND flash, which is why it’s the focus of this article.
An enterprise SSD contains multiple NAND flash chips for storing data. Each chip contains one or more dies, and each die contains one or more planes. A plane is divided into blocks, and a block is divided into pages.
Of these, the blocks and pages are the greatest concern, not because you configure or manipulate them directly, but because of how data is written, read, and erased on a NAND chip. Data is read and written at the page level, but erased at the block level, as illustrated in Figure 1.
Figure 1. Writing and erasing data in a NAND flash SSD (image by Dmitry Nosachev, licensed under Creative Commons Attribution-Share Alike 4.0 International)
In this case, each page is 4 kibibytes (KiB) and each block is 256 KiB, which equals 64 pages per block. (A kibibyte is 1024 bytes. Kibibytes are sometimes used instead of kilobytes because they’re more precise. A kilobyte can equal 1000 bytes or 1024 bytes, depending on its usage.) Each time the SSD reads or writes data, it does so in 4-KiB chunks, but each time the drive erases data, it carries out a 256-KiB operation. This write/erase difference has serious consequences when updating data, as you’ll see later in the article.
Inside the NAND Cell
A page is made up of multiple cells that each hold one or more data bits. A data bit is represented by an electrical charge state, which is determined by the electrons trapped between insulator layers within the cell. Each bit is registered as either charged (0) or not charged (1), providing the binary formula needed to represent the data.
Today’s NAND flash chips use either floating gate cells or charge trap cells. Until recently most NAND flash relied on floating gate technologies, in which the electrons are trapped between two oxide layers in a region called the floating gate. The bottom oxide layer is thin enough for electrons to pass through when voltage is applied to the underlying substrate. Electrons move into the floating gate during a write operation and out of the floating gate during an erase operation.
The challenge with the floating gate approach is that each time voltage is applied and electrons pass through the oxide layer, the layer degrades slightly. The more write and erase operations, the greater the degradation, until eventually the cell might no longer be viable.
Bear in mind, however, that SSD technologies have come a long way, making them more reliable durable, while being able to deliver greater performance and store more data. At the same time, they keep coming down in price, making them much more competitive in price.
Vendors continue to explore new technologies to continue to improve SSDs. For example, several vendors are now turning to charge trap technologies in their NAND cells. Charge trap cells are similar to floating gate cells except that they use different insulator materials and methodologies to trap the electrons, resulting in cells that are less susceptible to wear. That said, charge trap technologies come with their own reliability issues, so neither approach is ideal.
There is, of course, much more to floating gate and charge gate technologies, but this should give you some idea of what’s going on, in the event you come across these terms. But know too that gate technologies are only part of the equation when it comes to understanding the NAND cell structure.
In fact, the bigger concern when evaluating SSDs is the number of bits stored in each cell. Today’s SSDs accept between one and four bits per cell, with a correlated number of charge states per cell, as shown in the following table. Note that vendors are also working on five-bit cell flash—dubbed penta-level cell (PLC)—but the jury is still out on this technology.
Cell type |
# of data bits |
# of charge states |
Possible binary values per cell |
Single-level cell (SLC) |
1 |
2 |
0, 1 |
Multi-level cell (MLC) |
2 |
4 |
00, 01, 10, 11 |
Triple-level cell (TLC) |
3 |
8 |
000, 001, 010, 011, 100, 101, 110, 111 |
Quad-level cell (QLC) |
4 |
16 |
0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111 |
As the table shows, the more bits per cell, the greater the number of available charge states per cell, and the more charge states, the greater the number of available binary values, which translates to greater density. Not only does this mean packing in more data on each chip, but it also means more affordable drives.
Unfortunately, when you start squeezing more bits into each cell, performance suffers and the cells wear out faster. A QLC drive might hold more data and be a lot cheaper, but an SLC drive will deliver the best performance and last the longest, although at a greater price.
In addition to squeezing in more bits per cell, vendors have also been shrinking cells to fit more of them on each chip. Although this increases data density, this can lead to electrical charges leaking from one cell to another, so additional techniques must be leveraged to avoid data corruption and preserve data integrity. But vendors have an answer for this as well: 3-D NAND.
In the 3-D approach, vendors back off from shrinking cells and instead stack cells on top of each other in layers, creating chips that can hold significantly more data. When combined with multi-bit technologies such as MLC or TLC, 3D NAND makes it possible to increase chip densities beyond anything before possible, without sacrificing data integrity. For example, Samsung’s sixth-generation V-NAND chip combines 3-D and TLC technologies to store up to 256 Gb of data across 136 layers of cells. (For more information about performance-related concepts, refer to the second article in this series.)
The SSD Components
NAND chips are at the heart of the SSD, carrying out the drive’s main function of storing data. But an SSD also includes several other important components which work together to facilitate the read, write, and erase operations.
Figure 2 shows an HGST Ultrastar SSD that holds 1.6 TB of data. Although the NAND chips are covered by a label, you can see that the circuit board is filled with a variety of other components.
Figure 2. HGST Ultrastar SN150 Series NVMe/PCIe solid-state drive (photo by Dmitry Nosachev, licensed under Creative Commons Attribution-Share Alike 4.0 International)
To the right of the connector pins, the device hosts five Micron DRAM chips, neatly arrayed from bottom to top. The chips serve as a cache for improving write operations and maintaining system data. Unlike the NAND chips, the cache is volatile (non-persistent) and used only as a temporary buffer. In other words, although the buffered data won’t survive a loss of power, the drive will deliver better performance when it’s running.
The HGST drive has a 2.5-inch form factor and provides a Peripheral Component Internet Express (PCIe) interface. It also supports the Non-Volatile Memory Express (NVMe) protocol for maximizing the benefits of the PCIe interface. (For more information about form factors, interfaces, and protocols, refer to the first article in this series.)
You can see the PCIe interface connector in Figure 1, jutting out from the side of the circuit board. You can also see it in Figure 2, which shows the front side of the HGST drive, covered mostly by the heat sink.
Figure 3. HGST Ultrastar SN150 Series NVMe/PCIe solid-state drive (photo by Dmitry Nosachev, licensed under Creative Commons Attribution-Share Alike 4.0 International)
SSD configurations can vary considerably from one to the next, so don’t assume that others will look like the HGST drive. I picked this one because it provides a good example of a NAND flash SSD.
Despite the differences between SSDs, they all include NAND chips, conform to specific form factors and interface standards, and typically provide some type of cache to serve as a memory buffer. (All enterprise storage devices—HDD and SSD—provide built-in volatile cache.) An SSD also includes a controller for managing drive operations and firmware for providing the controller with the instruction sets necessary to carry out those operations.
Reading and Writing Data
As noted earlier, reading and writing data occur at the page level. Reading data is a fairly straightforward operation. When the drive receives a request for data, the controller locates the correct cells, determines the charge states, and ensures that the data is properly returned, using buffer memory as necessary. The entire process has little long-term impact on the drive itself.
Writing data is a programming operation that sets the data bits to the desired charge state, a process orchestrated by the controller. – Writing data to a page for the first time is nearly as straightforward as reading data. The process grows more complex when modifying that data, which requires that it first be erased and then rewritten, a process commonly referred to as a program/erase cycle (P/E cycle).
During a typical P/E cycle, the entire block containing the targeted pages is written to memory. The block is then marked for deletion and the updated data rewritten to another block. The actual erase operation occurs asynchronously in order to optimize performance.
The controller coordinates the erase and write processes, using advanced data management algorithms. Even if only a single change on a single page needs to be recorded, an entire P/E cycle is launched. The block is marked for deletion and all its data rewritten.
The controller erases the block when it’s needed or as part of an optimization process. When erasing the block, the controller sets every bit in every cell to 1. After that, data can be written to any page in the block. However, if any bits in a page are set to 0—even if only one—the entire page is off-limits to writing data.
As an SSD starts filling up, the writing and rewriting operations become more complex and start to slow down. The controller must find places to store the data, which can involve erasing blocks marked for deletion, moving and consolidating data, or performing multiple P/E cycles. The fuller the drive, the more extensive these operations, which is why performance can start to degrade as a drive reaches capacity.
Because of the many P/E cycles, more data is routinely written to the drive than the amount being modified, a characteristic commonly called write amplification. For example, updating a simple 25-KB text file might result in 250 KB of data being written, causing additional wear on the cells.
A flash SSD can support only a limited number of P/E cycles before it fails. The more bits squeezed into each cell, the fewer that number and the faster the time to failure. For example, an MLC drive might support up to 6,000 P/E cycles per block, but a TLC drive might max out at 3,000.
As P/E cycles start adding up, cells start failing. For this reason, SSDs employ several strategies to extend a drive’s lifespan, assure reliability, and maintain data integrity, including:
- Wear leveling: A controller-based operation for distributing P/E cycles evenly across the NAND chips to prevent any cells from premature failure.
- TRIM command: An operating system command for consolidating a drive’s free space and erasing blocks marked for deletion, which can improve performance and minimize write application.
- Over-provisioning: Extra drive space reserved for management processes such as wear leveling and for reducing the extra write amplification that occurs when a drive gets too full.
- Caching: A process of storing data in memory to boost performance and, when used effectively, minimize P/E cycles.
- Error-correction code (ECC): A process for checking data for errors and then, if necessary, correcting those errors.
An SSD might also incorporate strategies for improving performance. For example, flash drives implement garbage collection, a background process for moving, consolidating, and erasing data. There’s some debate about whether garbage collection adds write amplification or reduces it. It depends on how the garbage collection operations are implemented and the quality of the algorithms used to carry out these operations.
SSD firmware updates might also address performance, reliability, and integrity issues, along with other types of issues. Whenever you install a new SSD, one of the first steps you should take is to ensure that you’re running the latest firmware. These are not necessarily the only tactics that a drive will employ, but they represent some of the most common.
Much More to the SSD
The information covered here should give you a sense of how NAND flash SSDs work. At the same time, you no doubt also realize that SSDs are extremely complex devices and that what I’ve touched upon barely scratches the surface. Even so, you should now have a more solid foundation for moving forward.
Keep in mind, however, that memory technologies are quickly evolving, with some redefining how we think of memory and storage. For example, an Intel Optane drive can store data like NAND but operate nearly as fast as DRAM, bridging the gap between traditional storage and traditional memory. Whether Optane or a similar technology will replace flash is yet to be seen, but no doubt something will, at which point we’ll be having a very different discussion.
Storage 101: The Language of Storage

The series so far:
Storage drives come in many shapes and sizes, and it can be difficult to distinguish one from the other beyond their appearances because vendor-supplied information is sometimes confusing and obscure. Although the material has gotten better over the years, it can still be unclear. Yet understanding this information is essential to knowing how well a drive will perform, how much data it will hold, its expected lifespan, and other important features.
In the first article in this series, I introduced you to a number of storage-related concepts, all of which can play an important role in determining what each drive offers and how they differ. For example, a solid-state drive (SSD) that uses the Peripheral Component Interconnect Express (PCIe) interface will typically perform better than one that uses the Serial Advanced Technology Attachment (SATA) interface, and a SATA-based SSD will likely perform better than a SATA-based hard-disk-drive (HDD).
But the interface and drive type are only part of the equation when it comes to choosing storage media. You must also take into account latencies, input/output operations per second (IOPS), throughput, effective and usable capacities, data transfer size or I/O size, endurance, and other factors. Unfortunately, it’s no small matter trying to make sense of all these variables, especially with the inconsistencies among storage vendors in how they present information about their products.
In this article, I dig into concepts commonly used to describe storage media to help make sense of the information you’ll encounter when evaluating HDDs or SSDs for your organization. Many of the concepts are specific to performance, but I also discuss issues related to capacity and lifespan, particularly in how they differ between HDDs and SSDs.
Making Sense of Performance Metrics
When architecting storage solutions to meet enterprise requirements such as performance, you should identify the workloads that the devices must support. To this end, you must understand data access patterns such as read operations versus writes, random operations versus sequential, and block transfer size.
In this regard, storage operations can be divided into four types: random reads, random writes, sequential reads, and sequential writes. In some cases, these operations can be further divided by the block transfer size (small versus large), which depends on the application. Many workloads use a mix of two or more of these types, although they might favor one over the others. Common data access patterns include:
- Random read/write, small block: A wide variety of applications such as transactional business applications and associated databases.
- Sequential write, large block: Loading media, loading data warehouse.
- Sequential read, large block: Reading media, data warehouse aggregations and reporting.
- Sequential write, small block: Database log writes.
- Mixed: Any combination of the above. Note that, when multiple sequential workloads are active concurrently, the workload becomes randomized.
When shopping for storage, you’ll encounter an assortment of metrics that describe how well a drive is expected to perform. Understanding these metrics is essential to ensuring that you’re using the right drives to support your specific workloads.
Latency refers to a drive’s response time, that is, how long it takes for an I/O operation to complete. From an application’s perspective, latency is the time between issuing a request and receiving a response. From a user perspective, latency is the only metric that matters.
Vendors list latency in milliseconds (ms) or microseconds (µs). The lower the number, the shorter the wait times. However, the rate can quickly increase as individual I/O requests start piling up, the I/O sizes increase (which typically range between 4 KB to 512 KB), or the nature of the workload changes, such as changing from read-only to read/write or from sequential to random. For example, a drive’s latency might be listed as 20ms, but if the drive is supporting concurrent read operations, I/O requests could end up in a long queue, causing a dramatic increase in latency.
Latency is nearly always a useful metric when shopping for drives and should be considered in conjunction with IOPS. For example, a storage solution that provides 185 IOPS with an average latency of 5ms might deliver better application performance than a drive that offers 440 IOPS but with 30ms latency. It all depends on the workloads that the drives will need to support.
Another common metric is IOPS, which indicates the maximum number of I/O operations per second that the drive is expected to support. An I/O operation is the transfer of data to or from the drive. The higher the number of supported IOPS, the better the performance—at least according to conventional wisdom. In truth, IOPS tells only part of the story and should be considered along with other important factors, such as latency and I/O size. IOPs is most relevant for random data access patterns (and far less important for sequential workloads).
Another important metric is throughput, which measures the amount of data that can be written to or read from a storage drive within a given timeframe. Some resources may refer instead to data transfer rate or simply transfer rate, sometimes according to drive type. For example, you might see transfer rate used more often with HDDs and throughput associated with SSDs. Like other performance metrics, throughput is dictated by the nature of the workload. Throughput is most relevant for sequential data access patterns (and far less relevant for random workloads).
The distinction between sequential and random is particularly important for HDDs because of how data is stored on the platters, although it can also play a role in SSD performance. In fact, there are other differences between the two drive types that you need to understand when evaluating storage. Discussions of device capacity and endurance also follow.
What Sets HDDs Apart
Data is written to an HDD in blocks that are stored sequentially or scattered randomly across a platter. Enterprise drives contain multiple platters with coordinated actuator arms and read/write heads that move across the platters (a topic I’ll be covering more in-depth later in the series).
Whenever an application tries to access the data, the platter’s actuator arm must move the head to the correct track and the platter must be rotated to the correct sector. The time required to do so is referred to as the seek time. The time it takes for the platter to rotate to the correct sector is referred to as the rotational latency.
The duration of an I/O operation depends on the location of the head and platter prior to the request. When the data blocks are saved sequentially on the disk, an application can read and write data in relatively short order, reducing seek times and rotational latencies to practically nothing. If the blocks are strewn randomly across the platters, every operation requires the actuator to move the head to a different area on the platter, resulting in rotational latency and seek time, and therefore slower performance.
Because of these differences, it is essential to evaluate HDDs in terms of the workloads you plan to support, taking into account such factors as file size, concurrency, and data access patterns (random vs. sequential, read vs. write, big block vs. small block, and mixed). By taking your workloads into account, you’ll have a better sense of how to select the right drives for the workloads demanded by your organization’s applications.
For example, Dell offers a 12-TB SATA drive that supports up to 118 IOPS for random reads and 148 IOPS for random operations that include 70% reads and 30% writes at a given latency. Whereas the same drive offers a throughput of 228 MB/s for sequential read and write operations. From these metrics, you can start getting a sense of whether the drive can meet the needs of your anticipated workloads.
Suppose you’re looking for a storage solution to support a set of read-intensive web applications whose storage access pattern is primarily random reads, as opposed to sequential reads. You would want to compare the Dell drive against other drives to determine which one might offer the best IOPS, with less emphasis placed on the other types of access patterns.
One characteristic driving HDD performance is revolutions per minute (RPM), that is, the number of times the drive’s platters rotates within a minute. The higher the number of RPMs, the faster the data can be accessed, leading to lower latency rates and higher performance. Enterprise-class HDDs typically support 10,000 or 15,000 RPMs, often written as simply 10K or 15K RPMs.
You must also take into account a drive’s available capacity, keeping in mind that you never load an HDD anything close to its physical capacity.
A drive’s anticipated lifespan is indicated by the mean time between failures (MTBF) rating, the number of operating hours expected before failure. For example, Dell offers several 14-TB and 16-TB drives with MTBF ratings of 2,500,000 hours, which comes to over 285 years. Such ratings are common, yet in reality drives fail far more frequently than high MTBF suggests. MTBF is only a small part of the reliability equation. Enterprise solutions demand the identification and elimination of single points of failure and redundancies across components, starting at the drive level.
By looking at the various metrics, you have a foundation to begin comparing drives. Historically, marketing considerations—trying to present their drives in the best light—resulted in presenting performance specs in a way that made comparisons across vendors challenging. Today’s specifications are generally more consistent, at least enough to make reasonable apples-to-apples comparisons. Some vendors also provide insights beyond the basics, for example, providing performance metrics featuring a mix of workloads, such as 70% reads and 30% writes at a given latency.
What Sets SSDs Apart
Consumer and enterprise SSDs are based on NAND flash technologies, a type of nonvolatile memory in which data is stored by programming integrated circuit chips, rather than manipulating magnetic properties, as with an HDD. Also unlike the HDD, the SSD has no moving parts, which makes reading and writing data much faster operations.
If you’re relatively new to enterprise SSDs, the terminology that surrounds these drives can be confusing. Yet the fundamental performance considerations are exactly the same: latency, IOPs, throughput, capacity, and endurance .
As with an HDD, capacity in an SSD refers to the amount of data it can hold. With SSDs, however, vendors often toss around multiple terms related to capacity and do so inconsistently, so it’s not always clear what each one means. For example, some vendors list a drive’s total capacity as raw capacity or just capacity, both of which refer to the same thing—the maximum amount of raw data that the drive can hold.
Not all of the drive’s raw capacity is available for storing data. The drive must be able to accommodate the system overhead required to support various internal SSD operations. For this reason, the amount of available capacity is always less than the amount of raw capacity. Vendors often refer to available capacity as usable capacity.
Conceptually, raw capacity and usable capacity are similar between HDDs and SSDs. For example, when calculating capacities, you should take into account that some of that space must be available for system data, as well as for recovery configurations such as RAID.
Another characteristic that sets the SSD apart is the way in which bits are stored in the flash memory cells. Today’s SSDs can store up to four bits per cell, with talk of five bits in the wings. Vendors often reference the bit-level in the drive’s specifications. For example, a drive that supports three bits per cell is referred to as triple-level cell (TLC) flash, and a drive that supports four bits per cell is referred to as quad-level cell (QLC) flash.
In addition to squeezing more bits into a cell, vendors are also trying to get more cells onto a NAND chip. The more bits the chip can support, the greater the data density.
As with HDDs, SSD endurance refers to the drive’s longevity or lifespan; however, it’s measured differently. SSD drive endurance is based on the number of program/erase cycles (P/E cycles) it will support. NAND cells can tolerate only a limited number of P/E cycles. The higher that number, the greater the endurance. A drive’s endurance is measured by its write operations because read operations have minimal impact on an SSD. Whereas the HDD endurance metric is MTBF, vendors report an SSD’s endurance by providing the number of drive writes per day (DWPD), terabytes written (TBW), or both. The DWPD metric refers to the number of times you can completely overwrite the drive’s capacity each day during its warranted lifespan. The TBW metric indicates the total number of write operations that the drive will support over its lifetime. The DWPD and TBW are quite useful for comparing drives.
As vendors squeeze more bits into each cell and more cells into each chip, SSD vendors incorporate sophisticated technologies to mitigate the challenges of maintaining data integrity concomitant with higher data densities. For example, all SSDs employ wear leveling, over-provisioning, garbage collection, and error correction code (ECC) to extend the drive’s life, all of which I’ll be covering in more detail later in the series.
Moving toward a better understanding of storage
You should use the vendors’ published information only as a starting point for evaluating drives and identifying candidates. Consider additional research such as community reviews, benchmarks, or other resources to give you a realistic understanding of a drive’s capabilities. Your goal is to get as complete a picture as possible before investing in and testing solutions.
In addition, when comparing storage solutions, be sure to take into account the effective capacity, which is the amount of storage available to a drive after applying data-reduction technologies such as deduplication and compression. Such data-reduction strategies make it possible for the drive to store much more data. For example, IBM offers a flash-based storage system that provides 36.9 TB of raw capacity but supports up to 110 TB of effective capacity
Clearly, you need to understand a wide range of concepts to determine what types of storage will best support your workloads. Not only must you choose between HDDs and SSDs, but you must also select from within these categories, while taking into account such factors as latency, IOPs, throughput, densities, and numerous others considerations.
In the articles to follow, I’ll be digging into HDDs and SSDs more deeply so you can better understand their architectures and how they differ in terms of capacity, performance, and endurance. With this foundation, you’ll be better equipped to determine what storage solutions you might need for your organization, based on your applications and workloads.
What can I do to Reduce my Dynamics 365 Storage Costs?
Sooner or later, all organizations reach the free storage limit of Dynamics 365. In this post, we will examine possible ways of dealing with the issue of running out of storage in Dynamics 365 in the cloud. We will take into account the cost and how easily we can implement it.
A common issue with D365 storage space
Here is one of the requests Dynamics 365 managers often get:
One of our clients is having a problem with increasing database size very quickly. The huge database size is also becoming a performance issue. Besides, the client is looking for decreasing the storage costs. Current database size is 530GB. We have checked the free add-on from Microsoft, but we are reluctant about bringing it in our org full scale. Besides, looking for more functionality like extracting old and new documents.
Here’s a kicker: we have sensitive documents so we would prefer them not to run through any external service.
We also cannot employ anything outside the constraints of what we currently have in our Azure tenancy, that being: Dynamics 365 with the option of uploading custom plugins and/or custom workflow tasks or currently paid-for PaaS facilities, notably Flow. This eliminates the option of having a console-based application, or even a web app.
This request from a D365 admin reflects a typical constraint of the cloud CRM: the 10GB storage provided as part of a first subscription (Base license) of Customer Engagement or Finance, Supply Chain Management, Retail and Talent applications runs out quickly. When you are out of storage, you have only two options: pay for extra storage or look for ways to free up space.
Try reducing the storage needed in D365
When monitoring D365 health to ensure the system’s optimal performance, you should also keep an eye on whether or not there is enough space for growth.
When you reach 80% of D365 total storage capacity, the system should notify you so that you can take action.
Here is what you can do to reduce used up storage space:
- Delete old records, such as cases and opportunities
- Remove unnecessary attachments (from emails and notes) through advanced find and bulk deletion
- Delete audit logs
- Remove suspended workflows you won’t use again
- Delete bulk import instances
- Remove bulk duplicate detection jobs and bulk deletion job instances (strangely enough, those take-up space)
Are you still lacking space after doing all this? The problem most likely lies in attachments and documents. According to several surveys, documents and attachments take up 70% of storage space in Dynamics 365 on average. If your organization tracks emails in the CRM system as most companies do, free storage shrinks quickly – and it is not reversible with traditional measures.
Leveraging document management systems
Dynamics 365, just like other cloud CRM systems, was primarily designed to manage customer relations and not to store documents. For that reason, the best-proven practice for avoiding extra costs with document storage is moving them to other systems. These have cheaper storage and some times extra features.
Among the most popular are Document Management Systems (DMS) like
SharePoint, in particular, besides offering cheaper storage, offers organizations immense collaborating opportunities.
How to synchronize Dynamics 365 with Azure Blob or SharePoint
Before a D365 user can send attachments and other documents and to another system, the two systems need to be synchronized. Of course, manually extracting the documents is also a possibility in theory, but it is not feasible in practice because it takes up too much time.
Currently, there are a couple of solutions on the market for extracting attachments from Dynamics 365. Let’s have a look.
The first one comes from Microsoft Labs.
Nonetheless, D365 experts advise that you use this free add-on with caution.
Firstly, you have to consider that tech support for free apps is usually limited. In case of any trouble, you might have to rely on your own.
Secondly, current customers seem to not be so happy. Some can’t get it to work “Could not get it working, Plugins said it succeeded but nothing in Azure after waiting 20 mins! Documentation was not too detailed either” . Others feel the automation is limited “in order to migrate all of our 61,000+ attachments, we have to hit a “move to blob” button in 160+ attachment increments. You basically have to micro manage the migration by repetitiously clicking a button that could easily be automated. The monotony is horrendous. The concept is great, the execution is terrible.” or “Report is slow to load (several minutes) and you have to manually execute 100 at a time. Would have taken several days of pushing the button.” (reviews from AppSource).
Thirdly, this add-on only works on what you do from the installation onward, which is not good if you were already reaching the limit by then.
Finally, this solution works only for Azure Blob, so if you prefer SharePoint and the document collaboration advantages it offers, you need to look for other options.
CB Dynamics 365 Seamless Attachment Extractor
Connecting Software has been working on synchronization solutions since 2009, and we have noticed that the limited storage space is a chronic problem in many cloud CRM systems. In 2019, we launched
The solution solves the problem of Dynamics 365’s expensive storage space. It transfers seamlessly (thus, the name) any attachment files from CRM to other configured storage. For the end-user, it still looks like the attachment files are in Dynamics.
The user can still work with the attachments the same way as if they were stored in Dynamics. Yet, the add-on has actually offloaded them to another file storage, namely SharePoint, Azure File Storage, or Azure Blob Storage. This is entirely transparent to the user, who goes on working the exact same way they did before.
Each document will not take up storage space in D365 while, at the same time, it will remain reachable to users without any change to their workflows in D365. Any additional modification of those files will be transferred to the configured external file storage automatically.
It is important to note that, with the CB Dynamics 365 Seamless Attachment Extractor add-on, your documents are not leaving the Dynamics and attachment storage systems. There is no external service in between. When security is a concern, when you have sensitive data or want to ensure regulatory compliance (GDPR comes to mind…), then this is a crucial point, as you can see in the video (click on the image below).
Another convenient feature, unique for the market, is that is can compress/decompress files on-the-fly. It can also encrypt/decrypt them with AES256 encryption, which was adopted by the U.S. government and now used worldwide.
On top of that, if you have reached 80% of the storage space limit, this add-on will automatically go through the attachments that existed before you installed the add-on. You can move those attachments to SharePoint, Azure Blob, or Azure Storage. It works exactly the same way as it does for the ones that come up after the install. Isn’t that great news?
Want to know more on how to reduce Dynamics 365 storage costs?
If you want to know more on how CB Dynamics 365 Seamless Attachment Extractor can help your organization save on storage costs and more, our experts will be glad to talk to you and walk you through a
By Ana Neto,
Software engineer since 1997, she is now a technical advisor for Connecting Software.
Connecting Software is a global producer of integration and synchronization software solutions since 2004.