Tag Archives: Storage

Using Azure Analysis Services on Top of Azure Data Lake Storage

The latest release of SSDT Tabular adds support for Azure Data Lake Store (ADLS) to the modern Get Data experience (see the following screenshot). Now you can augment your big data analytics workloads in Azure Data Lake with Azure Analysis Services and provide rich interactive analysis for selected data subsets at the speed of thought!

ADLS Connector Available Using Azure Analysis Services on Top of Azure Data Lake Storage

If you are unfamiliar with Azure Data Lake, check out the various articles at the Azure Data Lake product information site. Also read the article “Get started with Azure Data Lake Analytics using Azure portal.”

Following these instructions, I provisioned a Data Lake Analytics account called tpcds for this article and a new Data Lake Store called tpcdsadls. I also added one of my existing Azure Blob Storage accounts, which contains a 1 TB TPC-DS data set, which I already created and used in the series “Building an Azure Analysis Services Model on Top of Azure Blob Storage.” The idea is to move this data set into Azure Data Lake as a highly scalable and sophisticated analytics backend, from which to serve a variety of Azure Analysis Services models.

For starters, Azure Data Lake can process raw data and put it into targeted output files so that Azure Analysis Services can import the data with less overhead. For example, you can remove any unnecessary columns at the source, which eliminates about 60 GB of unnecessary data from my 1 TB TPC-DS data set and therefore benefits processing performance, as discussed in “Building an Azure Analysis Services Model on Top of Azure Blob Storage–Part 3″.

Moreover, with relatively little effort and a few small changes to a U-SQL script, you can provide multiple targeted data sets to your users, such as a small data set for modelling purposes plus one or more production data sets with the most relevant data. In this way, a data modeler can work efficiently in SSDT Tabular against the small data set prior to deployment, and after production deployment, business users can get the relevant information they need from your Azure Analysis Services models in Microsoft Power BI, Microsoft Office Excel, and Microsoft SQL Server Reporting Services. And if a data scientist still needs more than what’s readily available in your models, you can use Azure Data Lake Analytics (ADLA) to run further U-SQL batch jobs directly against all the terabytes or petabytes of source data you may have. Of course, you can also take advantage of Azure HDInsight as a highly reliable, distributed and parallel programming framework for analyzing big data. The following diagram illustrates a possible combination of technologies on top of Azure Data Lake Store.

Analyze Big Data 1024x547 Using Azure Analysis Services on Top of Azure Data Lake Storage

Azure Data Lake Analytics (ADLA) can process massive volumes of data extremely quickly. Take a look at the following screenshot, which shows a Data Lake job processing approximately 2.8 billion rows of TPC-DS store sales data (~500 GB) in under 7 minutes!

Processing Store Sales 1024x741 Using Azure Analysis Services on Top of Azure Data Lake Storage

The screen in the background uses source files in Azure Data Lake Storage and the screen in the foreground uses source files in Azure Blob Storage connected to Azure Data Lake. The performance is comparable, so I decided to leave my 1 TB TPC-DS data set in Azure Blob Storage, but if you want to ensure absolute best performance or would like to consolidate your data in one storage location, consider moving all your raw data files into ADLS. It’s straightforward to copy data from Azure Blob Storage to ADLS by using the AdlCopy tool, for example.

With the raw source data in a Data Lake-accessible location, the next step is to define the U-SQL scripts to extract the relevant information and write it along with column names to a series of output files. The following listing shows a general U-SQL pattern that can be used for processing the raw TPC-DS data and putting it into comma-separated values (csv) files with a header row.

@raw_parsed = EXTRACT child_id int,
                ,
                empty string
FROM "/{*}_{child_id}_100.dat"
USING Extractors.Text(delimiter: '|');

@filtered_results = SELECT 
FROM @raw_parsed
;

OUTPUT @filtered_results
TO "//.csv"
USING Outputters.Csv(outputHeader:true);

The next listing shows a concrete example based on the small income_band table. Note how the query extracts a portion of the file name into a virtual child_id column in addition to the actual columns from the source files. This child_id column comes in handy later when generating multiple output csv files for the large TPC-DS tables. Also, the WHERE clause is not strictly needed in this example because the income_band table only has 20 rows, but it’s included to illustrate how to restrict the amount of data per table to a maximum of 100 rows to create a small modelling data set.

@raw_parsed = EXTRACT child_id int,
                      b_income_band_sk string,
                      b_lower_bound string,
                      b_upper_bound string,
                      empty string
FROM "wasb://income-band@aasuseast2/{*}_{child_id}_100.dat"
USING Extractors.Text(delimiter: '|');

@filtered_results = SELECT b_income_band_sk,
                           b_lower_bound,
                           b_upper_bound
FROM @raw_parsed
ORDER BY child_id ASC
FETCH 100 ROWS;

You can find complete sets of U-SQL scripts to generate output files for different scenarios (modelling, single csv file per table, multiple csv files for large tables, and large tables filtered by last available year) at the GitHub repository for Analysis Services.

For instance, for generating the modelling data set, there are 25 U-SQL scripts to generate a separate csv file for each TPC-DS table. You can run each U-SQL script manually in the Microsoft Azure portal, yet it is more convenient to use a small Microsoft PowerShell script for this purpose. Of course, you can also use Azure Data Factory, which among other things enables you to run U-SQL scripts on a scheduled basis. For this article, however, the following Microsoft PowerShell script suffices.

$  script_folder = "<Path to U-SQL Scripts>"
$  adla_account = "<ADLA Account Name>"
Login-AzureRmAccount -SubscriptionName "<Windows Azure Subscription Name>"

Get-ChildItem $  script_folder -Filter *.usql |
Foreach-Object {
    $  job = Submit-AdlJob -Name $  _.Name -AccountName $  adla_account –ScriptPath $  _.FullName -DegreeOfParallelism 100
    Wait-AdlJob -Account $  adla_account -JobId $  job.JobId
}

Write-Host "Finished processing U-SQL jobs!";

It does not take long for Azure Data Lake to process the requests. You can use the Data Explorer feature in the Azure Portal to double-check that the desired csv files have been generated successfully, as the following screenshot illustrates.

Output CSV Files for Modelling Using Azure Analysis Services on Top of Azure Data Lake Storage

With the modelling data set in place, you can finally switch over to SSDT and create a new Analysis Services Tabular model at the 1400 compatibility level. Make sure you have the latest version of the Microsoft Analysis Services Projects package installed so that you can pick Azure Data Lake Store from the list of available connectors. You will be prompted for the Azure Data Lake Store URL and you must sign in using an organizational account. Currently, the Azure Data Lake Store connector only supports interactive logons, which is an issue for processing the model in an automated way in Azure Analysis Services, as discussed later in this article. For now, let’s focus on the modelling aspects.

The Azure Data Lake Store connector does not automatically establish an association between the folders or files in the store and the tables in the Tabular model. In other words, you must create each table individually and select the corresponding csv file in Query Editor. This is a minor inconvenience. It also implies that each table expression specifies the folder path to the desired csv file individually. If you are using a small data set from a modelling folder to create the Tabular model, you would need to modify every table expression during production deployment to point to the desired production data set in another folder. Fortunately, there is a way to centralize the folder navigation by using a shared expression so that only a single expression requires an update on production deployment. The following diagram depicts this design.

Folder Navigation by using a shared Expression Using Azure Analysis Services on Top of Azure Data Lake Storage

To implement this design in a Tabular model, use the following steps:

  1. Start Visual Studio and check under Tools -> Extensions and Updates that you have the latest version of Microsoft Analysis Services Projects installed.
  2. Create a new Tabular project at the 1400 compatibility level.
  3. Open the Model menu and click on Import From Data Source.
  4. Pick the Azure Data Lake Store connector, provide the storage account URL, and sign in by using an Organizational Account. Click Connect and then OK to create the data source object in the Tabular model.
  5. Because you chose Import From Data Source, SSDT displays Query Editor automatically. In the Content column, click on the Table link next to the desired folder name (such as modelling) to navigate to the desired root folder where the csv files reside.
  6. Right-click the Table object in the right Queries pane, and click Create Function. In the No Parameters Found dialog box, click Create.
  7. In the Create Function dialog box, type GetCsvFileList, and then click OK.
  8. Make sure the GetCsvFileList function is selected, and then on the View menu, click Advanced Editor.
  9. In the Edit Function dialog box informing you that updates from the Table object will no longer propagate to the GetCsvFileList function if you continue, click OK.
  10. In Advanced Editor, note how the GetCsvFileList function navigates to the modelling folder, enter a whitespace character at the end of the last line to modify the expression, and then click Done.
  11. In the right Queries pane, select the Table object, and then in the left Applied Steps pane, delete the Navigation step, so that Source is the only remaining step.
  12. Make sure the Formula Bar is displayed (View menu -> Formula Bar), and then redefine the Source step as = GetCsvFileList() and press Enter. Verify that the list of csv files is displayed in Query Editor, as in the following screenshot.
    Invoke GetCsvFileList 1024x665 Using Azure Analysis Services on Top of Azure Data Lake Storage
  13. For each table you want to import:
    1. Right-click the existing Table object and click Duplicate.
    2. In the Content column, click on the Binary link next to the desired file name (such as call_center) and verify that Query Editor parses the columns and detects the data types correctly.
    3. Rename the table according to the csv file you selected (such as call_center).
    4. Right-click the renamed table object (such as call_center) in the Queries pane and click Create New Table.
    5. Verify that the renamed table object (such as call_center) is no longer displayed in italic, which indicates that the query will now be imported as a table into the Tabular model.
  14. After you created all desired tables by using the sequence above, delete the original Table object by right-clicking on it and selecting Delete.
  15. In Query Editor, click Import to add the GetCsvFileList expression and the tables to your Tabular model.

During the import, SSDT Tabular pulls in the small modelling data set. And prior to production deployment, it is now a simple matter of updating the shared expression by right-clicking on the Expressions node in Tabular Model Explorer and selecting Edit Expressions, and then changing the folder name in Advanced Editor. The below screenshot highlights the folder name in the GetCsvFileList expression. And if each table can find its corresponding csv file in the new folder location, deployment and processing can succeed.

Changing the CSV Folder 1024x644 Using Azure Analysis Services on Top of Azure Data Lake Storage

Another option is to deploy the model with the Do Not Process deployment option and use a small TOM application in Azure Functions to process the model on a scheduled basis. Of course, you can also use SSMS to connect to your Azure Analysis Services server and send a processing command, but it might be inconvenient to keep SSDT or SSMS connected for the duration of the processing cycle. Processing against the full 1 TB data set with a single csv file per table took about 15 hours to complete. Processing with four csv files/partitions for the seven large tables and maxActiveConnections on the data source set to 46 concurrent connections took roughly 6 hours. This is remarkably faster in comparison to using general BLOB storage, as in the Building an Azure Analysis Services Model on Top of Azure Blob Storage article, and suggests that there is potential for performance improvements in the Azure BLOB storage connector.

Processing 1024x529 Using Azure Analysis Services on Top of Azure Data Lake Storage

Even the processing performance against Azure Data Lake could possibly be further increased, as the processor utilization on an S9 Azure Analysis Server suggests (see the following screenshot). For the first 30 minutes, processor utilization is close to the maximum and then it decreases as the AS engine finishes more and more partitions and tables. Perhaps with an even higher degree of parallelism, such as with eight or twelve partitions for each large table, Azure AS could keep processor utilization near the maximum for longer and finish the processing work sooner. But processing optimizations through elaborate table partitioning schemes is beyond the scope of this article. The processing performance achieved with four partitions on each large table suffices to conclude that Azure Data Lake is a very suitable big-data backend for Azure Analysis Services.

QPUs 1024x352 Using Azure Analysis Services on Top of Azure Data Lake Storage

There is currently only one important caveat: The Azure Data Lake Store connector only supports interactive logons. When you define the Azure Data Lake Store data source, SSDT prompts you to log on to Azure Data Lake. The connector performs the logon and then stores the obtained authentication token in the model. However, this token only has a limited lifetime. Chances are fair that processing succeeds after the initial deployment, but when you come back the next day and want to process again, you get an error that “The credentials provided for the DataLake source are invalid.“ See the screenshot below. Either you deploy the model again in SSDT or you right-click the data source in SSMS and select Refresh Credentials to log on to Data Lake again and submit fresh tokens to the model.

refresh creds 1024x710 Using Azure Analysis Services on Top of Azure Data Lake Storage

A subsequent article is going to cover how to handle authentication tokens programmatically, so stay tuned for more on connecting to Azure Data Lake and other big data sources on the Analysis Services team blog. And as always, please deploy the latest monthly release of SSDT Tabular and send us your feedback and suggestions by using SSASPrev at Microsoft.com or any other available communication channels such as UserVoice or MSDN forums.

Let’s block ads! (Why?)

Analysis Services Team Blog

Container Storage Support with Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D), SMB Global Mapping

By Amitabh Tamhane

Goals: This topic provides an overview of providing persistent storage for containers with data volumes backed by Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D) and SMB Global Mapping.

Applicable OS releases: Windows Server 2016, Windows Server RS3

Prerequisites:

Blog:

With Windows Server 2016, many new infrastructure and application workload features were added that deliver significant value to our customers today. Amongst this long list, two very distinct features that were added: Windows Containers & Storage Spaces Direct!

1.   Quick Introductions

Let’s review a few technologies that have evolved independently. Together these technologies provide a platform for persistent data store for applications when running inside containers.

1.1         Containers

In the cloud-first world, our industry is going through a fundamental change in how applications are being developed & deployed. New applications are optimized for cloud scale, portability & deployment agility. Existing applications are also transitioning to containers to achieve deployment agility.

Containers provide a virtualized operating system environment where an application can safely & independently run without being aware of other applications running on the same host. With applications running inside containers, customers benefit from the ease of deployment, ability to scale up/down and save costs by better resource utilization.

More about Windows Containers.

1.2         Cluster Shared Volumes

Cluster Shared Volumes (CSV) provides a multi-host read/write file system access to a shared disk. Applications can read/write to the same shared data from any node of the Failover Cluster. The shared block volume can be provided by various storage technologies like Storage Spaces Direct (more about it below), Traditional SANs, or iSCSI Target etc.

More about Cluster Shared Volumes (CSV).

1.3         Storage Spaces Direct

Storage Spaces Direct (S2D) enables highly available & scalable replicated storage amongst nodes by providing an easy way to pool locally attached storage across multiple nodes.

Create a virtual disk on top of this single storage pool & any node in the cluster can access this virtual disk. CSV (discussed above) seamlessly integrates with this virtual disk to provide read/write shared storage access for any application deployed on the cluster nodes.

S2D works seamlessly when configured on physical servers or any set of virtual machines. Simply attach data disks to your VMs and configure S2D to get shared storage for your applications. In Azure, S2D can also be configured on Azure VMs that have premium data disks attached for faster performance.

More about Storage Spaces Direct (S2D). S2D Overview Video.

1.4         Container Data Volumes

With containers, any persistent data needed by the application running inside will need to be stored outside of the container or its image. This persistent data can be some shared read-only config state or read-only cached web-pages, or individual instance data (ex: replica of a database) or shared read-write state. A single containerized application instance can access this data from any container host in the fabric or multiple application containers can access this shared state from multiple container hosts.

With Data Volumes, a folder inside the container is mapped to another folder on the container host using local or remote storage. Using data volumes, application running inside containers access its persistent data while not being aware of the infrastructure storage topology. Application developer can simply assume a well-known directory/path to have the persistent data needed by the application. This enables the same container application to run on various deployment infrastructures.

2.   Better Together: Persistent Store for Container Fabric

This data volume functionality is great but what if a container orchestrator decides to place the application container to a different node? The persistent data needs to be available on all nodes where the container may run. These technologies together can provide a seamless way to provide persistent store for container fabric.

2.1         Data Volumes with CSV + S2D

Using S2D, you can leverage locally attached storage disks to form a single pool of storage across nodes. After the single pool of storage is created, simply create a new virtual disk, and it automatically gets added as a new Cluster Shared Volume (CSV). Once configured, this CSV volume gives you read/write access to the container persistent data shared across all nodes in your cluster.

With Windows Server 2016 (plus latest updates), we now have enabled support for mapping container data volumes on top of Cluster Shared Volumes (CSV) backed by S2D shared volumes. This gives application container access to its persistent data no matter which node the container orchestrator places the container instance.

Configuration Steps

Consider this example (assumes you have Docker & container orchestrator of your choice already installed):

  1. Create a cluster (in this example 4-node cluster)

New-Cluster -Name -Node

C01 Container Storage Support with Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D), SMB Global Mapping

(Note: The generic warning text above is referring to the quorum witness configuration which you can add later.)

  1. Enable Cluster S2D Functionality

Enable-ClusterStorageSpacesDirect or Enable-ClusterS2D

C02 Container Storage Support with Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D), SMB Global Mapping

(Note: To get the optimal performance from your shared storage, it is recommended to have SSD cache disks. It is not a must have for getting a shared volume created from locally attached storage.)

Verify single storage pool is now configured:

Get-StoragePool S2D*

C03 Container Storage Support with Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D), SMB Global Mapping

  1. Create new virtual disk + CSV on top of S2D:

New-Volume -StoragePoolFriendlyName *S2D* -FriendlyName -FileSystem CSVFS_REFS -Size 50GB

 C04 Container Storage Support with Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D), SMB Global Mapping

Verify new CSV volume getting created:

Get-ClusterSharedVolume

C05 Container Storage Support with Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D), SMB Global Mapping

This shared path is now accessible on all nodes in your cluster:

C06 Container Storage Support with Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D), SMB Global Mapping

  1. Create a folder on this volume & write some data:

C07 Container Storage Support with Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D), SMB Global Mapping

  1. Start a container with data volume linked to the shared path above:

This assumes you have installed Docker & able to run containers. Start a container with data volume:

docker run -it –name demo -v C:\ClusterStorage\Volume1\ContainerData:G:\AppData nanoserver cmd.exe

C08 Container Storage Support with Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D), SMB Global Mapping

Once started the application inside this container will have access to “G:\AppData” which will be shared across multiple nodes. Multiple containers started with this syntax can get read/write access to this shared data.

Inside the container, G:\AppData1 will then be mapped to the CSV volume’s “ContainerData” folder. Any data stored on “C:\ClusterStorage\Volume1\ContainerData” will then be accessible to the application running inside the container.

2.2         Data Volumes with SMB Global Mapping (Available in Windows Server RS3 Only)

Now what if the container fabric needs to scale independently of the storage cluster? Typically, this is possible through SMB share remote access. With containers, wouldn’t it be great to support container data volumes mapped to a remote SMB share?

In Windows Server RS3, there is a new support for SMB Global Mapping which allows a remote SMB Share to be mapped to a drive letter. This mapped drive is then accessible to all users on the local host. This is required to enable container I/O on the data volume to traverse the remote mount point.

With Scaleout File Server, created on top of the S2D cluster, the same CSV data folder can be made accessible via SMB share. This remote SMB share can then be mapped locally on a container host, using the new SMB Global Mapping PowerShell.

Caution: When using SMB global mapping for containers, all users on the container host can access the remote share. Any application running on the container host will also have access to the mapped remote share.

Configuration Steps

Consider this example (assumes you have Docker & container orchestrator of your choice already installed):

  1. On the container host, globally map the remote SMB share:

$ creds = Get-Credentials

New-SmbGlobalMapping -RemotePath \contosofileserver\share1 -Credential $ creds -LocalPath G:

This command will use the credentials to authenticate with the remote SMB server. Then, map the remote share path to G: drive letter (can be any other available drive letter). Containers created on this container host can now have their data volumes mapped to a path on the G: drive.

C09 Container Storage Support with Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D), SMB Global Mapping

  1. Create containers with data volumes mapped to local path where the SMB share is globally mapped.

C10 Container Storage Support with Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D), SMB Global Mapping

Inside the container, G:\AppData1 will then be mapped to the remote share’s “ContainerData” folder. Any data stored on globally mapped remote share will then be accessible to the application running inside the container. Multiple containers started with this syntax can get read/write access to this shared data.

This SMB global mapping support is SMB client-side feature which can work on top of any compatible SMB server including:

  • Scaleout File Server on top of S2D or Traditional SAN
  • Azure Files (SMB share)
  • Traditional File Server
  • 3rd party implementation of SMB protocol (ex: NAS appliances)

Caution: SMB global mapping does not support DFS, DFSN, DFSR shares in Windows Server RS3.

2.3 Data Volumes with CSV + Traditional SANs (iSCSI, FCoE block devices)

In Windows Server 2016, container data volumes are now supported on top of Cluster Shared Volumes (CSV). Given that CSV already works with most traditional block storage devices (iSCSI, FCoE). With container data volumes mapped to CSV, enables reusing existing storage topology for your container persistent storage needs.

Let’s block ads! (Why?)

Clustering and High-Availability

Best Practices in Data Storage (Part 1): What Types of Data Should be Retained?

Even if you analyze your data in real time, storing data for extended periods is important for compliance and other reasons. But what types of data should be retained and how long should you keep it? Keep reading for some insights on data storage.

blog data storage hard drive Best Practices in Data Storage (Part 1): What Types of Data Should be Retained?

Data Analytics Requires Data Storage

These days, real-time data analytics should be the foundation of most organizations’ approach to working with data. (For more on just how important real time has become, check out Syncsort ‘s Hadoop Market Adoption Survey report, which explains best practices for data management and analysis. But that doesn’t mean that you should interpret data as it streams in, then delete it forever.

On the contrary, keeping data around for a while – even after you’ve interpreted it – is important. It helps keep you compliant by ensuring that data remains available for audits or other reviews. It also provides you an opportunity to review historical data to identify long-term trends, or investigate incidents that you may not discover until long after the data related to them has been generated and processed.

blog banner landscape Best Practices in Data Storage (Part 1): What Types of Data Should be Retained?

Types of Data to Retain

The first step in building an effective data storage policy is to answer the question: Which types of data should I store for an extended period, and which can I delete instantly?

The short answer is that, to the extent possible, you should retain as much data as your storage capacity can support.

blog big data storage Best Practices in Data Storage (Part 1): What Types of Data Should be Retained?

But since most organizations must prioritize some data types for long-term data storage, here’s a general hierarchy that outlines which types of data to keep on hand. The data at the top of the list is the most important to store for as long as possible, while the data at the bottom is least important:

  1. Data that is required to be retained by compliance or regulatory policies. If you’re required by law to store a certain type of data, you should definitely keep that data around.
  2. Data that relates to your customers and helps you engage with them by achieving “customer 360.” Understanding your customers is hard, and you don’t want to give up the data that helps you with that challenge.
  3. Business documents, contracts and so on. This is important to store for as long as possible.
  4. Data that is generated by everyday business operations but is not regulated. This data can be helpful to have on hand for historical reviews or planning purposes, but it’s not essential.
  5. Machine data generated by your networking equipment, servers, sensors or other types of automated sources. Machine data tends to be the least useful type of data to store long term. It is sometimes useful to be able to review machine data when researching a technical incident or planning infrastructure expansions, but for the most part, machine data is only useful in real time, because the state of your infrastructure changes so quickly.

The exact types of data to prioritize for long-term storage will vary from organization to organization, of course. This hierarchy is just a general guide.

But there are practical limitations on how long you can store data. In tomorrow’s blog, we’ll discuss, just how long you need to keep it.

Discover the new rules for today’s data landscape – Download this eBook today!

 Best Practices in Data Storage (Part 1): What Types of Data Should be Retained?

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Hard Disk Storage is Cheaper than Ever: Why Do You Still Use Tape?

Which commodity has for the past several decades, consistently improved in quality while substantially declining in price? If you answered hard disk storage, you’re correct. Storage on hard drives today is much less expensive than it once was, and it performs much better. This means that if your storage strategy dates from a decade ago, it may be time to rethink it.

Allow me to reminisce a bit. My first computer, an IBM PS/2 that my grandparents gave me circa 1992, came with a hard disk that sported 256 megabytes – yes, megabytes – of storage.

After a couple of years of filling up that disk with DOS freeware games, I ran out of space. So, I diligently saved my birthday money and bought a new, 500-megabyte hard disk. It cost about $ 200, which easily made it the biggest purchase of my life.

Fast forward to the present and that money would easily get you 5 terabytes of storage – which, for those of you keeping track at home, is one million times as much storage as I had in the heyday of my PS/2. And that’s not even adjusting for inflation!

blog disk storage cheaper Hard Disk Storage is Cheaper than Ever: Why Do You Still Use Tape?

Disk Storage is Cheaper than Ever – and It’s Getting Cheaper Still

My experience was not an anomaly. Disk storage has steadily and dramatically decreased in cost over the past several decades.

If you want to see just how much, check out this fascinating page. It notes that data cost about $ 193,000 per gigabyte in the early 1990s, compared to $ 0.07 in 2009. That’s a pretty stunning decline. If only the price of other commodities – like oil, food and gold – had followed the same pattern, we’d all be feeling pretty rich today.

blog eBook Customer360 Hard Disk Storage is Cheaper than Ever: Why Do You Still Use Tape?

Disk Storage is Also Better

And that’s not all. In addition to declining in cost, data storage has also become much better, no matter how you measure it.

Read/write rates have improved dramatically. Data loss and file system corruption are no longer common problems (OK, that improvement has more to do with better software than better hardware, but the hardware helps).

Innovations like solid-state disks make it possible to toss your laptop out a window, while streaming data to disk, with a decently good chance that your files and hard drive will all come out OK (for the record, I decline all responsibility should you choose to test this).

According to Wikipedia, modern hard drives last about 11 times as long as they did when they were first introduced.

Related: Archiving Your Data for Regulatory Compliance with Hadoop and DMX-h

Making the Most of Modern Storage

Unless you live under a rock, you probably are already aware of everything I have noted above. You may therefore be wondering why this is relevant.

The reason is that it’s a reminder that if you are still clinging to legacy data storage media like tape archives, it’s high time you looked into replacing them with hard disks.

Yes, tape drives had their uses. They provided an affordable form of high-volume data storage at a time when large hard disks were just out of reach for most people. Their legacy will always live on in the venerable Unix tar command.

But today, tape drives no longer cut it in most respects when compared to hard drives. Drive-for-drive, the  storage capacity of tapes and hard disks is about the same. Hard disks’ life expectancy is at least as good.

The one place where hard drives come up a bit short when compared to tape is read/write speed. Tapes generally do better on this front – but solid-state disks are closing the gap.

And no matter how you slice it, hard disks are much cheaper gigabyte-for-gigabyte. That’s the biggest kicker.

blog modern storage Hard Disk Storage is Cheaper than Ever: Why Do You Still Use Tape?

Modernizing Your Storage with Syncsort

At this point, you may be thinking, “Hard disks sound great, but how do I actually move my data from tape to hard disk?”

That’s a fair question to ask. Moving data from tape to disk is not typically a drag-and-drop affair. You have to run a bunch of arcane commands (like the ones IBM summarizes here). And the transfer will take a long time if you have a lot of data.

A better approach is to take advantage of automated data migration solutions like Syncsort’s. Products like DMX-h handle the dirty work of tape-to-disk transfer for you. They allow you to ingest data automatically, in real time, from mainframe tape storage to a Hadoop environment running on disk.

With solutions like DMX-h, there’s really no excuse not to migrate to hard disks these days. Hard disks are smaller, more reliable and cheaper – and Syncsort makes the data migration as simple as doing any other kind of data transfer. Take the time to browse Syncsort’s available products.

Let’s block ads! (Why?)

Syncsort blog

New Features – Dynamics 365 Data Storage

With the upgrade to Dynamics 365, Microsoft has doubled the basic data storage included with your subscription. Previously, Dynamics CRM Online Subscriptions came with a basic 5 GB that you could use across all instances, including production instances and sandboxes. That 5 GB could be used up pretty quickly and we helped clients to maximize their storage usage in many different ways. With Dynamics 365 CRM, the default storage limit was increased to 10GB across all instances within a tenant. In addition, Organizations can add more storage to their organization in 2 ways – They can purchase additional storage from the Office 365 portal at an additional cost per month. Today that cost is $ 9.95 per GB per month. The second way to increase Dynamics 365 is to add additional Storage – An additional 5 GB of storage is added for every 20 full user licenses. There is no limit to additional free storage (previously this was limited to 50 GB) up to the technical limit of 5 TB.

 New Features – Dynamics 365 Data Storage
via Microsoft Dynamics 365 Licensing Guide

The Data Hogs

While 10 GB of storage seems like a lot of data at first, over the years that data will be consumed, sometime way faster than might be expected. We have identified several Data hogs that should be monitored

One of these factors is Auditing. Lots of companies turn on auditing for the entire CRM instance without thinking of the ramifications of Audit Logs. If auditing is turned on for too many CRM entities, the size of the audit logs quickly become very large. If you choose to use Auditing, then plan to audit only the most relevant entities and keep an eye on those audit logs.

Another factor leading to over usage of data is Workflows – Workflows lead to several situations – An incorrectly created workflow can result in thousands of workflows in a suspended state. These suspended workflows take up lots of storage as well as slowing your CRM system down. Also logs full of completed workflows can take up lots of data that can be recovered. .

However the leading factor that causes data usage is Attachments – both email attachments and attachments to notes – While we like to encourage users to attach their relevant emails into CRM, attaching too may emails with large attachments can quickly eat up the available storage. Monitoring large attachments and

What happens when you use too much storage?

When your system consumes 80% of available storage, System Administrators will begin to receive frequent email notifications as well as nag screens when they access CRM. I have never found a way to shut these off and they do become annoying. The only cure is to reduce the storage used below 80% or purchase additional storage through the Office 365 Administration portal.

So how do I reduce my database usage?

In our next blog we will talk through several tools and strategies for minimizing and controlling data usage. In the meantime, if you have any questions on managing your storage or licensing Dynamics 365, call our sales team at 1844-264-0729. We love helping You to make the best usage of your CRM system.

Let’s block ads! (Why?)

CRM Software Blog | Dynamics 365

Storage Considerations for Dynamics 365 Attachments

Attachments in Dynamics 365 generally consist of important Microsoft Office documents, image files, PDF files and other crucial files for your organization. What you may not know is that you can save money and reduce your Dynamics 365 data usage by storing your attachments outside of CRM. This post will let you know how to do just that and teach you how to change the size limit for any attachments you intend to store inside of CRM.

image thumb Storage Considerations for Dynamics 365 Attachments

An often overlooked feature of CRM is the ability to alter the file size limit for attachments. To do this navigate to System Settings and click on the Email tab. Here you will find a section called ‘Set file size limit for attachments’ in which you can set the maximum file size for attachments in kilobytes. The default size limit for attachments is 5120kB/5MB but can be increased to 32768Kb/32MB for older versions of CRM and 131072Kb/128MB for newer versions of CRM. This is great for any business that regularly deals with large attachments and needs to ensure that they can be stored in CRM.

Attachments are stored inside the CRM database and any extra storage is priced at a premium rate of $ 9.99 USD per GB while SharePoint is much cheaper at around $ 0.20 USD for every extra GB of storage. It is therefore important to consider moving your attachments from the CRM database into a document management and storage system such as SharePoint to save money. A good way to do this is by using the Attachment Extractor for Microsoft Dynamics 365 by mscrm-addons. This tool allows you to quickly extract your attachments to SharePoint and makes great use of the native Dynamics 365 SharePoint integration so you can easily attach and surface SharePoint documents from within CRM.

By following these tips your organization won’t have to pay escalating costs for additional data storage in order to continue making changes to CRM. Don’t forget that you can check how much space your CRM organization is using under the ‘Resources in Use’ section of the administration area. Monitoring this section is a great way to prevent your cloud storage limit from maxing out, so you can continue to make changes to CRM without issues.

Let’s block ads! (Why?)

Magnetism Solutions Dynamics CRM Blog

ASCP cures storage ailment with digital library, archive management software

TTlogo 379x201 ASCP cures storage ailment with digital library, archive management software

Imagine a pathologist in a lab coat, hunched over a microscope looking at a slide, trying to figure out if a patient is infected with a new strain of influenza or the common virus making the rounds this season. Sounds like something out of the 1950s, right?

A few years ago, that could have been the situation at the American Society for Clinical Pathology. But after running out of physical storage space and running into typical glass-slide issues, such as loss, breakage and degradation over time, these medical professionals crucial to U.S. public health decided to digitize the glass slides and implement archive management software. The result was a more usable, accessible and quickly searchable archive of the thousands of samples ASCP has built up.

The problems with storing glass slides are both obvious and unique — they are prone to being lost or damaged, and the iodine dye that is used can fade over time, compromising the educational value of the slide.

Digitizing eases cumbersome process

“It was an inventory and logistic nightmare,” said Ron Swan, CTO at ASCP, based in Chicago, referring to the cumbersome process of mailing out binders of slides and paperwork that corresponded to them. “Some products would be mailed to keep, but others were part of an assessment process where we’d see if a lab could accurately diagnose diseases. They would get a box of slides with a check sheet and be expected to ship them back to us. In a lot of cases, the slides got lost or broken.”

In 2012, ASCP started digitizing its existing slides and storing them on Alfresco’s enterprise content management (ECM) platform. ASCP uses the cloud-based platform to store most of the high-resolution images, because the digitized slides can be as large as 3 GB per image. The move toward digital storage was also due to ASCP moving to a new location, and the cost and resources dedicated to storing and maintaining the glass slides was no longer efficient.

“We have to pay to store the slides in a controlled condition,” Swan said. “Over time, the stain will fade and the integrity will be compromised. A lot of this educational inventory was at risk of fading and not being viable as an educational product.”

More recently, ASCP established an online tool through Alfresco, called ASCPedia, which allows pathologists to network and contribute slides and lab material — all of which goes through a thorough vetting process, according to Swan. The archive management software allows for easy searching among the vast amount of slides.

Digital library serves to educate

“A lot of our new contributions have been digitized images,” Swan said. “We have a lot of volunteers and members that are certified pathologists and will work with whole slide images that have been digitized. Surgical pathologists will wait for a biopsy, freeze it, slice it, stain it and immediately digitize it.”

And while the library of slides isn’t used for diagnosing patients a pathologist is currently seeing, it acts as a useful resource for the pathologist or other laboratory professionals to view similar cases that are digitally stored.

Some of the medical taxonomy — your normal spell-checker won’t work. Ron SwanCTO, American Society for Clinical Pathology

“The FDA [Food and Drug Administration] doesn’t approve primary diagnosis through digital pathology yet; they still use a microscope for that,” Swan said. “But if I’m a pathologist and choose to brush up on kidney abnormalities, for example, I could go to ASCPedia and search by discipline or by organ.”

While many ECM and archive management software companies exist, including Laserfiche, Dell EMC, Oracle and IBM, Swan and ASCP were drawn to Alfresco due to its open source capabilities and Swan’s familiarity with its script.

Hybrid cloud stores large images

“We give enterprises the ability to control their own destiny and not be beholden to a technical vendor,” said Thomas DeMeo, vice president of product management for Alfresco.

While digitizing the lab slides solved the problems surrounding lost or broken slides, Alfresco was paramount in helping ASCP with storing and transferring the digitized lab material, which often took up substantial digital real estate. ASCP uses a hybrid cloud to store larger files, while keeping on-premises storage space clear.

“Alfresco could act as a single platform and is a repository for not only regular documents, but high-end resolution images,” DeMeo said.

There are some areas where Swan and those at ASCP see room for improvement, specifically when it comes to natural language processing — although, that can be difficult in a jargon-filled field like pathology.

“Some of the medical taxonomy — your normal spell-checker won’t work,” Swan said, adding that mobile capabilities could be improved as well, but primarily for tablets due to resolution restrictions. “We’re not looking at taking this down to the level of a phone — it’s not practical. Due to the size of the image, pathology is visual, and no one is pulling out their five-inch-screen iPhones.”

Alfresco declined to comment regarding pricing, other than directing those interested to its pricing page on the Alfresco website.

Let’s block ads! (Why?)


ECM, collaboration and search news and features

On-premises CRM Maximum Storage Error

125 300x225 On premises CRM Maximum Storage Error

Are you running CRM On-premises (not CRM Online) and receiving the following error when attempting to save a record? Today’s blog will go over why this error is occurring and what the solution is.

Here is a look at what the error looks like:

120516 1629 OnpremisesC1 On premises CRM Maximum Storage Error

And here is the log file:

<s:Envelope xmlns:s=”http://schemas.xmlsoap.org/soap/envelope/”><s:Body><s:Fault><faultcode>s:Client</faultcode><faultstring xmlns:xml=”http://www.w3.org/XML/1998/namespace” xml:lang=”en-US”>Your subscription has the maximum amount of storage available. For additional storage, please contact our sales organization at 1-877-CRM-CHOICE (276-2464).</faultstring><detail><OrganizationServiceFault xmlns=”http://schemas.microsoft.com/xrm/2011/Contracts”><ErrorCode>-2147176361</ErrorCode><ErrorDetails /><Message>Your subscription has the maximum amount of storage available. For additional storage, please contact our sales organization at 1-877-CRM-CHOICE (276-2464).</Message><Timestamp>2016-09-07T13:07:59.6346654Z</Timestamp><InnerFault><ErrorCode>-2147176361</ErrorCode><ErrorDetails /><Message>Your subscription has the maximum amount of storage available. For additional storage, please contact our sales organization at 1-877-CRM-CHOICE (276-2464).</Message><Timestamp>2016-09-07T13:07:59.6346654Z</Timestamp><InnerFault><ErrorCode>-2147176361</ErrorCode><ErrorDetails /><Message>Your subscription has the maximum amount of storage available. For additional storage, please contact our sales organization at 1-877-CRM-CHOICE (276-2464).</Message><Timestamp>2016-09-07T13:07:59.6346654Z</Timestamp><InnerFault><ErrorCode>-2147220970</ErrorCode><ErrorDetails /><Message>System.Data.SqlClient.SqlException: Microsoft Dynamics CRM has experienced an error. Reference number for administrators or support: #F1ECD333</Message><Timestamp>2016-09-07T13:07:59.6346654Z</Timestamp><InnerFault xmlns:i=”http://www.w3.org/2001/XMLSchema-instance” i:nil=”true” /><TraceText xmlns:i=”http://www.w3.org/2001/XMLSchema-instance” i:nil=”true” /></InnerFault><TraceText xmlns:i=”http://www.w3.org/2001/XMLSchema-instance” i:nil=”true” /></InnerFault><TraceText xmlns:i=”http://www.w3.org/2001/XMLSchema-instance” i:nil=”true” /></InnerFault><TraceText xmlns:i=”http://www.w3.org/2001/XMLSchema-instance” i:nil=”true” /></OrganizationServiceFault></detail></s:Fault></s:Body></s:Envelope>

The Solution

The likely cause of this issue is that the CRM organization database is set to “Restricted Growth.” By clearing this option, the issue should be resolved.

Rather than manually searching for and changing this value, running the following script (replacing <DB> with the full name of the SQL database, including _MSCRM) should set the value as needed.

ALTER DATABASE <DB> MODIFY FILE (NAME = mscrm,MAXSIZE = UNLIMITED)

As always, it is recommended to do a SQL backup or full system image prior to making these types of changes.

For additional tips and tricks, visit our daily blog. And as always, if you need additional CRM assistance for Microsoft Dynamics 365, don’t hesitate to give us a call!

Happy CRM’ing!

Let’s block ads! (Why?)

PowerObjects- Bringing Focus to Dynamics CRM

AWS cuts S3 standard storage prices by up to 28%

AWS Thomas Cloer Flickr 780x438 AWS cuts S3 standard storage prices by up to 28%

Public cloud market leader Amazon Web Services (AWS) today announced changes to the pricing structure for its Glacier cold storage service, as well as price cuts for its widely used S3 standard cloud storage.

For S3 standard storage, prices are going down as much as 28.13 percent and will be as low as 2.1 cents per GB of data per month, effective December 1, AWS chief evangelist Jeff Barr wrote in a blog post. That price is only available when storing more than 500 TB and in AWS’ US East (Northern Virginia), US East (Ohio), US West (Oregon), and EU (Ireland) data center regions.

On the Glacier side, AWS will no longer charge customers based on the rate at which they retrieve data. Instead, there are new Standard, Expedited, and Bulk tiers. Standard is what was always available from Glacier, and generally allows customers to retrieve data in 3-5 hours for 1 cent per GB and 5 cents for every thousand requests. The Expedited tier, which generally provides data in 1-5 minutes, costs 3 cents per GB and 1 cent per request. (Additionally, customers can pay $ 100 a month for provisioned capacity in high-demand periods for at least three expedited retrievals every five minutes with no more than 150MB/second of throughput.) And the Bulk tier will generally take 5-12 hours and come out to a quarter of a cent per GB and 2.5 cents for every thousand requests.

AWS regularly cuts cloud prices when it racks up more customers and data; an Elastic Block Store (EBS) price cut was announced in August. Other public cloud infrastructure providers also cut their prices from time to time.

Competitor Google Cloud Platform recently introduced new cold storage tiers.

Earlier today, AWS announced a price cut for Custom Metrics through its CloudWatch monitoring service.

Let’s block ads! (Why?)

Big Data – VentureBeat

IBM launches new Power8 cloud systems, storage, and software

IBM aims to enable hybrid clouds with the announcement today of cloud-based Power Systems, storage systems, and new software.

Big Blue is unveiling the products and services at its IBM Edge event in Las Vegas today, all aimed at simplifying the movement of data, applications, and services across a hybrid cloud environment. The new systems use the company’s Power8 processors.

Organizations across all industries are investing in cloud technologies for innovation, growth, and efficiency. Those customers are blending public cloud, private cloud, and traditional information technology platforms. IBM wants to deepen the integration across the hybrid environments.

“Today’s business environment is very dynamic and filled with disruption,” said Tom Rosamilia, senior vice president of IBM Systems, in a statement. “A hybrid cloud model enables clients to continuously adapt while also optimizing on-premises investments. IBM is uniquely able to support the flexibility clients need across IBM Systems and the cloud.”

ibm cloud 800x417 IBM launches new Power8 cloud systems, storage, and software

Above: IBM has new cloud analytics.

Image Credit: IBM

The new systems offerings designed for hybrid cloud include Power Systems for the cloud. With integrated OpenStack-based cloud management and elastic consumption models, these offerings enable clients to transform their IT infrastructure to an on-premises, private cloud for AIX, IBM i, and Linux system of record workloads and extend them securely with rapid access to compute services in the IBM Cloud.

IBM is also launching Z Systems for cloud with Operational Insights and Hybrid Application Performance Management. Cloud-based services analyze Z System performance, provide recommended actions, and enable continuous benchmarking.

IBM is also unveiling IBM Spectrum Copy Data Management and Protect, which allows detailed, easy-to-use management of data copies. Additionally, IBM Spectrum Protect has expanded its extensive hybrid cloud solution integration with cloud object storage options for use in hybrid cloud deployments.

According to IBM, a survey showed that 92 percent of respondents said the most successful cloud projects enable the creation and support of new business models. They expect 45 percent of workloads to remain on dedicated on-premises systems, even as cloud use expands. And 83 percent of the highest performing organizations said cloud initiatives are integrated or highly coordinated.

IBM said it also has a number of new software solutions and partners for an open and collaborative software ecosystem in the cloud. IBM is partnering with Hortonworks, Mirantis, Nginx, and Red Hat.

Get more stories like this on TwitterFacebook

Let’s block ads! (Why?)

Big Data – VentureBeat