Tag Archives: model

Change your Power BI report to point to an external SSAS model

A common questions I get is to change the connection string to from my report to SSAS after I move my Power BI desktop file into SSAS. It turns out this actually pretty simple as there is an API that allows you to copy and then bind the report to ANY dataset. Let’s walk through this.

I start with a very simple Power BI desktop file that contains a single table that I imported from SQL:

 Change your Power BI report to point to an external SSAS model

I then upload it to Power BI, I added it to it’s own app workspace.

Then I created a report that points to my local SSAS and uploaded it to the same workspace, I do this to make sure I have a dataset in Power BI that points to SSAS. If you already have a dataset present you can skip this step. Of course you have to set up a data gateway in case of using a local SSAS.

So now I have 2 datasets in my workspace (I use the same workspace but you can also have them live in different ones):

 Change your Power BI report to point to an external SSAS model

SalesIt is the dataset that points to my SSAS instance and the SalesPBI is the embedded data model. I also have the 2 reports:

 Change your Power BI report to point to an external SSAS model

Now here comes the magic. I am using a PowerShell report created by my colleague Sirui that allows you to copy and bind the new report to ANY dataset in Power BI. You can find the script here. The key thing here is to make sure the 2 schema’s are the same, it needs to have the same columns, measures, tables etc otherwise the report will throw errors. In my example I didn’t actually use a imported model but used a SQL Server 2016 RTM model with the same schema and that also works.

Ok now for the PowerShell script, it does two things:

1 It creates a copy of the report

2 It binds the new report to a dataset provided in the script

The script uses to the Power BI Clone API to clone the report and rebind it..

First we need to configure the the script, I created a new “app” on https://dev.powerbi.com/apps as described in the script to get a new client Id and set a name for my new report called “SalesNowIt”. Next I got all the ID’s needed for the script to run like report and groupId’s. The script has step by step instructions.

Now after configuring I just run the PowerShell script (no additional changes necessary). And now see a new report showing up:

 Change your Power BI report to point to an external SSAS model

And now when I run the report that previously pointed to my embedded model it still works:

 Change your Power BI report to point to an external SSAS model

But when running profiler I see queries going to my local SSAS instead of the embedded model:

 Change your Power BI report to point to an external SSAS model

So that’s it, pretty straightforward and very straightforward.  Of course you can extend this PowerShell script yourself to do whatever you want, for example, loop through all the reports in a workspace and rebind them all.

Let’s block ads! (Why?)

Kasper On BI

Announcing the Modern Servicing Model for SQL Server

Background to SQL Server servicing

Historically, we have released Cumulative Updates (CUs) every 2 months after a major version is released, and roughly yearly Service Packs (SPs), containing fixes from all previous CUs, plus any feature completeness or supportability enhancements that may require localization. You can read more about the SQL Server Incremental Servicing Model (ISM) here.

Up to and including SQL Server 2016, RTM and any subsequent SPs establish a new product baseline. For each new baseline, CUs are provided for roughly 12 months after the next SP releases, or at the end of the mainstream phase of product lifecycle, whichever comes first.

For the entire product lifecycle, we release General Distribution Releases (GDRs) when needed, containing only security related fixes.

The Modern Servicing Model

Starting with SQL Server 2017, we are adopting a simplified, predictable mainstream servicing lifecycle:

  • SPs will no longer be made available. Only CUs, and GDRs when needed.
  • CUs will now accommodate localized content, allowing new feature completeness and supportability enhancements to be delivered faster.
  • CUs will be delivered more often at first and then less frequently. Every month for the first 12 months, and every quarter for the remainder 4 years of the full 5-year mainstream lifecycle.
  • CUs are delivered on the same week of the month: week of 3rd Tuesday.

Note: the Modern Servicing Model (MSM) only applies to SQL Server 2017 and future versions.

Servicing lifecycle

The servicing lifecycle is unchanged from SQL Server 2016:

  • Years 0-5 (Mainstream Support): Security and Functional issue resolution though CUs. Security issues through GDRs.
  • Years 6-10 (Extended Support): Security or critical functional issues.
  • Years 11-16 (Premium Assurance): Optional paid extension to Extended Support (no scope change).

FAQ

Having questions is expected. Please read below in case we have already covered it in this FAQ.

Q1: SPs were fully localized, and you released one update file for every supported language. How will this be handled with no SPs?
A1: CUs will be localized starting with SQL 2017. CUs will handle this requirement maintaining a single update file.

Q2: When we upgraded from a previous version of SQL Server, we did so at SP1 using slipstream media provided by Microsoft. How will this work with no SPs?
A2: We will provide CU based slipstream media for CU12 allowing for this.

Q3: My company always waited for SP1 to perform an upgrade from a previous version. What are my options now?
A3: Even before GA, the final SQL Server 2016 CTP versions were considered production-ready having gone through exhaustive testing both internally and with many preview customers. So there is no need to wait for an SP to install the latest SQL Server – you can install confidently as soon as a given version goes GA.
With that, you can still target any CU for Upgrade. For example, you could target CU12 for upgrade, and have slipstream media available.

Q4: I service an instance only with GDRs. I do not apply CUs, but apply SPs. Will I need to move to a CU servicing train if I need a non-critical/security fix?
A4: Yes. While this was previously true only leading up to SPs, now you must apply latest CU and there will not be an opportunity to reset back to receiving GDR updates only.

Q5: Assume that after Mainstream Support, you release a security fix. Are these going to be GDRs only? If so, how can I install it, if I’m already on a CU servicing train?
A5: During Extended Support, we will release GDRs and GDR-CUs separately. The same is valid for customers that purchase the additional Premium Assurance.

Q6: Previously, once SP2 was released (for example), if I was on the RTM baseline I would have to upgrade to SP1 or SP2 to get a hotfix. How will this work now?
A6: The only baseline will be RTM, and it will receive CUs for 5 years. There are no upgrades to an SP to receive CUs, or worry about which baseline a CU applies to.

Q7: If I am on RTM baseline, and CU20 (for example) was just released, will I receive technical support?
A7: This may be handled on a case by case basis. If the issue/question is in an area that has received a significant number of updates throughout the years, you may be asked to update to a later CU, yes.

Q8: Will SQL Server on Linux receive CUs and GDRs as well?
A8: Yes, every CU and GDR will have corresponding updates to all current Linux platforms.

Q9: Will CU and GDR KB articles then cover both SQL Server on Windows and Linux?
A9: Yes. Issues addressed in each release will be categorized by impacted platform(s).

Q10: Will SQL Server for Linux CUs and GDRs be updates to an existing installation like SQL Server on Windows?
A10: No, SQL Server on Linux updates will completely replace all binaries in the existing installation.

Q11: On SQL Server on Linux, can I remove an update?
A11: Yes, however this operation is performed by re-installing any desired previous servicing level package.

Q12: Will the testing and resulting quality levels of CUs be the same as SPs?
A12: Yes. CUs for all versions of SQL Server are tested to the same levels of Service Packs. As announced in January 2016, you should plan to install a CU with the same level of confidence you plan to install SPs as they are released. You can read more about that here.

Q13: Monthly CU releases are fast, I do not believe my business can keep pace with this, yet you have been proactively encouraging customers to stay current.
A13: Yes, the cadence is fast for the first 12 months. However, payload will be roughly 50% in theory, so these should be easier to consume. Of course, you still have the option to install every other CU for example, for the first 12 months. As the name suggests, all CUs are cumulative.

Q14: Why release CUs every month only for the first year, then move to quarterly updates for the remaining 4 years?
A14: Data shows that the vast majority of all hotfixes issued for a major release occurs in the first 12 months. The monthly cadence brings these fixes to customers much faster when it has the most impact. Reducing to quarterly updates reduces customer and operational overhead the course of the remaining 4 years.

Q15: Will the availability of CUs remain unchanged?
A15: For SQL Server on Windows CUs, no changes are planned. The most recent CU will be available on the Download Center, Windows Catalog, and WSUS. Previous CUs will be available in the Windows Catalog.

Q16: Where will I look for SQL Server on Linux CUs and GDRs?
A16: All updates, current and previous, will be maintained and available in repositories.

Q17: I see that Reporting Services (SSRS) is no longer installed by Setup. Where is it and how will it be serviced?
A17: RS is available for download via a link in Setup. Servicing will be independent moving forward.

Let’s block ads! (Why?)

SQL Server Release Services

New season, New software, New servicing model

Greetings. It feels like a lot of change is in the air! Last Friday marked the Autumnal equinox. Depending upon which hemisphere you live in, the changes you might see with the new season might differ. People living in northern hemisphere welcome Fall season while folks in southern hemisphere welcome Spring.

If you are working in the database field, you could not have missed the announcements of astronomical proportions coming from the Ignite conference this week. Specifically, for SQL Server, Scott Guthrie and Rohan Kumar announced the general availability of SQL Server 2017. You can read the complete announcement from Rohan Kumar @ Microsoft for the Modern Data Estate. Lot of customers, users and fans of SQL Server will be excited to deploy SQL Server across different platforms as well as experience the amazing new features introduced.

While the engineering team was busy getting ready to release the product, lot of engineers from the support team participated in reviewing the product behavior, providing feedback, filing bugs, tracking changes and getting trained on new technologies and so on. The support team is all geared up and ready to work with customers who will start deploying the new release.

New product release provides an opportunity to innovate how we service the product as well. You might have seen the announcements from SQL Server Release Services about the modern servicing model for SQL Server. My friend Pedro Lopes has blogged about this in great detail along with a FAQ @ Announcing the Modern Servicing Model for SQL Server. Please take the time to read through all the details and information provided. This will help you prepare to keep your SQL Server install base healthy and up to date on patches, fixes, improvements. When you are working with members of our support team you will hear about these changes – especially if you need a fix/change for the product.

Let us all welcome the new season, the new software and the new ways in which we will get updates! Looking forward to working with all of you on the new scenarios and possibilities the new software opens for all of us.

Suresh Kandoth / Pradeep M.M. / Arun Kumar K

[On behalf of all SQL Server support engineers, managers and escalation engineers]

Let’s block ads! (Why?)

CSS SQL Server Engineers

Courier Service Thrives Under Radical Business Model

When outspoken venture capitalist and Netscape co-founder Marc Andreessen wrote in The Wall Street Journal in 2011 that software is eating the world, he was only partly correct. In fact, business services based on software platforms are what’s eating the world.

Companies like Apple, which remade the mobile phone industry by offering app developers easy access to millions of iPhone owners through its iTunes App Store platform, are changing the economy. However, these world-eating companies are not just in the tech world. They are also emerging in industries that you might not expect: retailers, finance companies, transportation firms, and others outside of Silicon Valley are all at the forefront of the platform revolution.

These outsiders are taking platforms to the next level by building them around business services and data, not just apps. Companies are making business services such as logistics, 3D printing, and even roadside assistance for drivers available through a software connection that other companies can plug in to and consume or offer to their own customers.

SAP Q317 DigitalDoubles Feature1 Image2 Courier Service Thrives Under Radical Business ModelThere are two kinds of players in this business platform revolution: providers and participants. Providers create the platform and create incentives for developers to write apps for it. Developers, meanwhile, are participants; they can extend the reach of their apps by offering them through the platform’s virtual shelves.

Business platforms let companies outside of the technology world become powerful tech players, unleashing a torrent of innovation that they could never produce on their own. Good business platforms create millions in extra revenue for companies by enlisting external developers to innovate for them. It’s as if strangers are handing you entirely new revenue streams and business models on the street.

Powering this movement are application programming interfaces (APIs) and software development kits (SDKs), which enable developers to easily plug their apps into a platform without having to know much about the complex software code that drives it. Developers get more time to focus on what they do best: writing great apps. Platform providers benefit because they can offer many innovative business services to end customers without having to create them themselves.

Any company can leverage APIs and SDKs to create new business models and products that might not, in fact, be its primary method of monetization. However, these platforms give companies new opportunities and let them outflank smaller, more nimble competitors.

Indeed, the platform economy can generate unbelievable revenue streams for companies. According to Platform Revolution authors Geoffrey G. Parker, Marshall W. Van Alstyne, and Sangeet Paul Choudary, travel site Expedia makes approximately 90% of its revenue by making business services available to other travel companies through its API.

In TechCrunch in May 2016, Matt Murphy and Steve Sloane wrote that “the number of SaaS applications has exploded and there is a rising wave of software innovation in APIs that provide critical connective tissue and increasingly important functionality.” ProgrammableWeb.com, an API resource and directory, offers searchable access to more than 15,000 different APIs.

According to Accenture Technology Vision 2016, 82% of executives believe that platforms will be the “glue that brings organizations together in the digital economy.” The top 15 platforms (which include companies built entirely on this software architecture, such as eBay and Priceline.com) have a combined market capitalization of US$ 2.6 trillion.

It’s time for all companies to join the revolution. Whether working in alliance with partners or launching entirely in-house, companies need to think about platforms now, because they will have a disruptive impact on every major industry.

SAP Q317 DigitalDoubles Feature1 Image3 1024x572 Courier Service Thrives Under Radical Business Model

To the Barricades

Several factors converged to make monetizing a company’s business services easier. Many of the factors come from the rise of smartphones, specifically the rise of Bluetooth and 3G (and then 4G and LTE) connections. These connections turned smartphones into consumption hubs that weren’t feasible when high-speed mobile access was spottier.

One good example of this is PayPal’s rise. In the early 2000s, it functioned primarily as a standalone web site, but as mobile purchasing became more widespread, third-party merchants clamored to integrate PayPal’s payment processing service into their own sites and apps.

In Platform Revolution, Parker, Van Alstyne, and Choudary claim that “platforms are eating pipelines,” with pipelines being the old, direct-to-consumer business methods of the past. The first stage of this takeover involved much more efficient digital pipelines (think of Amazon in the retail space and Grubhub for food delivery) challenging their offline counterparts.

What Makes Great Business Platforms Run?

SAP Q317 DigitalDoubles Feature1 Image8 Courier Service Thrives Under Radical Business Model

The quality of the ecosystem that powers your platform is as important as the quality of experience you offer to customers. Here’s how to do it right.

Although the platform economy depends on them, application programming interfaces (APIs) and software development kits (SDKs) aren’t magic buttons. They’re tools that organizations can leverage to attract users and developers.

To succeed, organizations must ensure that APIs include extensive documentation and are easy for developers to add into their own products. Another part of platform success is building a general digital enterprise platform that includes both APIs and SDKs.

A good platform balances ease of use, developer support, security, data architecture (that is, will it play nice with a company’s existing systems?), edge processing (whether analytics are processed locally or in the cloud), and infrastructure (whether a platform provider operates its own data centers and cloud infrastructure or uses public cloud services). The exact formula for which elements to embrace, however, will vary according to the use case, the industry, the organization, and its customers.

In all cases, the platform should offer a value proposition that’s a cut above its competitors. That means a platform should offer a compelling business service that is difficult to duplicate.

By creating open standards and easy-to-work-with tools, organizations can greatly improve the platforms they offer. APIs and SDKs may sound complicated, but they’re just tools for talented people to do their jobs with. Enable these talented people, and your platform will take off.

In the second stage, platforms replace pipelines. Platform Revolution’s authors write: “The Internet no longer acts merely as a distribution channel (a pipeline). It also acts as a creation infrastructure and a coordination mechanism. Platforms are leveraging this new capability to create entirely new business models.” Good examples of second-stage companies include Airbnb, DoubleClick, Spotify, and Uber.

Allstate Takes Advantage of Its Hidden Jewels

Many companies taking advantage of platforms were around long before APIs, or even the internet, existed. Allstate, one of the largest insurers in the United States, has traditionally focused on insurance services. But recently, the company expanded into new markets—including the platform economy.

Allstate companies Allstate Roadside Services (ARS) and Arity, a technology company founded by Allstate in late 2016, have provided their parent company with new sources of revenue, thanks to new offerings. ARS launched Good Hands Rescue APIs, which allow third parties to leverage Allstate’s roadside assistance network in their own apps. Meanwhile, Arity offers a portfolio of APIs that let third parties leverage Allstate’s aggregate data on driver behavior and intellectual property related to risk prediction for uses spanning mobility, consumer, and insurance solutions.

SAP Q317 DigitalDoubles Feature1 Image4 Courier Service Thrives Under Radical Business ModelFor example, Verizon licenses an Allstate Good Hands Rescue API for its own roadside assistance app. And automakers GM and BMW also offer roadside assistance service through Allstate.

Potential customers for Arity’s API include insurance providers, shared mobility companies, automotive parts makers, telecoms, and others.

“Arity is an acknowledgement that we have to be digital first and think about the services we provide to customers and businesses,” says Chetan Phadnis, Arity’s head of product development. “Thinking about our intellectual property system and software products is a key part of our transformation. We think it will create new ways to make money in the vertical transportation ecosystem.”

One of Allstate’s major challenges is a change in auto ownership that threatens the traditional auto insurance model. No-car and one-car households are on the rise, ridesharing services such as Uber and Lyft work on very different insurance models than passenger cars or traditional taxi companies, and autonomous vehicles could disrupt the traditional auto insurance model entirely.

This means that companies like Allstate are smart to look for revenue streams beyond traditional insurance offerings. The intangible assets that Allstate has accumulated over the years—a massive aggregate collection of driver data, an extensive set of risk models and predictive algorithms, and a network of garages and mechanics to help stranded motorists—can also serve as a new revenue stream for the future.

By offering two distinct API services for the platform economy, Allstate is also able to see what customers might want in the future. While the Good Hands Rescue APIs let third-party users integrate a specific service (such as roadside assistance) into their software tools, Arity instead lets third-party developers leverage huge data sets as a piece of other, less narrowly defined projects, such as auto maintenance. As Arity gains insights into how customers use and respond to those offerings, it gets a preview into potential future directions for its own products and services.

SAP Q317 DigitalDoubles Feature1 Image5 1024x572 Courier Service Thrives Under Radical Business Model

Farmers Harvest Cash from a Platform

Another example of innovation fueling the platform economy doesn’t come from a boldfaced tech name. Instead, it comes from a relatively small startup that has nimbly built its business model around data with an interesting twist: it turns its customers into entrepreneurs.

Farmobile is a Kansas City–based agriculture tech company whose smart device, the Passive Uplink Connection (PUC), can be plugged into tractors, combines, sprayers, and other farm equipment.

Farmobile uses the PUC to enable farmers to monetize data from their fields, which is one of the savviest routes to success with platforms—making your platform so irresistible to end consumers that they foment the revolution for you.

Once installed, says CEO Jason Tatge, the PUC streams second-by-second data to farmers’ Farmobile accounts. This gives them finely detailed reports, called Electronic Field Records (EFRs), that they can use to improve their own business, share with trusted advisors, and sell to third parties.

The PUC gives farmers detailed records for tracking analytics on their crops, farms, and equipment and creates a marketplace where farmers can sell their data to third parties. Farmers benefit because they generate extra income; Farmobile benefits because it makes a commission on each purchase and builds a giant store of aggregated farming data.

This last bit is important if Farmobile is to successfully compete with traditional agricultural equipment manufacturers, which also gather data from farmers. Farmobile’s advantage (at least for now) is that the equipment makers limit their data gathering to their existing customer bases and sell it back to them in the form of services designed to improve crop yields and optimize equipment performance.

Farmobile, meanwhile, is trying to appeal to all farmers by sharing the wealth, which could help it leapfrog the giants that already have large customer bases. “The ability to bring data together easily is good for farmers, so we built API integrations to put data in one place,” says Tatge.

Farmers can resell their data on Farmobile’s Data Store to buyers such as reinsurance firm Guy Carpenter. To encourage farmers to opt in, says Tatge, “we told farmers that if they run our device over planting and harvest season, we can guarantee them $ 2 per acre for their EFRs.”

So far, Farmobile’s customers have sent the Data Store approximately 4,200 completed EFRs for both planting and harvest, which will serve as the backbone of the company’s data monetization efforts. Eventually, Farmobile hopes to expand the offerings on the Data Store to include records from at least 10 times as many different farm fields.

SAP Q317 DigitalDoubles Feature1 Image6 1024x572 Courier Service Thrives Under Radical Business Model

Under Armour Binges on APIs

Another model for the emerging business platform world comes from Under Armour, the sports apparel giant. Alongside its very successful clothing and shoe lines, Under Armour has put its platform at the heart of its business model.

But rather than build a platform itself, Under Armour has used its growing revenues to create an industry-leading ecosystem. Over the past decade, it has purchased companies that already offer APIs, including MapMyFitness, Endomondo, and MyFitnessPal, and then linked them all together into a massive platform that serves 30 million consumers.

This strategy has made Under Armour an indispensable part of the sprawling mobile fitness economy. According to the company’s 2016 annual results, its business platform ecosystem, known as the Connected Fitness division, generated $ 80 million in revenue that year—a 51% increase over 2015.

SAP Q317 DigitalDoubles Feature1 Image7 Courier Service Thrives Under Radical Business ModelBy combining existing APIs from its different apps with original tools built in-house, extensive developer support, and a robust SDK, third-party developers have everything they need to build their own fitness app or web site.

Depending on their needs, third-party developers can sign up for several different payment plans with varying access to Under Armour’s APIs and SDKs. Indeed, the company’s tiered developer pricing plan for Connected Fitness, which is separated into Starter, Pro, and Premium levels, makes Under Armour seem more like a tech company than a sports apparel firm.

As a result, Under Armour’s APIs and SDKs are the underpinnings of a vast platform cooperative. Under Armour’s apps seamlessly integrate with popular services like Fitbit and Garmin (even though Under Armour has a fitness tracker of its own) and are licensed by corporations ranging from Microsoft to Coca-Cola to Purina. They’re even used by fitness app competitors like AthletePath and Lose It.

A large part of Under Armour’s success is the sheer amount of data its fitness apps collect and then make available to developers. MyFitnessPal, for instance, is an industry-leading calorie and food tracker used for weight loss, and Endomondo is an extremely popular running and biking record keeper and route-sharing platform.

One way of looking at the Connected Fitness platform is as a combination of traditional consumer purchasing data with insights gleaned from Under Armour’s suite of apps, as well as from the third-party apps that Under Armour’s products use.

Indeed, Under Armour gets a bonus from the platform economy: it helps the company understand its customers better, creating a virtuous cycle. As end users use different apps fueled by Under Armour’s services and data-sharing capabilities, Under Armour can then use that data to fuel customer engagement and attract additional third-party app developers to add new services to the ecosystem.

What Successful Platforms Have in Common

The most successful business platforms have three things in common: They’re easy to work with, they fulfill a market need, and they offer data that’s useful to customers.

For instance, Farmobile’s marketplace fulfills a valuable need in the market: it lets farmers monetize data and develop a new revenue stream that otherwise would not exist. Similarly, Allstate’s Arity experiment turns large volumes of data collected by Allstate over the years into a revenue stream that drives down costs for Arity’s clients by giving them more accurate data to integrate into their apps and software tools.

Meanwhile, Under Armour’s Connected Fitness platform and API suite encourage users to sign up for more apps in the company’s ecosystem. If you track your meals in MyFitnessPal, you’ll want to track your runs in Endomondo or MapMyRun. Similarly, if you’re an app developer in the health and fitness space, Under Armour has a readily available collection of tools that will make it easy for users to switch over to your app and cheaper for you to develop your app.

As the platform economy grows, all three of these approaches—Allstate’s leveraging of its legacy business data, Farmobile’s marketplace for users to become data entrepreneurs, and Under Armour’s one-stop fitness app ecosystem—are extremely useful examples of what happens next.

In the coming months and years, the platform economy will see other big changes. In 2016 for example, Apple, Microsoft, Facebook, and Google all released APIs for their AI-powered voice assistant platforms, the most famous of which is Apple’s Siri.

The introduction of APIs confirms that the AI technology behind these bots has matured significantly and that a new wave of AI-based platform innovation is nigh. (In fact, Digitalistpredicted last year that the emergence of an API for these AIs would open them up beyond conventional uses.) New voice-operated technologies such as Google Home and Amazon Alexa offer exciting opportunities for developers to create full-featured, immersive applications on top of existing platforms.

We will also see AI- and machine learning–based APIs emerge that will allow developers to quickly leverage unstructured data (such as social media posts or texts) for new applications and services. For instance, sentiment analysis APIs can help explore and better understand customers’ interests, emotions, and preferences in social media.

As large providers offer APIs and associated services for smaller organizations to leverage AI and machine learning, these companies can in turn create their own platforms for clients to use unstructured data—everything from insights from uploaded photographs to recognizing a user’s emotion based on facial expression or tone of voice—in their own apps and products. Meanwhile, the ever-increasing power of cloud platforms like Amazon Web Services and Microsoft Azure will give these computing-intensive app platforms the juice they need to become deeper and richer.

These business services will depend on easy ways to exchange and implement data for success. The good news is that finding easy ways to share data isn’t hard and the API and SDK offerings that fuel the platform economy will become increasingly robust. Thanks to the opportunities generated by these new platforms and the new opportunities offered to end users, developers, and platform businesses themselves, everyone stands to win—if they act soon. D!


About the Authors

Bernd Leukert is a member of the Executive Board, Products and Innovation, for SAP.

Björn Goerke is Chief Technology Officer and President, SAP Cloud Platform, for SAP.

Volker Hildebrand is Global Vice President for SAP Hybris solutions.

Sethu M is President, Mobile Services, for SAP.

Neal Ungerleider is a Los Angeles-based technology journalist and consultant.


Read more thought provoking articles in the latest issue of the Digitalist Magazine, Executive Quarterly.

Comments

Let’s block ads! (Why?)

Digitalist Magazine

Online Analysis Services Course: Developing a Multidimensional Model

Check out the excellent, new online course by Peter Myers and Chris Randall for Microsoft Learning Experiences (LeX). Lean how to develop multidimensional data models with SQL Server 2016 Analysis Services. The complete course is available on edX at no cost to audit, or you can highlight your new knowledge and skills with a Verified Certificate for a small charge. Enrollment is available at edX.

Let’s block ads! (Why?)

Analysis Services Team Blog

Model Comparison and Merging for Analysis Services

Relational-database schema comparison and merging is a well-established market. Leading products include SSDT Schema Compare and Redgate SQL Compare, which is partially integrated into Visual Studio. These tools are used by organizations seeking to adopt a DevOps culture to automate build-and-deployment processes and increase the reliability and repeatability of mission critical systems.

Comparison and merging of BI models also introduces opportunities to bridge the gap between self-service and IT-owned “corporate BI”. This helps organizations seeking to adopt a “bi-modal BI” strategy to mitigate the risk of competing IT-owned and business-owned models offering redundant solutions with conflicting definitions.

Such functionality is available for Analysis Services tabular models. Please see the Model Comparison and Merging for Analysis Services whitepaper for detailed usage scenarios, instructions and workflows.

This is made possible with BISM Normalizer, which we are pleased to announce now resides on the Analysis Services Git repo. BISM Normalizer is a popular open-source tool that works with Azure Analysis Services and SQL Server Analysis Services. All tabular model objects and compatibility levels, including the new 1400 compatibility level, are supported. As a Visual Studio extension, it is tightly integrated with source control systems, build and deployment processes, and model management workflows.

Schema Diff2 Model Comparison and Merging for Analysis Services

Thanks to Javier Guillen (Blue Granite), Chris Webb (Crossjoin Consulting), Marco Russo (SQLBI), Chris Woolderink (Tabular) and Bill Anton (Opifex Solutions) for their contributions to the whitepaper.

Let’s block ads! (Why?)

Analysis Services Team Blog

Will Blockchain Technology Finally Lead Us To A Fair Payment Model For Content?

DECENT thumb Forbes01 e1498495694264 Will Blockchain Technology Finally Lead Us To A Fair Payment Model For Content?

During my journalism internship, I could already see there was going to be a problem with my chosen vocation. The small New England television station where I was interning was struck by a wave of media consolidation and competition from new cable networks. Writers were happy just to be called in as weekend temps. Geez, even my mentor was telling me to go to law school.

That was 1992, and traditional journalism was in deep crisis. Then along came the Internet, that happy digital highway of instant-publishing gratification. This had the effect of turning nearly everyone into a writer and publisher, for better or for worse. But the question ingrained upon me during my internship remained: How does anyone seriously get paid for this?

Different models – subscriptions, sponsorships, embedded ads, SEO, and pay-per-download – have been tried for promoting equitable distribution of digital content, but the question of secure, reliable payment still hovers over the media industry – as it does for most anyone who produces creative content, like books, movies, music, or pictures.

The traditional route of an author has been to present their work to a publisher and hope that their particular flavor of creative genius will be chosen for readers’ mass consumption. Few make the cut, and those who do face absorbent publishing fees that detract from hard-won earnings, sometimes leaving the author with a slim royalty of 25% for their e-book (source: Business of Publishing).

One of my favorite authors, Elizabeth Gilbert, wrote in her book Big Magic: Creative Living Beyond Fear about the many rejection letters – some helpful, some seemingly capricious – that she received from publishers. Undeterred, she chalked it up as part of the creative process, preferring perseverance to despair. She also kept her day job until her third book, the New York Times bestseller Eat, Pray, Love, won critical acclaim. Oprah promoted it in her book club, and Julia Roberts played the lead role in the movie version.

But what if there’s another path to publishing that’s open, free from media manipulation, and inherently just in matters of financial compensation? The answer might be found in blockchain technology. Underpinning the new “Internet of trust,” blockchains are open, distributed ledgers for recording transactions so they are verifiable and permanent. Inherently secure and resistant to tampering, this technology is ideal for recording events, records management, identity management, and transaction processing. But blockchain technology isn’t just for the likes of banks and insurance companies. It can also help creatives showcase their work, cultivate an audience – and get paid securely and directly.

DECENT, a Swiss tech startup, is applying blockchain technology to build a secured and trusted content distribution platform for authors, artists, and creatives of all calibers. Already in its second round of testing, with a launch date set for June 30, DECENT is attracting interest from technology enthusiasts and authors alike. Content posted on the platform includes e-books, blogs, videos, music, photos, and white papers, as well as independent software packages. Additionally, DECENT is encouraging third-party developers to build their own apps on top of its open-source protocol, opening the way for new publishing opportunities such as blogging spaces, photo galleries, and independent newspapers.

Freedom of speech on the blockchain

What makes DECENT different from other content distribution platforms is both the blockchain technology that supports it and its commitment to freedom of speech. Founded in 2015 by Matej Michalko and Matej Boda, both from Slovakia, DECENT is an independent, nonprofit peer-to-peer network that is wholly owned by its users. As such, it is not affiliated with any economic, media, or political party. In this decentralized network, content is hosted on multiple sites, meaning that it cannot be blocked, manipulated, or tampered with once it is published. This level of security provided by blockchain technology, in effect, also presents an interesting opportunity for independent media and organizations, like political opposition press and dissidents. “We have created a fully integrated and trustworthy worldwide platform system of digital content distribution, where blockchain plays a central role,” says Michalko. “Communication and all payments are done through the blockchain.”

That means no middlemen stand between the content author and the consumer. Whether in the form of publishing fees or content hosting fees, the cut taken by middlemen can be a serious threat to trust and profitability. “Most middlemen are unnecessary,” says Michalko. Citing a key advantage of blockchain compared to other technologies, he adds, “There is a significant reduction of costs of hosting because blockchain is peer-to-peer technology.”

DECENT community: Authors, consumers, publishers

The DECENT community is divided into three groups based on their relationships to the content: authors, consumers, and publishers. Authors comprise all types of content creators, whether they are individuals, organizations, businesses, artists, writers, photographers, software developers, music producers, or videographers. These people can upload their content, regardless of format, to the DECENT platform and set their price for which they will allow someone to access it. “There are no steps needed from the author. Just basically, upload the content to the platform and push the Publish button,” says Michalko. And what about tracking the content downstream? DECENT will tag the content with a digital fingerprint, enabling the author to identify copies of the content, so no unauthorized person can access or distribute it.

Consumers are the people who purchase the content and download it for their use. They can either access DECENT via its app or on the Web. “We built our products aimed at the general public, so no special knowledge is needed,” says Michalko. Purchasing content is like entering into a contract: the consumer makes a promise-to-pay; the author then delivers the content; and the payment is made. No payment, no content.

Payment is made with DCT tokens, a cryptocurrency like the Bitcoin that is the value denomination used in the DECENT community. DCT tokens are exchangeable to Bitcoins (BTC) and fiat currencies like euros, dollars, or pounds.

Which brings us to the role of publishers. Publishers are the members of the community who sustain the network by applying their independent computing resources for activities like processing transactions between authors and consumers and mining of Bitcoins. They typically provide their services in exchange for a fee. To be a publisher requires the necessary hardware and computing resources.

Startup credo: Build fresh, hire smart

DECENT stands out among blockchain startups in that it has its own blockchain built on open-source protocols. This provides the DECENT team with assurance in both the integrity of its technology and the mission of its platform. From September to November of 2016, DECENT held its initial coin offering (ICO) for its DCT tokens, raising 5,881+ BTC in the value of USD$ 4.2M at the end of the ICO and having a value of $ 15M in June 2017. More than 4,000 people backed the ICO, often buying in at two bitcoins or less. The funds will be used by the startup to further the development of its technology. DECENT’s team consists of 35+ people, spread among four global locations: Switzerland, Slovakia, China, and Armenia. Recently, DECENT hosted an AMA (Ask-Me-Anything) session to answer questions about its service offering and progress toward launching its main net. You can watch the recording of that AMA session here.

Michalko’s vision for DECENT is to be the content distribution platform of choice. He says, “I have confidence because we have great technology and a great team, and we are able to eliminate all the unnecessary middlemen.”

For more on blockchain, see How Can You Turn Blockchain Into Business Value?

Comments

Let’s block ads! (Why?)

Digitalist Magazine

Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 3

Part 2 finished with the deployment of a scalable Tabular 1400 model in Azure Analysis Services on top of Azure Blob storage. Part 3 continues the story by attempting to load up to 1 TB of TPC-DS source data into the model–hosted on the largest server that Azure Analysis Services offered at the time of this writing (an S9 with a capacity of 640 Query Processing Units and 400 GB of Cache). Can Azure Analysis Services import 1 TB of source data? How long would processing take and could it be accelerated? Let’s find out!

For this part of the story, I work primarily in SQL Server Management Studio (SSMS). SSMS can connect directly to an Azure Analysis Services server by using the fully qualified server name. For this purpose, it supports Active Directory Universal Authentication as well as Active Directory Password Authentication. SSMS enables you to perform administrative actions against a deployed model, such as incremental data loads and other direct operations, through the user interface as well as through scripts. The main tasks include editing data sources, adding partitions, and processing tables. Recent releases of SSMS also include a DAX Query Editor, as introduced in “SSMS Improvements for Analysis Services in the April 2017 Release,” which is especially convenient if you want to double-check row counts after processing or run other queries. For example, the following screenshot shows a straightforward DAX query to count the rows for each TPC-DS table after a full import.

CountingTableRows 1024x910 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 3

But before SSMS can show any impressive row counts, it is necessary to get the TPC-DS data into the Tabular model. Initially, I had planned to do this in increments, but I was anxious to see if the full TPC-DS data set could be processed at all, so I decided to go all in at once with an attempt to import the full 1 TB of source data. This required modifying the existing partitions of the large tables in the deployed model (catalog_returns, catalog_sales, inventory, store_returns, store_sales, web_returns, and web_sales) to pull in all 100 data files per table. Accordingly, the ReadBlobData line in the source queries had to be changed from ReadBlobData(Source, “”, 1, 1) to ReadBlobData(Source, “”, 1, 100). By right-clicking on each large table, selecting Partitions, and then clicking on the Edit button in the Partitions dialog box, this task was quickly accomplished. Next, I ran a Tabular Model Scripting Language (TMSL) script to process these seven tables in full, as the following screenshot illustrates.

ProcessingAllDataAtOnce 1024x499 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 3

Processing took roughly 21 hours to complete (see the script execution time in the lower right corner of the SSMS query window above). Certainly, not an impressive processing performance, but it was exciting to see that an S9 Azure Analysis Services server could take a 1 TB TPC-DS data set. The server overallocated about 25 GB of memory (a total of 425 GB), but processing succeeded. After a manual server restart in the Azure Portal to free up any unused memory, the server reallocated approximately 390 GB to load the model. The following graph shows the memory allocation on the server prior and after the restart.

Memory AfterRestart 1024x443 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 3

Note that memory allocation is not necessarily equivalent to model size. Especially, the Intel Threading Building Blocks (Intel TBB) allocator might proactively allocate more memory than is strictly needed. The Intel TBB allocator is enabled by default on large Azure Analysis Services servers for best performance and scalability.

Perhaps a more detailed view of the memory consumption is available through the DISCOVER_OBJECT_MEMORY_USAGE schema rowset. Again, the numbers are not always exact, but they do provide a sufficient estimate. Kasper de Jonge published a useful workbook called BISMServerMemoryReport.xlsx that relies on the DISCOVER_OBJECT_MEMORY_USAGE rowset to analyze the memory consumption on an Analysis Services server at any desired level of detail. And thanks to the full compatibility and rather seamless exchangeability of Azure Analysis Services with SQL Server Analysis Services, it is straightforward to use Kasper’s workbook to analyze the size of the TPC-DS tables and their columns on an Azure Analysis Services server, as in the screenshot below.

BISMServerMemoryReport 1024x802 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 3

So, 1 TB of TPC-DS source data fit into a 350 GB Tabular model. This is not a sensational compression ratio, but the TPC-DS tables are rather wide and not optimized for column-based compression. Still, smaller models are easier to handle, so I looked for low-hanging fruit to reduce the model size and optimize the data import.

In terms of model size, the first and foremost optimization step is to eliminate unnecessary table columns from the model. As far as TPC-DS is concerned, unnecessary columns are those columns that are not referenced in any of the TPC-DS benchmark queries. Why import columns that aren’t participating in any queries? A quick analysis of the benchmark queries revealed that there are quite a few unused columns in the large TPC-DS tables. Furthermore, the BISMServerMemoryReport.xlsx workbook showed that these unused columns consume about 60 GB in the model (see the following spreadsheet). Eliminating these columns would yield nice savings in terms of model size and therefore memory capacity.

UnecessaryTPCDSColumns 1024x520 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 3

To remove these unnecessary columns, I switched back to SSDT Tabular, deleted the columns one at a time by using Tabular Model Explorer (TME), and then redeployed the model with Processing Options set to Full so that SSDT would fully reprocess the model after deployment. Following the deployment, I continued in SSMS as before to update the source queries of the large tables so that the ReadBlobData function would again include all data files for the large tables, and then ran my TMSL processing script one more time.

As anticipated, the resulting model was about 60 GB smaller than before and the server would allocate about 75 GB less memory, as shown below. Note, however, that the processing time did not decrease because the data transfer still included the full 1 TB of source data. This is because the data files first need to be transferred before file parsing can be performed locally on the Azure Analysis Services server. It is only that fewer parsed columns are mapped to table columns, resulting in a smaller model size. If Azure Blob Storage could filter out the unused columns right away, as more sophisticated data sources could, such as Azure SQL Data Warehouse, then the transfer of about 150 GB of raw data could have been avoided and processing time would have been improved as well. But this was not an option.

ProcessingSmallerModel 1024x761 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 3

Given that the files needed to be read from Azure Blob Storage as before, it was not necessary to edit the source queries or modify the ReadBlobData function. As the following diagram illustrates based on the store_sales table, the ReadBlobData function still reads the contents of all the source files and continues to offer the full set of parsed columns to Azure Analysis Services for import. It’s just that Azure Analysis Services ignores the ss_net_paid_inc_tax column because it was deleted from the store_sales table in the model.

ColumnParsinginAzureAS Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 3

If Azure Blob Storage does not offer an option to reduce the data volume at the source, then perhaps processing time can be improved by pulling more data into Azure Analysis Services in parallel. For the initial data import, I modified the existing partition on each of the seven large tables to import all 100 files per table. So, Azure Analysis Services processed seven partitions in parallel, which took more than 21 hours to complete. The next test would be to use two partitions per table, each importing 50 files.

In SSMS, connected to the Azure Analysis Services server, I performed the following steps (see also the screenshots below):

  1. Run Process Clear on all seven large tables to purge the existing data.
  2. Edit the exiting partition of each table and change the ReadBlobData(Source, ““, 1, 100) line to ReadBlobData(Source, ““, 1, 50).
  3. Create a copy of the partition to add a second partition to each table and change the ReadBlobData(Source, ““, 1, 50) line to ReadBlobData(Source, ““, 51, 50).
  4. Run Process Full on the large tables to import the data again.

As you can see in the screenshot below, using two partitions per table helped to reduce the processing time by roughly 7 hours.

14PartitionsProcessing 1024x659 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 3

If 2 partitions per table have such a positive effect, then 4 partitions might yield perhaps even more gains. There is, however, one more detail to consider: the maxConnections parameter on the data source. By default, maxConnections is not specified explicitly. The default value is 10. So, Analysis Services establishes a maximum of 10 concurrent connections to the data source by default. Yet, with 7 large tables in the model and 4 partitions each, Analysis Services would need to process 28 partitions in parallel. Hence, it is necessary to adjust the maxConnections setting, as in the screenshot below. Note that the user interface currently does not expose the maxConnections parameter for modern data sources. In the current tools, this parameter must be specified through TMSL or programmatically by using the Tabular Object Model. Note also that maxConnections should not exceed the number of processor cores on the server. With 28 partitions and maxConnections set to 28, the S9 Azure Analysis Services server was able to finish processing in 11 hours and 36 minutes.

28Partitions28maxConnections 1024x688 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 3

Subsequent experiments with higher partition counts (up to 100 partitions per table — one source file per partition) and additional storage accounts (up to seven — one for each table) did not produce any further noteworthy gains. As mentioned earlier, processing time could be reduced by switching to a more sophisticated data source, such as Azure SQL Data Warehouse, and then excluding the unnecessary columns at the source. A corresponding test showed that it took no more than an amazing 2 hours and 30 minutes to load the entire data set into an Azure SQL Data Warehouse by using PolyBase, following the steps outlined in the tutorial “Load data with PolyBase in SQL Data Warehouse,” and then processing times in Azure Analysis Services could be reduced to around 9 hours. But for the mere joy of processing 1 TB of raw blob data in Azure Analysis Services, 11 hours and 36 minutes was reasonably sufficient.

And that’s it for this rather detailed journey about deploying a Tabular 1400 model in Azure Analysis Services on top of a 1 TB TPC-DS data set in Azure Blob Storage. Thanks to the modern Get Data experience, you can build a flexible data import pipeline directly in the model and process even very large data sets within a reasonable timespan. And as always, please deploy the latest monthly release of SSDT Tabular and SSMS and use these tools to take Tabular 1400 in Azure Analysis Services for a test drive. Send us your feedback and suggestions by using ProBIToolsFeedback or SSASPrev at Microsoft.com. Or use any other available communication channels such as UserVoice or MSDN forums. Influence the evolution of the Analysis Services connectivity stack to the benefit of all our customers!

Let’s block ads! (Why?)

Analysis Services Team Blog

Google advances AI with ‘one model to learn them all’

 Google advances AI with ‘one model to learn them all’

Google quietly released an academic paper that could provide a blueprint for the future of machine learning. Called “One Model to Learn Them All,” it lays out a template for how to create a single machine learning model that can address multiple tasks well.

The MultiModel, as the Google researchers call it, was trained on a variety of tasks, including translation, language parsing, speech recognition, image recognition, and object detection. While its results don’t show radical improvements over existing approaches, they illustrate that training a machine learning system on a variety of tasks could help boost its overall performance.

For example, the MultiModel improved its accuracy on machine translation, speech, and parsing tasks when trained on all of the operations it was capable of, compared to when the model was just trained on one operation.

Google’s paper could provide a template for the development of future machine learning systems that are more broadly applicable, and potentially more accurate, than the narrow solutions that populate much of the market today. What’s more, these techniques (or those they spawn) could help reduce the amount of training data needed to create a viable machine learning algorithm.

That’s because the team’s results show that when the MultiModel is trained on all the tasks it’s capable of, its accuracy improves on tasks with less training data. That’s important, since it can be difficult to accumulate a sizable enough set of training data in some domains.

However, Google doesn’t claim to have a master algorithm that can learn everything at once. As its name implies, the MultiModel network includes systems that are tailor-made to address different challenges, along with systems that help direct input to those expert algorithms. This research does show that the approach Google took could be useful for future development of similar systems that address different domains.

It’s also worth noting that there’s plenty more testing to be done. Google’s results haven’t been verified, and it’s hard to know how well this research generalizes to other fields. The Google Brain team has released the MultiModel code as part of the TensorFlow open source project, so other people can experiment with it and find out.

Google also has some clear paths to improvement. The team pointed out that they didn’t spend a lot of time optimizing some of the system’s fixed parameters (known as “hyperparameters” in machine learning speak), and going through more extensive tweaking could help improve accuracy in the future.

Updated 10:45: This story initially said that there was not a timetable for releasing the MultiModel code under an open source license. The code was released last week. This story has been updated to note that and include a link to the repository. 

Let’s block ads! (Why?)

Big Data – VentureBeat

Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

The first part in this series covering Azure Analysis Services models on top of Azure Blob Storage discussed techniques to implement a small Tabular 1400 model based on synthetic TPC-DS source data. This second part continues the journey to take Tabular 1400 in Azure Analysis Services to larger scale factors—up to the maximum capacity that Azure Analysis Services currently provides.

Taking a Tabular 1400 model to large scale requires an efficient approach that mitigates limitations in the tools and maximizes performance in the model. For starters, it would take a long time to generate 1 TB of TPC-DS source data by using a single dsdgen instance. A much better approach is to run multiple dsdgen instances in parallel to save time and create a 1 TB set of smaller source files that are easier to handle individually. Furthermore, having generated and uploaded the source data to Azure Blob storage, it would not be advisable to create the Tabular model directly against the full set of data because SQL Server Data Tools for Analysis Services Tabular (SSDT Tabular) would attempt to download all that data to the workspace server. Even if the workspace server had the capacity, it’s an unnecessarily large data transfer. Instead, it is a common best practice to create a representative subset of the data and then build the data model against that source, and then later switch the source during production deployment. Moreover, the Tabular 1400 model must be designed with data management and performance requirements in mind. Among other things, this includes a partitioning scheme for the tables in the model. And last but not least, the source queries of the table partitions should be optimized to avoid redundancies and keep the model metadata clean and small. The May 2017 release of SSDT Tabular introduces support for named expressions in Tabular 1400 models and this article demonstrates how to use them for source query optimization.

Generating 1 TB of TPC-DS source data by using multiple dsdgen instances is easy thanks to the command line parameters PARALLEL and CHILD. The PARALLEL parameter indicates the overall number of child processes to generate the source data. The CHILD parameter defines which particular chunk of data a particular dsdgen instance generates. For example, I distributed the data generation across 10 virtual machines in Azure, with each VM running 10 child dsdgen instances. Running in parallel, these 10 instances utilized the eight available cores per VM close to 100% and finished the data generation in roughly 2 hours. The following screenshot shows the resource utilization on one of the VMs about half an hour into the processing.

VM Resource Utilization Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

Windows PowerShell and the Azure Resource Manager cmdlets make provisioning 10 Azure VMs a blast. For a sample script to create a fully configured virtual machine, see the article “Create a fully configured virtual machine with PowerShell.” I then installed AzCopy via https://aka.ms/downloadazcopypr and copied the TPC-DS tool set and an empty data folder with all the sub-containers to each VM (as discussed in Part 1). Next, I slightly modified the batch file from Part 1 to create and upload the source files to accommodate the different file name format that dsdgen uses when PARALLEL and CHILD parameters are specified. Instead of

.dat, dsdgen now generates files names as __ .dat. An additional batch file then helped to launch the 10 data generation processes, passing in the child ID and Blob service endpoint URL as parameters. It’s a trivial batch as listed below. On the second VM, the loop would start at 11 and go to 20, and so forth (see the following illustration).

@echo off
for /l %%x in (1, 1, 10) do (

start createandupload.bat %%x https://.blob.core.windows.net/

)

ParallelSourceFileUpload Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

Having finished the data generation and verified that all files were uploaded to Azure Blob storage successfully, I deleted all provisioned VM resources by using a single Remove-AzureRmResourceGroup command. Deleting a resource group deletes all associated Azure resources. Needless to say that the Azure Blob storage account with the generated data must not be associated with this resource group for it must remain available for the next steps.

The next task is to create a representative sample of the TPC-DS data for modelling purposes. This can be as easy as placing the 1 GB data set generated in Part 1 in a separate Azure Blob storage account. However, dsdgen creates a different set of source files per table for the 1 GB versus the 1 TB scale factor, even if the same PARALLEL and CHILD parameters are specified. If it is important to generate the same set of source files just with less data—and in my case it is because I want to create source queries on top of a large collection of blobs representative of a 1 TB data set—a different approach is needed.

By using the 1 GB data set from Part 1 and the code snippet below, I generated a representative set of sample files identical to the ones that dsdgen generates for a 1 TB data set. The following table summarizes how the code snippet distributed the 1 GB of data across the sample files. The code snippet then used the Azure SDK to upload the files to my second Azure Blob storage account.


Table Name Row Count (1GB) File Count (1TB) Max Rows Per Sample File
call_center 6 1 6
catalog_page 11718 1 11718
catalog_returns 144067 100 1441
catalog_sales 1441548 100 14416
customer 100000 100 1000
customer_address 50000 100 500
customer_demographics 1920800 100 19208
date_dim 73049 1 73049
dbgen_version 1 1 1
household_demographics 7200 1 7200
income_band 20 1 20
inventory 11745000 100 117450
item 18000 1 18000
promotion 300 1 300
reason 35 1 35
ship_mode 20 1 20
store 12 1 12
store_returns 287514 100 2876
store_sales 2880404 100 28805
time_dim 86400 1 86400
warehouse 5 1 5
web_page 60 1 60
web_returns 71763 100 718
web_sales 719384 100 7194
web_site 30 1 30

CodeListingSourceFileGeneration 1024x770 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

The code is not very efficient, but it does get the job done eventually—giving me enough time to think about the partitioning scheme for the larger tables in the model. If you study the available Performance Guide for Analysis Services Tabular, you will find that the partitioning of tables in a Tabular model does not help to improve query performance. However, starting with SQL Server 2016 Analysis Services, Tabular models can process multiple partitions in parallel, so partitioning can help to improve processing performance. Still, as the performance guide points out, excessive partitioning could result in many small column segments, which could impact query performance. It’s therefore best to be conservative. A main reason for partitioning in Tabular models is to aid in incremental data loading, which is precisely my intention.

The goal is to load as much TPC-DS data as possible into a Tabular 1400 model. The largest Azure Analysis Services server currently has 20 cores and 100 GB of RAM. Even larger servers with 200 GB or 400 GB of RAM will be available soon. So, how large of a Tabular 1400 model can such a server load? The answer depends on, among other things, the compressibility of the source data. Achievable ratios can vary widely. With a cautious assumption of 2:1 compressibility, 1 TB of source data would far exceed 100 GB of RAM. It’s going to be necessary to start with smaller subsets. And even if a larger server could fit all the source data into 400 GB of RAM, it would still be advisable to go for incremental data loading. The data set consists of more than 1,000 blob files. Pulling all these files into a Tabular model at once would likely hit throttling thresholds on the Azure Blob storage side causing substantial delays during processing.

The TPC-DS tables can be categorized as follows:

Category Tables Amount of Data
Small tables with only 1 source file per table. call_center, catalog_page, date_dim, dbgen_version, household_demographics, income_band, item, promotion, reason, ship_mode, store, time_dim, warehouse, web_page, web_site ~0.1 GB
Medium tables with 100 source files per table. customer, customer_address,

customer_demographics

~5 GB
Large tables with 100 source files per table. catalog_returns, catalog_sales, inventory, store_returns, store_sales, web_returns, web_sales ~950 GB

The small tables with only 1 source file per table can be imported at once. These tables do not require an incremental loading strategy. Similarly, the medium files do not add much data and can be loaded in full, but the large tables require a staged approach. So, the first processing cycle will include all files for the small and medium tables, but only the first source file for each large table. This reduces the source data volume to approximately 10 GB for the initial processing cycle. Subsequent processing cycles can then add further partitions to the large tables to import the remaining data until RAM capacity is exhausted on the Azure Analysis Services server. The following diagram illustrates this staged loading process.

StagedLoadingProcess 1024x960 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

By using the 1 GB sample data set in Azure Blob storage, I can now build a Tabular 1400 model by using the May 2017 release of SSDT Tabular and implement the staged loading process by taking advantage of named expressions in the source queries. Note that previous SSDT Tabular releases are not able to deal with named expressions. The May (or a later) release is an absolute must have.

Regarding source queries, the small tables with a single source file don’t require any special attention. The source queries covered in Part 1 would suffice. So, let’s take one of the more interesting medium tables that comprises 100 source files, such as the customer table, create the source query for that table, and then see how the result could apply in a similar way to all other tables in the model.

The first step is to create a source query for the customer table by using Query Builder in SSDT Tabular. And the first task is to exclude the header file from the list of data files. It will be included later. In Navigator, select the customer table, and then in Query Builder, right-click on the Name cell in the last row (showing a value of “header_customer.dat”), expand Text Filters, and then select Does not start with. Next, in the Applied Steps list, for the resulting Filtered Rows step, click on the Settings button, and then in the Filter Rows dialog box, change the value for “does not begin with” from “header_customer.dat” to just “header” so that this filter can be applied later on in the same way to any header file in the data set. Click OK and verify that the header file has disappeared from the list of source files.

The next task is to combine the remaining files for the table. In the header cell of the Content column, click on the Combine Files button, as illustrated in the following screenshot, and then in the Combine Files dialog box, click OK.

CombiningFiles 1024x957 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

As you can see in the Queries pane above, this sequence of steps creates quite a few new expressions, which the customer query relies on to combine the 100 source files for the table. However, apart from the fact that the header file still needs to be added to get the column names, it is a good idea to optimize the expressions at this point to keep the model free of clutter. The sample queries are unnecessary and should be eliminated. This is especially notable when importing many tables. In the TPC-DS case with 25 tables, Query Builder would generate 25 different sets of these sample queries, which would amount to a total of 75 avoidable expressions in the model. The only named expression worth keeping is the Transform File from customer function. Again, Query Builder would generate multiple such transform functions (one for each table) where only a single such function suffices.

The first cleanup step is to eliminate the need for the sample queries by editing the customer source query. In Query Builder, in the Applied Steps list, delete all steps after the “Invoke Custom Function 1” step so that this invoke step is the last step in the query, which adds a “Transform File from customer” column to the table. Right-click this column and select Remove other Columns so that it is the only remaining column. Next, click on the Expand button in this column’s header, and then make sure you deselect the Use original column name as prefix checkbox and click OK. At this point, the customer query no longer references any sample expressions so they can be deleted in Query Builder. Also, rename the “Transform File from customer” function and just call it “Transform File” as in the screenshot below so that it can be used across multiple tables without causing confusion. I also shortened the M expression for this function as follows.

let

Source = (File) => let
Source = Csv.Document(File,[Delimiter=”|”])

in

OptimizingSourceQuery 1024x625 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

Note that even the Transform File function could be eliminated by replacing its reference in the Invoke Custom Function1 step with a direct call to the Csv.Document function, as in each Csv.Document([Content],[Delimiter=”|”]). But don’t eliminate the Transform File function just yet.

The next task is to extend the customer query to discover and apply the header file dynamically. This involves the following high-level steps:

Step M Expression
Get all files from the Blob container for the customer table that start with “header”. #”Get Header File List” = Table.SelectRows(customer1, each Text.StartsWith([Name], “header”))
Read the content of the first file from this list by using the Transform File function. There should only be one header file. Any additional files would be ignored. #”Read First Header File” = #”Transform File”(#”Get Header File List”{0}[Content]),
Transform the first row from this header file into a table. #”Headers Table” = Record.ToTable(#”Read First Header File”{0})
Clean up the headers table by removing any rows that have no values. #”Cleaned Headers Table” = Table.SelectRows(#”Headers Table”, each [Value] <> null and [Value] <> “”)
Modify the Expanded Transform File step and replace the long lists of static column references with the Name and Value lists from the cleaned headers table. #”Expanded Transform File” = Table.ExpandTableColumn(#”Removed Other Columns”, “Transform File”, #”Cleaned Headers Table”[Name], #”Cleaned Headers Table”[Value])

The result is a customer table that includes all source files from the table’s Blob container with the correct header names, as in the following screenshot.

FinishedCustomerTable 1024x747 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

This is great progress, but the job is not yet done. The next challenge is to convert this source query into a global function so that it can be used across all tables, not just the customer table. The existing Transform File function can serve as a template for the new function. In Query Builder, right-click on Transform File, and select Duplicate. Give the new function a meaningful name, such as ReadBlobData.

M functions follow the format = (Parameter List) => let statement. As parameters, I use DataSource and BlobContainerName, and the let statement is almost an exact copy of the query for the customer table, except that I replaced the data source and container references with the corresponding DataSource and BlobContainerName parameters. It’s relatively straightforward to copy and adjust the entire source query by using Advanced Editor, as in the screenshot below. Also, make sure to save the original source query in a separate text file because it might be needed again. The next step then is to replace the customer source query and call the ReadBlobData function instead, as follows (note that the data source name is specific to my data model):

let

Source = #”AzureBlobs/https://tpcdsfiles blob core windows net/”,
BlobData = ReadBlobData(Source, “customer”)

in

ReadBlobDataFunction 1024x708 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

The results so far suffice for the customer table, but there is one more requirement to support the staged imports for the large tables. In other words, the ReadBlobData function should not just read all the source files from a given Azure Blob container at once but in specified ranges. In Query Editor, this is easy to add to the original table query. It is not so easy to do in a complex named expression, such as the ReadBlobData function. Unfortunately, editing a complex named expression in Query Editor almost always requires jumping into the Advanced Editor. No doubt, there is room for improvements in future SSDT Tabular releases.

As workaround, I temporarily reverted the customer query using my previously saved copy, selected the Filtered Rows step, and then on the Rows menu, selected Keep Range of Rows. After clicking Insert in the Insert Step dialog box, I specified appropriate values for the First row and Number of rows parameters and clicked OK (see the following screenshot).

KeepRangeOfRows 1024x681 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

The new Kept Range of Rows step then needed to be inserted into the ReadBlobData function in between the Filtered Rows and the Invoke Custom Function1 steps in Advanced Editor. The ReadBlobData function also required two additional parameters called FirstRow and NumberOfRows, as in #”Kept Range of Rows” = Table.Range(#”Filtered Rows”, Value.Subtract(FirstRow, 1),NumberOfRows). Note that the Query Builder UI considers the value of 1 to refer to row 0, so the ReadBlobData function uses the Value.Substract function to maintain this behavior for the FirstRow parameter. This completes the ReadBlobData function (see the following code listing). It can now be called from all source queries in the model, as summarized in the table below.

let

Source = (DataSource, BlobContainerName, FirstRow, NumberOfRows) => let
customer1 = DataSource{[Name=BlobContainerName]}[Data],
#”Filtered Rows” = Table.SelectRows(customer1, each not Text.StartsWith([Name], “header”)),
#”Kept Range of Rows” = Table.Range(#”Filtered Rows”,Value.Subtract(FirstRow, 1),NumberOfRows),
#”Invoke Custom Function1″ = Table.AddColumn(#”Kept Range of Rows”, “Transform File”, each #”Transform File”([Content])),
#”Removed Other Columns” = Table.SelectColumns(#”Invoke Custom Function1″,{“Transform File”}),
#”Get Header File List” = Table.SelectRows(customer1, each Text.StartsWith([Name], “header”)),
#”Read First Header File” = #”Transform File”(#”Get Header File List”{0}[Content]),
#”Headers Table” = Record.ToTable(#”Read First Header File”{0}),
#”Cleaned Headers Table” = Table.SelectRows(#”Headers Table”, each [Value] <> null and [Value] <> “”),
#”Expanded Transform File” = Table.ExpandTableColumn(#”Removed Other Columns”, “Transform File”, #”Cleaned Headers Table”[Name], #”Cleaned Headers Table”[Value])
#”Expanded Transform File”

in

Category Source Query
Small tables with only 1 source file per table. let

Source = #”AzureBlobs/https://tpcdsfiles blob core windows net/”,

BlobData = ReadBlobData(Source, “<blob container name>”, 1, 1)

in

BlobData

Medium tables with 100 source files per table. let

Source = #”AzureBlobs/https://tpcdsfiles blob core windows net/”,

BlobData = ReadBlobData(Source, “<blob container name>”, 1, 100)

in

BlobData

Large tables with 100 source files per table. let

Source = #”AzureBlobs/https://tpcdsfiles blob core windows net/”,

BlobData = ReadBlobData(Source, “<blob container name>”, 1, 1)

in

BlobData

It is straightforward to create the 25 TPC-DS tables in the model by using the above source query pattern. Still, there is one more issue that must be addressed and that is that the source queries do not yet detect the data types for the table columns. This is an important prerequisite to analyzing the data. For each table, I modified the source query as follows:

  1. In Query Builder on the Rows menu, select all columns.
  2. On the Transform menu, under Any Column, select Detect Data Type.
  3. Display Advanced Editor and double-check that the detected data type for each column in the Changed Type step is correct.

As a side note, instead of editing the source query of an existing table, it is currently better to delete the table and recreate it from scratch. There are still some work items left to finish before the table editing in SSDT Tabular can work reliably.

And that’s it as far as creating the tables for my TPC-DS Tabular 1400 model is concerned. The initial data load into the workspace model finishes quickly because I’m working against the 1 GB sample data set. The row counts in the following screenshot confirm that only a small subset of the data is imported.

ImportingTables 1024x749 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

The Tabular model is now ready for deployment to an Azure Analysis Services server. Apart from updating the deployment settings in the Tabular project properties to point SSDT Tabular to the desired target server, this would require changing the data source definition to import the data from the actual 1 TB data set. In Tabular Model Explorer, this can be accomplished by right-clicking on the existing data source object, and then choosing Change Source, which displays the Azure Blob Storage dialog box to update the Account name or URL, as in the following screenshot.

ChangingDataSource 1024x619 Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

Finally, the model can be deployed by switching to Solution Explorer, right-clicking on the project node, and selecting Deploy. If necessary, SSDT will prompt for the access key before deploying the model. Processing only takes minutes because the source queries only import a few gigabytes at this time (see the screenshots below for processor activity and memory consumption on the Azure Analysis Server during and after processing). Having finished the deployment and initial processing, it is a good idea to change the data source definition again to revert to the 1 GB sample data set. This helps to avoid accidentally downloading large amounts of data to the SSDT workstation.

QPU Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

memory Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

And that’s it for this round of working with a Tabular 1400 model in Azure Analysis Services on top of Azure Blob storage. The data model is now deployed. The next part is to add partitions to load more data. For this part, I am going to switch from SSDT Tabular to SQL Server Management Studio (SSMS), Tabular Model Scripting Language (TMSL), and Tabular Object Model (TOM) in Azure Functions. One of the main reasons is that SSDT Tabular does not facilitate incremental modifications to a model. It prefers to deploy models in an all or nothing method, which is not suitable for the data load strategy discussed in the current article. SSMS, TMSL, and TOM provide finer control. So, stay tuned for part 3 in this series to put Azure Analysis Services under some serious data pressure.

And as always, please take Tabular 1400 for a test drive. Send us your feedback and suggestions by using ProBIToolsFeedback or SSASPrev at Microsoft.com. Or use any other available communication channels such as UserVoice or MSDN forums. Influence the evolution of the Analysis Services connectivity stack to the benefit of all our customers!

Let’s block ads! (Why?)

Analysis Services Team Blog