Tag Archives: Server

Troubleshooting Failed PolyBase Installation in SQL Server 2017

This article shows you how to troubleshoot a failed installation of SQL Server and how to implement a workaround to allow SQL Server 2017’s PolyBase feature to be installed when version 9 of the Java Runtime Environment (JRE) is present. An installation of all features in SQL Server 2017 has three external dependencies. Python, R, and the JRE are third party or open source software needed in a full installation. Changes to external software after the release of SQL Server 2017 can introduce breaking changes. Oracle, the company that owns Java, changed how Windows registry keys are named. This caused a breaking change for SQL Server 2017. Version 8 of the JRE is compatible with the SQL Server 2017 installer. Version 9 of the JRE is not. If version 9 of the JRE is the only version of the JRE on a Windows machine, it is not possible to install the PolyBase feature. The JRE version bug also is found in the SQL Server 2016 installer. The same workaround works for both SQL Server 2016 and 2017.

SQL Server 2017’s PolyBase feature requires the Oracle JRE version 7 update 51 or higher. Version 8 of the JRE works just fine. Version 9 is clearly higher than version 7, but the SQL Server 2017 installer cannot detect that version 9 of the JRE is present when it is. This causes the following error message to appear:

cidimage001 png01d37128 1045b270 1 Troubleshooting Failed PolyBase Installation in SQL Server 2017

Figure 1. SQL Server 2017 installer blocked when the Oracle JRE is version 9.

When you really want to troubleshoot a failed SQL Server installation, clicking the Failed hyperlinks in SQL Server installer dialog boxes is not enough. You need to go to the installer’s log file named Detail.txt. By default, you find SQL Server 2017 installer log files in C:\Program Files\Microsoft SQL Server0\Setup Bootstrap\Log where 140 refers to SQL Server 2017. For SQL Server 2016, you will find the log file in C:\Program Files\Microsoft SQL Server0\Setup Bootstrap\Log where 130 refers to SQL Server 2016.

Close examination of the Detail.txt file reveals the following when the JRE major version number is 9:

The installer is telling us that the problem is not actually with the Java runtime itself, but instead where the installer is looking in the registry. To understand the root cause of the problem, let’s compare the registry keys for JRE version 8 with the registry keys for JRE version 9.

Notice that the JRE version 8 registry key ends with Java Runtime Environment.

cidimage002 png01d3700e 13c0eca0 1 Troubleshooting Failed PolyBase Installation in SQL Server 2017

Figure 2. JRE version 8 key name ends with Java Runtime Environment.

Notice that the JRE version 8 registry key ends with JRE. That is why the SQL Server 2016 and 2017 installers cannot find the JRE version 9.

cidimage001 png01d3700d aa483120 1 Troubleshooting Failed PolyBase Installation in SQL Server 2017

Figure 3. JRE version 9 key name ends with JRE.

There are really only three options for installing PolyBase with SQL Server 2016 or 2017.

  1. Install a compatible JRE with a major version number less than 9.
  2. Wait for Microsoft to issue a patch.
  3. Update your machine’s registry to let the SQL Server installer find what it is looking for.

I choose to update the registry to allow SQL Server 2017 to be installed when the installed JRE is version 9. To do this, you need to export the registry key for the JRE, edit it, and import the modified keys into the registry.

To export the registry key, use regedit and navigate to HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\JRE. Right-click and select Export. Save the registry key to a file which in this example is named jreRegKey.reg. Edit the file with a text editor such as Notepad.

cidimage001 png01d37026 08fc7fb0 1 Troubleshooting Failed PolyBase Installation in SQL Server 2017

Figure 4. HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\JRE

Be careful when editing the registry file. You can do a global replacement if and only if you match case. You want to replace all occurrences of JRE with Java Runtime Environment. You must not alter the lowercase jre references.

word image 4 Troubleshooting Failed PolyBase Installation in SQL Server 2017

Figure 5. Case-sensitive global edit to create new registry key names.

word image 5 Troubleshooting Failed PolyBase Installation in SQL Server 2017

Figure 6. Modified registry file.

Save your changes and import the modified reg file by double-clicking it. With the new registry keys imported, the SQL Server 2017 installer is able to find the JRE version 9 and successfully complete the installation process as evidenced by the following query:

word image 6 Troubleshooting Failed PolyBase Installation in SQL Server 2017

Figure 7. Confirmation of successful installation of PolyBase with JRE version 9.

The query was obtained from the Get started with PolyBase page on Microsoft’s website.

As you can see, SQL Server installer log files provide greater insight allowing you to do root cause analysis of failed installations. Placing dummy entries in the registry to work around a breaking change may be too extreme for some, but it is possible to quickly overcome the PolyBase installation problem by making a simple registry update as described in this article.

Let’s block ads! (Why?)

SQL – Simple Talk

Cumulative Update #9 for SQL Server 2016 RTM

The 9th cumulative update release for SQL Server 2016 RTM is now available for download at the Microsoft Downloads site. Please note that registration is no longer required to download Cumulative updates. To learn more about the release or servicing model, please visit:

Let’s block ads! (Why?)

SQL Server Release Services

SQL Server Machine Learning Services – Part 1: Python Basics

With the release of SQL Server 2017, Microsoft changed the name of R Services to Machine Learning Services (MLS) and added support for Python, a widely implemented programming language known for its straightforward syntax and code readability. As with the R language, you can use Python to transform and analyze data within the context of a SQL Server database and then return the modified data to a calling application.

In this article, I explain how to run Python scripts within SQL Server. I start by introducing the sp_execute_external_script stored procedure and then dive into specifics about working with Python scripts. If you’re familiar with running R scripts in SQL Server, you already have the foundation you need to get started with Python. You use the same stored procedure and its parameters, merely swapping out one programming language for the other. The differences between the two approaches are primarily within the languages themselves.

This article is the first in a series about MLS and Python. In it, we’ll review several examples that demonstrate how to run Python scripts, using data from the AdventureWorks2017 database. The examples are very basic and meant only to introduce you to the fundamentals of working with Python in SQL Server. In subsequent articles, we’ll dig into Python’s capabilities in more depth, relying on this article as a foundation for the ones to follow.

Getting started with Python scripting

To run a Python script in SQL Server, use the sp_execute_external_script stored procedure. First you must ensure that the database engine’s MLS and Python components are installed, and the external scripting feature is enabled. For details on how to get all this set up, see the Microsoft document Set up Python Machine Learning Services (In-Database).

After MLS has been installed and enabled, you can use the sp_execute_external_script stored procedure to run Python scripts (or R scripts, for that matter). The following syntax shows the procedure’s basic elements:

You’ll learn the specifics of the syntax as you work through the examples in this article. Note, however, that the SQL Server document sp_execute_external_script (Transact-SQL) no longer includes the WITHRESULTSSETS clause as part of the syntax as it has in the past. Instead, the document points to the EXECUTE (Transact-SQL) topic for details on how to use this clause. I’ve included it here for completeness, but it’s actually part of the EXECUTE command.

To demonstrate how to run the sp_execute_external_script stored procedure, start with a simple example that retrieves sales data from the AdventureWorks2017 database, passes the data into a Python script, and then returns the data without modifying it:

The example can be broken into three distinct steps:

  • Declare the @pscript variable, define the Python script, and assign the script to the variable.
  • Declare the @sqlscript variable, define the T-SQL query, and assign the query to the variable.
  • Run the sp_execute_external_script stored procedure, passing in the @pscript and @sqlscript variables. You don’t have to pass the Python script and T-SQL query as variables, but it makes the code more readable.

Begin with the first step. After declaring the @pscript variable, you create the Python script, which starts by assigning the output from the T-SQL query to the df variable:

By default, the data returned by the T-SQL query can be accessed within the Python script by calling the InputDataSet variable. In this way, you can assign the SQL Server dataset to the df variable and then work with that dataset as necessary. As you’ll see later in the article, you can also specify another name to replace the InputDataSet variable, but for now, stick with the default.

Next, assign the df dataset to the OutputDataSet variable:

The OutputDataSet variable is the default variable used to return the script’s data to the calling application. As with the InputDataSet variable, you can change the name, but for now, use the default.

In this case, the script does nothing more than return the same data that you’re retrieving from SQL Server, something you could have done without using a Python script. In fact, you do not even have to include the df variable and can instead simply assign the input variable to the output variable:

I’ve included the df variable here to help demonstrate how to begin working with SQL Server data within a Python script. More often than not, you’ll want to manipulate and analyze the SQL Server data. Assigning the source data to a separate variable allows you to modify the source data, while still maintaining the original dataset. It’s up to you to decide the best way to move data from one object to the next, based on your specific circumstances.

Next, define the T-SQL query and assign it to the @sqlscript variable. In this case, the query is just a simple join, but you can make it as complex as it needs to be. One item worth noting, however, is that the code converts the Sales column to the FLOAT data type. This is because Python supports a limited number of types when compared to SQL Server, and the Python engine can implicitly convert only some of the SQL Server types.

For example, the Territories column is configured with the NVARCHAR data type. When the Python script runs, the Python engine implicitly converts the column to the Python str type. However, the Sales column is configured with the MONEY type. The Python engine cannot implicitly convert this type, so you must explicitly convert it to one that the engine can handle. In this case, you’re converting the Sales column to FLOAT, which the Python engine can then convert to the Python float64 type. (For more information on the SQL Server types that Python supports, see the Microsoft document Python Libraries and Data Types.)

After defining the Python script and T-SQL query, you can call the sp_execute_external_script stored procedure, passing in the necessary parameter values. For the @language parameter, specify Python. If you were working with an R script, you would pass in R as the value. Currently, those are the only two values that the parameter supports.

For the @script parameter, specify the @pscript variable, and for the @input_data_1 parameter, specify the @sqlscript variable. The @input_data_1 parameter is optional. You do not have to include SQL Server data in the Python script, but if you’re going to be running Python within the SQL Server environment, it seems likely you’ll want to use SQL Server data.

With the parameters in place, you can call the stored procedure. When you do, the Python engine runs the Python script, using the SQL Server data, and returns the dataset to the calling application. The following figure shows the first chunk of rows returned by the script.

20screenshots1 stpython1 fig01 png 1 SQL Server Machine Learning Services – Part 1: Python Basics

As you can see, the Python script returns a basic result set with the data organized into columns and rows, just like the source data. You might have noticed, however, that the returned data includes no column names. For that, you need to specifically assign names to the returned data.

Assigning names to Python script objects

When using the sp_execute_external_script stored procedure to run Python scripts, you can specify names for the default variables assigned to the input and output datasets. You can also specify column names for the outputted dataset. The following procedure call assigns the name RawData to the input variable, the name SumData to the output variable, and the names Territories and Sales to the output columns:

To rename the input variable, you must include the @input_data_1_name parameter when calling the sp_execute_external_script stored procedure, assigning the new name to that parameter. You must then use the name within the Python script when referencing the input dataset:

To rename the output variable, you must include the @output_data_1_name parameter when calling the stored procedure, assigning the new name to that parameter. You must then use that name within the Python script when referencing the output dataset:

The next step is to assign names to the columns. To do this, include the WITHRESULTSETS clause when calling the stored procedure, specifying the column names and their data types. The list of column names must be enclosed in double parentheses with the names separated by commas. SQL Server will then assign those names to the output dataset in the order they’re specified, as shown in the following results.

20screenshots1 stpython1 fig02 png 1 SQL Server Machine Learning Services – Part 1: Python Basics

When assigning column names, keep in mind that the database engine must be able to implicitly convert the returned data to that type. For example, in the WITHRESULTSETS clause above, you assign the Territories name to the first column, specifying the NVARCHAR(50) data type. If you had instead specified CHAR(5), many of the column values would have been truncated. If you had specified INT for Sales, the database engine would have returned an error.

With regard to the Sales column, SQL Server easily converts the Python float64 type to the SQL Server MONEY type without changing the data. You can get away with this because the values are already limited to four decimal places when you import the data into the Python script. If there had been a greater number of decimal places in the data returned by the Python script, the values would have been rounded when they were converted. Often this will not be a problem, depending on the level of precision that’s required, but it points to the fact that, whenever data is implicitly converted, you need to be aware of its potential impact.

When it comes to the output dataset, it’s up to you whether to specify column names. You’ll likely want to decide on a case-by-case basis, depending on your application needs.

The same goes for renaming the input and output variables. There might be times when renaming them can be useful for clarity, especially if you’re importing additional datasets. Often, however, it’s just a matter of preference. Personally, I prefer not to rename the input and output variables if there’s no overriding reason to do so. I find it easier to review code that uses the default names.

Manipulating the Python data frame

If you’ve worked with the R language in SQL Server, you’re no doubt familiar with the concept of the data frame, a two-dimensional data structure similar to a SQL Server table. Data frames make it easier to manipulate and analyze sets of data within your scripts. Python data frames operate much the same way as they do in the R language and are essential to working with the data within the context of SQL Server.

The InputDataSet and OutputDataSet variables both store data as data frames. This is true even if you rename the variables. The variables are both DataFrame objects, based on the DataFrame class. The SQL Server data imported into a Python script is converted to a DataFrame object, and the data returned by a Python script is passed to the output variable as a DataFrame object. And much of what you do with that data in between often involves DataFrame objects.

A simple way to confirm that you’re working with data frames is to modify the first example to return only the data type of the df variable:

To get the data type, you call the type function, specifying the df variable, and then call the print function to return the data type to the application. When I ran this script in SQL Server Management Studio, I received the following message:

As you can see, the df variable is a DataFrame object, which it inherited from the InputDataSet variable. Notice, however, that the namespace is actually pandas.core.frame.DataFrame. This is because the DataFrame class is part of the Pandas library which is included in the MLS Python installation. We’ll be covering the Pandas library in more detail in subsequent articles. Until then, you can learn more about Pandas at https://pandas.pydata.org/.

Now try to do something with the data you’re importing from SQL Server. You’ll start by aggregating the Sales values to determine the total amount of sales per territory:

To aggregate the data in the df1 data frame, call the object’s groupby function which is a member of the Pandas DataFrame class:

When calling the groupby function, specify the df1 variable, followed by a period and then the function name. Although the function can take several arguments, this example includes only two. The first is the Territories column which indicates that the data will be grouped based on values in that column. The second argument is the as_index option which is set to False. This tells the Python engine to return the Territories values, ensuring that the unique Territories names are included in the results along with the aggregated sales data.

After specifying the groupby arguments, tag on the sum function to indicate that the Sales values should be added together for each group. You can use any of the supported aggregate functions, such as mean, max, min, first, or last.

Because the Sales column is the only column in the dataset other than Territories, you do not need to do anything else with the function itself. Had the dataset contained additional columns that you did not want to include, you would have had to specify the Sales column when calling the groupby function.

Next, assign the aggregated results to the df2 variable and then assign that variable to the OutputDataSet variable, giving the following results.

20screenshots1 stpython1 fig04 png 1 SQL Server Machine Learning Services – Part 1: Python Basics

The next step is to add a calculated column to the df2 data frame that rates a territory’s performance based on the aggregated Sales values. To do so, modify the data frame by defining the range of each predefined value:

Start by defining a new column named Performance and assigning the default value Acceptable to the column:

To add the column, you need only specify the df2 variable, followed by brackets that include the name of the new column in quotes. You then use an equal operator (=) to set the column’s default value to Acceptable.

Next, modify the column definition to include a second possible value, using a conditional expression to define a data range for when to apply that value:

Again, start by specifying the df2 variable and the name of the column in brackets. Next, define the conditional expression, enclosed in a second set of brackets. The expression states that if the Sales value is less than 7 million, the Performance value should be set to Poor. You then repeat this process to add a third value, only this time the conditional expression indicates that, if the Sales value exceeds 10 million, the Performance value should be set to Excellent:

That’s all there is to adding a calculated column to the data frame. The Python script now returns the results shown in the following figure.

20screenshots1 stpython1 fig05 png 1 SQL Server Machine Learning Services – Part 1: Python Basics

Of course, you could have achieved the same results with T-SQL alone, but even such a simple example helps demonstrate some of the ways that you can manipulate data to achieve various results. In practice, your Python scripts will likely be much more complex if you are to perform meaningful analytics.

Running a Python script within a stored procedure

Often when you’re working with Python, you’ll want to call the sp_execute_external_script stored procedure from within a user-defined stored procedure that can then be evoked from a calling application. To do so, simply embed the T-SQL and Python code and sp_execute_external_script procedure call in the procedure definition, as shown in the following example:

There’s no magic here, just a basic definition that creates the GetSalesTotals procedure which calls the sp_execute_external_script procedure and passes in the Python script and T-SQL query as we’ve seen in previous examples. You can then use an EXECUTE statement to run the procedure:

The procedure will return the same results as the previous example, only now you’re able to persist the logic. You can also define a stored procedure to include input parameters whose values can then be consumed by the Python script:

This time, when defining the stored procedure, include two input parameters, @low and @high, both defined with the INT data type. The first parameter will set the low range for the calculated column, and the second parameter will set the high range.

You then need to tie these parameters to the Python script. To do so, include three additional parameters when calling the sp_execute_external_script stored procedure. The first is the @params parameter, which specifies the @LowSales and @HighSales parameters. These are used to pass values into the Python script. Both parameters are defined with the FLOAT data type.

You must then assign the @low and @high parameters to the @LowSales and @HighSales parameters, respectively:

You can then reference the @LowSales and @HighSales parameters inside the Python script using the names LowSales and HighSales:

Now when you call the stored procedure, you can pass in the low and high values used to define the ranges in the Performance column:

In this case, you’re specifying the same values that you used in the previous example, so you will receive that same results. However, you can also specify different values:

The stored procedure now returns the results shown in the following figure which indicate that only one territory is performing poorly and only one above average.

20screenshots1 stpython1 fig06 png 1 SQL Server Machine Learning Services – Part 1: Python Basics

Although this is a fairly trivial example, it demonstrates how you can dynamically interact with the Python script by passing in values when calling stored procedures in order to better control the results.

Getting to know Python and Machine Learning Services

Clearly, we’ve barely skimmed the surface of all we can do with Python in SQL Server. Python is a flexible and extensive language that provides a wide range of options for manipulating and analyzing data. Best of all, being able to run Python within SQL Server makes it possible to analyze a database’s data directly within the context of that database.

As we progress through this series, we’ll dig more into the Python language and the various ways we can use it to manipulate data and perform analytics. In the meantime, this article should provide you with a good foundation for getting started with Python in SQL Server. We’ll be building on this foundation going forward. I also encourage you to dig into Python yourself and play with it in the context of SQL Server. The better you understand that language, the more powerful a tool you’ll have for working with SQL Server data.

Let’s block ads! (Why?)

SQL – Simple Talk

Cumulative Update #2 for SQL Server 2017 RTM

The 2nd cumulative update release for SQL Server 2017 RTM is now available for download at the Microsoft Downloads site. Please note that registration is no longer required to download Cumulative updates. To learn more about the release or servicing model, please visit:

Starting with SQL Server 2017, we are adopting a new modern servicing model. Please refer to our blog for more details on Modern Servicing Model for SQL Server

Let’s block ads! (Why?)

SQL Server Release Services

Released: Microsoft Kerberos Configuration Manager for SQL Server 4.1

We are pleased to announce the latest generally-available (GA) of Microsoft Kerberos Configuration Manager for SQL Server.

Get it here:Download Microsoft Kerberos Configuration Manager for SQL Server

Note : this replaces the previously released v4.0.

Why Kerberos?

Kerberos authentication provides a highly secure method to authenticate client and server entities (security principals) on a network. To use Kerberos authentication with SQL Server, a Service Principal Name (SPN) must be registered with Active Directory, which plays the role of the Key Distribution Center in a Windows domain. In addition, many customers also enable delegation for multi-tier applications using SQL Server. In such a setup, it may be difficult to troubleshoot the connectivity problems with SQL Server when Kerberos authentication fails.

Here are some additional reading materials for your reference.

Why use this tool?

The Kerberos Configuration Manager for SQL Server is a diagnostic tool that helps troubleshoot Kerberos related connectivity issues with SQL Server, SQL Server Reporting Services, and SQL Server Analysis Services. It can perform the following functions:

  • Gather information on OS and Microsoft SQL Server instances installed on a server.
  • Report on all SPN and delegation configurations and Always On Availability Group Listeners installed on a server.
  • Identify potential problems in SPNs and delegations.
  • Fix potential SPN problems.

This release (v4.1) adds support for Always On Availability Group Listeners, and fixes SPN format incompatibility with Windows Server 2008 and 2008 R2 (introduced in v4.0).

Notes
  • Microsoft Kerberos Configuration Manager for SQL Server requires a user with permission to connect to the WMI service on any machine its connecting to. For more information, refer to Securing a Remote WMI Connection.
  • For Always On Availability Group Listeners discovery, run this tool from the owner node.
  • Also, if needed for troubleshooting, the Kerberos Configuration Manager for SQL Server creates a log file in %AppData%\Microsoft\KerberosConfigMgr.

Let’s block ads! (Why?)

SQL Server Release Services

Still More SQL Server Features that Time Forgot

 Still More SQL Server Features that Time Forgot

The series so far:

  1. The SQL Server Features that Time Forgot: Data Quality Services, Master Data Services, Policy-Based Management, Management Data Warehouse and Service Broker
  2. More SQL Server Features that Time Forgot: Auto-shrink, Buffer-pool extension, Database Diagrams, Database Engine Tuning Advisor, and SQL CLR
  3. Even more SQL Server Features that Time forgot: In-Memory OLTP, lightweight pooling, the sql_variant data type, stretch databases, transaction savepoints, and XML indexes
  4. Still More SQL Server Features that Time Forgot: Active Directory Helper Service, Data Transformation Services, DBCC commands, English Query, Native XML Web Services, Northwind and pubs databases, Notification Services, SQL Mail, SQL Server Distributed Management Objects, Surface Area Configuration Tool, utilities, and Web Assistant

In the previous articles of this series, we focused on SQL Server components that are, for better or worse, still part of the product. We covered such features as Service Broker, auto-shrink, database diagrams, XML indexes, and a variety of others. I picked these features because of the buzz they’ve generated over the years and the landslide of opinions that went with it.

Despite all the brouhaha, Microsoft seems determined to keep these components in play, at least in the foreseeable future. Not all features have been so lucky. SQL Server’s history is checkered with memories of features past, components deprecated or dropped during one of the product’s many release cycles, sometimes with little fanfare. Many of these features have generated their own fair share of controversy, either because of how they were implemented or because they were removed. Other components have barely been missed.

Here we look at a number of features that were once part of SQL Server and have since been removed or deprecated, with some being dismissed many years back. For the most part, I’ve listed the features in alphabetical order to avoid prioritizing them or editorializing too much on their departure. You can think of this article as a trip down memory lane, without the nostalgia or remorse that often accompanies such reflection. Mostly it’s just a way to have some fun as we finish up this series.

Active Directory Helper Service

The Active Directory Helper Service, MSSQLServerADHelper, was introduced in SQL Server 2000 to help integrate SQL Server with Active Directory (AD). The service made it possible for the SQL Server service to register itself in an AD domain. In this way, the SQL Server service could run under a domain account with local administrative rights, while being able to add or remove AD objects related to the SQL Server instance.

Only one instance of the Helper Service ran on a host server, regardless of the number of SQL Server instances installed on that host. The service ran only when the SQL Server service needed to access AD. The Helper Service also played a role in replication and SQL Server Analysis Services (SSAS). To support the service, SQL Server included three system stored procedures: sp_ActiveDirectory_Obj, sp_ActiveDirectory_SCP and sp_ActiveDirectory_Start.

Microsoft discontinued the Helper Service in SQL Server 2012, removing the service and its associated stored procedures from the product. The company provided few specifics for why the service was removed, but it appears that the service was simply no longer being used.

Data Transformation Services

Anyone who’s been around SQL Server for any length of time will no doubt remember Data Transformation Services (DTS), that loveable collection of features and tools for carrying out data extract, transform and load (ETL) operations.

First introduced in SQL Server 7, DTS provided the components necessary connect to SQL Server and other data sources in order to import or export data and transform it along the way. Prior to that, database developers had to rely on utilities such as bcp to move data from one place to another, with few useful tools for efficiently transforming the data.

With DTS developers could define savable packages that connected to heterogeneous data sources and performed ETL operations. They could then run the packages on demand or schedule them to run at regular intervals.

Unfortunately, DTS had a number of limitations, especially when considered against the backdrop of a rapidly changing data culture. For this reason, Microsoft effectively ditched DTS in SQL Server 2005 and offered in its place SQL Server Integration Services (SSIS), a far more robust ETL tool that included advanced control flow, error handling, and transformation capabilities, along with a number of other new and improved features.

DBCC gang

Over the years, Microsoft has introduced and then removed an assortment of SQL Server DBCC statements. (DBCC is short for Database Console Commands.) One of these statements was DBCCDBREPAIR, which provided a quick way to drop a damaged database. In SQL Server 2005, Microsoft gave this statement the boot, informing customers that they should instead use the DROPDATABASE statement going forward.

Another DBCC statement that Microsoft finally ousted was DBCCNEWALLOC, which could be used to verify data and index page allocation within the extent structure. Starting with SQL Server 2000, Microsoft included the statement only for backward compatibility, removing it altogether in SQL Server 2014.

A couple other DBCC statements that have been laid to rest are DBCCPINTABLE and DBCCUNPINTABLE. The first was used to mark a table as pinned, and the second to mark it as unpinned. If a table were pinned, the database engine would not flush the table’s pages from memory.

Microsoft introduced the ability to pin a table in SQL Server 6.5 as a way to boost performance. Unfortunately, pinning a table resulted in adverse effects, such as damaging the buffer pool or causing the server to run out of memory. It wasn’t long before Microsoft disabled these statements, although they’re still part of the T-SQL lexicon. They just don’t do anything.

The DBCCROWLOCK is another statement that goes back to SQL Server 6.5. The statement enabled Insert Row Locking (IRL) operations on a database’s tables. However, this capability became unnecessary because Microsoft soon automated row locking. In fact, by SQL Server 2000, the statement was included for backward compatibility only, although it wasn’t until SQL Server 2014 that Microsoft finally removed the statement.

Microsoft also removed the DBCCTEXTALL and DBCCTEXTALLOC statements from SQL Server 2014. The DBCCTEXTALL statement verified the integrity of the text, ntext, and image columns for all tables in a database. The DBCCTEXTALLOC statement did the same thing, but only for a specified table. Both statements originated with SQL Server 6.5 and by SQL Server 2000 were included for backward compatibility only.

No doubt, plenty of other T-SQL statements have come and gone, many without leaving a paper trail, but SQL Server 2014 seemed particularly hard on DBCC statements. Perhaps Microsoft saw that as a good time to do a bit of house-cleaning.

English Query

Introduced in SQL Server 6.5, English Query made it possible to automatically transform a question or statement written in English into a T-SQL statement. Microsoft offered English Query as part of SQL Server and as a standalone product.

English Query included a development environment and runtime engine to support the query transformation process. Ideally, an end user could type a question into an application’s text box, English Query would interpret the question and generate the T-SQL query, and the database engine would return the results, just like any other query.

In SQL Server 2005, Microsoft ditched English Query altogether. From then on, customers could no longer install or upgrade the product. However, if they upgraded a SQL Server 2000 instance to SQL Server 2005, and English Query had been implemented in the original installation, the English Query component would still work. In addition, customers with a SQL Server 2005 license could apparently install SQL Server 2000 and then use English Query against a SQL Server 2005 database, but those days are long gone.

Like many SQL Server features, English Query received an assortment of mixed reviews. Some developers liked it and made use of it. Others did not. At some point, Microsoft must have determined there was not enough interest in the feature to bother, so English Query got the axe, which came as a surprise to a number of users.

Perhaps in this case, Microsoft had been ahead of its time. When you consider how far we’ve come with technologies such as Siri, Google Assistant, and even Cortana, the potential for English Query was certainly there.

Native XML Web Services

In SQL Server 2005, Microsoft added Native XML Web Services to provide a standards-based structure for facilitating access to the database engine. Using these services, an application could send a Simple Object Access Protocol (SOAP) request to a SQL Server instance in order to execute T-SQL batch statements, stored procedures, or scalar-valued user-defined functions.

To carry out these operations, a SOAP/HTTP endpoint had to be defined on the server to provide a gateway for HTTP clients issuing SOAP requests. The T-SQL modules (batch statements, procedures, and functions) were made available as web methods to the endpoint users. Together these methods formed the basis of the web service.

Microsoft deprecated the Native XML Web Services in SQL Server 2008, with the recommendation that any SOAP/HTTP endpoints be converted to ASP.NET or Windows Communications Foundation (WCF) endpoints. These newer technologies were considered more robust, scalable, and secure than SOAP. Microsoft eventually removed the Native XML Web Services feature altogether.

Northwind and pubs databases

Who among us does not remember the pubs and Northwind databases? Even today, you can find references to them strewn across the web (mostly in outdated resources). They certainly deserve a mention as we stroll down memory lane.

The pubs database was developed by Sybase and came to Microsoft as part of the Sybase-Microsoft partnership in the early ’90s. The database included about 10 or so tables, based on a bookstore model. For example, the database contained the Titles, Authors, and Publishers tables, among several others. The pubs database provided a clean and simple example for demonstrating such concepts as many-to-many relationships and atomic data modeling.

But the pubs database was too basic to demonstrate more complex data modeling concepts and SQL Server features, so with the release of SQL Server 2000, Microsoft also introduced the Northwind database, which had its origins in Microsoft Access. The SQL Server team coopted the database to provide a more useful example of database concepts, without having to do a lot of the work themselves.

The Northwind database was based on a manufacturing model and included such tables as Customers, Orders and Employees. The database was still relatively simple, offering only a few more tables than the pubs database, but it helped to demonstrate slightly more complex relationships, such as hierarchical data. With the release of SQL Server 2005, the Northwind database was usurped by the now infamous AdventureWorks database.

Notification Services

Microsoft introduced Notifications Services in SQL Server 2000 to provide a platform for developing and deploying applications that generated and sent notifications to subscribers. Notification Services allowed developers to build applications that could send critical information to customers, employees, or other types of users when data changed in a specified way.

Developers could set up the service to generate and send notifications whenever triggering events occurred. In addition, subscribers could schedule notifications to be generated and sent at their convenience. The service could be configured to send notifications to a subscriber’s email account, cell phone, personal digital assistant (PDA), or Windows Messenger account.

Microsoft pulled the plug on Notification Services in SQL Server 2008 because the feature was not being implemented enough, which I doubt surprised many. Notification Services had a reputation for being inflexible, confusing, and difficult to implement, requiring a great deal of patience just to get a solution up-and-running. That said, some developers were able to make Notification Services work and thought it could do some cool stuff, but they seemed to be the exception. For most, getting to that point wasn’t worth the effort.

After pulling Notification Services from the product, Microsoft recommended that users turn to SQL Server Reporting Services (SSRS) and take advantage of such features as data-driven subscriptions.

SQL Mail

I’m not sure when Microsoft introduced SQL Mail, but it was there in the early days of SQL Server, providing users with a tool for sending, receiving, deleting, and processing email messages. Best of all, the service could send messages that included T-SQL query results.

SQL Mail used the Extended Messaging Application Programming Interface (MAPI) to communicate with an external email server and process email messages. However, to make this possible, an application that supported Extended MAPI also had to be installed on the server that hosted the SQL Server instance. The application would then provide SQL Server with the Extended MAPI components needed to communicate with the email server.

Microsoft introduced Database Mail in SQL Server 2005 as a replacement to SQL Mail because Database Mail was more robust and secure and offered better performance. Database Mail was also based on the Simple Mail Transfer Protocol (SMTP), rather than MAPI, and did not require that a local email application be installed. Microsoft finally dropped SQL Mail in SQL Server 2012.

SQL Server Distributed Management Objects

SQL Server Database Management Objects (SQL-DMO) were a collection of programming objects that facilitated database and replication management. The objects made it possible to automate repetitive administrative tasks as well as create and manage SQL Server objects and SQL Server Agent jobs, alerts, and operators. Developers could create SQL-DMO applications using any OLE Automation controller or COM client development platform based on C or C++.

By SQL Server 2005, SQL-DMO could no longer keep up with the new capabilities being introduced into the database platform. The time had come to replace the aging APIs. As a result, Microsoft introduced SQL Management Objects (SMO), a more robust set of APIs for administering SQL Server. SMO offered advanced caching and scripting features, along with a number of other capabilities, such as delayed instantiation.

To support backward compatibility, Microsoft continued to include SQL-DMO until SQL Server 2012, when it was dropped unceremoniously from the product. The thinking, no doubt, was that seven years was long enough for developers to update their apps and move into the 21st century.

Surface Area Configuration Tool

Remember the Surface Area Configuration Tool? It was introduced in SQL Server 2005 and dropped in SQL Server 2008, making it one of the product’s most short-lived features. The idea behind it was to improve security by providing a centralized tool for limiting the number of ways that would-be hackers and cybercriminals could gain access into the SQL Server environment.

The Surface Area Configuration Tool made it possible for administrators to enable, disable, start, or stop SQL Server features and services, as well as control remote connectivity. The tool leveraged WMI to facilitate these capabilities. Microsoft also made a command-line version of the tool available.

After dropping the Surface Area Configuration Tool, Microsoft recommended that users turn to such tools as SQL Server Management Studio (SSMS), SQL Server Configuration Manager, and policy-based management.

Utility hit list

As with many SQL Server components that have come and gone over the years, so too have an assortment of command-line utilities. Take, for example, the isql utility, which would let users run T-SQL statements, stored procedures, and script files from a command prompt. The utility used the old DB-Library protocol to communicate with SQL Server. Microsoft stopped including the isql utility in SQL Server 2005, pointing users to the sqlcmd utility as a replacement.

A similar utility, osql, does pretty much everything the isql utility did, except that it uses the ODBC protocol. However, the osql utility has been deprecated since at least SQL Server 2012 and will likely be pulled from the product in the not-too-distant future.

The same fate is in store for the sqlps utility, which launches a Windows PowerShell session, with the SQL Server PowerShell provider and related cmdlets loaded and registered.

Another deprecated utility is sqlmaint, which is slated to be removed after SQL Server 2017. The sqlmaint utility carries out database maintenance operations, such as performing DBCC checks, backing up database and log files, rebuilding indexes, and updating statistics. Going forward, DBAs should use the SQL Server maintenance plan feature instead.

A couple other deprecated utilities are makepipe and readpipe, which are used to test the integrity of the SQL Server Named Pipe services. Both utilities will soon be removed. In fact, they’re not even installed during setup, although they can still be found on the installation media. Same goes for the odbcping utility, which tests the integrity of an ODBC data source and verifies client connectivity.

Web Assistant

The Web Assistant, which I believe was introduced in SQL Server 7, offered a wizard for generating static HTML pages that contained SQL Server data. The wizard used a set of related system stored procedures to build the pages initially and to rebuild them if the data changed. The pages were fairly rudimentary, even by late ’90s standards, but were adequate enough for simple use cases.

With the release of SQL Server 2005, Microsoft did away with the wizard and kept only the stored procedures, which finally got dumped in SQL Server 2014. Whether anyone used the procedures after the wizard was removed is hard to say. Whether they used the wizard before that is even more of a mystery. I doubt many even noticed the procedures were gone.

To the best of my knowledge, Microsoft has not tried to replace this feature, perhaps deciding that static web pages provided little value, that HTML development has gotten far too sophisticated, that SSRS is more than adequate, or that a relational database management system was not the best place to be playing at HTML development. For whatever reason, Web Assistant and all of its offspring are gone for good.

History in the making

There are undoubtedly plenty of other SQL Server features that have gone missing over the years, in addition to what we’ve covered here. Perhaps you recall a few components that have a special place in your heart. Given how SQL Server has expanded and evolved over the years, it would be difficult to catch them all, especially if you also consider SSRS, SSAS, SSIS, or various other components. Whether or not you agree with their demise is another matter altogether.

Let’s block ads! (Why?)

SQL – Simple Talk

Released: Microsoft Kerberos Configuration Manager for SQL Server 4.0

We are pleased to announce the latest generally-available (GA) of Microsoft Kerberos Configuration Manager for SQL Server.

Get it here:Download Microsoft Kerberos Configuration Manager for SQL Server

Why Kerberos?

Kerberos authentication provides a highly secure method to authenticate client and server entities (security principals) on a network. To use Kerberos authentication with SQL Server, a Service Principal Name (SPN) must be registered with Active Directory, which plays the role of the Key Distribution Center in a Windows domain. In addition, many customers also enable delegation for multi-tier applications using SQL Server. In such a setup, it may be difficult to troubleshoot the connectivity problems with SQL Server when Kerberos authentication fails.

Here are some additional reading materials for your reference.

Why use this tool?

The Kerberos Configuration Manager for SQL Server is a diagnostic tool that helps troubleshoot Kerberos related connectivity issues with SQL Server, SQL Server Reporting Services, and SQL Server Analysis Services. It can perform the following functions:

  • Gather information on OS and Microsoft SQL Server instances installed on a server.
  • Report on all SPN and delegation configurations and Always On Availability Group Listeners installed on a server.
  • Identify potential problems in SPNs and delegations.
  • Fix potential SPN problems.

This release (v4.0) adds support for Always On Availability Group Listeners.

Notes
  • Microsoft Kerberos Configuration Manager for SQL Server requires a user with permission to connect to the WMI service on any machine its connecting to. For more information, refer to Securing a Remote WMI Connection.
  • For Always On Availability Group Listeners discovery, run this tool from the owner node.
  • Also, if needed for troubleshooting, the Kerberos Configuration Manager for SQL Server creates a log file in %AppData%\Microsoft\KerberosConfigMgr.

Let’s block ads! (Why?)

SQL Server Release Services

Released: SQL Server 2017+ and Replication Management Packs (7.0.0.0)

We are happy to announce the final release of SQL Server 2017+ and Replication Management Packs! These MPs can be used to monitor SQL Server 2017 on both Windows and Linux and SQL Server 2017 Replication on Windows. Below, let’s look at some of the new features we added to these MPs. All the details regarding the new functionality can be found in the Operations Guide that can be downloaded along with the Management Pack.

Cross-platform Monitoring

You can now monitor SQL Server 2017 on Windows and on Linux!

Version Agnostic SQL Server MPs

We understand that with many SQL Server versions in market and with new server releases becoming more frequent, it is becoming harder to manage a separate MP for each server version. We are moving to version agnostic MPs to address this issue. This will be valid going forward. The new MP is named SQL Server 2017+. The ‘+’ in the name indicates that it will be used to monitor SQL Server 2017 and the releases that come after that. Current in-market MPs (2008 through 2016) will not be changed and the 2017+ MP cannot be used to monitor older releases. We are making this change for SQL Server and Replication MPs.

Agentless Monitoring

SQL Server 2017+ Management Pack introduces Agentless Monitoring mode support. This monitoring mode is designed to support SQL Server on Linux but it also works for Windows deployments. We’ve received multiple requests from our customers to support agentless monitoring for SQL Server that can be useful for production OLTP workloads or in case the organization’s policy denies deployment of SCOM agent to the monitored host or making any changes to it.

With the Agentless Monitoring mode there is no need to deploy SCOM Agent on the SQL box. Instead, the monitoring workloads will be transferred to management servers included in the SQL Server Monitoring Pool. This allows to remove SCOM Agent and data processing overhead from the SQL box and move it to the SQL Server Monitoring Pool. Also with the agentless monitoring mode, you can use SQL credentials for authentication that simplifies security configuration.

Engineering Improvements

  • Usage of scripts is discontinued in favor of .Net Framework modules, which enables more efficient resource usage.
  • For getting information on health and performance, SQL Server Dynamic Management Views and Functions are now used instead of WMI calls. This provides better efficiency and connectivity.

Downloads available:

Microsoft System Center Management Pack for SQL Server 2017+

Microsoft System Center Management Pack for SQL Server 2017+ Replication

Microsoft System Center Management Pack for SQL Server Dashboards

We are looking forward to your feedback.

Let’s block ads! (Why?)

SQL Server Release Services

Changes to hashing algorithm for self-signed certificate in SQL Server 2017

Starting with SQL Server 2005, a self-signed certificate is created automatically during the startup to be used for channel encryption. By default, credentials in the login packet that are transmitted when a client application connects to SQL Server are always encrypted using this certificate if a certificate has not been explicitly provisioned for SSL/TLS. Optionally, the self-signed certificate can also be used to enable channel encryption. SSL/TLS connections that are encrypted using a self-signed certificate do not provide strong security, so it is strongly recommended that a certificate obtained from a trusted certification authority be used.

Until SQL Server 2016, the self-signed certificate was created using a SHA1 algorithm. However, SHA1 algorithm and many older algorithms have been deprecated beginning with SQL Server 2016. Refer to this books online article for more information.

Beginning with SQL Server 2017, the self-signed certificate now uses SHA256 algorithm which is more secure compared to SHA1 algorithm. Having said that, we still recommend using a certificate obtained from trusted certification authority to be used for channel encryption.

Let’s block ads! (Why?)

CSS SQL Server Engineers

SQL Server R Services: Working with Multiple Data Sets

The series so far:

  1. SQL Server R Services: The Basics
  2. SQL Server R Services: Digging into the R Language
  3. SQL Server R Services: Working with ggplot2 Statistical Graphics
  4. SQL Server R Services: Working with Data Frames
  5. SQL Server R Services: Generating Sparklines and Other Types of Spark Graphs
  6. SQL Server R Services: Working with Multiple Data Sets

Throughout this series, we’ve looked at several examples of how to use SQL Server R Services to create R scripts that incorporate SQL Server data. The key to using SQL Server data is to pass in a T-SQL query as an argument when calling the sp_execute_external_script stored procedure. The R script can then incorporate the data returned by the query, while taking advantage of the many elements available to the R language for analyzing data.

Despite the ease with which you can run an R script, the sp_execute_external_script stored procedure has an important limitation. You can specify only one T-SQL query when calling the procedure. Of course, you can create a query that joins multiple tables, but this approach might not always work in your circumstances or might not be appropriate for the analytics you’re trying to perform. Fortunately, you can retrieve additional data directly within the R script.

In this article, we look at how to import data from a SQL Server table and from a .csv file. We also cover how to save data to a .csv file as well as insert that data into a SQL Server table. Being able to incorporate additional data sets or save data in different formats provides us with a great deal of flexibility when working with R Services and allows us to take even greater advantage of the many elements available to the R language for data analytics.

Importing data from SQL Server

When you call the sp_execute_external_script stored procedure, you can pass in a single T-SQL query to the @input_data_1 parameter. This allows you to incorporate the data returned by the query directly within your R script. In some cases, however, you might want to bring in additional data, either from SQL Server or from another source.

For example, the following T-SQL code includes an R script that retrieves data separately from the SalesOrderHeader and SalesTerritory tables in the AdventureWorks2014 database:

If you reviewed the previous articles in the series, you should be familiar with how to call the sp_execute_external_script stored procedure and run an R script. In this article, we focus primarily on the components within the R script. If you’re uncertain about the other language elements, be sure to refer to the previous articles.

The first element in the R script assigns the data retrieved from the SalesOrderHeader table to the sales variable, using the InputDataSet variable to access the SQL Server data:

This gives us our first data set, which includes the sale amounts associated with the individual territory IDs. Now suppose we want to use territory names rather than IDs. We can do this in our original T-SQL query by joining the two tables, or we can call the tables individually. In this case, the tables are in the same database, so creating one query would be the most expedient approach, but data is not always accessible through a join, or sometimes an R script requires a different approach to working the data, such as needing to join data after aggregating one of the data sets.

For this example, our primary purpose is to demonstrate how to bring in a second data set from SQL Server, so we’ll retrieve the territory data separately. To do so, we must first define a connection to the SQL Server instance that contains the data:

To define the connection, we create a single string that specifies the driver, instance, database, user ID, and password, separated by semi-colons. Note that you should not put spaces around the equal signs within the connection definition. If you do, you will receive an error.

You can also define a connection that uses Windows Authentication, rather than having to provide a user name and password. If you take this approach, your connection definition would look similar to the following:

To use Windows Authentication, you might have to set up implied authentication on the target SQL Server instance, as described in the Microsoft article Modify the User Account Pool for SQL Server R Services.

After we set up the connection, we can define a SELECT statement to retrieve the data we need from the SalesTerritory table:

With our connection and query in place, we can use the RxSqlServerData function to generate a SQL Server data source object:

The RxSqlServerData function is part of the RevoScaleR library included with R Services. The library includes a collection of functions for importing, transforming, and analyzing data at scale. A subset of those functions, including RxSqlServerData, is specific to SQL Server.

Using the RxSqlServerData function in this way gives us our second data set, which we assign it to the territories variable.

Another SQL Server function in the RevoScaleR library is rxDataStep. We can use the function in an R script to transform the data returned by the RxSqlServerData function into a more workable format. Our next step, then, is to run the function against the territories data set, assigning the results back to the same variable:

Not surprisingly, there’s a lot more to the RevoScaleR library and SQL Server functions than what we’ve covered here, so be sure to check out the rest when you have time. A good place to start is with the Microsoft topic RevoScaleR package.

With our two data sets in place, we can use the merge function to create a single data frame, joining the data sets based on the TerritoryID column in each one:

We can then use the names function to assign names to the data frame columns, making it easier to work with the data frame later in the script:

We now have a single data frame, df1, that contains the data we need to move forward with our (simple) analytics. The next step is to create a second data frame that aggregates the data based on the sales amounts. The goal is to come up with the mean sales per territory. To do this, we construct the new data frame one column at a time, starting with a column that lists the individual territory names:

We begin by using the as.factor function to convert the values in the Territory column to a factor, which provides a way for us to work with categorical data. We can then use the levels function to retrieve the distinct territory names, assigning them to the c1 variable.

The next step is to define the column that will contain the aggregated sales data:

We start by using the tapply function to aggregate the sales totals for each category. The first argument, df1$ Sales, indicates that the Sales values are the ones to be aggregated. The second argument, df1$ Territory, provides the basis for the aggregation, in this case, the territories. The territory names also serve as an index for the c2 column. The third argument, mean, indicates that the mean sales value should be calculated for each territory. We then use the round function to round the aggregated totals to integers.

With our two column definitions in place, we can use the data.frame function to create a data frame and assign it to the df2 variable:

At this point, we can take any number of other steps as part of our analytics, but we’ll stop here. The only remaining task is to assign the df2 data frame to the OutputDataSet variable:

Assigning the data frame to the OutputDataSet variable makes it possible for the R script to return the data to the calling application (in this case, SQL Server Management Studio). The following figure shows the results returned by the script when we run the sp_execute_external_script stored procedure.

20figures6 stsqlsrvrsvcs6 fig01 png 1 SQL Server R Services: Working with Multiple Data Sets

That’s all there is to incorporating additional SQL Server data into an R script, when a single T-SQL query not be enough. Although this is only a very basic example, it points to the potential for creating far more complex analytics that can include whatever data is necessary, as long as the target SQL Server instance is accessible from the instance running the script.

Adding conditional logic to an R script

Before we move on to importing data from a .csv file, I want to demonstrate another SQL Server function in the RevoScaleR library: rxSqlServerTableExists. The function checks whether a specified SQL Server table exists, information we can use in our R script to implement conditional logic. For example, the following script uses the function to verify whether the SalesTerritory table exists before trying to retrieve data from that table:

Most of the script elements should look familiar, but notice that after setting up the connection, we create an if…else structure to define the conditional logic, starting with the following statements:

We begin by identifying the target table and assigning it to the table variable. We then use the rxSqlServerTableExists function to check if the table exists. If it does, the function returns TRUE, and the if block runs, creating the df2 data frame, just like in the preceding example.

If the function returns FALSE, the else block runs instead, creating the data frame based on territory IDs, rather than names.

In this case, we’re using the rxSqlServerTableExists function to implement conditional logic; however, you might also find it useful for verifying a table’s existence before trying to delete it or performing other operations in the R script.

However, even if you don’t use the function, the example points to something more important: the ability to implement conditional logic within R scripts, making it possible to work with SQL Server data in ways that can get quite difficult with T-SQL alone, especially when trying to perform sophisticated analytics.

Importing data from a .csv file

Rather than bringing additional data in from SQL Server, we might want to create an R script that uses data from another source, in addition to the data provided by the primary T-SQL query. For example, suppose the territory names we retrieved in the preceding examples are in a .csv file, as shown in the following figure.

20figures6 stsqlsrvrsvcs6 fig02 png 1 SQL Server R Services: Working with Multiple Data Sets

This is the same data we had retrieved from the SalesTerritory table in the preceding examples. We can update that script to instead point to that file:

Rather than defining a connection like we do for SQL Server, we can read the data directly from the .csv file. To do so, we first specify the file location and save it to the inputcsv variable:

Note the use of the double backslashes to escape the single backslash. We can then use the read.csv function to retrieve the data, passing in the inputcsv variable for the file argument:

In addition to setting the file argument, we set the header argument to TRUE because the column names are included in the file, and we specify a comma (within quotes) for the sep argument, which indicates that a comma is used to separate the values within the file. That’s all there is to it. We can then create the df1 and df2 data frames just like we did in the preceding examples. The R script will return the same results as before.

Being able to import data from a .csv file can be useful whenever we need data other than what is available in SQL Server. This approach also allows us to get data from a legacy system, in which direct access is not possible.

And we’re not limited to .csv files. The R language lets us import data from a wide range of sources, including JSON and XML files, relational databases, non-relational data stores, web-based resources, and many more. That said, I have not tested all these source types from within R Services. The only way to know for sure what you can or cannot do in your particular circumstances is to try it out yourself.

Exporting data to a .csv file

There might be times when you want to export the results of your R script to a file, rather than (or in addition to) returning the data to the calling application. To send the data to a .csv file, for example, we can use the write.csv function, as shown in the following example:

Sending data to a .csv file is just as easy as retrieving it. In this case, we specify the path and target file (territorysales.csv), assigning them to the outputcsv parameter. We then use the write.csv function to write the data in the df2 data frame to the file. The following figure shows the file with the inserted data.

20figures6 stsqlsrvrsvcs6 fig03 png 1 SQL Server R Services: Working with Multiple Data Sets

Notice that the first row includes an empty string to represent the first column and that each subsequent row includes two instances of the territory name. In this case, the R engine is also outputting the names used to index the c2 column. Plus, quotes are used to enclose the character data, which we might not want to include. We can get around these issues by modifying the write.csv function call:

This time, when calling the function, we add two arguments. The first is quote, which we set to FALSE to specify that character data should not be enclosed in quotation marks. (You can omit this argument or set it to TRUE if you want to include the quotes.)

The second argument is row.names, which we set to FALSE so the extra territory names are not included, giving us the results shown in the following figure.

20figures6 stsqlsrvrsvcs6 fig04 png 1 SQL Server R Services: Working with Multiple Data Sets

It should also be noted that we don’t have to choose between sending data to a file or using the OutputDataSet variable to return data to the calling application. We can do both:

We can also return different data to the file and calling application, depending how we set up our script. The point is, R is extremely flexible when it comes to retrieving and returning data. The better you understand the R language, the better you can utilize these features.

Outputting data to a SQL Server table

SQL Server also makes it possible to insert data from the R script into a SQL Server table. The easiest way to do this is to call the sp_execute_external_script stored procedure as part of an INSERT statement:

In this case, we’re creating the #TerritorySales temporary table to demonstrate the insert operation. We then run the INSERT statement, calling the sp_execute_external_script stored procedure, which provides us with the data we need to add to the table. We can then run a SELECT statement to verify our results, which are shown in the following figure.

20figures6 stsqlsrvrsvcs6 fig05 png 1 SQL Server R Services: Working with Multiple Data Sets

Being able to insert data returned from an R script into a temporary table or table variable can be a useful tool for incorporating analytics into more complex T-SQL scripts. We can also return the results to a permanent table when we want to persist the data. Again, it’s all about flexibility.

Much more to the R language

In this series, we’ve covered many of the basics of how to use SQL Server R Services to run R scripts that incorporate data from SQL Server and other sources. Once you understand how to use the sp_execute_external_script stored procedure and construct R scripts, you can take advantage of the broad range of R language elements available for transforming and analyzing data.

The examples we’ve looked at throughout this series provide only a glimpse of the many operations you can perform with R. The language is widely implemented and has a large user community behind it and deserves far more attention than what we’ve covered here. Fortunately, there are plenty of sources out there that describe the various R language elements and how to put them together. Once you understand how to run an R script in SQL Server, you can take advantage of these elements to perform complex analytics and generate various types of visualizations.

With the release of SQL Server 2017, Microsoft changed the name of R Services to Machine Learning Services and added support for the Python language. In future articles, we’ll dig into the Python side of things, taking the same approach we covered in this series, that is, using the sp_execute_external_script stored procedure to run our external scripts. Python is another widely implemented language with a large user community and could prove to be an important addition to SQL Server, so stay tuned as we get ready to dive into the world of SQL Server Machine Learning.

Let’s block ads! (Why?)

SQL – Simple Talk