• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Tag Archives: table

Disable Smart Matching EmailHashBase table population for Dataverse environments

April 7, 2021   Microsoft Dynamics CRM

For anyone that has been using Server-Side Synchronization and automatic email correlation, you may be familiar with the term Smart Matching. I won’t go into detail on what Smart Matching is, but you can take a look at this document HERE. Smart Matching is a legacy email correlation method from as far back as CRM 4.0. Now, there are newer methods of Email Correlation that use an Exchange item’s ConversationIndex to correlate that email to an existing thread. This does not require Smart Matching or Tracking Tokens. I cover this setting in more detail HERE.

You can disable Smart Matching, but even though you have Smart Matching disabled, by default, we still populate the EmailHashBase database table in the event anyone was not using Smart Matching and decided to in the future. This would allow Smart Matching email correlation to work on historical items. However, for those that never intend to use Smart Matching, this will continue populating this table, impacting storage consumption. Additionally, with the newer Email Correlation option, Smart Matching is not necessary for proper correlation.

What is this setting?

We created a new OrgDbOrgSetting, DisableSmartMatchingEmailHashCreation. Changing this setting to true, will stop population of the EmailHashBase table. However, it will not remove the data from this table. You will need to contact your dedicated Customer Engineer or Microsoft support to complete that process.

Important Note:

If you set DisableSmartMatchingEmailHashCreation to True or if the EmailHashBase table is cleaned out/truncated and you were to choose to enable Smart Matching in the future (not recommended), email correlation would only work on emails from that day forward. 

Verify you are not using Smart Matching

First, you will want to go to System Settings, in the Email tab, to double-check you are not configured for Smart Matching and that you have only Use correlation to track email conversations checked. Also make sure you check with any relevant team members to ensure there is no plans to enable Smart Matching. If not, you are good to proceed. 

6138.pastedimage1617630039686v1 Disable Smart Matching EmailHashBase table population for Dataverse environments

Change the setting

 

Once you have confirmed this, you can disable the Smart Matching feature and table population. Go to Settings| Solutions and open the OrganizationSettingsEditor solution. It will likely prompt you that you are not on the latest version and to download version 1.0.2103.1201. Please download and import this solution. If you do not yet have this solution installed, you can install it from here

Once imported and upgraded, open the solution and find the following two settings.

0552.pastedimage1617630065757v2 Disable Smart Matching EmailHashBase table population for Dataverse environments

Click Add for each of these and choose Yes

2627.pastedimage1617630085049v3 Disable Smart Matching EmailHashBase table population for Dataverse environments

Then click Edit for each of these and change to true and click Update

7343.pastedimage1617630101420v4 Disable Smart Matching EmailHashBase table population for Dataverse environments

If you have questions, please feel free to comment.

Thank you for reading!

Aaron Richards

Let’s block ads! (Why?)

Dynamics 365 Customer Engagement in the Field

Read More

Use hidden measures and members from #PowerBI dataset in an Excel Pivot table

December 3, 2020   Self-Service BI

When you connect to a Power BI Dataset from Power BI desktop you might have noticed that you can see and use hidden measures and columns in the dataset.

113020 2035 usehiddenme1 Use hidden measures and members from #PowerBI dataset in an Excel Pivot table

But the hidden fields cannot be seen if you browse the dataset in Excel.

113020 2035 usehiddenme2 Use hidden measures and members from #PowerBI dataset in an Excel Pivot table

But that does not mean that you cannot use the fields in Excel – and here is how you can do it.

Using VBA

You can use VBA by creating a macro

113020 2035 usehiddenme3 Use hidden measures and members from #PowerBI dataset in an Excel Pivot table

The code will add the field AddressLine1 from the DImReseller dimension as a Rowfield if the active cell contains a pivotable.

113020 2035 usehiddenme4 Use hidden measures and members from #PowerBI dataset in an Excel Pivot table
Sub AddField()
    Dim pv As PivotTable
        Set pv = ActiveCell.PivotTable
        pv.CubeFields("[DimReseller].[AddressLine1]").Orientation = xlRowField
End Sub

If you want to add a measure/value to the pivotable you need to set change the Orientation property to xlDataFields

113020 2035 usehiddenme5 Use hidden measures and members from #PowerBI dataset in an Excel Pivot table

This means that we now have added two hidden fields from the dataset

113020 2035 usehiddenme6 Use hidden measures and members from #PowerBI dataset in an Excel Pivot table

Add hidden measures using OLAP Tools

You can also add hidden measures using the OLAP Tools and MDX Calculated Measure

113020 2035 usehiddenme7 Use hidden measures and members from #PowerBI dataset in an Excel Pivot table

Simply create a new calculated measure by referencing the hidden measure in the MDX

113020 2035 usehiddenme8 Use hidden measures and members from #PowerBI dataset in an Excel Pivot table

This will add a calculated Measure to the measure group you selected

113020 2035 usehiddenme9 Use hidden measures and members from #PowerBI dataset in an Excel Pivot table

And you can add that to your pivotable

113020 2035 usehiddenme10 Use hidden measures and members from #PowerBI dataset in an Excel Pivot table

Referencing hidden items using CUBE functions

Notice that you can also reference the hidden measures using CUBE functions

113020 2035 usehiddenme11 Use hidden measures and members from #PowerBI dataset in an Excel Pivot table

Simply specify the name of the measure as the member expression in this case as “[Measures].[Sales Profit]”

You can also refer to members from hidden fields using the CUBEMEMBER functions

113020 2035 usehiddenme12 Use hidden measures and members from #PowerBI dataset in an Excel Pivot table

Hope this can help you too.

Power On!

Let’s block ads! (Why?)

Erik Svensen – Blog about Power BI, Power Apps, Power Query

Read More

Improve Row Count Estimates for Table Variables without Changing Code

May 27, 2020   BI News and Info

You probably have heard that table variables work fine when the table variable only contains a small number of records, but when the table variable contains a large number of records it doesn’t perform all that well. A solution for this problem has been implemented in version 15.x of SQL Server (Azure SQL Database and SQL Server 2019) with the rollout of a feature called Table Variable Deferred Compilation

Table Variable Deferred Compilation is one of many new features to improve performance that was introduced in the Azure SQL Database and SQL Server 2019. This new feature was included in the Intelligent Query Processing (IQP). See Figure 1 for a diagram that shows all the IQP features introduced in Azure SQL Database and SQL Server 2019, as well as features that originally were part of the Adaptive Query Processing feature included in the older generation of Azure SQL Database and SQL Server 2017.

word image 27 Improve Row Count Estimates for Table Variables without Changing Code

Figure 1: Intelligent Query Processing

In releases of SQL Server prior to 15.x, the database engine used a wrong assumption on the number of rows that were in a table variable. Because of this bad assumption, the execution plan that was generated didn’t work too well when a table variable contained lots of rows. With the introduction of SQL Server 2019, the database engine now defers the compilation of a query that uses a table variable until the table variable is used the first time. By doing this, the database engine can more accurately identify cardinality estimates for table variables. By having more accurate cardinality numbers, queries that have large numbers of rows in a table variable will perform better. Those queries will need to be running against a database with a database compatibility level set to 150 (version 15.x of SQL Server) to take advantage of this feature. To better understand how deferred compilation improves the performance of table variables that contain a large number of rows, I’ll run through an example, but first, I’ll discuss what is the problem with table variables in versions of SQL Server prior to version 15.x.

What is the Problem with Table Variables?

A table variable is defined using a DECLARE statement in a batch or stored procedure. Table variables don’t have distribution statistics and don’t trigger recompiles. Because of this, SQL Server is not able to estimate the number of rows in a table variable like it does for normal tables. When the optimizer compiles code that contains a table variable, prior to 15.x, it assumes a table is empty. This assumption causes the optimizer to compile the query using an expected row count of 1 for the cardinality estimate for a table variable. Because the optimizer only thinks a table variable contains a single row, it picks operators for the execution plan that work well with a small set of records, like the NESTED LOOPS operator for a JOIN operation. The operators that work well on a small number of records do not always scale well when a table variable contains a large number of rows. Microsoft documented this problem and recommends that temp tables might be a better choice than using a table variable that contains more than 100 rows. Additionally, Microsoft even recommends that if you are joining a table variable with other tables that you consider using the query hint RECOMPILE to make sure that table variables get the correct cardinality estimates. Without the proper cardinality estimates queries with large table variables are known to perform poorly.

With the introduction of version 15.x and the Table Variable Deferred Compilation feature, the optimizer delays the compilation of a query that uses a table variable until just before it is used the first time. This allows the optimizer to know the correct cardinality estimates of a table variable. When the optimizer has an accurate cardinality estimate, it has a good chance at picking execution plan operators that perform well for the number of rows in a table variable. In order for the optimizer to defer the compilation, the database must have its compatibility level set to 150. To show how deferred compilation of table variables work, I’ll show an example of this new feature in action.

Table Variable Deferred Compilation in Action

To understand how deferred compilation works, I will run through some sample code that uses a table variable in a JOIN operation. That sample code can be found in Listing 1.

Listing 1: Sample Test Code that uses Table Variable in JOIN operation

USE WideWorldImportersDW;

GO

DECLARE @MyCities TABLE ([City Key] int not null);

INSERT INTO @MyCities

  SELECT [City Key] FROM Dimension.City;

SELECT O.[Order Key], TV.[City Key]

FROM Fact.[Order] as O INNER JOIN @MyCities as TV

ON O.[City Key] = TV.[City Key];

As you can see, this code uses the WideWorldImportersDW database, which can be downloaded here. In this script, I first declare my table variable @MyCities and then insert 116,295 rows from the Dimension.City table into the variable. That variable is then used in an INNER JOIN operation with the Fact.[Order] table.

To show the deferred compilation in action, I will need to run the code in Listing 1 twice. The first execution will be run against the WideWorldImportsDW using compatibility code 140, and the second execution will run against this same database using compatibility level 150. The script I will use to compare how table variables work, using the two difference compatibility levels, can be found in Listing 2.

Listing 2: Comparison Test Script

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

USE WideWorldImportersDW;

GO

– Turn on time statistics

SET STATISTICS TIME ON;

GO

—————————————————

– Test #1 – Using SQL Server 2017 compatibility –

—————————————————

ALTER DATABASE WideWorldImportersDW SET COMPATIBILITY_LEVEL = 140;

GO

DECLARE @MyCities TABLE ([City Key] int not null);

INSERT INTO @MyCities

  SELECT [City Key] FROM Dimension.City;

SELECT O.[Order Key], TV.[City Key]

FROM Fact.[Order] as O JOIN @MyCities as TV

ON O.[City Key] = TV.[City Key]

—————————————————

– Test #2 – Using SQL Server 2019 compatibility –

—————————————————

ALTER DATABASE WideWorldImportersDW SET COMPATIBILITY_LEVEL = 150;

GO

USE WideWorldImportersDW;

GO

DECLARE @MyCities TABLE ([City Key] int not null);

INSERT INTO @MyCities

  SELECT [City Key] FROM Dimension.City

SELECT O.[Order Key], TV.[City Key]

FROM Fact.[Order] as O JOIN @MyCities as TV

ON O.[City Key] = TV.[City Key];

GO

When I run the code in Listing 2, I run it from a query window in SQL Server Management Studio (SSMS), with the Include Actual Execution Plan query option turned on. The execution plan I get with I run query Test #1 and #2 can be found in Figure 2 and Figure 3, respectfully.

word image 28 Improve Row Count Estimates for Table Variables without Changing Code

Figure 2: Execution Plan for Test #1 code in Listing 2, using compatibility level 140

word image 29 Improve Row Count Estimates for Table Variables without Changing Code

Figure 3: Execution Plan for Test #2 code in Listing 2, using compatibility level 150

If you compare the execution plan between Figure 2 and 3, you will see the execution plans are a little different. When compatibility mode 140 was used, my test query used a NESTED LOOPS operation to join the table variable to the Fact.[Order] table, whereas when using compatibility mode 150, the optimizer picked a HASH MATCH operator for the join operation. This occurred because the Test #1 query uses an estimated row count of 1 for the table variable @MyCities. Whereas the Test #2 query was able to use the deferred table variable compilation feature which allowed the optimizer to use an estimated row count of 116,295 for the table variable. These estimated row count numbers can be verified by looking at the Table Scan operator properties for each execution plan, which are shown in Figure 4 and 5 respectfully.

word image 30 Improve Row Count Estimates for Table Variables without Changing Code

Figure 4: Table Scan properties when Test #1 query ran under compatibility level 140

word image 31 Improve Row Count Estimates for Table Variables without Changing Code

Figure 5: Table Scan properties when Test #2 query ran under compatibility level 150

By reviewing the table scan properties, the optimize used the correct estimated row count when compatibility level 150 was used. Whereas when compatibility level 140 was used, the optimizer estimated a row count of 1. Also note that my query that ran under compatibility level 150 also used BATCH mode for the TABLE SCAN operation, whereas the compatibility mode 140 query ran using ROW mode. You may be asking yourself now, how much faster does running my test code under compatibility level 150 perform over running the test code under the older compatibility level 140.

Comparing Performance between Compatibility Mode 140 and 150

In order to compare the performance of running my test query under both compatibility level, I executed the script in Listing 1 ten different times under each of the two compatibility levels. I then calculated the average CPU and elapsed time for the two different compatibility levels, and finally graphed the average performance number in the graph in Figure 6.

word image 32 Improve Row Count Estimates for Table Variables without Changing Code

Figure 6: Performance Comparison between Compatibility Mode 140 and 150.

When the test query was run under compatibility mode 150, it used a fraction of the CPU over compatibility level 140. Whereas the Elapsed Time value of the test query that ran under compatibility level 150 ran 4.6 times faster than then using compatibility level 140. This is a significate performance improvement. But since batch mode processing was for the compatibility level 150 test, I can’t assume all this improvement was associated with only the Deferred Table Variable Compilation feature.

In order to remove the batch mode from my performance test, I’m going to run my test query under compatibility mode 150 one more time. But this time my test will run with a query hint to disable the batch mode feature. The script I will use for this additional test can be found in Listing 3.

1

2

3

4

5

6

7

8

9

10

USE WideWorldImportersDW;

GO

DECLARE @MyCities TABLE ([City Key] int not null);

INSERT INTO @MyCities

  SELECT [City Key] FROM Dimension.City

SELECT O.[Order Key], TV.[City Key]

FROM Fact.[Order] as O JOIN @MyCities as TV

ON O.[City Key] = TV.[City Key]

OPTION(USE HINT(‘DISALLOW_BATCH_MODE’));

GO 10

Listing 3: Test #2 query with Batch Mode disabled

The graph in Figure 7 shows the new performance comparison results using deferred compilation and row mode features when my test ran under compatibility level 150.

word image 33 Improve Row Count Estimates for Table Variables without Changing Code

Figure 7: Table Variable Deferred Compilation Comparison with Batch Mode disabled

With the Batch Mode feature disabled, my CPU went up significantly from my previous test when batch mode was enabled. But the Elapsed Time was only slightly different. Deferred Compilation seems to provide significate performance improvements, by delaying the compilation of a query until the table variable is used the first time. I have to wonder if the deferred compilation feature will improve the cardinality estimate issue caused by parameter sniffing with a parameterized query.

Does Deferred Compilation Help with Parameter Sniffing?

Parameter sniffing has been known to cause performance issues when a compiled execute plan is executed multiple times using different parameter values. But does the deferred table variable compilation feature in 15.x solve this parameter sniffing issue? To determine whether or not it does, let me create a stored procedure name GetOrders, to test this out. That stored procedure CREATE statement can be found in Listing 4.

Listing 4: Code to test out parameter sniffing

1

2

3

4

5

6

7

8

9

10

11

12

USE WideWorldImportersDW;

GO

CREATE OR ALTER PROC GetOrders(@CityKey int)

AS

DECLARE @MyCities TABLE ([City Key] int not null);

INSERT INTO @MyCities

  SELECT [City Key] FROM Dimension.City

  WHERE [City Key] < @CityKey;

SELECT *

FROM Fact.[Order] as O INNER JOIN @MyCities as TV

ON O.[City Key] = TV.[City Key]

GO

The number of rows returned by the stored procedure in Listing 4 is controlled by the value passed in the parameter @MyCities. To test if the deferred compilation feature solves the parameter sniffing issue, I will run the code in Listing 5.

Listing 5: Code to see if deferred compilation resolves parameter sniffing issue

USE WideWorldImportersDW;

GO

SET STATISTICS IO ON;

DBCC FREEPROCCACHE;

– First Test

EXEC GetOrders @CityKey = 10;

–Second Test

EXEC GetOrders @CityKey = 231412;

The code in Listing 5 first runs the test stored procedure using a value of 10 for the parameter. The second execution uses the value 231412 for the parameter. These two different parameters will cause the store procedure to process drastically different numbers of rows. After I run the code in Listing 5, I will explore the execution plan for each execution of the stored procedure. I will look at the properties of the TABLE SCAN operation to see what the optimizer thinks are the estimated and actual rows count for the table variables for each execution. The table scan properties for each execution can be found in Figure 8 and 9 respectfully.

word image 34 Improve Row Count Estimates for Table Variables without Changing Code

Figure 8: Table Scan Statistics for the first execution of the test stored procedure

word image 35 Improve Row Count Estimates for Table Variables without Changing Code

Figure 9: Table Scan Statistics for the second execution of the test stored procedure

Both executions got the same number of estimated rows counts but got considerably different actual row counts. This means that the deferred table compilation feature of version 15.x doesn’t resolve the parameter sniffing problem of a stored procedure.

What Editions Supports the Deferred Compilations for Table Variables?

Like many cool new features that have come out with each new release of SQL Server in the past, they are first introduced in Enterprise edition only, and then over time, they might become available in other editions. You will be happy to know that the Deferred Compilation for Table Variables feature doesn’t follow this typical pattern. As of the RTM release of SQL Server 2019, the deferred compilation feature is available in all editions of SQL Server, as documented here.

Improve Performance of Code using Table Variables without Changing Any Code

TSQL code that contains a table variable has been known not to perform well when the variable contains lots of rows. This is because the code that declares the table valuable is compiled before the table has been populated with any rows of data. Well, that has all changed when TSQL code is executed in SQL Server 2019 or Azure SQL DB when your database is running under compatibility level 150. When using a database that is in compatibility level 150, the optimizer defers the compilation of code using a table variable until the first time the table variable is used in a query. By deferring the compilation, SQL Server can obtain a more accurate estimate of the number of rows in the table variable. When the optimizer has better cardinality estimates for a table variable, the optimizer can pick more appropriate operators for the execution plan, which leads to better performance. Therefore, if you have found code where table variables don’t scale well when they contain a lot of rows, then possibly version 15.x of SQL Server might help. By running TSQL code under compatibility level 150, you can improve the performance of code using table variables without changing any code.

Let’s block ads! (Why?)

SQL – Simple Talk

Read More

Translating a SQL Server Schema into a Cassandra Table: Part III Many-to-Many, Attribute Closure and Solution Space

April 25, 2020   BI News and Info

The series so far:

This last in the series continues with an exploration of the many-to-many relationship. Then it returns to an analysis by functional dependency with attribute closure which is important in establishing “uniqueness” in the Cassandra key and other table properties. Finally, the four solutions to the original problem introduced in Part I are given. Note that code used in this article can be found here.

First, though, is a basic definition of redundancy.

Redundancy

Previous articles gave examples of redundancy. This section is part review and part new information. Download all code used in this article here.

Structuring in redundancy is inherent to the Cassandra design process. Redundancy between rows in the same partition or between partitions, though, can result in inconsistent data. Still, it is to be managed not avoided. Now an informal definition:

Redundancy occurs whenever a determinant of a given value and any of its non-trivial dependents could be in more than one row.

A trivial dependent set Y has the form X → Y where Y ⊆ X (reflexivity from Armstrong’s Axioms.)

Recall that form ‘X → Y’ establishes set ‘X’ as determinant, the ‘→’ as symbol for ‘functionally determines,’ and set ‘Y’ as dependent. Whenever rows repeat the exact same values in X columns, the Y columns must be invariant.

This definition is true regardless of whether the determinant and dependent sets are all in the key, all as non-key attributes, or as spanning key and non-key attributes.

(Redundancy, as defined here, is not related to replication, in which a partition and its rows are duplicated and synched over more than one node in the cluster(s).)

As an example, here is a query readout from the CQL shell over a sample table:

a close up of a sign description automatically ge Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

(Partition columns are red, clustering columns are aqua, and regular columns mauve.)

Figure 5 below is the dependency diagram created for the article series problem (and discussed later). The sample table here is bound by three of its functional dependencies, SC → G, S → N, and CS → P where S: state, C: city, G: city rating, N: state region and P: city population. Of them, only data involving FD S → N can be duplicated between rows in the same partition. The design is viable and the two rows shown don’t violate the FD – although they could.

a close up of a sign description automatically ge 1 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Erroneous input into a different partition as above, however, causes all three FDs in the table to be violated. As you know from the previous article, there is no mechanism in the Cassandra system to prevent this and finding it with CQL only can be difficult.

FDs can change as requirements are better understood. Say SC → G becomes ISC → G, where “I” means “rating type;” prepend it to the clustering column list:

a screenshot of a cell phone description automati Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

As with SC → G, dependency ISC → G can only be broken in rows spanning partitions, but S → N as before along with CS → P now can break in the same partition:

word image 59 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

This sketch applies both to one-to-many and one-to-one relationships – the functional dependencies as they occur in the unified relation. It also applies, in a way, to the many-to-many relationship as well.

Many-to-Many

Not discussed in the series so far is the common case of the many-to-many (m:n) relationship. The restrictions placed on attribute sets for functional dependencies in the primary key – see section Placement of Functional Dependencies in Part II – don’t apply to key attributes on either side. Consider this simplified ERD snippet:

a picture containing text map description automa Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Figure 1. Conceptual trial model (Chen notation)

In the unified relation, the “site” one-to-many relationship establishes an FD with trial ID as determinant and courthouse name as dependent.

In contrast, the “participation” binary m:n relationship does not establish the m-side or n-side attributes as FDs in the unified relation. The relationship indeed combines two one-to-many relationships. However, they are not mutually independent and cannot be modeled in ERD as two distinct relationships. As a result, the lawyer identifier does not functionally determine trial attributes, nor vice-versa.

a picture containing map description automaticall Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Figure 2. Not equivalent in meaning to Figure 1 “participation” m:n relationship

However, in relational and Cassandra modeling, the two one-to-many relationships are directly applicable. Say “lawyer name” and “trial ID” are both in the Cassandra key. If lawyer Stuart appears K times in combination with different trial IDs in “participation”, then pairings of Stuart and trial ID in the key yield K duplicates for each associated Lawyer entity attribute placed as a regular column. The n:1 relationship with the Trial entity is parallel.

Placing identifiers for both lawyer and trial in any part of the key in either order is always correct. They can both be in the partition, and their dependents made static (if clustering columns exist). If one or both is a clustering attribute(s), the second identifier implies expansion in the conceptual row nesting disc layout (see Part I).

For example, take C to mean courthouse, L lawyer and T trial identifiers. Given they are all in the key, restrictions are that C comes before T because T → C and T and C cannot both be in the partition for the same reason. Functional dependencies and m:n relationships mix easily in the primary key and table. Legal keys, then, include ((C)TL), ((C)LT), ((CL)T) and ((L)CT). Satisfy yourself that every successive clustering attribute in each key expands row nesting.

Any n-ary degree m:n relationship can cause significant redundancy within a table. Say the key is ((S)LCT) where S is the state wherein the courthouse resides (C → S). If lawyer Rudy has appeared in three courthouses and argued in three trials in each, then needed attributes in a relationship with Rudy appear as regular columns in nine rows. Attributes related to each courthouse are duplicated three times. If Rudy has a role in trials in more than one state, this redundancy is spread over multiple partitions.

This section concludes with a final example in logical and physical design based on Figure 1. Here is the access pattern for which to solve:

Qn. Find trials started in a time frame by courthouse and lawyer

Notice first that courthouse and lawyer are in a m:n relationship.

To satisfy the access pattern’s implied range queries, Trial’s non-key attribute “start date time” – lower case “t” – must be a clustering attribute. It must follow lawyer and courthouse IDs in the key since by CQL rules, once a clustering column is sliced by an inequality operator in the WHERE clause, no clustering columns following it in the list may be restricted.

The trial ID (T) could follow t in the key because T → t. still, it seems that given a lawyer and a point in time, the combination should yield a trial, i.e. the FD Lt → T. This cannot be deduced using Armstrong’s Axioms. It requires a new business rule: A lawyer can start only one trial at a time. Now T must be removed from the key and made a regular column along with its related attributes (precisely because Lt → T).

Logical key possibilities then are ((C)Lt), ((CL)t) / ((LC)t) and ((L)Ct). The first ((C)Lt) would create the largest number of rows on fewest partitions, and the second, ((CL)t), the smallest number on most partitions. These are considerations for the physical stage and somewhat in the logical as well. The decision is made to use key ((L)Ct).

a screenshot of a cell phone description automati 1 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Figure 3. Table for intermediate-sized partition

Access pattern Qn can now be satisfied with equality or range queries over time:

a screenshot of a cell phone description automati 2 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

This table may be part of a two-pass query, the second retrieving documents and other trial details given the trial ID; refer to the application workflow.

Attribute Closure: Implications for Cassandra Design

This section returns to analysis by functional dependencies only.

These next subsections dovetail with techniques for discovering further properties of Cassandra tables and primary key design. The first presents a diagram of functional dependency relationships inferred from the SQL Server normalized physical design in Part I. The second, a code module that references the diagram as input for computing attribute closures given sets of initial attributes.

a screenshot of a cell phone description automati 3 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Figure 4. Normalized physical design (Part I)

The Functional Dependency Diagram

Functional dependencies placed into the unified relation come from the primary and alternate keys, the latter not shown by the SQL Server design tool. One FD not included (the dotted arrow) indicates that realtor location additional data is not relevant to the series problem’s access pattern.

a close up of a map description automatically gen Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Figure 5. Functional Dependency diagram from normalized schema

I’ve abbreviated attribute names to single letters, intuitive when possible. Herein I’ll often refer to attribute names as their single-letter counterparts.

The arrow recipients in the diagram are the dependents, and the arrows emanate from the determinants. Composite determinants are grouped in boxes; the double arrow implies a (1:1) relationship, I’ve elided redundant arrows in such relationships and so on. Not surprisingly is the double arrow between Y and SC (City# and State/City names}. (In the SQL Server logical design, the latter was key for the City relation, and in the physical design, the surrogate City# was added as primary key, and the composite was enforced as alternate key). Follow the arrows to see the partial and transitive dependencies.

By visual inspection alone, you know that L, EZ and H functionally determine, directly or transitively, all other attributes. As each is irreducible, each is a candidate key. The diagram is an important artifact. What is needed, though, is an additional tool for proving certain properties of key and table design choices.

The Attribute Closure Module

The elementary code module below does a closure over an initial set of attributes in the unified relation given the relation’s set of functional dependencies. Its purpose is to determine which key attribute additions make for “uniqueness,” and which attributes become static or regular attributes in different Cassandra key design choices. This analysis is essential for larger, more complex relations.

a screenshot of a cell phone description automati 4 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Figure 6. Attribute closure function

I wrote the solution in Scala. If you’ve done C# or C++ or Java or other imperative/object-oriented language, the algorithm shouldn’t be difficult to follow. The one twist is that Scala also fully supports functional programming. This style allows functions to appear wherever a variable could. Thus, I’ve nested the “go” function inside function “attributeClosure” rather than placing it separately as a private function in the wrapper class (or object) where it most certainly wouldn’t be reused by other methods.

Essentially, each recursive invocation of the inner function uses set operations to glean the next set of dependent attributes implied by two structures. The first structure is the invariant set of FDs (“fdependencies”) as passed into the outer function. Second, the inner function’s parameter (“attrClosureSet”): the set of already captured attributes, acting as the source of possible determinant combinations. This set is initialized from parameter “closureAttrs”.

At each recursive step, if new (dependent) attributes are added to the closure, iteration continues. If not, the inner function returns the closure to the outer. The outer then returns a two-tuple consisting of the closure plus the Boolean decision as to whether the initial attribute set attrClosure is a superkey for the relation.

The dependencies input, “fdependencies”, mirrors the dependency diagram (Figure 5):

a screenshot of a cell phone description automati 5 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Figure 7. Sample structuring of functional dependencies in Scala

The input dependencies for this implementation are irreducible, meaning it meets these criteria: a) right-irreducible: each dependent is a singleton; b) left-irreducible: each determinant set is minimal; and c) there are no trivial FDs (A → subset(A)) or FDs derivable from other FDs. In this implementation, c) is waivable. For any implantation, b) is required.

Beyond the design stage, functional diagrams and this algorithm can be instrumental in tracking down FD data errors in tables, e.g. via a Spark application.

Sample run readouts are deferred for the next section.

Solution Space

The series began with the stated goal of transforming a normalized SQL Server schema (Figure 4) into a Cassandra table designed for this access pattern:

Qk Find listings for a realtor company in a city

Realtors and cities are in a m:n relationship, so their ordering in the key doesn’t matter.

The solutions can reference the results of applying the FD set in Figure 5 and initial attribute sets to the closure function in Figure 4.

a screenshot of a cell phone description automati 6 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Figure 8a. Sample closure runs (eclipse software)

Incidentally, Scala runs in PowerShell; see readme.txt file included in the downloads for instructions on installing and using Scala.

a screenshot of a cell phone description automati 7 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Figure 8b. Sample closure runs

I’ll repeat a central point. The closure is an important method for deciding whether and which attributes need be appended to the primary key for “uniqueness”: i.e. to cover all attributes using minimal collections. This is not the same as making the key unique; the Cassandra key is always unique. Collection types make this possible. If the key were R, e.g., its singleton row would have collections structured over combinations of CS-Y and L and their related attributes. You saw this in Part II.

To satisfy the access pattern, RSC or RY – not both! – must be in the key as search attributes. There are now two decisions to make. The first, as you expect: is it RSC or RY? In consultation with the analytics team, secondary stakeholders to this OLTP database, they would prefer RSC as it simplifies querying over RS by itself.

The second concerns selecting partition attributes. In the logical model, R by itself is sufficient; in the physical, assume the row count and size are manageable and performant, so the base key remains ((R)SC). Closure results indicate, among other conclusions, that attributes T, B and D must be static.

This discussion ends what has been considerable background, except for one more tool.

Design A: The Minimal Superkey

As per the closure, L is a relational candidate key. Could it be the partition and full primary key (no clustering attributes) for its singleton row while satisfying the access pattern?

a screenshot of a cell phone description automati 8 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Figure 9. Possible logical model from the KDM

The Kashlev Data Modeler (KDM) is a tool for automating Apache Cassandra logical and physical designs (see references). It produced the logical design above given key L and an ERD. KDM uses Chebotko symbols to identify Cassandra column types. The “K” is for partition column.

Here is the actual table definition in CQL I devised (not from KDM):

a screenshot of text description automatically ge Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Figure 10. Physical design featuring minimal superkey and UDTs

This is the query showing the first three rows returned (almost):

word image 60 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Well, the CQL shell tried. The error message with terms “data filtering” and “unpredictable performance” means this: The system can’t initially use the WHERE clause predicate conditions but must visit every singleton row in every partition – essentially, a table scan. Only then can It filter out rows not meeting the (RSC) search conditions. Depending on several factors, the query may involve many or most or even all cluster nodes, potentially vitiating the excellent performance of which Cassandra is capable. Then again, performance may be fine – just a warning and you can append the ALLOW FILTERING clause. Still, it’s an unacceptable warning for the main access pattern for which a table was designed.

Secondary indexes help in some cases but not this one. In short, the table is too highly transactional, and the ALLOW FILTERING clause would still be required. I placed indexes on all RSC columns and Cassandra did use the one with the highest selectivity – the one on the city name:

a screenshot of a cell phone description automati 9 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Among several examples of data duplication here is owing to dependency CS → GP (city to rating and population), or alternatively Y → GP (city UUID to same). Thus (San Francisco. California) yields (4.7, 884,363) for every listing row in this city in different partitions. This particular redundancy is better managed in the remaining solutions, which build from the base ((R)SC).

I stress again that the major problem is the table scan. Regardless, you may have concluded that the candidate key approach given Qk would work better later in the application workflow with a different access pattern. There, its attributes can be more specific to the listing such as a video, floorplan, and information about schools.

Design B: Key Sans Uniqueness

In this solution, the key is ((R)SC). The redundancy owing to CS → GP is eliminated. However, as per closure results (Figure 8), it can’t reach all attributes – specifically, those for the listing. Since the listing identifier functionally determines RSC, the solution requires placing the listing identifier and its directly reachable attributes coalesced into a collection:

a screenshot of a social media post description a Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Figure 11. Physical design where key lacks uniqueness column

Read this to mean that a realtor company in a city has zero or more listings.

Notice that given R → BDT and ((R)SC), Design B introduces static (UDT) column “address” (Figure 10). The static designation ensures that address is not repeated into each row in a partition. It further enforces the first rule of functional dependencies as introduced in Part II: a determinant (R) and its dependents (BDT) can never appear together in the partition key.

a close up of a sign description automatically ge 2 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

The query result shows listings for a single row. For the testbed, the design collapses six individual listings into one row. A realistic example could have hundreds of listings. In querying, the client potentially can access numerous rows. The size of the return set (each row listing collection being indivisible), the highly transactional nature of the collection and other factors render this solution unsuitable.

The recommendation is to use collections when data is smaller, bounded and need little updating. Think, as a simple example, book authors in a list.

Design C: The Non-Minimal Superkey

More to standard is appending the listing identifier as uniqueness attribute to make ((R)SCL); logical and physical designs are as follows:

a screenshot of a cell phone description automati 10 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Figure 12. Recommended designs for access pattern Qk

Continuing Chebotko logical notation, the C↑ is clustering column ascending order, and the S is static.

By constraining on the prefix of key columns short of listing ID, the query satisfies Qk as the first three rows returned from the sample query show:

a screenshot of a cell phone description automati 11 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Listings that were compressed in Design B are now each in their own row. No less information is retrieved – in fact, the design accounts for more redundancy as seen in regular columns for state and city. Unlike B, however, row modifications in this highly transactional table are simpler and safer, and clients can take advantage of paging of listings over realtor-city pairings.

Design D: The Non-Minimal Superkey with Additional Ordering

Sometimes a table is correctly designed but still falls short of needs. The problem, then, is the access pattern itself: it needs a fuller description.

Perhaps searching on or organizing data by zip codes, or range querying over asking price is also desired. Say the access pattern is altered thus:

Qk Find listings for a realtor company in a city sorted from least to highest asking price

Starting from Design C, pull the asking price out of the listing standard UDT (Figure 10) and wedge it between the search and uniqueness columns: ((R)SCAL):

a screenshot of a cell phone description automati 12 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Figure 13. Logical and physical designs with ordering on asking price

Client software can now page through the table already sorted by price, or do a range query as seen here:

a screenshot of a cell phone description automati 13 Translating a SQL Server Schema into a Cassandra Table: Part III Many to Many, Attribute Closure and Solution Space

Always think conceptual row nesting on disc. As a general rule, inserting a new attribute(s) into the clustering list must expand row possibilities for key attributes defined before it. You may want to review Part I for specific rules on clustering column type and placement.

Last Word

Relational SQL Server and NoSQL Cassandra are meant for different use cases. As such, there may be more differences than similarities between them. You’ve seen a number of differences, including rules for key design and the ability to enforce integrity constraints. I won’t list them or introduce others here, but I will leave you with one final thought.

If conventional wisdom says that the relational model and theory has no valuable insights for NoSQL systems, do you (still) believe it?

References

J. King, “DS220: Data Modeling,” https://academy.datastax.com/resources/ds220-data-modeling, a DataStax Academy online video series course on Apache Cassandra.

DataStax Documentation, “CQL for DataStax Enterprise 6.7,” https://docs.datastax.com/en/dse/6.7/cql/cql/cql_using/queriesTOC.html, primary key and querying.

J. Carpenter, E. Hewitt, Cassandra The Definitive Guide Distributed Data at We Scale, 2nd ed. Sebastopol: O’Reilly Media Inc., 2016.

A. Kashlev, “The Kashlev Data Modeler,” http://kdm.dataview.org/, an online automated tool for developing Apache Cassandra logical and physical designs.

DataStax, “Data Modeling in Apache Cassandra 5 Steps to an Awesome Data Model,” https://www.datastax.com/sites/default/files/content/whitepaper/files/2019-10/CM2019236%20-%20Data%20Modeling%20in%20Apache%20Cassandra%20%E2%84%A2%20White%20Paper-4.pdf?mkt_tok=eyJpIjoiWmpnMk4yWTVaR0l6WTJOaiIsInQiOiJWTFQyYW5LRkljSWt2Q28xSDR5RXBBNmNlM1pLbThFTFFwSHJLd0E1NXdYM2VmbHJaam1HelFSSThCYW1JRVZVSE9uM1YzTmYzbDVTUlB5c3RESlNyTTJYMSs3bE90QlNiVmRrZkg2ZDZINlYrS1MralJ1Zzh2K1pKNVZ0K0g2aSJ9, White paper .pdf.

C.J. Date, An Introduction to Database Systems, 8th ed. Boston: Pearson Education Inc., 2004, esp. Chapter 11 Functional Dependencies.

D.R. Howe, Data Analysis for Database Design, 3rd ed. Oxford: Butterworth-Heinemann, 2001, esp. Chapters 9 Properties of relationships and 10 Decomposition of many:many relationships.

Let’s block ads! (Why?)

SQL – Simple Talk

Read More

The Art of Table Touch: How a Manager’s Time at the Table Can Create a Guest For Life

February 21, 2020   NetSuite

Posted by Brady Thomason, NetSuite Solution Manager, Restaurant & Hospitality

Consumers today are living in the experience economy. A term first coined more than 20 years ago, the experience economy symbolizes a shift in consumerism where the memory of a product or service becomes the product itself. Nowhere is the focus on customer experience more prevalent than the restaurant industry.

As participants in this experience economy, Americans value experiences over things. Check out these figures from a recent SevenRooms study, “Turning a Meal Into an Experience”:

Almost one-quarter of Americans have visited a restaurant because of the way the food or atmosphere looked on social media.

More than half responded that a waiter or waitress remembering their name would make their visit more memorable.

Nearly one-quarter wouldn’t return if their dining experience wasn’t “memorable or special.”

A key component of the restaurant experience is the interaction with staff. In addition to beautiful decor and ambiance, diners expect restaurants to engage, empower, hear, delight and know them in a personalized way.

re The Art of Table Touch: How a Manager’s Time at the Table Can Create a Guest For Life

Let’s face it: everyone wants to be Norm from Cheers. There’s nothing better than walking into a restaurant, being greeted by name and having exactly what you want (a cold beer in this case) placed in front of you.

With turnover rates hovering around 75% in the restaurant industry, many of these personalized experiences are tough to replicate. Restaurant staff are not sticking around long enough to get to know patrons. However, restaurants do have another important arrow in their quiver: the manager’s process of visiting and checking in on tables throughout service, otherwise known as table touching.

How Does Table Touching Support a Positive Restaurant Experience?

When performed well, a table touch enhances the restaurant experience for diners by bringing an additional human connection and an extra means of support and service to the table. Maybe the manager approaches a chatty group that wants to know more about when the restaurant opened, who the chef is and how the menu was developed. Or the waitstaff is backed up and a manager can step in and pre-bus, refill glasses and take dessert orders.

Even more important is the manager’s ability to correct a problem while guests are still at the table. Once they leave the restaurant, it’s too late. In our world of instant gratification via social media, a seemingly small slip can turn into a PR event that millions read if gone unchecked.

And those reviews and comments on platforms like Yelp can have a big impact:

Whatever the case, a table touch can be a supplement to the dining experience – like a vitamin. The manager might have the opportunity to boost guest interest, share special events, or even turn a negative experience into a positive one.

Consider these scenarios: a couple walks into a restaurant and is seated at a table but ends up waiting roughly 10-15 minutes to be served. After their food arrives, they alert the waitress that there’s a bug in the salad. At another table, a group of four enjoy their meal and hear about some mouth-watering dessert specials, but they’re too full to order.

The first couple lets the manager know about both the waitstaff delay and the bug in the food. They’re given half off their check and a complimentary dessert. The manager also gives the group of four a gift card to come back and try the decadent dessert. But without that manager’s table touch, the opportunity to create an experience—especially one patrons will share with others in a positive light—is lost.

So how do restaurant managers enable these interactions?

Pay Attention to Cues

One of the most important things a manager can do when visiting tables is to follow social cues from diners, including body language. A guest’s interest in attention can range from wanting to be completely left alone, to some conversation, to engagement whenever they see the manager.

Aside from how much guests want to interact, there’s also complaints that come up: cold or undercooked food, incorrect orders or long wait times. To ferret out these issues, managers should be scanning the area for faces of concern or patrons trying to get a waiter’s attention.

If managers are going to add to the experience with a table touch, it’s important that they observe these cues before approaching guests. Otherwise their presence might have a negative effect. There’s nothing like being in the middle of telling an engaging story only to be interrupted, or having a manager approach a guest who has only picked at their food and not ask how the meal tastes.

Gather Information Wherever Possible

As managers are making their rounds, it’s important not just to observe before approaching, but to ask questions once they get to the table. Going beyond the standard “How is everything this evening?” will go a long way towards giving managers the tools to create a memorable experience.

For guests who seem open to conversation, asking them if they are local or from out of town can provide an easy icebreaker. If they are local, a manager can share special events, loyalty programs or even follow-up with a discount code or gift card for their next visit. Learning if guests are at the restaurant for a special event like a birthday or anniversary, or attending a concert or play afterwards, can help managers with proactive ideas. Instead of “How does everything taste?” try: “Is your ribeye cooked to your liking?” or “Does that pan-roasted salmon taste as delicious as it smells?”

Going back to our group of four in the previous example, the manager learns that the group enjoyed their meal but unfortunately they’re stuffed, so they won’t be staying for dessert. He or she brings back a gift card to try the dessert on another visit. Sometimes capturing information means listening to guests and making an offer right away; other times managers can step away and think for a moment about how they can add value.

Tailor the Guest Experience

Don’t misinterpret: restaurant managers aren’t all of a sudden going to become mind readers, but there are little things they can do to customize each guest interaction that makes them feel special—and makes them want to return.

In a full-service restaurant, a manager might want to bring a complimentary dessert to a table where there was a delay in service or an error on the order. However, he or she recognizes that the guests are a family with small children. Mom might not want her already rambunctious children to have sugar so late in the evening. Instead of offering to the whole table, the manager asks her quietly if it would be alright to avoid the children overhearing.

In a quick-service restaurant, a manager can use a table touch to go above and beyond typical order-at-the-counter service norms by offering to refill a beverage, get more napkins or pre-bus empty trays.

The occasion or even the age of guests might also factor into how managers tailor the experience. The same restaurant might in one night serve a large group of very dressed-up high school students who are later attending the prom and also a very casually-dressed family with small children. The manager will likely act differently to demonstrate the importance of the prom versus the goal of helping a family have a successful meal out with children.

Table Touch Can Transform the Restaurant Experience

Today’s customers demand more than ever before from dining experiences. Restaurants have many tools in their arsenal to improve guest interactions and a manager’s table touch is just one of them. By paying attention to social cues, gathering information wherever they can and tailoring each guest’s experience, managers can increase the likelihood of a positive experience—and return visits down the line.

Posted on Thu, February 20, 2020
by NetSuite filed under

Let’s block ads! (Why?)

The NetSuite Blog

Read More

Narrowing down table data to only print if certain conditions are satisfied

February 5, 2020   BI News and Info
 Narrowing down table data to only print if certain conditions are satisfied

suppose there is an ODE function F solved by NDSolve and the solution of this function is in the range [-40 : 34]. Only, I need where the F==0, or the value of F in the interval between [-1,1]. How can I achieve this by using table:

Table[If[F <= 10^-2, Print@F, Return["Exit", Table]];, {x, 3.2, 3.5, 0.01}, {y, 3.2, 3.5,0.01}]

Using the above code gives me all the negative region. So, I am trying to catch the value of function F when it close to zero or zero, and prints it if the following condition is satisfied: a) The F has a value close to zero. Thanks in advance for your time, and sorry if this is simple, I couldn’t find a solution within this site.

Let’s block ads! (Why?)

Recent Questions – Mathematica Stack Exchange

Read More

Translating a SQL Server Schema into a Cassandra Table: Part I Problem Space and Cassandra Primary Key

January 1, 2020   BI News and Info

The series so far:

A subset of related tables in a relational schema can satisfy any number of queries known and unknown at design time. Refactoring the schema into one Cassandra table to answer a specific query, though, will (re)introduce all the data redundancies the original design had sought to avoid.

In this series, I’ll do just that. Starting from a normalized SQL Server design and statement of the Cassandra query, I’ll develop four possible solutions in both logical and physical models. To get there, though, I’ll first lay the foundation.

This initial article focuses on the Cassandra primary key. There are significant differences from those in relational systems, and I’ll cover it in some depth. Each solution (Part III) will have a different key.

Part II uses – and this may surprise you – several techniques from the relational world to gain insight into the primary key choices and their implications as well as redundancy points.

Part III employs the foundation from I and II both to design and evaluates each logical and physical model pair.

A background in relational databases and theory is essential, but not Cassandra. I’ll present enough context as the series progresses.

The Problem

The sample transactional database tracks real estate companies and their activities nationwide. Its data is growing into the terabyte range, and the decision was made to port to a NoSQL solution on Azure. The downsides are the loss of the expressive power of T-SQL, joins, procedural modules, fully ACID-compliant transactions and referential integrity, but the gains are scalability and quick read/write response over a cluster of commodity nodes.

Among the SQL Server 2017 artifacts is this greatly simplified, fully normalized four-table diagram:

a screenshot of a cell phone description automati Translating a SQL Server Schema into a Cassandra Table: Part I Problem Space and Cassandra Primary Key

Figure 1. Normalized SQL Server schema

Both relational and Cassandra database design (should) go through conceptual, logical and physical analysis. They use the same conceptual model, mostly an ER diagram, but veer off starting at the logical design. The diagram for this purpose is an ER replacement; I’ll use it to pick attributes for what I’ll term a “unified relation,” the structure created by merging entities into one, which is to become the Cassandra table. Verifying against enterprise rules, Figure 1 is fully normalized (through 5NF). There is no redundancy.

This is the query, aka data access pattern, that the single Cassandra table is to support:

Qk Find listings for a realtor company in a city

The resultant Cassandra table will be a denormalized, partitioned and sorted dataset on cluster nodes’ discs meant to answer the query efficiently for any given realtor company-city pair(s). More generally, as joins are not supported, a table is a precomputed result set from which data is sliced. The method is moving from organizing data around meaning to query-first design.

Answering the sample query Qk doesn’t require all attributes, or even most, from the diagram. Many of the attributes would have been provided to the app by previous workflow access patterns or needed by later ones. I include them to show dependencies and redundancies in solutions to come.

The Cassandra Primary Key

This primer is meant to be enough to understand key designs in the solutions and a little more. A Cassandra Primary Key consists of two parts: the partition key and the clustering column list.

word image 67 Translating a SQL Server Schema into a Cassandra Table: Part I Problem Space and Cassandra Primary Key

Figure 2. Two-part primary key

I’ll use this shorthand pattern to represent the primary key: ((AB)CDE). The inner parentheses enclose the partition key column(s), and clustering columns follow. In the solutions, columns in the logical and actual table primary key definitions are in the order presented. Order matters!

The design of the primary key affects data locality, sorting on disc, and performance. Keep in mind that the attributes chosen are those to be constrained in CRUD operations.

In addition, there is no notion of an alternate key and no referential integrity and so no foreign keys. Cassandra allows a limited form of indexing at times useful for retrofit queries but not meant to be used for the primary query for which the table was designed.

As with relational databases, the Cassandra primary key uniquely identifies rows. This does not mean, however, that its key patterns follow rules for unique key design in the relational model – far from it. An important deviation as you’ll see is that the key commonly combines attributes that can appear separately in unique keys and non-unique indexes. Another involves data types not available in relational systems.

Collection Data Types: Extending Key Design Possibilities

Parts II and III explore concepts demonstrated here in more depth. For now, I give a basic example of how a key can be “unique” in a non-traditional manner.

Cassandra Query Language (CQL) data types include collection types, which are not permissible in the relational model (by 1NF): list, set, map, tuple and user-defined type (UDT). A UDT groups related fields in a structure.

This graphic shows how two entities associate as per enterprise rules in the “listed by” relationship:

word image 68 Translating a SQL Server Schema into a Cassandra Table: Part I Problem Space and Cassandra Primary Key

Figure 3. Occurrence diagram: many-to-one from Listing to Realtor

Say the query is this: Find the realtor for a listing. Since CQL has no joins, the listing identifier must be in the key so you can search on it, the realtor identifier and its other data should be non-key columns, and that there must be one row per listing. Specifically, the “many” side functionally determines the “one” side.

As for redundancy: this is expected in Cassandra for performance as it obviates the need for multi-pass queries. A major challenge, though, is keeping duplicated data consistent between rows (let alone between tables). Here, any data in a dependency relationship with the realtor identifier can be redundant.

Now turn the query around: Find all listings for a realtor. Realtor is placed in the key for searching, but for demonstration purposes, place listings outside the key in a set as this sample CQL written in DataStax Studio shows:

a screenshot of a cell phone description automati 1 Translating a SQL Server Schema into a Cassandra Table: Part I Problem Space and Cassandra Primary Key

Query over two partitions – see next section – and result:

word image 69 Translating a SQL Server Schema into a Cassandra Table: Part I Problem Space and Cassandra Primary Key

a screenshot of a cell phone description automati 2 Translating a SQL Server Schema into a Cassandra Table: Part I Problem Space and Cassandra Primary Key

I’ve omitted a lot here. The point is this: the one-side key uniquely identifies its row but could not be a unique key in SQL Server. I generalize the concept to mean that any key column(s), used for uniqueness or not – see the section below on clustering columns – can reverse functional dependency with collection types.

A final point. Any key column can also be a collection (example later).

Partition Key

This first part of the primary key definition consists of one or more columns and divides the table’s rows into disjoint sets of related rows. The partition also is the atomic unit of storage – its rows cannot be split among nodes – and so column choice is important to physical design size considerations as well as logical row grouping for efficient querying and modification.

Each partition of replicated rows is automatically and transparently placed on nodes in the Cassandra cluster via partition key hashing. During CRUD operations, Cassandra can instantly locate partitions without scanning the potentially large cluster.

For the initial access pattern that a table is to support, you get the best performance if you design for equality comparisons on the partition key, and if multiple columns, you constrain on each column. As an example, here is a CQL query returning all rows on two partitions given partition key columns realtor company and state, abbreviated (RS):

SELECT *

FROM   listings_by_realtor_state

WHERE  R IN(‘Cirrus’, ‘Stratus’) and S = ‘Michigan’;

Inequality comparisons or failure to constrain on all partition key columns results in unpredictable (generally poor) performance. Good performance often requires that Cassandra find rows in one partition, or failing that, a very small, knowable set of partitions as does the sample.

Static columns are data not repeated in each row but stored once in its partition. This is possible because any given static column is in a dependency relationship with a subset of partition columns.

As an example, let (RSC) be the partition key where “C” is city. The combination (SC) functionally determines city population, and S, the governor, state bird, flag and other attributes when known:

a screenshot of a cell phone description automati 3 Translating a SQL Server Schema into a Cassandra Table: Part I Problem Space and Cassandra Primary Key

There is much redundancy across partitions when an (SC) or S value is repeated, but much less so than if the statics were in each row in a large partition.

There can be no static columns, however, if there are no clustering columns.

Clustering Columns

Unlike the partition key, the list of clustering columns in the primary key definition is optional. When absent, a table partition has at most one row: the partition column(s) uniquely identifies its row.

When present, clustering columns enable a partition to have multiple rows (and static columns) and establish the ordering of rows within the partition. Just as Cassandra uses the partition key to instantly locate row sets on a node(s) in the cluster, it uses the clustering columns to quickly access slices of data within the partition.

The ordering of clustering columns in the primary key definition follows this sequence:

word image 70 Translating a SQL Server Schema into a Cassandra Table: Part I Problem Space and Cassandra Primary Key

Figure 4. Clustering column conceptual breakdown

Any subset of these clustering column types may be in the primary key clustering column list, as long as they are placed in this order. CQL, though, has no notion of these types – certainly no syntax – but it matters to the query.

In the opening section for the Cassandra primary key, I stated that the key can be a mix of attributes for uniqueness and others for searching or different ordering from non-unique indexes – Figure 4 is meant to show this. A key having any of searching or ordering attributes accompanying those for uniqueness is, in relational terms, a non-minimal superkey. Such is a near-guarantee for data corruption in relational OLTP systems, but common in the Cassandra key.

In the WHERE clause predicate, a query can filter on no clustering columns to get all rows in a partition or all of them to pinpoint one row. It can get contiguous ranges of rows off disc by constraining on a proper subset of them. CQL, however, will not allow a query to filter on a clustering column if the clustering columns defined in the list before them are not also constrained.

One more important restriction: once a clustering column in the WHERE clause is filtered with a non-equality operator, no clustering column following it in the list may be constrained.

I’ll use this table with two clustering columns to illustrate:

a screenshot of a cell phone description automati 4 Translating a SQL Server Schema into a Cassandra Table: Part I Problem Space and Cassandra Primary Key

The CQL shell (CQLSH) utility show two errors in violation of the rules:

word image 71 Translating a SQL Server Schema into a Cassandra Table: Part I Problem Space and Cassandra Primary Key

The first two types of columns shown in Figure 3 form the “search attributes.” Columns meant to be constrained on equality operators “=’ and “IN()” may be followed by columns more likely to be filtered with inequality operators such as “≤.” Queries with an inequality operation used on an equality column and conversely equality operations on inequality columns will work, subject to the above restrictions.

Ordering columns can be appended next to the clustering column list to further affect data layout on disc. It is essential to realize that ordering columns – in fact, clustering columns in general – do not order over the entire table but only within the slice of rows in the partition defined by the clustering columns in the list before them. As with previous clustering column types, a query may or not filter on these columns.

Columns in the last position, if any, are added as necessary to make the primary key unique. The assumption is that these appended columns – assuming there are clustering columns defined before them – are not critical to searching and sorting. Placement higher in the list would hamper this ability. And further, it is often the case that their values, when not constrained upon in the query, are needed in the result set. While this is recommended form, do recall the deviant case from the collection types section in which the key remains unique although the “uniqueness” columns are in an off-key structure.

Finally, another way to affect data layout is to make any clustering column ascending, the default, or descending. Logical models will show each clustering column annotated with the “C↑” or “C↓” arrows, but the actual table definitions will all have ascending columns, so their specifications are not shown.

The Combined Key

a screenshot of a cell phone description automati 5 Translating a SQL Server Schema into a Cassandra Table: Part I Problem Space and Cassandra Primary Key

Figure 5. Conceptual row nesting in a partition

The graphic shows the nested layout for possible rows given key ((R)SCL) where abbreviations R and S and C are as before, and “L” for Listing# appended to make the key unique. The static columns for partition key R are not shown. Aside from those columns, you can visualize with a CQL query a three-row result set containing C and L and all non-key columns sliced from the partition:

SELECT *

FROM   listings_by_realtor_city

WHERE  R = ‘Stratus’ AND

       S IN(‘Oregon’, ‘California’);

Here is a summary of some properties of the primary key; compare how in each case they differ from the relational primary (or alternate) key:

  • Key columns are immutable
  • The key is often a non-minimal superkey
  • The key may not functionally determine its rows (while remaining unique)
  • Any key column may be a collection
  • Key and non-key column dependencies may fail any level of normalization

As to the third point: in the collection types section, you saw the “one” side of a many-to-one relationship placed in the key and the “many” side in a set of nested collections as a non-key attribute. I illustrate the fourth by reversing this; this valid table definition uses collections for the partition and a clustering column:

a screenshot of a cell phone description automati 6 Translating a SQL Server Schema into a Cassandra Table: Part I Problem Space and Cassandra Primary Key

Key or not, collections fail 1NF.

During physical design, clustering columns can be moved into the partition key to create more partitions with fewer rows, or movement in the opposite direction to create fewer partitions with more rows. The decision also affects which columns become static.

In most solutions, I will use combinations of attributes ((R)SC) in the key. The final designs, though, could be based on ((RS)C) or even ((RSC)). This is noted but not considered further.

Conclusion

You’ve seen that Cassandra tables have partition columns for data locality among nodes and their attendant static columns. Clustering columns and rigid CQL query restrictions are aimed at optimizing contiguous row retrieval and processing. Every table is a denormalized, precomputed result set from which data is sliced. Tables typically have a non-minimal superkey if that. Understanding collections as key and non-key columns with little or no indexing as we’re accustomed to thinking is quite an adjustment.

Add to this the redundancy that denormalization implies; converting to a system that not only expects data duplication between rows in a single partition, across partitions and tables as well but requires it for responsiveness. Any table can fail normalization levels in multiple places via relationships involving key and non-key columns.

Cassandra does have mechanisms to keep replicated rows in sync and some ACID compliance. It has durability always, atomicity and isolation to a partial degree but nothing for all the other consistency and integrity checking we’re used to ranging from alternate and foreign keys and transaction isolation levels to stored procedures and triggers. It all implies a tight coupling with the OLTP app and the development of new apps to find integrity problems.

In Part II, I’ll continue foundation with relational analysis using tools such as occurrence and functional dependency diagrams, and a function that computes closure over a set of attributes and determines whether the set would be a superkey for the unified relation.

Let’s block ads! (Why?)

SQL – Simple Talk

Read More

How to add a сolumn from one table to another?

June 15, 2019   BI News and Info
 How to add a сolumn from one table to another?

I have a rather naive doubt. How to add a column from an already existing table to another already existing table?

1 Answer

Let’s block ads! (Why?)

Recent Questions – Mathematica Stack Exchange

Read More

Converting a History Table into a System-Versioned Temporal Table

March 16, 2019   BI News and Info

SQL Server has had many different methods to track changes to data. There has been old-fashioned trigger-based logging, Change Data Capture, and Change Tracking. All of these features allow you to see how data has changed with varying degrees of how far you can go back historically to see the changes. However, there has never been the ability to see how the entire table looked at any given point in time. That is what Temporal Tables do – they log every change that happens to the table. When you want to query it for any specific period of time, SQL Server will do the hard work of getting the snapshot of how the data in the entire table looked at that time.

A great introduction on how to set up temporal tables in different ways with various limitations can be found here. In this article, you will learn how to set up versioning when creating a new table and how to convert a table with an existing history to a system-versioned temporal table.

Creating a System-Versioned Temporal Table


Table versioning can be created on entirely new tables, or on existing tables. The history table, or the table where changes are logged, can be:

  • An entirely new, ‘anonymous’ table with no name specified in which case SQL Server creates a table and assigns a name,
  • A ‘default’ table with a name as desired by you,
  • An existing table with data that you now want to use as a history log.

To get started, create an entirely new table and version it first.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

CREATE TABLE [dbo].[Region]

(RegionID INT IDENTITY(1,1) NOT NULL

     CONSTRAINT PK_Region PRIMARY KEY CLUSTERED,

RegionDescription VARCHAR(100) NULL,

StartDateTime datetime2 generated always as row start NOT NULL,

EndDateTime datetime2 generated always as row end NOT NULL,

PERIOD FOR SYSTEM_TIME (StartDateTime, EndDateTime))

WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.Region_History));

INSERT INTO [dbo].[Region]

(RegionDescription)

VALUES (‘North USA’)

INSERT INTO [dbo].[Region]

(RegionDescription)

  VALUES

(‘South USA’),

(‘NorthEast USA’)

SELECT * FROM [dbo].[Region]

word image 41 Converting a History Table into a System Versioned Temporal Table

It is easy to see that the column StartDateTime is populated with current date and time in UTC, and EndDateTime is the max value that can be specified for a datetime2 data type. These are not specified in the insert statements and do not have defaults defined, they get autopopulated. Notice the syntax in the CREATE TABLE statement, generated always as row start and generated always as row end.

Now take a look at what was logged in history table:

SELECT * FROM [dbo].[Region_History]

This returns nothing. This is because logging is limited to updates and deletes and does not log inserts.

Now, if you run an update and then look at the history table, you will see that the previous row has been logged. The StartDateTime and EndDateTime values specify exactly when this row was active.

UPDATE [dbo].[Region] SET RegionDescription = ‘NorthEast US’

WHERE RegionDescription = ‘NorthEast USA’

SELECT * FROM [dbo].[Region_History]

word image 42 Converting a History Table into a System Versioned Temporal Table

If you look at the main table for the same row, you can see that it has a new start date that matches the date when previous version of the row was retired.

SELECT * FROM [dbo].[Region]

word image 43 Converting a History Table into a System Versioned Temporal Table

Deleting a row also works similarly. When the end date of the deleted row matches date when it was deleted and there is no matching row in the main table.

DELETE  FROM [dbo].[Region] WHERE RegionID = 3

SELECT * FROM [dbo].[Region_History] WHERE RegionID = 3

word image 44 Converting a History Table into a System Versioned Temporal Table

Adding Versioning an Existing Table

The next scenario is to transition an existing history table to versioning. Trigger-based change tracking is still a very common and easy-to-implement process used at many places. This example explores a simple way this was implemented and how to use the same table, without changing or deleting any data to implement versioning.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

IF (EXISTS (SELECT *

                 FROM INFORMATION_SCHEMA.TABLES

                 WHERE TABLE_SCHEMA = ‘dbo’

                 AND  TABLE_NAME = ‘Region’))

BEGIN

ALTER TABLE dbo.Region SET (system_versioning = off)

DROP TABLE dbo.Region

DROP TABLE dbo.Region_History

END

go

CREATE TABLE  [dbo].[Region](

[RegionID] [int] IDENTITY(1,1) NOT NULL,

[RegionDescription] [varchar](100) NULL,

[CreateUser] [nvarchar](100) NOT NULL default

              (COALESCE(SUSER_NAME(SUSER_ID()),USER_NAME())),

[CreateDate] DateTime NOT NULL default getdate(),

[UpdateUser] [nvarchar](100) NOT NULL default

              (COALESCE(SUSER_NAME(SUSER_ID()),USER_NAME())),

[UpdateDate] DateTime NOT NULL default getdate()

CONSTRAINT [PK_Region] PRIMARY KEY CLUSTERED

(

[RegionID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,

     IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,

     ALLOW_PAGE_LOCKS = ON) )

CREATE TABLE  [dbo].[Region_History](

[RegionHistoryID] [int] IDENTITY(1,1) NOT NULL,

[RegionID] [int] NOT NULL,

[RegionDescription] [varchar](100) NULL,

[CreateUser] [nvarchar](100) NOT NULL ,

[CreateDate] DateTime NOT NULL ,

[UpdateUser] [nvarchar](100) NOT NULL

           default (COALESCE(SUSER_NAME(SUSER_ID()),USER_NAME())),

[UpdateDate] DateTime NOT NULL default getdate()

)

GO

There are two simple triggers, one for updates and one for deletes, to track changes to the table.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

CREATE TRIGGER [dbo].[Region_Update] on [dbo].[Region]

AFTER UPDATE

AS

BEGIN

  INSERT INTO dbo.Region_History

  (RegionId,RegionDescription,CreateUser,CreateDate,

      UpdateUser,UpdateDate)

  SELECT i.RegionId,i.RegionDescription,i.CreateUser,i.CreateDate,

     SUSER_SNAME(), getdate()

  from  dbo.Region r

  inner join inserted i on r.RegionID=i.RegionID

END

GO

CREATE TRIGGER [dbo].[Region_Delete]  

ON [dbo].[Region]  

AFTER DELETE  

AS  

  INSERT INTO [dbo].[Region_History]

([RegionID],[RegionDescription],[CreateUser],

     [CreateDate],UpdateUser,UpdateDate )

SELECT  [RegionID],[RegionDescription],[CreateUser],[CreateDate],

      SUSER_SNAME(), getdate()  FROM DELETED

GO

–Now insert data into the main table.

INSERT INTO [dbo].[Region]

(RegionDescription)

values

(‘Northeast’)

,(‘Southwest’)

,(‘West’)

,(‘Southeast’)

,(‘Midwest’);

SELECT * FROM [dbo].[Region]

word image 45 Converting a History Table into a System Versioned Temporal Table

Intentionally change the same records several times so that the history table has a decent volume of data. This script will take about 10 minutes to run as you are recreating a history table with several updates with different timestamps on them.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

DECLARE @counter INT

SELECT @COUNTER = 100

WHILE @counter > 0

BEGIN

UPDATE [dbo].[Region]

SET RegionDescription = ‘NorthEast’

WHERE RegionDescription = ‘Northeast’

WAITFOR DELAY ’00:00:01′

UPDATE [dbo].[Region]

SET RegionDescription = ‘Southwest ‘

WHERE RegionDescription = ‘Southwest’

WAITFOR DELAY ’00:00:01′

UPDATE [dbo].[Region]

SET RegionDescription = ‘Southeast ‘

WHERE RegionDescription = ‘Southeast’

WAITFOR DELAY ’00:00:01′

UPDATE [dbo].[Region]

SET RegionDescription = ‘Midwest ‘

WHERE RegionDescription = ‘Midwest’

WAITFOR DELAY ’00:00:01′

UPDATE [dbo].[Region]

SET RegionDescription = ‘MidWest’

WHERE RegionDescription = ‘Midwest ‘

WAITFOR DELAY ’00:00:01′

UPDATE [dbo].[Region]

SET RegionDescription = ‘SouthWest’

WHERE RegionDescription = ‘Southwest ‘

WAITFOR DELAY ’00:00:01′

UPDATE [dbo].[Region]

SET RegionDescription = ‘SouthEast’

WHERE RegionDescription = ‘Southeast ‘

SELECT @counter = @counter - 1

END

Also, delete a couple of records from the main table.

DELETE FROM [dbo].[Region] WHERE RegionDescription = ‘West’

DELETE FROM [dbo].[Region] WHERE RegionDescription = ‘MidWest’

SELECT * FROM dbo.Region

word image 46 Converting a History Table into a System Versioned Temporal Table

You’ll see 702 rows in the history table.

SELECT * FROM dbo.Region_History

word image 47 Converting a History Table into a System Versioned Temporal Table

The goal is to transition these two tables to temporal tables by keeping this data intact and allowing for traditional querying as well as querying using temporal table methodology.

As a first step, add start and end dates to both tables:

ALTER TABLE dbo.Region

ADD [StartDate] [datetime2] NOT NULL DEFAULT (getdate()),

[EndDate] [datetime2] NOT NULL

    DEFAULT (convert(datetime2, ‘9999-12-31 23:59:59.9999999′))

ALTER TABLE dbo.Region_History

ADD [StartDate] [datetime2] NOT NULL DEFAULT (getdate()),

[EndDate] [datetime2] NOT NULL

   DEFAULT (convert(datetime2, ‘9999-12-31 23:59:59.9999999′))

The structures of the history table and main table must be identical for turning versioning on. Since there is one column, regionhistoryid, in the history table that is not in the main table, you can either get rid of it in the history table or add it to the main table. Getting rid of it will mean the history table has no key per the older method. This is not ideal if you want to query older data with that method. Instead, add it to the main table. You won’t be using it, just add it to ensure conformance for this purpose.

ALTER TABLE [dbo].[Region] ADD RegionHistoryId int;

The next step is to add the period to connect two new fields in the main table and then attempt to enable versioning.

ALTER TABLE dbo.Region  

ADD PERIOD FOR SYSTEM_TIME ([StartDate], [EndDate])

ALTER TABLE dbo.Region  

   SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.Region_History,

      DATA_CONSISTENCY_CHECK = ON))

This returns an error as shown below:

word image 48 Converting a History Table into a System Versioned Temporal Table

SQL Server will not allow identity columns in a history table. The identity property must be removed, but the data in this column is needed. To solve this, create another column, move data there, drop this column and rename the new column to the old name.

ALTER TABLE [dbo].[Region_History] ADD RegionHistId int;

GO

UPDATE [dbo].[region_history] SET regionhistid = regionhistoryid;

GO

ALTER TABLE [dbo].[region_history] DROP COLUMN regionhistoryid;

GO

EXEC sp_RENAME ‘dbo.region_history.RegionHistid’

     , ‘RegionHistoryID’, ‘COLUMN’;

GO

Now that the identity column is removed from the history table, try to turn versioning on again. This time you’ll get another error.

ALTER TABLE dbo.Region  

   SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.Region_History,

      DATA_CONSISTENCY_CHECK = ON))

word image 49 Converting a History Table into a System Versioned Temporal Table

The data consistency check runs DBCC CHECKCONSTRAINT under the hood and comes up with issues if constraints do not validate. The default value of the new EndDate column is the maximum date of the system which, of course, is in the future.

There are several ways to resolve this problem. One way is to enable versioning but to skip the checks. Don’t run this code, but here it is for your reference:

ALTER TABLE dbo.Region  

   SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.Region_History,

      DATA_CONSISTENCY_CHECK = OFF))

This essentially means the data that is currently in the history table cannot be queried on with methods used to query temporal tables. You may also run into issues with querying any new data because older data is bad and has time period overlaps. This method is not recommended since it carries a lot of risk.

Instead, it is better to fix the data for time periods to match what is expected when using temporal table methodologies. Each history record must have a start and end date in the past during which the row was valid. The start date of each history record must match the end date of the one before it and the end date should be the start date of the next one and so on. The start date of the main table record should equal the last end date of its history. Cleaning up the data in this way will ensure that there are no time gaps.

To perform the cleanup, follow these three steps:

Step 1: Find the first history record for each row in the main table and set start date to equal the create date and end date to equal update date.

UPDATE dbo.region_history SET startdate = createdate,

  enddate = updatedate

–select a.regionid,a.regionhistoryid,b.slno

FROM dbo.region_history a INNER JOIN

(SELECT regionid,regionhistoryid,

   RANK() OVER (PARTITION BY regionid ORDER BY regionhistoryid) AS slno

FROM dbo.region_history) b

ON a.regionid = b.regionid AND a.regionhistoryid = b.regionhistoryid

  AND b.slno = 1

Step 2: Find the records that are dated after the first one, and update them in sequence, the start date of each record should be equal to end date of the previous one.

1

2

3

4

5

6

7

8

9

10

11

UPDATE dbo.region_history SET startdate = b.priorupdatedate,

  enddate = a.updatedate

–select a.*,b.priorupdatedate, b.slno

FROM dbo.region_history a INNER JOIN

(SELECT regionid,regionhistoryid,updatedate,

  LAG(updatedate) OVER (PARTITION BY RegionId order by updatedate)

    AS priorupdatedate,

  RANK() OVER (PARTITION BY regionid ORDER BY regionhistoryid) AS slno

  FROM dbo.region_history) b

ON a.regionid = b.regionid AND a.regionhistoryid = b.regionhistoryid

   AND b.slno > 1 and b.priorupdatedate IS NOT NULL

Step 3: The last date of the very last history record should equal the start date in main table of the same record. Remember that you have old triggers still enabled, so any changes you make to the main table will be logged again. So first, you have to drop those triggers. You also have to temporarily remove the period.

DROP TRIGGER [dbo].[Region_Delete]

DROP TRIGGER [dbo].[Region_Update]

ALTER TABLE dbo.region DROP PERIOD FOR system_time;

Then, run an update to bridge the history on the history table and main table.

1

2

3

4

5

6

7

8

9

10

WITH RegionCTE AS

(

    SELECT RegionID, maxupdatedate = MAX(updatedate)

    FROM dbo.Region_History GROUP BY regionid

)

UPDATE dbo.region SET startdate = b.maxupdatedate,

   enddate = ‘9999-12-31 23:59:59.9999999′

–select a.*,b.priorstartdate

FROM dbo.region a INNER JOIN RegionCTE b  

ON a.regionid = b.regionid

You may, if you choose to, drop the columns createuser, createdate, updatedate, updateuser, and regionhistoryid from both tables at this point. If you have older queries using these columns, though, this might not be desirable to do.

Now, when you add the period back in and set versioning on, it works like a charm. You have also covered all time gaps involved so that querying using both the older method and the new method in versioning will work the same.

ALTER TABLE dbo.Region  

   ADD PERIOD FOR SYSTEM_TIME ([StartDate], [EndDate])

ALTER TABLE dbo.Region  

   SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.Region_History,

   DATA_CONSISTENCY_CHECK = ON))

SELECT * FROM dbo.Region

FOR SYSTEM_TIME AS OF ‘2019-03-10 14:07:29.2366667′ ;  

Gives results as below:

word image 50 Converting a History Table into a System Versioned Temporal Table

1

2

3

4

5

6

7

8

9

10

DECLARE @ADayAgo datetime2  

SET @ADayAgo = DATEADD (day, -2, getdate())  

/*Comparison between two points in time for subset of rows*/  

SELECT D_1_Ago.[RegionID], D.[RegionID],  

D_1_Ago.[RegionDescription], D. RegionDescription,

D_1_Ago.[StartDate], D.[StartDate],  

D_1_Ago.[EndDate], D.[EndDate]  

FROM [dbo].[Region] FOR SYSTEM_TIME AS OF @ADayAgo AS D_1_Ago  

JOIN [dbo].[Region] AS D ON  D_1_Ago.[RegionID] = [D].[RegionID]    

AND D_1_Ago.[RegionID] BETWEEN 1 and 4 ;

(The results returned may depend on when you run this query – in comparison to when the data was created, so use the right date for the variable @Adayago).

word image 51 Converting a History Table into a System Versioned Temporal Table

Converting Your Data

Cleaning up the data to make it conform to the system-version temporal tables can be quite tricky, and your scenario may be even more complex. Here are a few things to keep in mind:

  • The start date should always be less than the end date in both tables.
  • If you have multiple history records for a single parent record in main table, the start and end dates should also be sequential in ascending order with no period overlaps.
  • The end date for the last row in the history table should match the start date for the active record in the parent table.

Deleting data that does not obey these conditions is also a possible solution. Since this destroys the purpose of even having an existing table converted to history, this is not recommended. Instead you could keep that table as is and use a brand-new table to store versioning history.

It is also noteworthy that table versioning does not capture who made the change. That is something you may have to do manually, if you have a need to get this information. This trigger based workaround suggested by MVP Aaron Bertrand is a good way to incorporate this.

Removing Versioning

Eventually you may have e a scenario where you need to drop the tables or remove versioning entirely. Reasons might be that the table is gathering too much history or the footprint is not affordable.

To do this, you need set system versioning off and drop period for system_time. You can remove the date columns too since they are not of much relevance if the table is not using them but this optional.

There are a few steps to go through for this process and the following script can come in handy.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

DECLARE @DefaultConstraint nvarchar(200)

DECLARE @Tablename nvarchar(200)

DECLARE @startdatecolumnname nvarchar(200)

DECLARE @enddatecolumnname nvarchar(200)

SELECT @Tablename = ‘dbo.Region’

SELECT @startdatecolumnname = ‘SysStartTime’

SELECT @enddatecolumnname = ‘SysEndTime’

EXEC(‘ALTER TABLE ‘ + @Tablename + ‘ SET (system_versioning = off)’)

EXEC(‘ALTER TABLE ‘ + @Tablename + ‘ DROP PERIOD FOR system_time;’)

SELECT @DefaultConstraint = Name FROM SYS.DEFAULT_CONSTRAINTS

WHERE PARENT_OBJECT_ID = OBJECT_ID(@Tablename)

AND PARENT_COLUMN_ID = (SELECT column_id FROM sys.columns

WHERE NAME = @startdatecolumnname

AND object_id = OBJECT_ID(@Tablename))

IF @DefaultConstraint IS NOT NULL

EXEC(‘ALTER TABLE ‘ + @Tablename +  

   ‘ DROP CONSTRAINT ‘ + @DefaultConstraint)

EXEC(‘ALTER TABLE ‘ + @Tablename +

   ‘ DROP COLUMN IF EXISTS ‘ + @startdatecolumnname)

SELECT @DefaultConstraint = Name FROM SYS.DEFAULT_CONSTRAINTS

WHERE PARENT_OBJECT_ID = OBJECT_ID(@Tablename)

AND PARENT_COLUMN_ID = (SELECT column_id FROM sys.columns

WHERE NAME = @enddatecolumnname

AND object_id = OBJECT_ID(@Tablename))

IF @DefaultConstraint IS NOT NULL

EXEC(‘ALTER TABLE ‘ + @Tablename +

   ‘ DROP CONSTRAINT ‘ + @DefaultConstraint)

EXEC(‘ALTER TABLE ‘ + @Tablename +

   ‘ DROP COLUMN IF EXISTS ‘ + @enddatecolumnname)

Summary

Data Versioning is complex and there are no two ways of versioning that work exactly the same. There are many situations you may run into if you are transitioning from an older method to temporal tables. Knowing what SQL Server expects would help this transition to happen smoothly. Temporal tables are a great feature and very easy to use, once we cross the hurdle of setting them up.

Let’s block ads! (Why?)

SQL – Simple Talk

Read More

Dynamics 365 On-Premises Auditing: Managing Audit Table Size

September 24, 2018   Microsoft Dynamics CRM
auditing 300x225 Dynamics 365 On Premises Auditing: Managing Audit Table Size

With Auditing configured in the system, Dynamics 365 field level changes can be written to the AuditBase table in MSSQL. The challenge is how to maintain audit history without creating an unmanageable database size over time.

With SQL Enterprise, the AuditBase table is managed in partitions. A separate audit partition is created quarterly by the system. With appropriate user security privileges, Dynamics 365 provides functionality to delete the quarterly audit partitions, oldest first. However, deleting a partition removes the record level audit data for the related time period. Audited details for the time period represented by the audit partition will no longer be available in Dynamics 365.

As the AuditBase table doesn’t have any constraints that would be violated by moving data from the table, an audit partition’s records can be copied into an audit archive database, set to the same collation as the Dynamics 365 production database.

As the data in the AuditBase table contains all GUID and INT based values, a join between the archive database and the production database will support query-based access and reporting.

Over time, as audit data is archived before each audit partition deletion, the join between the audit and production databases continues to provide reporting.

What is Auditing and what is being Audited?

Designed to meet compliance, security and governance requirements, Dynamics 365 auditing can be configured to capture the following:

  • When a user accesses a record and what changes are made
  • Create, update, deactivate, and delete operation details
  • Original attribute value versus new attribute value
  • Changes to sharing privileges
  • Record relationship changes

With the appropriate user security setting, system auditing configuration is accessible under Auditing in the Settings section of Dynamics 365. Entity and attribute configuration is made in the Customization section where custom fields and entities are created and added to the system.

Over auditing can impact database size significantly. To validate and adapt audit configuration, use XRMToolBox’s AuditCenter plugin. AuditCenter bypasses the cumbersome process of using system customization to review and edit system auditing configuration; editing Case Stage in this example.

091818 1618 Dynamics3651 Dynamics 365 On Premises Auditing: Managing Audit Table Size

Creating an Audit Archive Database in MSSQL Enterprise

Use MSSQL Management Studio to create a new database in which to archive Dynamics 365 audit records prior to purging audit partitions in Dynamics 365. Ensure that the new audit record archive database uses the same Collation as the D365 production database that will be the source of the archived audit records.

“Collation refers to a set of rules that determine how data is sorted and compared. Character data is sorted using rules that define the correct character sequence, with options for specifying case-sensitivity, accent marks, kana character types and character width.”

The Collation rules between the D365 production database and the audit record archive database must be identical to insure data consistency when running queries and reports against both datasets concurrently.

An organizational database administrator (DBA) should create the audit archive database as they are in the best position to duplicate the schema, determine collation and ensure any organizationally specific configuration and security is in place.

Archive Audit Records Prior to Purge

Although Dynamics 365 will create a new audit partition quarterly, it may not be necessary to archive audit records to the audit archive database quarterly. With analysis of how much audit data is being generated over time, it may be appropriate to archive and purge audit data biannually or yearly. Depending on analysis, it may be determined that maintaining two years of audit data in Dynamics 365 while maintaining older audit data in the audit archive database is the preferred approach. In such a circumstance, an annual audit data archive / purge procedure may be run at each year end. Regardless of regularity, the key is a documented procedure.

As there are multiple methods of copying a Dynamics 365 audit partition’s records from the production database to the audit archive database, an organizational database administrator (DBA) should identify and document the preferred procedure. MS SQL Management Studio menu commands and sql scripts are two distinct approaches. The PowerObjects infrastructure team is available to consult and write sql scripts to insure a repeatable procedure that ensures ongoing data integrity.

Create a Join Between Audit Archive and Dynamics 365 Production Databases

The critical factor in maintaining continuity and data integrity between the Dynamics 365 record and their related archived audit records, is a join between the Dynamics 365 and audit archive databases.

Each record in a relational database has a unique identifier, that is not associated with any other record. In MSSQL, the unique identifier is referred to as a GUID or Globally Unique IDentifier. No two records in Dynamics 365 will share a GUID. By extension, the GUID of the archived audit record will also not be associated with any other record in Dynamics 365.

The GUID is often referred to as the record’s primary key. In a relational database, when a record (a contact) is in relation to another record (a case), the GUID of the contact record can be thought of as the primary key and the GUID of the related case record can be thought of as the secondary key. A primary key pointing to secondary keys creates the relationship; in this a case a one to many relationship or one contact related to many cases.

As noted above, Dynamics 365 audit records include the GUID of the record on which the changes were made. If a contact’s email address is changed, the audit record (which will have its own GUID), will also reference the GUID of the contact on which the email address was edited.

A join will reference GUIDs between the archived audit record and the Dynamics 365 record on which the audited change was made. From an external reporting tool, all the archived audit records related to the GUID of a specific Dynamics 365 record can be returned by query and reported on.

A database administrator (DBA) will be able to create the join to leverage reporting tools used within the data warehouse strategy of the organization.

Purge Dynamics 365 Audit Partitions

Once the Dynamics 365 production audit data has be validated as successfully archived, the final step is to delete the archived audit partition or partitions.

With appropriate security privileges, a Dynamics 365 administrator can navigate to the Auditing section of the Settings area in Dynamics 365 where Audit Log Management is an available option.

As previously noted, Dynamics 365 creates a unique audit log partition for each quarter and the oldest partition must be deleted first. Partition deletion is completed by clicking on the oldest partition (a checkmark will show in the left column to indicate the partition selected) and clicking the ‘Delete Logs’ button.

091818 1618 Dynamics3652 Dynamics 365 On Premises Auditing: Managing Audit Table Size

There you have it! For more Dynamics 365 tips and tricks, subscribe to our blog!

Happy Dynamics 365’ing!

Let’s block ads! (Why?)

PowerObjects- Bringing Focus to Dynamics CRM

Read More
« Older posts
  • Recent Posts

    • Bad Excuses
    • Understanding CRM Features-Better Customer Engagement
    • AI Weekly: Continual learning offers a path toward more humanlike AI
    • The Easier Way For Banks To Handle Data Security While Working Remotely
    • 3 Ways Data Virtualization is Evolving to Meet Market Demands
  • Categories

  • Archives

    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited