• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Heaps in SQL Server: Part 4 PFS contention

January 5, 2021   BI News and Info

The series so far:

  1. Heaps in SQL Server: Part 1 The Basics
  2. Heaps in SQL Server: Part 2 Optimizing Reads
  3. Heaps in SQL Server: Part 3 Nonclustered Indexes
  4.  Heaps in SQL Server: Part 4 PFS contention

After looking at the internal structures and the selection of data in heaps in the previous articles, the next articles will describe how DML operations can be optimized on a heap.

Demo set up

I use data from a demo database for all demos for demonstration purposes in articles and conferences. You can download the database [CustomerOrders] here.

In this article, I’ll use an additional database, demo_db. Run the following script to create the demo_db database, a heap to test inserts, and a view pointing to [CustomerOrders].

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

CREATE DATABASE demo_db;

GO

USE demo_db;

GO

CREATE TABLE dbo.Customers

(

Id     INT          NOT NULL,

Name   VARCHAR(200) NOT NULL,

CCode  CHAR(3)      NOT NULL,

State  VARCHAR(200) NOT NULL,

ZIP    CHAR(10)     NOT NULL,

City   VARCHAR(200) NOT NULL,

Street VARCHAR(200) NOT NULL

);

GO

CREATE VIEW dbo.CustomerAddresses

AS

SELECT C.Id,

C.Name,

A.CCode,

A.State,

A.ZIP,

A.City,

A.Street

FROM CustomerOrders.dbo.Customers AS C

INNER JOIN CustomerOrders.dbo.CustomerAddresses AS CA

ON (C.Id = CA.Customer_Id)

INNER JOIN CustomerOrders.dbo.Addresses AS A

ON (CA.Address_Id = A.Id)

WHERE CA.IsDefault = 1;

GO

Standard Procedure – INSERT

When data records are entered in a heap, this process consists of several individual steps that are transparent to the applications. Knowing them leaves room for possible optimization of the process.

Update of PFS

If a data row is stored in a heap and there is not enough space available on the data page, a new data page must be created. The data record can only be saved after the new page has been created.

In the first demo, insert one row into the formerly created table from the created view.

CHECKPOINT;

GO

INSERT INTO dbo.Customers

SELECT *

FROM dbo.CustomerAddresses

WHERE Id = 1;

GO

The above example adds a new record from an existing data source to the new table. Since the table was previously empty, the table structure must first be created. The undocumented function sys.fn_dblog () can be used to determine which tasks Microsoft SQL Server had to perform to insert the record into the table. I used CHECKPOINT to eliminate previous operations from appearing in the results below.

1

2

3

4

5

6

7

8

9

10

11

SELECT ROW_NUMBER() OVER (ORDER BY [Current LSN]) [Step #],

[Current LSN],

Operation,

Context,

AllocUnitName,

[Page ID],

[Slot ID]

FROM sys.fn_dblog(NULL, NULL)

WHERE CONTEXT <> N‘LCX_NULL’

AND AllocUnitName IS NOT NULL;

GO

word image 23 Heaps in SQL Server: Part 4 PFS contention

Figure 1: Recording from the Transaction Log

Step(s)

Operation and Context

Description

1 and 2

LOP_MODIFY_ROW / LCK_PFS

Since data pages are first created for the table, each assignment must be “registered” in the PFS page. A data page and the IAM page are created and registered for the table.

3

LOP_FORMAT_PAGE / LCX_IAM

Creation of the IAM page for table dbo.Customers

4

LOP_MODIFY_ROW / LCX_IAM

Registration of the first data page in IAM page

5 and 6

LOP_ROOT_CHANGE / LCX_CLUSTERED

Registration of table metadata in Microsoft SQL Server system tables

7

LOP_FORMAT_PAGE / LCX_HEAP

Preparation of the data page for the heap for storing the records.

8

LOP_ROOT_CHANGE / LCX_CLUSTERED

Storage of metadata in Microsoft SQL Server system tables

9

LOP_INSERT_ROWS / LCX_HEAP

Insert row in Heap

10

LOP_SET_FREE_SPACE / LCX_PFS

Update of the filling level of the data page for the PFS page

Note:  I describe the system pages and their functions in detail in the article “Heaps – The Basics”.

If further records are entered, the existing data page is filled until it is – in percentage terms – so full that no new records can be saved on it and Microsoft SQL Server has to allocate the next data page in the system.

Run this script to add another 10,000 rows.

1

2

3

4

5

6

7

8

9

10

11

12

CHECKPOINT;

GO

DECLARE @I INT = 2

WHILE @I <= 10000

BEGIN

INSERT INTO dbo.Customers

SELECT * FROM dbo.CustomerAddresses

WHERE   Id = @I;

SET @I += 1;

END

GO

Another 10,000 records will be inserted into the table [dbo].[Customers] with the code above. Afterwards, look into the Transaction log to see the single transactional steps.

word image 24 Heaps in SQL Server: Part 4 PFS contention

Figure 2: PFS updates

You can see that Microsoft SQL Server must update the PFS page several times (line 2, 46, 73, …). This is because the PFS page – only in the case of heaps – needs to be updated every time the next threshold is reached.

Bottleneck PFS

The PFS page “can” become a bottleneck for a heap if many data records are entered in the heap in the shortest possible time. How often the PFS page has to be updated depends mostly on the data record’s size to be saved.

This procedure does not apply to clustered indexes since data records in an index must ALWAYS be “sorted” into the data volume according to the defined index value. Therefore, the search for a “free” space is not carried out via the PFS page but via the value of the key attribute!

Microsoft SQL Server must explicitly check after each insert process whether the PFS page needs to be updated or not. If the above result is reduced to processes on the PFS page, the process is easy to recognize.

word image 25 Heaps in SQL Server: Part 4 PFS contention

Figure 3: Filtered operations from the log for PFS activity

In total – due to the short data record length – the PFS page had to be updated 14 times in order to enter 10,000 data records in the heap.

At first glance, that may not seem like a lot – after all, 10,000 records were entered. However, it can become problematic for the PFS page as soon as more than one process wants to enter data in the table at the same time. To derive – imprecise due to the limitations of my test system! – a trend, I had the latches recorded on the PFS page with the help of an extended event session and then processed the above (wrapped in a stored proc) in parallel with a different number of clients.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

CREATE OR ALTER PROC dbo.InsertCustomerData

@NumOfRecords INT

AS

BEGIN

WHILE @NumOfRecords > 0

BEGIN

INSERT INTO dbo.Customers

SELECT * FROM dbo.CustomerAddresses

WHERE   Id = @NumOfRecords;

SET @NumOfRecords -= 1;

END

END

GO

CREATE EVENT SESSION [track pfs contention]

ON SERVER

ADD EVENT sqlserver.latch_suspend_end

(

    ACTION(package0.event_sequence)

    WHERE

    (

        sqlserver.database_name = N‘demo_db’

        AND sqlserver.is_system = 0

        AND mode >= 0

        AND mode <= 5

    )

    AND class = 28

    AND

    (

        – only check for PFS, GAM, SGAM

        page_id = 1

        OR page_id = 2

        OR page_id = 3

        OR package0.divides_by_uint64(page_id, 8088)

        OR package0.divides_by_uint64(page_id, 511232)

    )

)

ADD TARGET package0.event_file

(

SET filename = N‘T:\TraceFiles\PFS_Contention.xel’,

MAX_FILE_SIZE = 1024,

MAX_ROLLOVER_FILES = 10

)

WITH

(

    MAX_MEMORY = 4096KB,

    EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,

    MAX_DISPATCH_LATENCY = 5 SECONDS,

    MAX_EVENT_SIZE = 0KB,

    MEMORY_PARTITION_MODE = NONE,

    TRACK_CAUSALITY = OFF,

    STARTUP_STATE = OFF

)

GO

I carried out each series of tests five times to compensate for possible deviations. After each workload, the recordings from the extended event has been analysed with the following query:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

SELECT CAST(event_Data AS xml) AS StatementData

INTO #EventData

FROM sys.fn_xe_file_target_read_file

(‘T:\TraceFiles\PFS*.xel’, NULL, NULL, NULL);

GO

SELECT * FROM #EventData;

GO

WITH XE

AS

(

SELECT StatementData.value(‘(event/@timestamp)[1]’,‘datetime’) AS [time],

StatementData.value(‘(event/@name)[1]’, ‘VARCHAR(128)’) AS [Event_name],

StatementData.value(‘(event/data[@name=”mode”]/text)[1]’,‘VARCHAR(10)’) AS [mode],

StatementData.value(‘(event/data[@name=”duration”]/value)[1]’,‘int’) AS [duration],

StatementData.value(‘(event/data[@name=”page_type_id”]/text)[1]’,‘VARCHAR(64)’) AS [page_type]

FROM #EventData

)

SELECT XE.page_type,

COUNT_BIG(*) AS num_records,

SUM(XE.duration) AS sum_duration,

AVG(XE.duration) AS avg_duration

FROM XE

GROUP BY

XE.page_type

GO

Processes

1

2

4

8

16

32

64

PFS-Contention

0

1

1

7

7

16

68

avg. duration (µsec)

0

0

27

305

790

1.113

3.446

Runtime (sec)

4,28

5,65

7,68

13,45

23,83

55,93

165,72

avg (µsec)/ row

428

2.825

192

16.813

16.769

17.478

25.894

word image 26 Heaps in SQL Server: Part 4 PFS contention

Figure 4: Dependence on processes to contention(s)

The tests I carried out are not representative because external influences were not properly isolated. Nevertheless, one can deduce from the values that the potential for contention on the PFS page escalates with an increasing number of simultaneous processes.

word image 27 Heaps in SQL Server: Part 4 PFS contention

You know the problem from everyday life; You have to queue longer the more people want to use the same resource (till in the supermarket) at the same time. The bottleneck can be rectified by working with multiple files for the filegroup in which the heap is located – as is also common practice with TEMPDB.

word image 28 Heaps in SQL Server: Part 4 PFS contention

Figure 5: A separate database file for each core

I performed the same workload with 4 database files for the PRIMARY filegroup, and the results have been observed with the Windows Resource Manager:

word image 29 Heaps in SQL Server: Part 4 PFS contention

Figure 6: Relatively even distribution of the write load – better throughput

BTW: Now it is a good time to learn “a few” german words like

Datei = File

Lesen = read

Schreiben = write

It was to be expected that this would ease the situation. You can think of it as a situation in a supermarket where only one till is open at first. As soon as there are many customers in the supermarket, it accumulates in front of the till. Several cash registers are opened, and the situation is more relaxed again.

word image 30 Heaps in SQL Server: Part 4 PFS contention

Figure 7: Significant relaxation for the PFS pages

Bottleneck data structure

Anyone working with heaps must take the data structures into account. The biggest difference in the storage of data between an index and a heap is that data in a heap can be stored anywhere, while indexed tables must store the data according to the index attribute’s value. Storing data in a Heap can result in several problems:

Waste of storage space due to the calculation of the percentage of available storage space on a data page

Waste of memory in the buffer pool, since it is not the data itself that is loaded into the buffer pool, but the data pages on which the data is located

Increased contention on the PFS page if data records are too large and the percentage filling level has to be updated quickly.

Unused memory on a data page

Memory is expensive and, for Microsoft SQL Server, it’s an important component for fast queries. For this reason, you naturally want to avoid the situation where data pages are not completely filled, and thus RAM cannot be used.

To demonstrate this huge discrepancy between a Heap and a Clustered Index, create in the first scenario, a Heap table with a column C2 with a fixed size of 2,000 bytes for the payload. Afterwards, a Stored Procedure inserts 10,000 rows into the Heap table.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

USE demo_db;

GO

DROP TABLE IF EXISTS dbo.Customers;

GO

– Create a demo table

CREATE TABLE dbo.Customers

(

   C1 INT NOT NULL IDENTITY (1, 1),

   C2 CHAR(2000) NOT NULL DEFAULT (‘Testdata’),

);

GO

– Create stored procedure for the INSERT process

CREATE OR ALTER PROC dbo.InsertCustomerData

@NumOfRecords INT

AS

BEGIN

WHILE @NumOfRecords > 0

BEGIN

INSERT INTO dbo.Customers

(C2)

DEFAULT VALUES;

SET @NumOfRecords -= 1;

END

END

GO

– Execution of stored procedures for 10,000 rows

EXEC dbo.InsertCustomerData @NumOfRecords = 10000;

GO

The example above creates the table [dbo].[Customers] and a simple Stored Procedure which gets the number of rows to be inserted from a variable. After the insert process, you can get insights into the data distribution with the next query, which retrieves the physical information about the stored data.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

SELECT page_count,

record_count,

record_count / page_count AS avg_rows_per_page,

avg_page_space_used_in_percent

FROM sys.dm_db_index_physical_stats

(

DB_ID(),

OBJECT_ID(N‘dbo.Customers’, N‘U’),

NULL,

NULL,

N‘DETAILED’

)

WHERE index_level = 0;

GO

word image 31 Heaps in SQL Server: Part 4 PFS contention

With the table’s current design, two or three records (avg) can be stored on one data page. This means that a data page is filled with approx. 50 – 75%. If you change the Heap Table to a Clustered Index Table, the results look completely different!

1

2

3

4

5

6

7

8

9

10

11

12

13

DROP TABLE IF EXISTS dbo.Customers;

GO

– Create a demo table

CREATE TABLE dbo.Customers

(

  C1  INT NOT NULL IDENTITY (1, 1),

  C2  CHAR(2000) NOT NULL DEFAULT (‘Testdata’),

  CONSTRAINT pk_Customers_C1 PRIMARY KEY CLUSTERED (C1)

);

GO

– Execution of stored procedures for 10,000 rows

EXEC dbo.InsertCustomerData @NumOfRecords = 10000;

GO

word image 32 Heaps in SQL Server: Part 4 PFS contention

The reason for this odd behaviour is that Microsoft SQL Server references ONLY to the PFS page when it comes to the storage of a record in a Heap while a Clustered Index always has to follow the restriction of the Clustered Key and stores the record on the position of the key in the table.

A clustered index outperforms – based on the storage consumption – the Heap due to the need to store a record based on the key attribute. But keep in mind that – different from a Heap structure – the INSERT process requires to follow the B-Tree structure when it must safe a record on a data page.

Note

Before you go for a Heap structure, perform some tests to understand your data distribution in the data pages!

Workload when inserting records

The following demonstration shows the dependencies between the row size and the remaining free space on a data page.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

IF OBJECT_ID(N‘dbo.demo_table’, N‘U’) IS NOT NULL

DROP TABLE dbo.demo_table;

GO

– The size of the column C1 will change with every test!

CREATE TABLE dbo.demo_table (C1 CHAR(100) NOT NULL);

GO

– Clear the log file for the analysis of PFS updates

CHECKPOINT;

GO

– This script will run for each test loop and insert

– 10,000 records into the table

BEGIN TRANSACTION InsertRecord;

GO

DECLARE @I INT = 1;

WHILE @I <= 10000

BEGIN

    INSERT INTO dbo.demo_table(C1) VALUES (‘This is a test’);

    SET @I += 1;

END

– Afterwards we count the log entries for the PFS updates

SELECT Context,

COUNT_BIG(*)

FROM sys.fn_dblog(NULL, NULL)

WHERE [Transaction ID] IN

(

SELECT [Transaction ID]

FROM sys.fn_dblog(NULL, NULL)

WHERE [Transaction Name] = N‘InsertRecord’

OR Context = N‘LCX_PFS’

)

GROUP BY

Context;

– and have a look to the avg space used in the heap

SELECT page_count,

avg_page_space_used_in_percent

FROM sys.dm_db_index_physical_stats

(

DB_ID(),

OBJECT_ID(N‘dbo.demo_table’, N‘U’),

0,

NULL,

N‘DETAILED’

);

GO

ROLLBACK TRANSACTION;

GO

The above demonstration has been run with different row sizes. The result of the tests with different row sizes gave the following results:

While the duration of the transaction runtime changes moderately (157 ms – 1.459 ms), the number of updates of the PFS page increases extremely beginning with a record length of 200 bytes (563 – 16.260). Although the PFS page refresh occurs quite frequently, the number of data pages grows moderately (149-5,000). The average filling level of a data page is between 75% and 100%, depending on the size of the row.

The PFS page’s frequent updating is explained by the growing size of a data record since fewer data records fit on one data page and the various thresholds can be reached more quickly.

Record Length

Time (ms)

PFS Update

Pages

Avg. Used space

100

157

563

149

90,36%

200

414

1,397

271

95,26%

500

441

3,199

777

80,91%

1000

595

5,916

1,436

86,79%

2000

920

10,625

3,339

74,31%

3000

1,138

16,069

5,004

74,27%

4000

1,459

16,260

5,000

99,04%

Let’s do a little maths when data are stored on a data page.

Bytes

50%

80%

95%

100%

100

40

64

76

80

200

20

32

38

40

500

8

12

15

16

1000

4

6

7

8

2000

2

3

3

4

3000

1

2

2

2

4000

1

   

2

The above table shows the maximum records which “should” fit on ONE data page when the threshold has exceeded. Please note that with a fill level of 95%, only 403 bytes (8.060 * (1-95%)) are mathematically available on the data page.

If the row size is 100 Bytes, Microsoft SQL Server can store 40 records on ONE data page before the threshold gets updated to 80%. It takes 24 more records before the next update to 95% will happen.

As bigger the row size is as faster will the thresholds be reached. Keep in mind that the row size has an direct impact on the possible contention on the PFS page.

Let’s take a row size of 1,000 bytes for a record. With the 5th record, the PFS gets updated to 80%. When the 6th row (1,000 Bytes) must be stored on a data page, it will fit perfectly. From the table above, you can see the green and red values.

The green values mean that the records can be stored on the data page while the red ones show the records which will request a new data page!

Summary

The aim when inserting new data in a heap is to avoid frequent updates of the PFS pages and to use the available space as max as possible. The next article will show how you can boost the performance when you insert data into a Heap.

Let’s block ads! (Why?)

SQL – Simple Talk

contention, Heaps, Part, Server
  • Recent Posts

    • Accelerate Your Data Strategies and Investments to Stay Competitive in the Banking Sector
    • SQL Server Security – Fixed server and database roles
    • Teradata Named a Leader in Cloud Data Warehouse Evaluation by Independent Research Firm
    • Derivative of a norm
    • TODAY’S OPEN THREAD
  • Categories

  • Archives

    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited