• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Changing Data Types on Large Tables: The INT to BIGINT Conundrum

February 28, 2018   BI News and Info

During one of our regular SQL Server heath checks, using sp_blitz, one of our largest production tables raised a worrying alert. The ID column of a table holding customer order information was an INT datatype, and it was running out of numbers.

The table was around 500GB with over 900 million rows. Based on the average number of inserts a day on that table, I estimated that we had eight months before inserts on that table would grind to a halt. This was an order entry table, subject to round-the-clock inserts due to customer activity. Any downtime to make the conversion to BIGINT was going to have to be minimal.

This article describes how I planned and executed a change from an INT to a BIGINT data type, replicating the process I used in a step by step guide for the AdventureWorks database. The technique creates a new copy of the table, with a BIGINT datatype, on a separate SQL Server instance, then uses object level recovery to move it into the production database.

Assessing the options

The obvious option for changing the datatype from INT to BIGINT was simply to ALTER the table directly. However, the requirement for minimal downtime ruled this out immediately. The ID column is a clustered Primary Key, which poses the first issue: you must drop the clustered index before running the ALTER TABLE command.

This was going to cause an outage of around nine hours and a considerable amount of logging. We could add extra disk capacity to cope with the logging requirements, but that length of downtime was not an option.

We didn’t have the luxury of being on Azure SQL Database, or SQL Server 2016 or 2017, each of which offer online rebuild options. There are limitations to online rebuilds though, one being this warning from MSDN:

Online alter column does not reduce the restrictions on when a column can be altered. References by index/stats, etc. might cause the alter to fail.

Another option involved the use of a trigger. This would entail copying all the data to a new table, creating all indexes and constraints, then creating a trigger to make sure inserts go to both tables. I was worried about many aspects of this idea, including maintenance and performance.

Another option suggested was to reseed the INT to use negative numbers. This means resetting the INT from -1 onwards to -2.147 billion rows, but I wasn’t a fan of the idea, and, in any event, this was not a long-term answer.

A standard approach I’d seen documented, was to create a replica table, except with a BIGINT datatype instead of INT and copy the data across in small batches. You then keep the two tables in sync, by capturing modifications to the original table (using a trigger or Change Data Capture) and applying them to the new table. Finally, during some brief downtime, switch the new table for the old. This was my preferred approach…but with a new twist. I created the replica table on a separate development instance, rather than side-by-side with the original in production. Instead of a trigger or Change Data Capture, I used SSIS packages to keep the tables in sync. Then, I used object-level restore to move the new table to production for the switch over.

I’d not seen this approach written about anywhere, but it seemed like a great way to minimize any possible disruption to production, and my instincts told me it would work!

The proof of concept

I did a lot of work in our test and development environments, making sure this approach was going to work exactly as intended. The following sections summarize the testing work. The demo mimics the steps I took as closely as possible but using the sample AdventureWorks database. I’ll assume you’ve restored the database to a development environment and start by creating the replica table.

Create the ‘replica’ table

In a newly restored AdventureWorks database, create a PersonNEW table alongside the original Person table, using a BIGINT data type for the clustered index column as shown in Listing 1. Note: to more closely follow along with the process used when I made the change in production, create the new table in a database on another instance.

Listing 1: Create the PersonNEW table

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

CREATETABLEPerson.PersonNEW

(

    BusinessEntityIDBIGINTNOTNULL,

    PersonTypeNCHAR(2)NOTNULL,

    NameStyledbo.NameStyleNOTNULL,

    TitleNVARCHAR(8)NULL,

    FirstNamedbo.NameNOTNULL,

    MiddleNamedbo.NameNULL,

    LastNamedbo.NameNOTNULL,

    SuffixNVARCHAR(10)NULL,

    EmailPromotionINTNOTNULL,

    AdditionalContactInfoXML(CONTENTPerson.AdditionalContactInfoSchemaCollection)NULL,

    DemographicsXML(CONTENTPerson.IndividualSurveySchemaCollection)NULL,

    rowguidUNIQUEIDENTIFIERROWGUIDCOLNOTNULL,

    ModifiedDateDATETIMENOTNULL,

    CONSTRAINTPK_Person_BusinessEntityIDNEW

        PRIMARYKEYCLUSTERED(BusinessEntityIDASC)

        WITH(PAD_INDEX=OFF,STATISTICS_NORECOMPUTE=OFF,IGNORE_DUP_KEY=OFF,

                ALLOW_ROW_LOCKS=ON,ALLOW_PAGE_LOCKS=ON

             )ON[PRIMARY]

)ON[PRIMARY]TEXTIMAGE_ON[PRIMARY];

GO


Transfer across the data, then create indexes

I used SSIS to transfer all data across into the PersonNEW table, and then created all the necessary indexes and constraints. When you create the SSIS package, make sure to click Enable Identity Insert(see below). You will find this option under the Edit Mappings tab in Select Source Tables and Views. In my scenario there was an identity column so this was needed. I also didn’t want any discrepancies, since the ID is a unique number for each order used by many applications and throughout the company.

word image 18 Changing Data Types on Large Tables: The INT to BIGINT Conundrum

During testing, I periodically updated the data in the BIGINT table using SSIS packages. For example, if the last import stopped at ID 6000, I would create the next SSIS package with > 6000. I used the cluster key to eliminate scanning the table each time for the most efficient transfer. I did this every day to keep the transferring of data time down. Listing 2 shows the query to use in the SSIS package for the Person table.

Listing 2: The transfer query

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

/****** Script for SelectTopNRows command from SSMS  ******/

SELECT[BusinessEntityID]

      ,[PersonType]

      ,[NameStyle]

      ,[Title]

      ,[FirstName]

      ,[MiddleName]

      ,[LastName]

      ,[Suffix]

      ,[EmailPromotion]

      ,[AdditionalContactInfo]

      ,[Demographics]

      ,[rowguid]

      ,[ModifiedDate]

  FROM[AdventureWorks2014].[Person].[Person]

  WHEREBusinessEntityID>6000

During testing, I also used Redgate’s SQL Data Compare after data transfers to verify that the data was copied exactly as expected.

Object level restore

The next step was testing the process on a separate staging server. I wanted to see if I could use object level recovery of the table into a database with a different name. To do this, I had to use a third-party SQL Server backup tool since object level recovery is not supported natively. I restored a fresh copy of AdventureWorks to the staging server, naming it AdventureWorksBIGINT. This represented the Production database in my test. I then restored the new table, PersonNEW, from the backup into the new staging database.

This was a smoke test to make sure that the same object level restore, from Development to Production would work exactly as expected. When restoring to Production, I restored the table using the object level recovery feature in my SQL Server backup tool.

Create a trigger to stop entries to the original table

During the switch over to the new table, changes to the data must be stopped. I used a trigger (Listing 3) to stop all change to the production table and stopped the application from being able to insert, update, or delete.

Listing 3: Trigger to stop changes

1

2

3

4

5

6

7

8

9

10

11

12

CREATETRIGGERtrReadOnly_PersonON[Person].[Person]

    INSTEADOFINSERT,

               UPDATE,

               DELETE

AS

BEGIN

    RAISERROR(‘Person table is read only.’,16,1)

    ROLLBACKTRANSACTION

END

GO

–DROP TRIGGER trReadOnly_Person


Swap out the old table for the new one

Now that both the original and replica tables were in the same database, the final step was to do a swap of the tables, swapping the indexes, constraints, table names, foreign keys, a trigger and a couple database permissions for denying access to certain columns. You can download the test object flip script for AdventureWorks at the bottom of this article, but I won’t show it here. Looking back, I did overcomplicate the index name flip as only the primary key was needed in my environment. Bear in mind not all indexes need to be changed since you could reuse the same name in two different tables.

Before I did the swap, I checked which rows were last added to the table, so when I did the last transfer, I knew what the most recent row should be. I used Redgate’s SQL Data Compare to make sure none of the rows were different.

What I also decided to do while I could amend the replica table as I pleased as it’s on our development environment, was to compress the table, saving around 200GB. Finally, I took a differential backup to make sure we had an up to date backup of the production database on hand, in case any issues occurred.

Everything looked good, so I ‘flicked the object flip switch’.

Once all objects are renamed, you can remove the trigger to reopen the table. Listing 4 shows how to check that you can access the table and entries are being made or can be made.

Listing 4: Insert check query

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

  INSERTINTOPerson.Person (  

             BusinessEntityID,

             PersonType,

             NameStyle,

             Title,

             FirstName,

             MiddleName,

             LastName,

             Suffix,

             EmailPromotion,

             AdditionalContactInfo,

             Demographics,

             rowguid,

             ModifiedDate

                              )

    SELECTTOP1  BusinessEntityID+1,

    PersonType,

    NameStyle,

    Title,

    FirstName,

    MiddleName,

    LastName,

    Suffix,

    EmailPromotion,

    AdditionalContactInfo,

    Demographics,

    NEWID(),

    ModifiedDateFROMPerson.Person

    ORDERBYBusinessEntityIDDESC

Check your tables you will see the Person table now has a BIGINT datatype!

Moving onto Production…

To my sureprise, my method worked like a charm. My boss Sjors Takes (blog | Simple-Talk articles) was impressed.

We ran a pilot in our Acceptance environment, which mimics our production set up, and it worked fine also.

When running on Acceptance and Production, the process was performed in the following steps:

  1. Restore a full database backup of the production database to a dev/test environment
  2. In the restored database, create the replica table with a BIGINT instead of INT
  3. Create the SSIS package and, with IDENTITY INSERT enabled, transfer the data
  4. Create all the indexes and constraints on the replica table
  5. Compress the new table (optional)
  6. Perform the object restore into the production database, keeping the table named as PersonNew .
  7. Update the PersonNew table periodically using SSIS packages to transfer data from a reporting instance in an availability group
  8. In the planned maintenance window, do one more SSIS transfer and then create the trigger to make the table read-only. We also shut down the applications that accesses this table
  9. Do a differential backup
  10. Do the table swap
  11. Check data consistency using SQL Data Compare
  12. Remove trigger and bring the APIs back online

This method cut down the downtime from a potential nine hours to less than fifteen minutes, and a lot of the intensive work was done away from the production instance. I did not see much impact on performance from the restore of the table using object level recovery.

Summary

There are many ways to make this and other datatype changes to production databases. The option you choose will often depend on the available downtime window you have available. The shorter that is, the more ‘creative’ you need to be with your solution, while always ensuring data integrity as the number one priority.

The method I’ve described minimized both downtime, and the potential to affect production server performance, since it allowed me to do most of the work to the table on a separate development instance.

Let’s block ads! (Why?)

SQL – Simple Talk

BigInt, Changing, Conundrum, data, large, Tables, types
  • Recent Posts

    • TripleBlind raises $8.2 million for its encrypted data science platform
    • Ba’al comes to CPAC, Ted Cruz jokes about his Cancun trip
    • Optimizing data migration/integration with Power Platform
    • AI Weekly: Biden calls for $37 billion to address chip shortage
    • NOT WHAT THEY MEANT BY “BUILDING ON THE BACKS OF….”
  • Categories

  • Archives

    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited