SQL SERVER How to deleting big volume of data from very huge table

Issue

We one of very huge database which running out of server disk space. I was asked to remove millions of historical data from some huge table. When I try to run DELETE with where condition directly, I got two problems. Firstly, whole table blocked for long time (long then 12 hours). Secondly, after run DELETE the ldf Database log file increased exponentially and the script ending with error because disk is full.

Solution

Base on the server’s situation, we may using different way to solve this issue.

1. We just want to Deleting All the data from the table then the best option will be Truncate table directly, as below:

TRUNCATE TABLE huge_history_table


This TRUNCATE TABLE command will remove all rows from a table, without logging the individual row deletions. TRUNCATE TABLE is similar to the DELETE statement with no WHERE clause; however, TRUNCATE TABLE is faster and uses fewer system and transaction log resources.

2. We want to deleting more than 80-90 Percent of the data from the huge table and this table can be offline for short period. We can using SELECT INTO script to move the data (data need to keep) to another staging table. Then Truncate this Large table and Insert back the staging data. as below:

SELECT * INTO staging_table
FROM huge_table
WHERE create_date>’DateAdd(month, -1, Convert(datetime, GetDate()))’


TRUNCATE TABLE huge_table


INSERT INTO huge_table (column_name)
SELECT column_name
FROM staging_table

3. One last option I can think of is to change your database’s Recovery Mode to SIMPLE and then delete rows in smaller batches using a while loop something like this. We want to delete large historical data from the table, the table need to be accessed by system 24×7 and cannot find a time take if offline. Then we delete rows in smaller batches using a while loop something as below:

DECLARE @Deleted_Rows INT;
SET @Deleted_Rows = 1;

WHILE (@Deleted_Rows > 0)
BEGIN
— Delete some small number of rows at a time
DELETE TOP (10000) LargeTable
WHERE create_date<‘DateAdd(month, -1, Convert(datetime, GetDate()))’

SET @Deleted_Rows = @@ROWCOUNT;
END

PowerShell : How to replace string in files

Issue

How can we replace text in multiple files and folders quickly if folder name and file name are dynamically change.

Solution

Using PowerShell built-in cmdlets we can easily read and replace any kind of content inside of a text file.

First, we can using Get-ChildItem to get all the files. Because the folder name and file name are dynamical then need to use wildcard in the path as below:

$files = Get-ChildItem ‘D:\test\*\files**.txt’ -Recurse

Then we need to loop the $files to get then content for each file and replace.

$files | %{
(gc $_ -Encoding UTF8) -replace ‘test’,” | Set-Content $_.fullname -Encoding UTF8
}

Why left join not returning nulls?

Issue

I am trying to get all products and category name if category is still active with below left join script. But I cannot get production information if category is not active. What’s wrong with this left join script?

select p.name, c.name
from products p
left join categories c on p.categoryid = c.categoryid
where c.isactive = 1 or c.isactive is null

Reason

To find out what’s the issue of this script have, we need to understand order of a SQL query. In the script above we used SELECT, FROM, LEFT JOIN and WHERE

  1. FROM and JOIN: The FROM clause, and subsequent JOINs are first executed to determine the total working set of data that is being queried
  2. WHERE: Once we have the total working set of data, the WHERE constraints are applied to the individual rows, and rows that do not satisfy the constraint are discarded.
  3. SELECT: Any expressions in the SELECT part of the query are finally computed.

The WHERE clause is execution after the ON clause. which means the predicates in the ON clause are applied to the table before the join, WHERE clause is semantically applied to the result of the join.

So below script will get all the data from product table and will keep categories data as null if cannot find category.

select p.name, c.name
from products p
left join categories c on p.categoryid = c.categoryid

WHERE clause is applied to the JOIN result which will remove products result if category missing.

where c.isactive = 1 or c.isactive is null

Solution

if we want to get all product information even if categories is missing or is not active, we need to changed the WHERE clause to ON clause as below:

select p.name, c.name
from products p
left join categories c on p.categoryid = c.categoryid
and (c.isactive = 1 or c.isactive is null )

SSIS Converting date to YYYYMMDD

SSIS Expression format date to YYYYMMDD

Mostly, when we create an auto file creation SSIS package we want to have timestamp in file name. To achieve this purpose, we can simply use below SSIS Expression to produce file name with current timestamp.

“D:\\OutPut\\Transactions_”
+ (DT_STR, 4, 1252) DATEPART(“yyyy” , GETDATE())
+ RIGHT(“0” + (DT_STR, 2, 1252) DATEPART(“mm” , GETDATE()), 2)
+ RIGHT(“0” + (DT_STR, 2, 1252) DATEPART(“dd” , GETDATE()), 2)
+ “.csv”

SSIS Expression format YYYYMMDD to Date

(DT_DATE)(SUBSTRING(@[yourDateAsStr],1,4)
+ “-” + SUBSTRING(@[yourDateAsStr],5,2)
+ “-” + SUBSTRING(@[yourDateAsStr],7,2))

Set SSIS Expression to format File Connection Manager Property