SSIS: Foreach File Enumerator variable folder path and file filter

When we use SSIS to import files to Database, there is a very comm issues for most of user. How we can dynamically set the folder path and file filters for Foreach File Enumerator in a Foreach Loop. There are lots of website mentioned how to import different type files to database, how to loop folders and most of those website are hard code folder path and file filters. But for one of my project, I need to dynamically loop the folders and pick up file dynamically per parameter.

It is quite easy to implement the feature if you know the property Directory of the Foreach File Enumerator. Now let us look at the steps:

1. Create two SSIS package parameters CSVFileFilter and FolderPathTest2.

Create two variables CSVFileLists and CSVFileName

2. Create a Foreach Loop Container task to the Control Flow panel, double click it to pop up the property window.

3. Switch to the Collection tab, choose the Enumerator as “Foreach File Enumerator”, expand Expressions and add two properties “Directory” for folder path and “FileSpec” for filtering specific types of files. Specify “@[$Package::FolderPathTest2]” to Directory and ” @[$Package::CSVFileFilter]” to FileSpec.

4. Mapping the variable

5. Click OK to finished Foreach Loop Container configuration

6. Add a Data Flow Task in the Foreach Loop Container

7. Add a Flat File Source inside the Data Flow Task,configure Flat File Source

8. Assign variable @[User::CSVFileName] ‘Flat File Connection Manager’ ConnectionString property

9. Add ADO NET Destination then we can dynamically import file to table now.

SQL performance tuning: variable in where condition cause performance issue

I am seeing a hug performance difference between two quires that are almost identical.

Query 1:

declare @type varchar(10) = ‘OR-‘
select * from client_details t1, Orders t2
where t1.client_code = @type +t2.client_code

Query 2:

select * from client_details t1, Orders t2
where t1.client_code = ‘OR-‘+t2.client_code

client_details has around 400 thousands records, Orders table has around 200 thousands.

Query 1 take long then 5 hours to processing, but Query 2 is finished within few minutes.

I did lots of search and testing, it seems SQL SERVER optimizer issue.

Please the difference of Literal Values and local variable from Kendra Little’s Blog:

https://www.brentozar.com/archive/2014/06/tuning-stored-procedures-local-variables-problems/

Solution to fix my issue, I created an additional column called type and then use the value from this new column:

select * from client_details t1, Orders t2
where t1.client_code = t2.OrderType +t2.client_code

SSIS: ScriptMain.CreateNewOutputRows error after copying script components

In my SSIS project, there are lots of packages used the similar code and logic. So I just copied the component and changed the things that need to be changed. Then I got a very weird error :

at ScriptMain.CreateNewOutputRows()
at UserComponent.PrimeOutput(Int32 Outputs, Int32[] OutputIDs, PipelineBuffer[] Buffers, OutputNameMap OutputMap)
at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PrimeOutput(Int32 outputs, Int32[] outputIDs, PipelineBuffer[] buffers)

After lots of search and investigation, I realized it’s SSIS component copy issue.

https://kohera.be/blog/sql-server/the-dangers-of-copying-script-components-in-ssis/

I followed Stefan’s steps and try to fix it. But it’s not working for mine. I recreated a new script component from the scratch.

There is ticket in Visual Studio Developer community as well, the last update at 06/02/2020 mentioned it’s a VS bug and still working on it.

https://developercommunity.visualstudio.com/content/problem/225039/ssis-copying-script-component-to-another-data-flow.html

SSIS: Reduce SSISDB database size

The SSISDB database is automatically created when we create SSIS catalog. All the SSIS projects, packages and execution history are saved in this database. So if you have a package that run every few minutes you could see the database size is growing exponentially. I deployed one project at AWS EC2 server with limited disk storage size (100GB SSD), the package was automatically executed every 3 minutes by SQL Agent Job. After one weeks later, I noticed the disk size almost full. After investigation, I notice the size of SSISDB is very big already. How can I control SSISDB to a minimum size?

SSIS created a maintenance job already when we create the catalog, it’s SIS Server Operation Records Maintenance.

Looking the job steps, you will notice it calls a stored procedure internal.cleanup_server_retention_window, in this stored procedure you will see it is using catalog.catalog_properties to do database maintain.

Now is solution is easy, I just need to update RETENTION_WINDOW property of catalog_properties .

exec catalog.configure_catalog RETENTION_WINDOW , 1

After this change, then just waiting for SSIS Maintenance Job to do database cleaning task.

If you don’t want to do this via T-SQL. You can right click on your catalog then click Property to change settings from Catalog Properties window:

After you reduced Database size, you maybe will notice the log file size is going up now. For the log file, if it’s not very critical system, you can used DBCC SHRINKFILE to shrink files directly:

USE SSISDB;
GO
–use below script to find ssisdb log file id
–SELECT file_id, name FROM sys.database_files;
— Truncate the log by changing the database recovery model to SIMPLE.
ALTER DATABASE SSISDB
SET RECOVERY SIMPLE;
GO
— Shrink the truncated log file to 1 MB.
DBCC SHRINKFILE (2, 1);
GO
— Reset the database recovery model.
ALTER DATABASE SSISDB
SET RECOVERY FULL;
GO

Best practice to run SSIS Package within store-procedure (T-SQL)

Once SSIS packages deployed to production server, there are several way to execute it. Execute it by SQL Job or performing it from SQL script.

For most of case in my project need to execute SSIS package from SQL procedure. There are few challenges come up when executing SSIS package from SQL procedure.

By default start_execution is async function. The SQL procedure move to the following script once SSIS Package is called. But some of case we need to the data which processed in SSIS Package . In this case, asynchronous execution should be used.

The syntax to add SYNCHRONIZED parameter is shown below:

EXEC [SSISDB].[catalog].[set_execution_parameter_value] @execution_id, @object_type = 50, @parameter_name = N’SYNCHRONIZED’, @parameter_value = 1

start_execution will return success message even if SSIS Package is failed. For most of case, the whole store procedure should be stopped once SSIS package execution failed. The workaround solution to solve this issue, we need to get package execution status flag from SSISDB then raise error. Code show as below:

IF 7 <> (SELECT [status] FROM [SSISDB].[catalog].[executions] WHERE execution_id = @execution_id)
RAISERROR(‘Json file import exception, please contact support team’, 16, 1)

Install DLL to GAC without Visual Studio

I was working on one SSIS project, everything was good till published to production server.

I was using Newtonsoft.Json in one of package and dll must installed to GAC, unfortunately production server don’t have Visual Studio. Therefore I cannot use gacutil.exe tool

Finally, I found Powershell can register dll to GAC as well.

[System.Reflection.Assembly]::Load(“System.EnterpriseServices,Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a”)
$publish = New-Object System.EnterpriseServices.Internal.Publish
$publish.GacInstall(“C:\Users\admin\Desktop\Newtonsoft.Json.dll”)