Saturday, March 14, 2015

How to use Unpivot Transformation in SSIS

The Unpivot Transformation is a way to accomplish this business requirement. In this example, you’ll be shown how to use the Unpivot Transformation to create rows in the Data Flow from columns and shown how it is the opposite of the Pivot Transformation.


Package design
Source File
Unpivot settings
Execute the package. You’ll see that the Valid Customer output goes to the customer table, and the NULL data condition is just thrown out.



Cheers!
Uma

How to use Fuzzy Grouping in SSIS

The Fuzzy Grouping Transformation can look through a list of similar text and group the results using the same logic as the Fuzzy Lookup. You can use these groupings in a transformation table to clean up source and destination data or to crunch fact tables into more meaningful results without altering the underlying data. The Fuzzy Group Transformation also expects an input stream of text, and it requires a connection to an OLE DB Data Source because it creates in that source a set of structures to use during analysis of the input stream.
Create a new SSIS project named Fuzzy Grouping Example. Drop a Data Flow Task on the
Control Flow design surface and click the Data Flow tab. Add a Flat File Connection to the Connection Manager. Add a Fuzzy Grouping Transformation to the Data Flow design surface. Connect the output of the Flat File Source to the Fuzzy Group.


Open the Fuzzy Grouping Editor and set the OLE DB Connection Manager to a new AdventureWorksDW connection.


In the Columns tab, select the Title column in the Available Input Columns. Accept the other defaults.


In the Advanced tab, set the Similarity threshold to 0.50. This will be your starting point for similarity comparisons


The output shown at various similarity thresholds would look similar to below image.


Cheers!
Uma

How to use Fuzzy Lookup in SSIS

The Fuzzy Lookup transformation performs data cleaning tasks such as standardizing data, correcting data, and providing missing values.
The following example show that how to use it for Lookup for bad data.
The package design:
Flat file source
Lookup Table


Flat File Source connection
FuzzyLookup configuration: Open the Fuzzy Lookup Transformation Editor. Set the OLE DB Connection Manager in the Reference tab to use the AdventureWorksDW database connection and the Occupation table. Set up the Columns tab by connecting the input to the reference table columns as in below image, dragging the Title column to the occupationLabel column on the right. Set up the Advanced tab with a Similarity threshold of 50 (0.50).

Output:
The output will be something similar to below image. You can compare the source value and the lookup value and the other Fully Lookup related columns.

Cheers!
Uma

How to use Audit component in SSIS

The Audit Transformation allows you to add auditing data to your Data Flow
The task is easy to configure. For example, to track what task inserted data into the table, you can add those columns to the Data Flow path with this transformation. The functionality in the Audit Transformation can be achieved with a Derived Column Transformation, but the Audit Transformation provides an easier interface.
Following are some of the available options:
  • Execution instance GUID: GUID that identifies the execution instance of the package
  • Package ID: Unique ID for the package
  • Package name: Name of the package
  • Version ID: Version GUID of the package
  • Execution start time: Time the package began
  • Machine name: Machine on which the package ran
  • User name: User who started the package
  • Task name: Data Flow Task name that holds the Audit Task
  • Task ID: Unique identifier for the Data Flow Task that holds the Audit Task


Select Audit column will append in the Dataflow.
Cheers!
Uma

How to use tFileProperties in Talend

tFileProperties obtains information about the main properties of a defined file
A schema is a row description, it defines the fields to be processed and passed on to the next component.
The schema of this component is read-only. It describes the main properties of the specified file. You can click the [...] button next to Edit schema to view the predefined schema which contains the following fields:
  • abs_path: the absolute path of the file.
  • dirname: the directory of the file.
  • basename: the name of the file.
  • mode_string: the access mode of the file, r and w for read and write permissions respectively.
  • size: the file size in bytes.
  • mtime: the timestamp indicating when the file was last modified, in milliseconds that have elapsed since the Unix epoch (00:00:00 UTC, Jan 1, 1970).
  • mtime_string: the date and time the file was last modified.
output_row.abs_path = input_row.abs_path;
output_row.dirname = input_row.dirname;
output_row.basename = input_row.basename;
output_row.mode_string = input_row.mode_string;
output_row.size = input_row.size;
output_row.mtime = input_row.mtime;
output_row.mtime_string = context.Timestamp;

Cheers!
Uma

Monday, March 9, 2015

How to find / search / dependency analysis Object and Text in MSSQL Server

This article covers how to find or search object and text in MSSQL Server database. I have recently found that there is a free tool/ SSMS add-in to perform this task very effectively. Before that I have many set of complex script to perform each and every task separately and however dependency analysis was not possible via T-SQL. There are can be many tools or add-in, this is article I am covering ApexSQL Search, a free SQL Server Management Studio for SQL Server database object and text searching and for dependency analysis. Even system objects can be searched.


You can download the most recent version of ApexSQL Search from this link and play.
Once you installed, to use, right click the database or any other object from the Object Explorer, you can see the ApexSQL Search menu.
Database Text Search
Database Object Search
View Dependencies

Edit Extended Properties
Cheers! Uma

Friday, March 6, 2015

How to use MultiFlatFile Connection Manager in SSIS

The Multiple Flat File Connection Manager gives you the same view as a Flat File Connection Manager, but it enables you to point to multiple files. In either case, you can point to a list of files by placing a vertical bar (|) between each filename.




In the Data Flow, the Multiple Flat File Connection Manager reacts by combining the total number of records from all the files that you have pointed to, appearing like a single merged file.


Using this option will initiate the Data Flow process only once for the files whereas the Foreach Loop container will initiate the process once per file being processed. In either case, the metadata from the file must match in order to use them in the Data Flow.

Cheers!
Uma

How to use Bulk Insert Task in SSIS

The Bulk Insert Task enables you to insert data from a text or flat file into a SQL Server database table in the same high-octane manner as using a BULK INSERT statement or the bcp.exe commandline tool.

In this example shows that how to load data from csv file
Source File:
Destination Table
CREATE TABLE PROSSIS_ZIPCODE (
ZipCode CHAR(5),
State CHAR(2),
ZipName VARCHAR(16)
)


Configuration
*** The Bulk Insert Task does not log error-causing rows. If you want bad records to be written to an error file or table, it’s better to use the Data Flow Task.


Cheers!

Uma

How to zip and unzip / compress and uncompressed in SSIS using 7ZIP

The Execute Process Task will execute a Windows or console application inside of the Control Flow. You’ll find great uses for this task to run command-line-based programs and utilities prior to performing other ETL tasks. The most common example would have to be unzipping packed or encrypted data files with a command-line tool.


Famous example is zip and unzip the files. In this example, shows using 7Zip tool how to zip and unzip the files in SSIS.


Compress files or folders in SSIS

Executable: C:\Program Files\7-Zip\7z.exe
Arguments: a "C:\demo\doc.7z" "C:\demo\doc" (This can be construct dynamically using the variable and script task)


Uncompress files or folders in SSIS


Executable: C:\Program Files\7-Zip\7z.exe
Arguments: e "C:\demo\doc.7z"
Working Directory: C:\demo\doc

You can specify the password and other options

e @[User::FullPathtoZip] -o@[User::TargetFolder]-pPASSWORD -aOVERWRITE
Ex: e C:\demo\myfiles.zip -oC:\demo\extract\*.* -pXXXXXXXX -aoa

Overwrite options are
-aoa Overwrite all destination files.
-aos Skip over existing files without overwriting. Use this for files where the earliest version is most important.
-aou Avoid name collisions.

Please refer for further details
https://www.dotnetperls.com/7-zip-examples

Cheers!
Uma

SQL Server Data Type VS SSIS Connection Manager Data Type

The following table contains some of the common SQL Server data types and what they are mapped into in a Flat File Connection Manager.

Cheers!
Uma

Wednesday, March 4, 2015

How to use Data Profiling Task in SSIS 2014

Data profiling is the process of examining data and collecting metadata about the quality of the data, about frequency of statistical patterns, interdependencies, uniqueness, and redundancy.
The Data Profiling Task is located in the SSIS Toolbox.


In this example, DimCustomer table used for profiling. First you need to create output file, using file connection as shown below.


There are two ways to activate these profiles. The first is to click the Quick Profile button on the Data Profiling Task Editor. This creates a set of profiles to run against the same table. Note that the ConnectionManager property must be set to an ADO.NET-based Connection Manager, like the one here connected to AdventureWorksDW




You can also skip the quick profile option and create the profiles one by one. Either way you can navigate to the Profile Requests table to configure the request and add regular expressions or other parameter values to the task properties.
The task provides a set of defined profile request types that can be modified like the other tasks in specific properties. The following list describes the different request types and how you can use them to profile your data:
  • Candidate Key Profile Request: The profile request will examine a column or set of columns to determine the likelihood of there being a unique candidate key for the data set. Use this to determine whether you have duplicate key values or whether it is possible to build a natural key with the data.
  • Column Length Distribution Profile: This profile request enables you to analyze the statistical profile of all the data in a column, with the percentage of incidence for each length. You can use this to help you determine whether your data column length settings are set correctly or to look for bad data in attributes that are known to be one fixed size.
  • Column Null Ratio Profile Request: This profile request looks at the ratio of NULL values in a column. Use this to determine whether you have a data quality problem in your source system for critical data elements.
  • Column Pattern Profile Request: This profile request enables you to apply regular expressions to a string column to determine the pass/fail ratio across all the rows. Use this to evaluate business data using business formatting rules.
  • Column Statistics Profile Request: This profile request can analyze all the rows and provide statistical information about the unique values across the entire source. This can help you find low incidence values that may indicate bad data. For example, a finding of only one color type in a set of 1 million rows may indicate that you have a bad color attribute value.
  • Functional Dependency Profile Request: This is one of two profile requests that enable you to examine relationships between tables and columns to look for discrepancies within a known dependency. For example, you can use this request to find countries with incorrect currency codes.
  • Value Inclusion Profile Request: This profile request tests to determine whether the values in one column are all included in a separate lookup or dimension table. Use this to test foreign key relationships.
Once you run, the output file will be created something same as below
To access this viewer, select SQL Server Integration Services from the Start menu. Once the tool is loaded, use the Open button to browse to the output file that will be generated by the Data Profiling Task.
You can view the structured output file that is produced by the Data Profiling Task in a special Data Profiler Viewer that provides drill-downs back to the detail level.



Cheers! Uma