Saturday, March 14, 2015

How to use Fuzzy Grouping in SSIS

The Fuzzy Grouping Transformation can look through a list of similar text and group the results using the same logic as the Fuzzy Lookup. You can use these groupings in a transformation table to clean up source and destination data or to crunch fact tables into more meaningful results without altering the underlying data. The Fuzzy Group Transformation also expects an input stream of text, and it requires a connection to an OLE DB Data Source because it creates in that source a set of structures to use during analysis of the input stream.
Create a new SSIS project named Fuzzy Grouping Example. Drop a Data Flow Task on the
Control Flow design surface and click the Data Flow tab. Add a Flat File Connection to the Connection Manager. Add a Fuzzy Grouping Transformation to the Data Flow design surface. Connect the output of the Flat File Source to the Fuzzy Group.


Open the Fuzzy Grouping Editor and set the OLE DB Connection Manager to a new AdventureWorksDW connection.


In the Columns tab, select the Title column in the Available Input Columns. Accept the other defaults.


In the Advanced tab, set the Similarity threshold to 0.50. This will be your starting point for similarity comparisons


The output shown at various similarity thresholds would look similar to below image.


Cheers!
Uma

1 comment: