In this blog, I would like to
share or republish a article about "Pushdown Optimization" which is a new terminology in ETL. I found that the following article gives much more clear explanation about Pushdown Optimization.
Pushdown Optimization which is a
new concept in Informatica PowerCentre, allows developers to balance data transformation
load among servers. This article describes pushdown techniques.
What is Pushdown Optimization?
Pushdown
optimization is a way of load-balancing among servers in order to achieve
optimal performance. Veteran ETL developers often come across issues when they
need to determine the appropriate place to perform ETL logic. Suppose an ETL
logic needs to filter out data based on some condition. One can either do it in
database by using WHERE condition in the SQL query or inside Informatica by
using Informatica Filter transformation.
Sometimes,
we can even "push" some transformation logic to the target database
instead of doing it in the source side (Especially in the case of EL-T rather
than ETL). Such optimization is crucial for overall ETL performance.
How does Push-Down
Optimization work?
One can push transformation logic to the source or target
database using pushdown optimization. The Integration Service translates the
transformation logic into SQL queries and sends the SQL queries to the source
or the target database which executes the SQL queries to process the
transformations. The amount of transformation logic one can push to the
database depends on the database, transformation logic, and mapping and session
configuration. The Integration Service analyzes the transformation logic it can
push to the database and executes the SQL statement generated against the
source or target tables, and it processes any transformation logic that it
cannot push to the database.
Using Pushdown
Optimization
Use the Pushdown Optimization Viewer to preview
the SQL statements and mapping logic that the Integration Service can push to
the source or target database. You can also use the Pushdown Optimization
Viewer to view the messages related to pushdown optimization.
Let us take an example:
Filter Condition used in this mapping is: DEPTNO>40
Suppose a mapping contains a Filter transformation that filters
out all employees except those with a DEPTNO greater than 40. The Integration
Service can push the transformation logic to the database. It generates the following
SQL statement to process the transformation logic:
INSERT INTO EMP_TGT(EMPNO, ENAME, SAL, COMM, DEPTNO)
SELECT
EMP_SRC.EMPNO,
EMP_SRC.ENAME,
EMP_SRC.SAL,
EMP_SRC.COMM,
EMP_SRC.DEPTNO
FROM EMP_SRC
WHERE (EMP_SRC.DEPTNO >40)
The Integration Service generates an INSERT SELECT statement and
it filters the data using a WHERE clause. The Integration Service does not
extract data from the database at this time.
We can configure pushdown optimization in the following ways:
Using source-side
pushdown optimization:
The Integration Service pushes as much transformation logic as
possible to the source database. The Integration Service analyzes the mapping
from the source to the target or until it reaches a downstream transformation
it cannot push to the source database and executes the corresponding SELECT
statement.
Using target-side
pushdown optimization:
The Integration Service pushes as much transformation logic as
possible to the target database. The Integration Service analyzes the mapping
from the target to the source or until it reaches an upstream transformation it
cannot push to the target database. It generates an INSERT, DELETE, or UPDATE
statement based on the transformation logic for each transformation it can push
to the database and executes the DML.
Using full pushdown
optimization:
The Integration Service pushes as much transformation logic as
possible to both source and target databases. If you configure a session for
full pushdown optimization, and the Integration Service cannot push all the
transformation logic to the database, it performs source-side or target-side
pushdown optimization instead. Also the source and target must be on the same
database. The Integration Service analyzes the mapping starting with the source
and analyzes each transformation in the pipeline until it analyzes the target.
When it can push all transformation logic to the database, it
generates an INSERT SELECT statement to run on the database. The statement
incorporates transformation logic from all the transformations in the mapping.
If the Integration Service can push only part of the transformation logic to
the database, it does not fail the session, it pushes as much transformation
logic to the source and target database as possible and then processes the
remaining transformation logic.
Please read the full article in the below link:
http://dwbi.org/etl/informatica/162-pushdown-optimization-in-informatica