Tuesday, September 29, 2015

Save File data to cache memory using tFileFetch in Talend

In tFileFetch select the Use cache to save the resource check box to add your file data to the cache memory. This option allows you to use the streaming mode to transfer the data. In this case you don’t need provide destination filename and location. You can use ((java.io.InputStream)globalMap.get("tFileFetch_1_INPUT_STREAM")) variable to access the file data.
https://help.talend.com/download/attachments/9311144/Use_Case_tFileInputDelimited2_2.png?version=1&modificationDate=1355444870000&api=v2
For example, my tFileInputDelimited component uses a tFileFetch as its source and the File name/Stream field looks like this. Similar way use can use for any file format such as XML, JSON.
((java.io.InputStream)globalMap.get("tFileFetch_1_INPUT_STREAM"))
https://help.talend.com/download/attachments/9311144/Use_Case_tFileInputDelimited2_3.png?version=1&modificationDate=1355444871000&api=v2

Cheers!
Uma

How to exclude some files in tFileList in Talend


Advanced settings: Use Exclude Filemask

Select this check box to enable Exclude Filemask field to exclude filtering condition based on file type:
Exclude Filemask: Fill in the field with file types to be excluded from the Filemasksin the Basic settings view (File types in this field should be quoted with double quotation marks and seperated by comma).

Cheers!
Uma

Friday, September 25, 2015

Oracle and Datanomic

Datanomic is one of the old ETL tool. Oracle has acquired Datanomic, a leading provider of customer data quality software and related applications for risk and compliance screening. Datanomic technology combined with Oracle Product Data Quality is expected to deliver the most complete data quality solution to reduce the cost and complexity of managing data across our customers' businesses.
The following steps shows that how to create a Datanomic job.
Create a Staged data for Source
Create a Processes, in Datanomic most of the processes are done by java script. You can use the help link to find a command.
Create a jobs using tasks. If you want to save the data from staging you need to Export.
Cheers! Uma

Thursday, September 24, 2015

How to Pass context values from a parent Job to a child Job in Talend

In parent package set context values and make sure you have checked "Transmit whole context"
Select this check box to get all the context variables from the parent Job. Deselect it to get all the context variables from the child Job.
If this check box is selected when the parent and child Jobs have the same context variables defined:
  •  Variable values for the parent Job will be used during the child Job execution if no relevant values are defined in the Context Param table.
  • Otherwise, values defined in the Context Param table will be used during the child Job execution.

Cheers!
Uma

Monday, September 21, 2015

What is Pushdown Optimization?

In this blog, I would like to share or republish a article about "Pushdown Optimization"  which is a new terminology in ETL. I found that the following article gives much more clear explanation about Pushdown Optimization.

Pushdown Optimization which is a new concept in Informatica PowerCentre, allows developers to balance data transformation load among servers. This article describes pushdown techniques.

What is Pushdown Optimization?
Pushdown optimization is a way of load-balancing among servers in order to achieve optimal performance. Veteran ETL developers often come across issues when they need to determine the appropriate place to perform ETL logic. Suppose an ETL logic needs to filter out data based on some condition. One can either do it in database by using WHERE condition in the SQL query or inside Informatica by using Informatica Filter transformation.
Sometimes, we can even "push" some transformation logic to the target database instead of doing it in the source side (Especially in the case of EL-T rather than ETL). Such optimization is crucial for overall ETL performance.
How does Push-Down Optimization work?
One can push transformation logic to the source or target database using pushdown optimization. The Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the source or the target database which executes the SQL queries to process the transformations. The amount of transformation logic one can push to the database depends on the database, transformation logic, and mapping and session configuration. The Integration Service analyzes the transformation logic it can push to the database and executes the SQL statement generated against the source or target tables, and it processes any transformation logic that it cannot push to the database.
Using Pushdown Optimization
Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related to pushdown optimization.
Let us take an example:
Filter Condition used in this mapping is: DEPTNO>40
Suppose a mapping contains a Filter transformation that filters out all employees except those with a DEPTNO greater than 40. The Integration Service can push the transformation logic to the database. It generates the following SQL statement to process the transformation logic:
INSERT INTO EMP_TGT(EMPNO, ENAME, SAL, COMM, DEPTNO)     
SELECT
EMP_SRC.EMPNO,
EMP_SRC.ENAME,
EMP_SRC.SAL,
EMP_SRC.COMM,
EMP_SRC.DEPTNO
FROM EMP_SRC
WHERE (EMP_SRC.DEPTNO >40)
The Integration Service generates an INSERT SELECT statement and it filters the data using a WHERE clause. The Integration Service does not extract data from the database at this time.
We can configure pushdown optimization in the following ways:
Using source-side pushdown optimization:
The Integration Service pushes as much transformation logic as possible to the source database. The Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the source database and executes the corresponding SELECT statement.
Using target-side pushdown optimization:
The Integration Service pushes as much transformation logic as possible to the target database. The Integration Service analyzes the mapping from the target to the source or until it reaches an upstream transformation it cannot push to the target database. It generates an INSERT, DELETE, or UPDATE statement based on the transformation logic for each transformation it can push to the database and executes the DML.
Using full pushdown optimization:
The Integration Service pushes as much transformation logic as possible to both source and target databases. If you configure a session for full pushdown optimization, and the Integration Service cannot push all the transformation logic to the database, it performs source-side or target-side pushdown optimization instead. Also the source and target must be on the same database. The Integration Service analyzes the mapping starting with the source and analyzes each transformation in the pipeline until it analyzes the target.
When it can push all transformation logic to the database, it generates an INSERT SELECT statement to run on the database. The statement incorporates transformation logic from all the transformations in the mapping. If the Integration Service can push only part of the transformation logic to the database, it does not fail the session, it pushes as much transformation logic to the source and target database as possible and then processes the remaining transformation logic.
Please read the full article in the below link:

http://dwbi.org/etl/informatica/162-pushdown-optimization-in-informatica

Friday, September 18, 2015

How to rollback Query on Oracle SQL Developer

You don’t need to use AUTOCOMMITED OFF; START TRANSACTION; in SQL Developer. All you need is after running the script just run ROLLBACK / COMMIT command, this will support insert/delete/update. Because by default in SQL Developer will not do auto commit.

If you give 
rollback;
Then the whole transactions is roll backed.

If you give 
commit;
Then the whole transaction is committed and all savepoints are removed.

If you give,
rollback to a; 
Then rollback up to the specified savepoints.
  
Example
insert into emp (empno,ename,sal) values (109,’Sami’,3000);
savepoint a;
insert into dept values (10,’Sales’,’Hyd’);
savepoint b;
insert into salgrade values (‘III’,9000,12000);

Cheers!

Uma

Thursday, September 17, 2015

How to download multiple files at same time (parallel) using tFileFetch in Talend

You can download files parallel using tFlowToInterate component. Click on the Iterate link between tFlowToIterate and tFechFile, Enable the check box for Enable parallel execution and set the required parallel download. This Number of parallel execution should be decide based on environment such as internet bandwidth.


Cheers!
Uma

Wednesday, September 9, 2015

Type of Fact table

In this blog, I write about Type of Fact table and brief explanations.
Additive – that can be summed up through all of the dimensions in the fact table
Example for Additive
FactSales
DateKey
StoreKey
ProductKey
SalesAmount


Semi-Additive – that can be summed up for some of the dimensions in the fact table, but not others.
Example for Semi-Additive, in this fact table current balance for each account at the end of the day, so that current balance cannot be summed for Datekey, but adding for accountkey is meaning full.
FactAccountBalance
DateKey
AccountKey
CurrentBalance
ProfitMargin


Non-Additive – that cannot be summed for any of the dimension in the fact table
For example, in the below fact table profit margin is non additive, it doesn’t make sense to add them up for accountkey or datekey
FactAccountBalance
DateKey
AccountKey
CurrentBalance
ProfitMargin


Another way of categorizing Data Warehouse Types/Fact table types
Cumulative Data Warehouse:  This type of fact table describes what has happened over a period of time. For example, tis fact table may describe the total sales by product by store by day. The fact for this type of fact tables are mostly additive facts.
FactSales
DateKey
StoreKey
ProductKey
SalesAmount


Snapshot Data Warehouse: This type of fact able describes the state of things in a particular instance of time, usually includes more semi-additive and non-additive facts. Mostly used by Banks and Insurance domain data warehouse designs.
FactAccountBalance
DateKey
AccountKey
CurrentBalance
ProfitMargin

Cheers!
Uma