PowerCenter: April 2016

In short... yes, to my great surprise. My personal argumentation against it was (sadly wrong):

Number of sort operations: On sort takes place in any case. Either in AGG or in SRT, for the case the sorted input option of the AGG is switched on. -> no advantage for the SRT-AGG-combination
Data flow: An AGG can forward a data group one by one only, if it gets pre-sorted data and the according switch is active (SRT-AGG-Kombination), otherwise it forwards all data at the same time, after having processed the last incoming record (AGG only). Later behavior is the same for SRT, such that down stream processing waits until the last record has passed such element. -> no advantage for the SRT-AGG-combination
The SRT is an additional element, and data needs to be transported to and from. -> disadvantage for SRT-AGG-combination if the order of the data does not get used down stream again

However, go for the Oracle sorting if you can. I do not know whether other databases also sort as efficiently.

Setup

Test case code

A) Pure aggregator

B) Sorter in front of aggregator

C) As A but the source qualifier delegates sorting to the database

Results

Interpretation

Wow, Oracle sorts it ways faster that PowerCenter!
The first couple of thounds of rows the pure aggregator was faster but butchered by the sorter solution even with only 20'000 records.
I have not the faintest idea why Informatica does not use the sorting mechanism of the Sorter in the aggregator as well being that much faster. I guess if you have an application for an aggregator to crunch on only a couple of thousands of records you do not care about the aggregator speed anyway.

What is the effiency of different methods?

Setup

Test case code

A) Direct comparison between the payload attribute pairs

DECODE(
TRUE,
ISNULL(PK_THIEMO_MANY_STRINGS) AND NOT(ISNULL(PK_THIEMO_MANY_STRINGS1)), 'New',
NOT(ISNULL(PK_THIEMO_MANY_STRINGS)) AND ISNULL(PK_THIEMO_MANY_STRINGS1), 'Deleted',
-- it is not possible in joining to get both join attribute tuple being null if at least one attribute out of them is not nullable
STRING_1 != STRING_110 OR
STRING_2 != STRING_21 OR
STRING_3 != STRING_31 OR
STRING_4 != STRING_41 OR
STRING_5 != STRING_51 OR
STRING_6 != STRING_61 OR
STRING_7 != STRING_71 OR
STRING_8 != STRING_81 OR
STRING_9 != STRING_91 OR
STRING_10 != STRING_101 OR
STRING_11 != STRING_111 OR
STRING_12 != STRING_121 OR
STRING_13 != STRING_131 OR
STRING_14 != STRING_141 OR
STRING_15 != STRING_151 OR
STRING_16 != STRING_161 OR
STRING_17 != STRING_171 OR
STRING_18 != STRING_181 OR
STRING_19 != STRING_191 OR
STRING_20 != STRING_201, 'Changed',
'Unchanged'
)

B) As A but concatenate the payload attributes with ; as delimiter first and compare those strings

v_compare_string_1 := STRING_1 || ';' ||
STRING_2 || ';' ||
STRING_3 || ';' ||
STRING_4 || ';' ||
STRING_5 || ';' ||
STRING_6 || ';' ||
STRING_7 || ';' ||
STRING_8 || ';' ||
STRING_9 || ';' ||
STRING_10 || ';' ||
STRING_11 || ';' ||
STRING_12 || ';' ||
STRING_13 || ';' ||
STRING_14 || ';' ||
STRING_15 || ';' ||
STRING_16 || ';' ||
STRING_17 || ';' ||
STRING_18 || ';' ||
STRING_19 || ';' ||

STRING_20

v_compare_string_2 := STRING_110 || ';' ||
STRING_21 || ';' ||
STRING_31 || ';' ||
STRING_41 || ';' ||
STRING_51 || ';' ||
STRING_61 || ';' ||
STRING_71 || ';' ||
STRING_81 || ';' ||
STRING_91 || ';' ||
STRING_101 || ';' ||
STRING_111 || ';' ||
STRING_121 || ';' ||
STRING_131 || ';' ||
STRING_141 || ';' ||
STRING_151 || ';' ||
STRING_161 || ';' ||
STRING_171 || ';' ||
STRING_181 || ';' ||
STRING_191 || ';' ||

STRING_201

DECODE(

TRUE,
ISNULL(PK_THIEMO_MANY_STRINGS) AND NOT(ISNULL(PK_THIEMO_MANY_STRINGS1)), 'New',
NOT(ISNULL(PK_THIEMO_MANY_STRINGS)) AND ISNULL(PK_THIEMO_MANY_STRINGS1), 'Deleted',
-- it is not possible in joining to get both join attribute tuple being null if at least one attribute out of them is not nullable
v_compare_string_1 != v_compare_string_2, 'Changed',
'Unchanged'

)

C) As B but calculate the md5 hash of the strings and compare these values instead

v_compare_string_1 := MD5(
STRING_1 || ';' ||
STRING_2 || ';' ||
STRING_3 || ';' ||
STRING_4 || ';' ||
STRING_5 || ';' ||
STRING_6 || ';' ||
STRING_7 || ';' ||
STRING_8 || ';' ||
STRING_9 || ';' ||
STRING_10 || ';' ||
STRING_11 || ';' ||
STRING_12 || ';' ||
STRING_13 || ';' ||
STRING_14 || ';' ||
STRING_15 || ';' ||
STRING_16 || ';' ||
STRING_17 || ';' ||
STRING_18 || ';' ||
STRING_19 || ';' ||

STRING_20

)

D) As B but use chr(31) as delimiter instead of ;

v_compare_string_1 := STRING_1 || chr(31) ||
STRING_2 || chr(31) ||
STRING_3 || chr(31) ||
STRING_4 || chr(31) ||
STRING_5 || chr(31) ||
STRING_6 || chr(31) ||
STRING_7 || chr(31) ||
STRING_8 || chr(31) ||
STRING_9 || chr(31) ||
STRING_10 || chr(31) ||
STRING_11 || chr(31) ||
STRING_12 || chr(31) ||
STRING_13 || chr(31) ||
STRING_14 || chr(31) ||
STRING_15 || chr(31) ||
STRING_16 || chr(31) ||
STRING_17 || chr(31) ||
STRING_18 || chr(31) ||
STRING_19 || chr(31) ||

STRING_20

E) As D but make the delimiter be provided over mapping parameter and parameter file

v_compare_string_1 :=

STRING_1 || $$delimiter ||

STRING_2 || $$delimiter ||

STRING_3 || $$delimiter ||

STRING_4 || $$delimiter ||

STRING_5 || $$delimiter ||

STRING_6 || $$delimiter ||

STRING_7 || $$delimiter ||

STRING_8 || $$delimiter ||

STRING_9 || $$delimiter ||

STRING_10 || $$delimiter ||

STRING_11 || $$delimiter ||

STRING_12 || $$delimiter ||

STRING_13 || $$delimiter ||

STRING_14 || $$delimiter ||

STRING_15 || $$delimiter ||

STRING_16 || $$delimiter ||

STRING_17 || $$delimiter ||

STRING_18 || $$delimiter ||

STRING_19 || $$delimiter ||

STRING_20

F) First detect new/old/others and afterwards detect from the others only the ones with changes as it is done in E

IIF(
STRING_1 || $$delimiter ||
STRING_2 || $$delimiter ||
STRING_3 || $$delimiter ||
STRING_4 || $$delimiter ||
STRING_5 || $$delimiter ||
STRING_6 || $$delimiter ||
STRING_7 || $$delimiter ||
STRING_8 || $$delimiter ||
STRING_9 || $$delimiter ||
STRING_10 || $$delimiter ||
STRING_11 || $$delimiter ||
STRING_12 || $$delimiter ||
STRING_13 || $$delimiter ||
STRING_14 || $$delimiter ||
STRING_15 || $$delimiter ||
STRING_16 || $$delimiter ||
STRING_17 || $$delimiter ||
STRING_18 || $$delimiter ||
STRING_19 || $$delimiter ||
STRING_20
!=
STRING_110 || $$delimiter ||
STRING_21 || $$delimiter ||
STRING_31 || $$delimiter ||
STRING_41 || $$delimiter ||
STRING_51 || $$delimiter ||
STRING_61 || $$delimiter ||
STRING_71 || $$delimiter ||
STRING_81 || $$delimiter ||
STRING_91 || $$delimiter ||
STRING_101 || $$delimiter ||
STRING_111 || $$delimiter ||
STRING_121 || $$delimiter ||
STRING_131 || $$delimiter ||
STRING_141 || $$delimiter ||
STRING_151 || $$delimiter ||
STRING_161 || $$delimiter ||
STRING_171 || $$delimiter ||
STRING_181 || $$delimiter ||
STRING_191 || $$delimiter ||
STRING_201
, 'Changed'
, 'Unchanged'
)

Test set 1

2'000'000 records compared to 2'000'000 records. The second set contains about 1'500'000 changes in the payload and of 50 the primary key was changed such there are 50 detected as new and 50 as old.

Test set 2

2'000'000 records compared to 2'000'000 records. The second set contains about 50 changes in the payload and of 1'500'000 the primary key was changed such there are 1'500'000 detected as new and 1'500'000 as old.

Results

Interpretation

First run takes more time probably because the table was not in the Oracle cache yet
The direct comparison (or-ing the comparison of the single attribute pairs) seems to cost a lot. There problem of NULL handling is not taken care of and would cost even more time! The concatation solutions do an implicit NULL handling by converting it to the empty string.
Using chr(31) hard coded is not a problem. Possibly because on instantiation time it already is replaced by its value like a constant. To try to force PoC do it with the usage of a parameter does not help. On the contrary, somehow it slows it down.
I cannot imagine why the use of ; instead of chr(31) is less efficient. And having said that there is not justification for increased danger of finding false matches if string port lists can easily contain ; in its values.
The use of the md5 hash function to encapsulate costs quite a bit of time. If one does not need the value afterwards I cannot see a point using it for a comparison. Every hashing implies a loss of information and therewith the danger of falsly found matches (collision). If the hash would be stored along the actual records for the later use in the comparison, I presume, would cut the execution time loss only by half. I would not run the risk of collision however.
The overhead of shifting data through more transformations for the change case is only made up with really few new/old records compared to the number of changes/no-changes. In the test setup the ratio was 0.00005. Real world? Rather not!

PowerCenter

Seiten

Mittwoch, 27. April 2016

Is it worth to put a sorter in front of an aggregator to use sorted input in the latter no matter what?

Setup

A) Pure aggregator

B) Sorter in front of aggregator

C) As A but the source qualifier delegates sorting to the database

Results

Interpretation

Efficient change detection

Setup

A) Direct comparison between the payload attribute pairs

B) As A but concatenate the payload attributes with ; as delimiter first and compare those strings

C) As B but calculate the md5 hash of the strings and compare these values instead

D) As B but use chr(31) as delimiter instead of ;

E) As D but make the delimiter be provided over mapping parameter and parameter file

F) First detect new/old/others and afterwards detect from the others only the ones with changes as it is done in E

Test set 1

Test set 2

Results

Interpretation

Donnerstag, 21. April 2016

Cross-grained error 7023 ... error constructing sql statement for target table

Inexistent session parameter makes workflow fail