Setup
Test case codeA) Direct comparison between the payload attribute pairs
DECODE(TRUE,
ISNULL(PK_THIEMO_MANY_STRINGS) AND NOT(ISNULL(PK_THIEMO_MANY_STRINGS1)), 'New',
NOT(ISNULL(PK_THIEMO_MANY_STRINGS)) AND ISNULL(PK_THIEMO_MANY_STRINGS1), 'Deleted',
-- it is not possible in joining to get both join attribute tuple being null if at least one attribute out of them is not nullable
STRING_1 != STRING_110 OR
STRING_2 != STRING_21 OR
STRING_3 != STRING_31 OR
STRING_4 != STRING_41 OR
STRING_5 != STRING_51 OR
STRING_6 != STRING_61 OR
STRING_7 != STRING_71 OR
STRING_8 != STRING_81 OR
STRING_9 != STRING_91 OR
STRING_10 != STRING_101 OR
STRING_11 != STRING_111 OR
STRING_12 != STRING_121 OR
STRING_13 != STRING_131 OR
STRING_14 != STRING_141 OR
STRING_15 != STRING_151 OR
STRING_16 != STRING_161 OR
STRING_17 != STRING_171 OR
STRING_18 != STRING_181 OR
STRING_19 != STRING_191 OR
STRING_20 != STRING_201, 'Changed',
'Unchanged'
)
B) As A but concatenate the payload attributes with ; as delimiter first and compare those strings
v_compare_string_1 := STRING_1 || ';' ||STRING_2 || ';' ||
STRING_3 || ';' ||
STRING_4 || ';' ||
STRING_5 || ';' ||
STRING_6 || ';' ||
STRING_7 || ';' ||
STRING_8 || ';' ||
STRING_9 || ';' ||
STRING_10 || ';' ||
STRING_11 || ';' ||
STRING_12 || ';' ||
STRING_13 || ';' ||
STRING_14 || ';' ||
STRING_15 || ';' ||
STRING_16 || ';' ||
STRING_17 || ';' ||
STRING_18 || ';' ||
STRING_19 || ';' ||
STRING_20
v_compare_string_2 := STRING_110 || ';' ||
STRING_21 || ';' ||
STRING_31 || ';' ||
STRING_41 || ';' ||
STRING_51 || ';' ||
STRING_61 || ';' ||
STRING_71 || ';' ||
STRING_81 || ';' ||
STRING_91 || ';' ||
STRING_101 || ';' ||
STRING_111 || ';' ||
STRING_121 || ';' ||
STRING_131 || ';' ||
STRING_141 || ';' ||
STRING_151 || ';' ||
STRING_161 || ';' ||
STRING_171 || ';' ||
STRING_181 || ';' ||
STRING_191 || ';' ||
STRING_201
DECODE(
TRUE,
ISNULL(PK_THIEMO_MANY_STRINGS) AND NOT(ISNULL(PK_THIEMO_MANY_STRINGS1)), 'New',
NOT(ISNULL(PK_THIEMO_MANY_STRINGS)) AND ISNULL(PK_THIEMO_MANY_STRINGS1), 'Deleted',
-- it is not possible in joining to get both join attribute tuple being null if at least one attribute out of them is not nullable
v_compare_string_1 != v_compare_string_2, 'Changed',
'Unchanged'
)
C) As B but calculate the md5 hash of the strings and compare these values instead
v_compare_string_1 := MD5(STRING_1 || ';' ||
STRING_2 || ';' ||
STRING_3 || ';' ||
STRING_4 || ';' ||
STRING_5 || ';' ||
STRING_6 || ';' ||
STRING_7 || ';' ||
STRING_8 || ';' ||
STRING_9 || ';' ||
STRING_10 || ';' ||
STRING_11 || ';' ||
STRING_12 || ';' ||
STRING_13 || ';' ||
STRING_14 || ';' ||
STRING_15 || ';' ||
STRING_16 || ';' ||
STRING_17 || ';' ||
STRING_18 || ';' ||
STRING_19 || ';' ||
STRING_20
)
D) As B but use chr(31) as delimiter instead of ;
v_compare_string_1 := STRING_1 || chr(31) ||STRING_2 || chr(31) ||
STRING_3 || chr(31) ||
STRING_4 || chr(31) ||
STRING_5 || chr(31) ||
STRING_6 || chr(31) ||
STRING_7 || chr(31) ||
STRING_8 || chr(31) ||
STRING_9 || chr(31) ||
STRING_10 || chr(31) ||
STRING_11 || chr(31) ||
STRING_12 || chr(31) ||
STRING_13 || chr(31) ||
STRING_14 || chr(31) ||
STRING_15 || chr(31) ||
STRING_16 || chr(31) ||
STRING_17 || chr(31) ||
STRING_18 || chr(31) ||
STRING_19 || chr(31) ||
STRING_20
E) As D but make the delimiter be provided over mapping parameter and parameter file
v_compare_string_1 :=
STRING_1 || $$delimiter ||
STRING_2 || $$delimiter ||
STRING_3 || $$delimiter ||
STRING_4 || $$delimiter ||
STRING_5 || $$delimiter ||
STRING_6 || $$delimiter ||
STRING_7 || $$delimiter ||
STRING_8 || $$delimiter ||
STRING_9 || $$delimiter ||
STRING_10 || $$delimiter ||
STRING_11 || $$delimiter ||
STRING_12 || $$delimiter ||
STRING_13 || $$delimiter ||
STRING_14 || $$delimiter ||
STRING_15 || $$delimiter ||
STRING_16 || $$delimiter ||
STRING_17 || $$delimiter ||
STRING_18 || $$delimiter ||
STRING_19 || $$delimiter ||
STRING_20
F) First detect new/old/others and afterwards detect from the others only the ones with changes as it is done in E
DECODE(TRUE,
ISNULL(PK_THIEMO_MANY_STRINGS) AND NOT(ISNULL(PK_THIEMO_MANY_STRINGS1)), 'New',
NOT(ISNULL(PK_THIEMO_MANY_STRINGS)) AND ISNULL(PK_THIEMO_MANY_STRINGS1), 'Deleted',
-- it is not possible in joining to get both join attribute tuple being null if at least one attribute out of them is not nullable
'Changed?'
)
IIF(
STRING_1 || $$delimiter ||
STRING_2 || $$delimiter ||
STRING_3 || $$delimiter ||
STRING_4 || $$delimiter ||
STRING_5 || $$delimiter ||
STRING_6 || $$delimiter ||
STRING_7 || $$delimiter ||
STRING_8 || $$delimiter ||
STRING_9 || $$delimiter ||
STRING_10 || $$delimiter ||
STRING_11 || $$delimiter ||
STRING_12 || $$delimiter ||
STRING_13 || $$delimiter ||
STRING_14 || $$delimiter ||
STRING_15 || $$delimiter ||
STRING_16 || $$delimiter ||
STRING_17 || $$delimiter ||
STRING_18 || $$delimiter ||
STRING_19 || $$delimiter ||
STRING_20
!=
STRING_110 || $$delimiter ||
STRING_21 || $$delimiter ||
STRING_31 || $$delimiter ||
STRING_41 || $$delimiter ||
STRING_51 || $$delimiter ||
STRING_61 || $$delimiter ||
STRING_71 || $$delimiter ||
STRING_81 || $$delimiter ||
STRING_91 || $$delimiter ||
STRING_101 || $$delimiter ||
STRING_111 || $$delimiter ||
STRING_121 || $$delimiter ||
STRING_131 || $$delimiter ||
STRING_141 || $$delimiter ||
STRING_151 || $$delimiter ||
STRING_161 || $$delimiter ||
STRING_171 || $$delimiter ||
STRING_181 || $$delimiter ||
STRING_191 || $$delimiter ||
STRING_201
, 'Changed'
, 'Unchanged'
)
Test set 1
2'000'000 records compared to 2'000'000 records. The second set contains about 1'500'000 changes in the payload and of 50 the primary key was changed such there are 50 detected as new and 50 as old.Test set 2
2'000'000 records compared to 2'000'000 records. The second set contains about 50 changes in the payload and of 1'500'000 the primary key was changed such there are 1'500'000 detected as new and 1'500'000 as old.Results
Interpretation
- First run takes more time probably because the table was not in the Oracle cache yet
- The direct comparison (or-ing the comparison of the single attribute pairs) seems to cost a lot. There problem of NULL handling is not taken care of and would cost even more time! The concatation solutions do an implicit NULL handling by converting it to the empty string.
- Using chr(31) hard coded is not a problem. Possibly because on instantiation time it already is replaced by its value like a constant. To try to force PoC do it with the usage of a parameter does not help. On the contrary, somehow it slows it down.
- I cannot imagine why the use of ; instead of chr(31) is less efficient. And having said that there is not justification for increased danger of finding false matches if string port lists can easily contain ; in its values.
- The use of the md5 hash function to encapsulate costs quite a bit of time. If one does not need the value afterwards I cannot see a point using it for a comparison. Every hashing implies a loss of information and therewith the danger of falsly found matches (collision). If the hash would be stored along the actual records for the later use in the comparison, I presume, would cut the execution time loss only by half. I would not run the risk of collision however.
- The overhead of shifting data through more transformations for the change case is only made up with really few new/old records compared to the number of changes/no-changes. In the test setup the ratio was 0.00005. Real world? Rather not!
Keine Kommentare:
Kommentar veröffentlichen