Site Network: Home | About

in Informatica?

Most likely this will strike your mind if you have ever supported LARGE pool of Informatica jobs. If you are asked to simply watch the jobs and report any errors then you are blessed, provided your sessions are properly set to handle/report the errors and command tasks are correctly configured to catch the errors from shell/batch scripts.

But, this will not be your only job while you support ETL jobs - you should also make sure that you don't miss the SLAs. So, may be you need to keep close eye on EVERY task (sessions, command tasks etc...) to be sure that nothing has stuck or running slowly than they usually take; and you need to identify such jobs without much loss of time.

I have recently written a simple sql in Oracle 10g. This query pulls the data from repository tables for the historical information of ALL the tasks to calculate benchmark value of each task and compare that with the currently running task to flag whether the running task has crossed the benchmark value or not.


SELECT b.subject_area sa, b.workflow_name wf, b.instance_name sn,
CEIL (expected_time) ed, CEIL (b.elapsed) et
FROM (SELECT subject_id, workflow_id, task_id, instance_id,
instance_name,
MEDIAN (end_time - start_time) * 24 * 60 expected_time
FROM (SELECT subject_id, workflow_id, task_id, instance_id,
instance_name, start_time, end_time,
RANK () OVER (PARTITION BY subject_id, workflow_id, task_id, instance_id ORDER BY start_time DESC) older_runs
FROM rep_task_inst_run
WHERE task_type IN (58, 60, 68)
AND run_err_code = 0
AND run_status_code = 1)
WHERE older_runs <= 10
GROUP BY subject_id,
workflow_id,
task_id,
instance_id,
instance_name) a,
(SELECT subject_area, subject_id, workflow_name, workflow_id,
task_id, instance_id, instance_name,
(SYSDATE - start_time) * 24 * 60 elapsed
FROM rep_task_inst_run
WHERE task_type IN (58, 60, 68)
AND run_err_code = 0
AND run_status_code = 6) b
WHERE a.subject_id = b.subject_id
AND a.workflow_id = b.workflow_id
AND a.task_id = b.task_id
AND a.instance_id = b.instance_id
AND b.elapsed >
a.expected_time
* (CASE
WHEN a.expected_time < 5
THEN 4
WHEN a.expected_time < 10
THEN 3
WHEN a.expected_time < 20
THEN 1.5
ELSE 1.3
END
)
ORDER BY b.subject_area, b.workflow_name;

Here goes the details...
  1. Command tasks, waits tasks and session tasks are included only.
  2. Median value of last successful runs is evaluated which is the fundamental about this process.
  3. May be you could directly use that as benchmark, or else add some extra time to that to arrive a benchmark value. I chose the later option - and in my query if a task runs for more than 30% of its median value I would consider it as long running. I just went further on it and created few more bands on choosing the 30% value. For the simple reason that a task which usually runs for just 1 min, say , allowing a 30% lenience does not make any sense. So, pick your % value based on median value - smaller the median value larger the % value that would need to allow.
CASE WHEN A.EXPECTED_TIME < 5 THEN 4
WHEN A.EXPECTED_TIME < 10 THEN 3
WHEN A.EXPECTED_TIME < 20 THEN 1.5
ELSE 1.3
END

Performance wise this query is not very efficient though. I didn't find a way to make better use of indexes present on underlying tables. However I have a try here. Below query can replace the inline view A in that query.

SELECT c.subject_id, c.workflow_id, c.task_id, c.instance_id, c.instance_name, c.start_time, c.end_time,
RANK () OVER (PARTITION BY c.subject_id, c.workflow_id, c.task_id, c.instance_id ORDER BY c.start_time DESC) "OLDER_RUNS"
FROM rep_task_inst_run c, rep_task_inst_run d
WHERE c.task_type IN (58, 60, 68)
AND c.run_err_code = 0 AND c.run_status_code = 1
AND c.subject_id = d.subject_id AND c.workflow_id = d.workflow_id
AND c.instance_id = d.instance_id AND c.task_name = d.task_name
AND d.run_status_code = 6 AND d.end_time IS NULL


Thanks!

0 Comments:

Post a Comment