Thoughts and Musings...: Informatica - Identify the drop/improvement in performance of runs

Informatica - Identify the drop/improvement in performance of runs

Posted by Karteek at 11/30/2008

This is going to be an addition to my last post "Is the task running for more time...". That post talked what's wrong happening at moment i.e that query tells if there is any task that IS running for more time that it usually takes so that you can act on it immediately. That helps you in avoiding the last minute surprises, ah!... job stuck up somewhere or is running slowly and it will not finish with in SLA.

Now I wanted to share something you can work off-line. It would identify the jobs that STARTED running slowly or fast. If there is a task that usually finishes in 30 min, and today it finished in 70 min. Might be still...your system is good today !! but what if it takes approx same extra time to finish daily. I'm sure that's not good. Time to act and see why it suddenly bounced from 30 to 70 min. Any useful indexes are not unusable or dropped but not recreated...like this anything unusual. So, you may wish to run this query once in day to see what are the jobs that started showing up abnormality in elapsed time.

Recently one of the projects I've been with migrated from 9i to 10g RAC. We started noticing few jobs that were finishing extremely faster, and few other jobs equally slower. I helped a lot in listing jobs and finding there source/target tables and finding the route causes.

I suggest you not to just stick to this query. You could comfortably make adjustment so as to obtain some interesting results. Like we did in our case, you can change the date ranges to compare the October month jobs against the last n day jobs (change total_runs and start_dt for this).

Query (Tested in Informatica 7.1.2 Repository and Oracle 10.2.0.4.0):

--start_dt - date default sysdate-1, -- to only consider the tasks that finished until yesterday
--total_runs - number default 11, -- total calculation is performed on last 11 successful runs of each task
--latest_runs - number default 3, -- Of 11 runs, oldest 8 runs' median is compared against latest 3 runs
--min_runs - number default 5 -- A task should atleast run for 5 times successfuly in its lifetime

WITH NAMED_QUERY
AS
(
    SELECT SUBJECT_AREA, WORKFLOW_NAME, INSTANCE_NAME, ELAPSED, MED, EXPECTED_HIGH, EXPECTED_LOW,
        CASE WHEN ELAPSED > EXPECTED_HIGH THEN 'H'
             WHEN ELAPSED < EXPECTED_LOW THEN 'L'
             ELSE NULL
        END VARIANCE,
             RUN_DAY, MAX_RUN, HISTORY
    FROM
    (
        SELECT B.*,
         LTRIM(SYS_CONNECT_BY_PATH(ELAPSED, ','),',') HISTORY,
         CEIL(MED*(CASE WHEN MED < 5 THEN 4
                 WHEN MED < 10 THEN 3
                 WHEN MED < 20 THEN 1.5
                 ELSE 1.3 END)) EXPECTED_HIGH,
         CEIL(MED*(CASE WHEN MED < 5 THEN 0.20
                 WHEN MED < 10 THEN 0.30
                 WHEN MED < 20 THEN 0.60
                 ELSE 0.70 END)) EXPECTED_LOW
         FROM
         (
            SELECT SUBJECT_AREA, WORKFLOW_NAME, INSTANCE_NAME, START_TIME, END_TIME, ELAPSED,
             CEIL(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY CASE WHEN RUN_DAY > latest_runs THEN ELAPSED ELSE NULL END) OVER(PARTITION BY SUBJECT_AREA, WORKFLOW_NAME, INSTANCE_NAME)) MED,
             RUN_DAY, MAX_RUN
             FROM
             (
                 SELECT A.SUBJECT_AREA, A.WORKFLOW_NAME, A.INSTANCE_NAME, START_TIME, END_TIME,
                  CEIL((END_TIME-START_TIME)*24*60) ELAPSED,
                  ROW_NUMBER() OVER (PARTITION BY A.SUBJECT_AREA, A.WORKFLOW_NAME, A.INSTANCE_NAME ORDER BY START_TIME DESC) "RUN_DAY",
                  LEAST(COUNT(1) OVER (PARTITION BY A.SUBJECT_AREA, A.WORKFLOW_NAME, A.INSTANCE_NAME), total_runs) MAX_RUN,
                  MAX(START_TIME) OVER (PARTITION BY A.SUBJECT_AREA, A.WORKFLOW_NAME, A.INSTANCE_NAME) "LAST_RUN"
                  FROM DATA_PROFILER.INFAWFLOWTASKRUNS A
                  WHERE TASK_TYPE IN (58, 68) AND RUN_ERR_CODE = 0 AND RUN_STATUS_CODE= 1
            )
            WHERE RUN_DAY <= total_runs
            AND LAST_RUN>=TRUNC(start_dt)
            AND MAX_RUN >= min_runs
        ) B
        START WITH RUN_DAY = latest_runs
        CONNECT BY
        SUBJECT_AREA = PRIOR SUBJECT_AREA AND
        WORKFLOW_NAME = PRIOR WORKFLOW_NAME AND
        INSTANCE_NAME = PRIOR INSTANCE_NAME AND
        RUN_DAY+1 = PRIOR RUN_DAY
    )
)
SELECT SUBJECT_AREA sa, WORKFLOW_NAME wf, INSTANCE_NAME sn, MED et, VARIANCE va, HISTORY lr
FROM NAMED_QUERY A
WHERE
A.RUN_DAY = 1
AND
(
A.EXPECTED_HIGH < ALL (SELECT B.ELAPSED FROM NAMED_QUERY B
                         WHERE A.SUBJECT_AREA = B.SUBJECT_AREA
                         AND A.WORKFLOW_NAME = B.WORKFLOW_NAME AND A.INSTANCE_NAME = B.INSTANCE_NAME AND B.RUN_DAY <= latest_runs
            )
OR
A.EXPECTED_LOW > ALL (SELECT B.ELAPSED FROM NAMED_QUERY B
                         WHERE A.SUBJECT_AREA = B.SUBJECT_AREA
                         AND A.WORKFLOW_NAME = B.WORKFLOW_NAME AND A.INSTANCE_NAME = B.INSTANCE_NAME AND B.RUN_DAY <= latest_runs
                     )
)
ORDER BY va, sa, wf, sn;

Hope it helps...
Karteek

Thoughts and Musings...

Site menu:

Informatica - Identify the drop/improvement in performance of runs

0 Comments:

Contributors

Labels

Blog Archive

Page Stats