Thoughts and Musings...: 2008

ORA-00257 - Oracle backup schedule - OEM

0 comments Posted by Karteek at 12/16/2008

ORA-00257: Archiver error. Connect internal only, until freed

something really not good. Usually, perhaps mostly, occurs when database runs in ARCHIVELOG mode and space is running out in archive destination directory. Oracle archives the redo logs timely to a specified archive location (timely/on commit...) so that recovery is guaranteed at later time. But we should keep a check on space utilization in destination directory. If we miss, Oracle may most likely stop new any external connections until DBA clears the space on dest directory.

ORA-00257: archiver error. Connect internal only, until freed. Cause: The archiver process received an error while trying to archive a redo log. If the problem is not resolved soon, the database will stop executing transactions. The most likely cause of this message is the destination device is out of space to store the redo log file. Action: Check archiver trace file for a detailed description of the problem. Also verify that the device specified in the initialization parameter ARCHIVE_LOG_DEST is set up properly for archiving.

Fix to this was briefly explained by Donald Burleson in his pages below ...

http://www.dba-oracle.com/sf_ora_00257_archiver_error_connect_internal_only_until_freed.htm

We do this problem recently in one of our production environments, after which only i started finding some details about this error. Respective 10g database was recently moved to a new box and so the error as backup maintenance cron scripts from old box were not migrated to new box. As there was not job to backup the archive logs on schedule basis space on file system ran out and finally database hung up. What is surprising me most is why would we still need a cron to manage few of the Oracle jobs. We may likely loose the control when things are scattered. Moment I realized crons were being used for archive back up, I felt Oracle should have already had a better way to handle these tasks. I searched for it and found there is a way...OEM can do this for us. Below link explains that...

http://www.oracle.com/technology/obe/2day_dba/backup/backup.htm

It's just that people have to leverage the feature that has been given for what they paid for. At this time I don't have much idea on cons of using OEM, but this is an advantage unless there is not hidden side effects. - Karteek

India mourns...

0 comments Posted by Karteek at 11/30/2008

Long live India!!! Jai Hind!!!

Informatica - Identify the drop/improvement in performance of runs

0 comments Posted by Karteek at 11/30/2008

This is going to be an addition to my last post "Is the task running for more time...". That post talked what's wrong happening at moment i.e that query tells if there is any task that IS running for more time that it usually takes so that you can act on it immediately. That helps you in avoiding the last minute surprises, ah!... job stuck up somewhere or is running slowly and it will not finish with in SLA.

Now I wanted to share something you can work off-line. It would identify the jobs that STARTED running slowly or fast. If there is a task that usually finishes in 30 min, and today it finished in 70 min. Might be still...your system is good today !! but what if it takes approx same extra time to finish daily. I'm sure that's not good. Time to act and see why it suddenly bounced from 30 to 70 min. Any useful indexes are not unusable or dropped but not recreated...like this anything unusual. So, you may wish to run this query once in day to see what are the jobs that started showing up abnormality in elapsed time.

Recently one of the projects I've been with migrated from 9i to 10g RAC. We started noticing few jobs that were finishing extremely faster, and few other jobs equally slower. I helped a lot in listing jobs and finding there source/target tables and finding the route causes.

I suggest you not to just stick to this query. You could comfortably make adjustment so as to obtain some interesting results. Like we did in our case, you can change the date ranges to compare the October month jobs against the last n day jobs (change total_runs and start_dt for this).

Query (Tested in Informatica 7.1.2 Repository and Oracle 10.2.0.4.0):

--start_dt - date default sysdate-1, -- to only consider the tasks that finished until yesterday
--total_runs - number default 11, -- total calculation is performed on last 11 successful runs of each task
--latest_runs - number default 3, -- Of 11 runs, oldest 8 runs' median is compared against latest 3 runs
--min_runs - number default 5 -- A task should atleast run for 5 times successfuly in its lifetime

WITH NAMED_QUERY
AS
(
    SELECT SUBJECT_AREA, WORKFLOW_NAME, INSTANCE_NAME, ELAPSED, MED, EXPECTED_HIGH, EXPECTED_LOW,
        CASE WHEN ELAPSED > EXPECTED_HIGH THEN 'H'
             WHEN ELAPSED < EXPECTED_LOW THEN 'L'
             ELSE NULL
        END VARIANCE,
             RUN_DAY, MAX_RUN, HISTORY
    FROM
    (
        SELECT B.*,
         LTRIM(SYS_CONNECT_BY_PATH(ELAPSED, ','),',') HISTORY,
         CEIL(MED*(CASE WHEN MED < 5 THEN 4
                 WHEN MED < 10 THEN 3
                 WHEN MED < 20 THEN 1.5
                 ELSE 1.3 END)) EXPECTED_HIGH,
         CEIL(MED*(CASE WHEN MED < 5 THEN 0.20
                 WHEN MED < 10 THEN 0.30
                 WHEN MED < 20 THEN 0.60
                 ELSE 0.70 END)) EXPECTED_LOW
         FROM
         (
            SELECT SUBJECT_AREA, WORKFLOW_NAME, INSTANCE_NAME, START_TIME, END_TIME, ELAPSED,
             CEIL(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY CASE WHEN RUN_DAY > latest_runs THEN ELAPSED ELSE NULL END) OVER(PARTITION BY SUBJECT_AREA, WORKFLOW_NAME, INSTANCE_NAME)) MED,
             RUN_DAY, MAX_RUN
             FROM
             (
                 SELECT A.SUBJECT_AREA, A.WORKFLOW_NAME, A.INSTANCE_NAME, START_TIME, END_TIME,
                  CEIL((END_TIME-START_TIME)*24*60) ELAPSED,
                  ROW_NUMBER() OVER (PARTITION BY A.SUBJECT_AREA, A.WORKFLOW_NAME, A.INSTANCE_NAME ORDER BY START_TIME DESC) "RUN_DAY",
                  LEAST(COUNT(1) OVER (PARTITION BY A.SUBJECT_AREA, A.WORKFLOW_NAME, A.INSTANCE_NAME), total_runs) MAX_RUN,
                  MAX(START_TIME) OVER (PARTITION BY A.SUBJECT_AREA, A.WORKFLOW_NAME, A.INSTANCE_NAME) "LAST_RUN"
                  FROM DATA_PROFILER.INFAWFLOWTASKRUNS A
                  WHERE TASK_TYPE IN (58, 68) AND RUN_ERR_CODE = 0 AND RUN_STATUS_CODE= 1
            )
            WHERE RUN_DAY <= total_runs
            AND LAST_RUN>=TRUNC(start_dt)
            AND MAX_RUN >= min_runs
        ) B
        START WITH RUN_DAY = latest_runs
        CONNECT BY
        SUBJECT_AREA = PRIOR SUBJECT_AREA AND
        WORKFLOW_NAME = PRIOR WORKFLOW_NAME AND
        INSTANCE_NAME = PRIOR INSTANCE_NAME AND
        RUN_DAY+1 = PRIOR RUN_DAY
    )
)
SELECT SUBJECT_AREA sa, WORKFLOW_NAME wf, INSTANCE_NAME sn, MED et, VARIANCE va, HISTORY lr
FROM NAMED_QUERY A
WHERE
A.RUN_DAY = 1
AND
(
A.EXPECTED_HIGH < ALL (SELECT B.ELAPSED FROM NAMED_QUERY B
                         WHERE A.SUBJECT_AREA = B.SUBJECT_AREA
                         AND A.WORKFLOW_NAME = B.WORKFLOW_NAME AND A.INSTANCE_NAME = B.INSTANCE_NAME AND B.RUN_DAY <= latest_runs
            )
OR
A.EXPECTED_LOW > ALL (SELECT B.ELAPSED FROM NAMED_QUERY B
                         WHERE A.SUBJECT_AREA = B.SUBJECT_AREA
                         AND A.WORKFLOW_NAME = B.WORKFLOW_NAME AND A.INSTANCE_NAME = B.INSTANCE_NAME AND B.RUN_DAY <= latest_runs
                     )
)
ORDER BY va, sa, wf, sn;

Hope it helps...
Karteek

Is the task running for more time...

0 comments Posted by Karteek at 6/06/2008

in Informatica?

Most likely this will strike your mind if you have ever supported LARGE pool of Informatica jobs. If you are asked to simply watch the jobs and report any errors then you are blessed, provided your sessions are properly set to handle/report the errors and command tasks are correctly configured to catch the errors from shell/batch scripts.

But, this will not be your only job while you support ETL jobs - you should also make sure that you don't miss the SLAs. So, may be you need to keep close eye on EVERY task (sessions, command tasks etc...) to be sure that nothing has stuck or running slowly than they usually take; and you need to identify such jobs without much loss of time.

I have recently written a simple sql in Oracle 10g. This query pulls the data from repository tables for the historical information of ALL the tasks to calculate benchmark value of each task and compare that with the currently running task to flag whether the running task has crossed the benchmark value or not.

SELECT b.subject_area sa, b.workflow_name wf, b.instance_name sn,
CEIL (expected_time) ed, CEIL (b.elapsed) et
FROM (SELECT subject_id, workflow_id, task_id, instance_id,
instance_name,
MEDIAN (end_time - start_time) * 24 * 60 expected_time
FROM (SELECT subject_id, workflow_id, task_id, instance_id,
instance_name, start_time, end_time,
RANK () OVER (PARTITION BY subject_id, workflow_id, task_id, instance_id ORDER BY start_time DESC) older_runs
FROM rep_task_inst_run
WHERE task_type IN (58, 60, 68)
AND run_err_code = 0
AND run_status_code = 1)
WHERE older_runs <= 10
GROUP BY subject_id,
workflow_id,
task_id,
instance_id,
instance_name) a,
(SELECT subject_area, subject_id, workflow_name, workflow_id,
task_id, instance_id, instance_name,
(SYSDATE - start_time) * 24 * 60 elapsed
FROM rep_task_inst_run
WHERE task_type IN (58, 60, 68)
AND run_err_code = 0
AND run_status_code = 6) b
WHERE a.subject_id = b.subject_id
AND a.workflow_id = b.workflow_id
AND a.task_id = b.task_id
AND a.instance_id = b.instance_id
AND b.elapsed >
a.expected_time
* (CASE
WHEN a.expected_time < 5
THEN 4
WHEN a.expected_time < 10
THEN 3
WHEN a.expected_time < 20
THEN 1.5
ELSE 1.3
END
)
ORDER BY b.subject_area, b.workflow_name;

Here goes the details...
Command tasks, waits tasks and session tasks are included only.
Median value of last successful runs is evaluated which is the fundamental about this process.
May be you could directly use that as benchmark, or else add some extra time to that to arrive a benchmark value. I chose the later option - and in my query if a task runs for more than 30% of its median value I would consider it as long running. I just went further on it and created few more bands on choosing the 30% value. For the simple reason that a task which usually runs for just 1 min, say , allowing a 30% lenience does not make any sense. So, pick your % value based on median value - smaller the median value larger the % value that would need to allow.
CASE WHEN A.EXPECTED_TIME < 5 THEN 4
WHEN A.EXPECTED_TIME < 10 THEN 3
WHEN A.EXPECTED_TIME < 20 THEN 1.5
ELSE 1.3
END

Performance wise this query is not very efficient though. I didn't find a way to make better use of indexes present on underlying tables. However I have a try here. Below query can replace the inline view A in that query.

SELECT c.subject_id, c.workflow_id, c.task_id, c.instance_id, c.instance_name, c.start_time, c.end_time,
RANK () OVER (PARTITION BY c.subject_id, c.workflow_id, c.task_id, c.instance_id ORDER BY c.start_time DESC) "OLDER_RUNS"
FROM rep_task_inst_run c, rep_task_inst_run d
WHERE c.task_type IN (58, 60, 68)
AND c.run_err_code = 0 AND c.run_status_code = 1
AND c.subject_id = d.subject_id AND c.workflow_id = d.workflow_id
AND c.instance_id = d.instance_id AND c.task_name = d.task_name
AND d.run_status_code = 6 AND d.end_time IS NULL

Thanks!

Thoughts and Musings...

Site menu:

ORA-00257 - Oracle backup schedule - OEM

India mourns...

Informatica - Identify the drop/improvement in performance of runs

Is the task running for more time...

Contributors

Labels

Blog Archive

Page Stats