Last week, we were trying to bring an improvement in one of our ETL processes. Process is such that Hyperion cube’s load needs to started just after some ETL loads’ completion. Both Informatica and Hyperion are hosted on different servers, and both are Linux. So, there has to be some dependency established between ETL process on Informatica server and Cube load process so that Cube load gets kicked-off immediately after ETL finishes.
If you feel you do not have much time to conver the full details below, you directly skip to here as main purpose of this post is explain the difficultly I faced in using ssh from Informatica.
We could think of an approach where in trigger file (zero byte file) from ETL to notify Cubes’ load job. But this is very loosely coupled – ETL should either send the trigger on to Hyperion server and Cube load process that already started on Hyperion server should be waiting for the trigger to kick-off cubes load. There are some serious drawbacks in this approach -
Both ETL and Cubes’ load jobs should be scheduled on their servers – how does already started Cube load process know about ETL job failed and that current is ignored – fairly acceptable scenario, I believe. As a work around, Cube load job on Hyperion server can for defined time period, and it still doesn’t find trigger file then stop cubes load – some sort of workable, but still not good when schedule of ETL changed or ETL took longer period. (Trigger file released on to a shared file system instead of sftping is just nothing but a trigger file dependent)
A good possible workaround I could think of is an Enterprise Job Scheduler that reduces if not fully eliminates the process dependency barriers among the different servers. In addition to Informatica and Hyperion servers if there is another third server that controls Jobs on rest of the servers this issue is solved – Job scheduler to start the Cube load process only after ETL finishes.
But not every company uses a Job Server/Scheduler, unless the IT is relatively complex and value for money is not identified. My client is not using the Job scheduler, at least that I know of in their well matured BI department. Just because of some rare (at for them) benefits that Job Scheduler can bring, simply ignoring the classic scheduler that Informatica gives is also not good idea. So be with it.
At this point of time, only option I can think of is remote invocation of Cube load process on Hyperion server from Informatica server as part of its ETL itself – no it’s tightly coupled – process of Hyperion server doesn’t even start unless ETL finishes.
So, point is clear by now – I need to start the Cubes’ load process from Informatica, precisely through a command task calling a remote shell script. A code snippet is something like this…
ssh –q –l hyp01 hypsrvr ‘/opt/app/Hyperion/cube_load.ksh’
I noticed a weird behaviour of this process when it was invoked from Informatica 7.1. Even though cube_load.ksh process on Hyperion server is finished, process on Informatica 7.1 server still shows up in the process list (ps -f) forever, and so does in Informatica monitor. The very same command when executed directly on the server (from putty) worked fine. Even a simple ssh command for “date” gave the same result when ran from Informatica. For the first time I tried to use strace on the PID of ssh process, though not successfully – but that was showing only “select(16, [0 13], [], NULL, NULL” – still not helpful.
ssh –q –l hyp01 hypsrvr ‘date’After breaking my head for couple of hours, I partially suspected pmserver process of Informatica 7.1 (which means Informatica 7.1 it self) which becomes parent process of every process (session, shell command, sql client or anything invoked by Informatica).
I then tried in our other upgraded environment Informatica 8.6 and that worked. I’m suspecting 7.1 has this bug fixed in later releases. I’m not confident enough to say that Informatica 7.1 has this bug, as Linux machine environment settings also a possible reason. So, I deeply appreciate sharing your similar experiences. This is our Informatica Linux server version.
With my boss help I tested rsh too instead ssh – we knew rsh was not the way to go due to the risks involved, we just gave a try and that worked in both 7.1 and 8.6 – only ssh had the problem.
$ uname -svm Linux #1 SMP Wed Jul 12 23:36:53 EDT 2006 i686
Something off the topic I learned from him was how to setup .rhosts for rsh. I knew already how to setup the similar for ssh (generating the public keys on client machine and them to host machine’s authorization keys file), but rsh too has some similarity in the setup. There are no encrypted keys generation required, but just adding the client user name in the remote machine’s HOME/.rhosts file and that works.
As I requested, please do share your experiences with Informatica and ssh together.
Karteek
Labels: bug, informatica, Linux