Site Network: Home | About

SQL Loader has basically two types of data load modes... - Conventional path load method and - Direct path load method By default it is first method...which we've been using in our project to load the data. In my previous posts, I think I had explained about how we do data load, which strategy we follow in distributing the files across 3 different instances. Let me just quickly take you through...

  • Ours is RAC with 3 instances.
  • Need to utilize maximum h/w capacity.
  • and data being loaded is a partitioned table(LIST)...so if we randomly pick the files and split into 3 groups and start data load...obviously there would be a lot contention on partition level. How do we do here?
each file will also into a single partition and we also know that into which partition files go into. So, we distribute the files carefully in a such a way that no two instances at any point of time during the load, loads the data into a same partition. Confused!! even I can't get my words clearly...I will put it in simple indirectly we make every instance to use only a set of partitions exclusively. Then what?...absolutely tremendous throughput..total time was cut down from 15hr to 5 hrs!!. Now, everything is going to change. In my last post (about JB's meeting) I talked few about our new design where we are going to use HASH partitioning instead of LIST. List partitioning is fully under our control - we know into a which a partition will be put, so files distribution was possible. But, Hash is different - only Oracle knows it. Basically, HASH itself is a slow process when it comes to data load - because Oracle needs to apply Hash algorithm on every record and decide the partition. Now it became a challenge implementing the similar for Hash, we don't even know the partition of a record, how can we distribute the files??. - absolutely not possible!! So, I tried to use DIRECT PATH Load option in SQL Loader. But again this did not work. We have a sequence generator (for primary key column) in sql loader and this is not allowed in Direct path load - it simply inserts NULLS. I tried to find alternatives but no use. http://www.orafaq.com/forum/t/50054/0/ - this thread has few discussions on the same. There it was suggested to a SEQUENCE() function - but not recommended, performance would again be bad. And I also felt that it is not easily maintainable between the tables.
....
deptno CHAR TERMINATED BY ':',
projno,
loadseq  SEQUENCE(MAX,1))
The SEQUENCE function generates a unique value in the column loadseq. This function finds the current maximum value in column loadseq and adds the increment (1) to it to obtain the value for loadseq for each row inserted.
- Karteek

with JB...

This must be a long waiting post. After being inactive here for long time, it's time to blog my experience after I met JB Sastry, GE Money's Data warehousing Chief Architect. He is very aggressive person which I heard from many. But he was very jovial on the day I met and luckily I didn't see his other side. I was really surprised when he showed me an email which he received from GE Money's CRM chief (Sandeep) asking about the status of Cross Sell Offer DB's incremental loads. As per JB, Cross Sell is one of the very good and growing businesses of GE Money, as it involves tie-up with so many clients, including giants like Walmart, JC Penny, Dillards, Mervyn etc... I hope by now you got the reason for my meet with JB. Till now we have been doing Full refresh data loads of customer offers every week. But for a growing business like this which highly interacts with customers and theirs needs in real time, obviously incremental loads can be a better solution. Existing design is no more going to support incremental loads. Total data is huge - more than 300MM, and the data to be purged and added is also in millions - more than 30MM a week. Apparently, a normal Delete and Insert approaches would not suffice. Me and JB discussed about the new design. He gave few suggestions...mainly,

  • do delta operations in OS layer, instead doing in database
  • use hash partitions and avoid global indexes to do partition pruning
That seemed like he was happy with my understanding and he wanted me to send a Minutes of that meeting. He want to put that design in front of others - Deepak and Kaustubh. For me that design seemed Ok and I even suggested a point on using Index Organized Tables (reduces physical reads as relevant records are closely placed on few blocks) in my reply email to JB. Now, waiting for his responses..... - Karteek