SQL Loader has basically two types of data load modes... - Conventional path load method and - Direct path load method By default it is first method...which we've been using in our project to load the data. In my previous posts, I think I had explained about how we do data load, which strategy we follow in distributing the files across 3 different instances. Let me just quickly take you through...
- Ours is RAC with 3 instances.
- Need to utilize maximum h/w capacity.
- and data being loaded is a partitioned table(LIST)...so if we randomly pick the files and split into 3 groups and start data load...obviously there would be a lot contention on partition level. How do we do here?
....
deptno CHAR TERMINATED BY ':',
projno,
loadseq SEQUENCE(MAX,1))
The SEQUENCE function generates a unique value in the column loadseq. This function finds the current maximum value in column loadseq and adds the increment (1) to it to obtain the value for loadseq for each row inserted.- Karteek
This must be a long waiting post. After being inactive here for long time, it's time to blog my experience after I met JB Sastry, GE Money's Data warehousing Chief Architect. He is very aggressive person which I heard from many. But he was very jovial on the day I met and luckily I didn't see his other side. I was really surprised when he showed me an email which he received from GE Money's CRM chief (Sandeep) asking about the status of Cross Sell Offer DB's incremental loads. As per JB, Cross Sell is one of the very good and growing businesses of GE Money, as it involves tie-up with so many clients, including giants like Walmart, JC Penny, Dillards, Mervyn etc... I hope by now you got the reason for my meet with JB. Till now we have been doing Full refresh data loads of customer offers every week. But for a growing business like this which highly interacts with customers and theirs needs in real time, obviously incremental loads can be a better solution. Existing design is no more going to support incremental loads. Total data is huge - more than 300MM, and the data to be purged and added is also in millions - more than 30MM a week. Apparently, a normal Delete and Insert approaches would not suffice. Me and JB discussed about the new design. He gave few suggestions...mainly,
- do delta operations in OS layer, instead doing in database
- use hash partitions and avoid global indexes to do partition pruning