This must be a long waiting post. After being inactive here for long time, it's time to blog my experience after I met JB Sastry, GE Money's Data warehousing Chief Architect. He is very aggressive person which I heard from many. But he was very jovial on the day I met and luckily I didn't see his other side. I was really surprised when he showed me an email which he received from GE Money's CRM chief (Sandeep) asking about the status of Cross Sell Offer DB's incremental loads. As per JB, Cross Sell is one of the very good and growing businesses of GE Money, as it involves tie-up with so many clients, including giants like Walmart, JC Penny, Dillards, Mervyn etc... I hope by now you got the reason for my meet with JB. Till now we have been doing Full refresh data loads of customer offers every week. But for a growing business like this which highly interacts with customers and theirs needs in real time, obviously incremental loads can be a better solution. Existing design is no more going to support incremental loads. Total data is huge - more than 300MM, and the data to be purged and added is also in millions - more than 30MM a week. Apparently, a normal Delete and Insert approaches would not suffice. Me and JB discussed about the new design. He gave few suggestions...mainly,
- do delta operations in OS layer, instead doing in database
- use hash partitions and avoid global indexes to do partition pruning