A day with Ralph Kimball, Part 1

I had the opportunity to spend a day in a seminar with Ralph Kimball. If you don’t know who that is, he is a guru of data warehousing. If you’re involved in data warehouses, I hope you are at least familiar with his work. Currently in the industry there are two primary, competing warehousing methodologies, i.e. practically religions to some, Kimball vs. Inmon. I think that’s kind of silly. A methodology is like a hammer or a drill; choose the best one for the job. If I absolutely have to pick, I’m in the Kimball camp. I have used Inmon’s methodology in the past and that works but I find that Kimball’s work applies better in my most recent projects, particularly when implementing enterprise bus architecture. Having said that, if I was on a project and the requirements made me think Inmon’s methodology would work better, I wouldn’t hesitate to do so. For that matter, just using flat, reporting tables is appropriate in the right circumstances. But I’m getting ahead of myself. Back to the seminar.

An extract from the marketing literature: Ralph Kimball is the founder of Kimball Group, a company that trains dimensional modeling and critiques data warehouses. He founded Red Brick, a super speedy warehouse RDBMS that was purchased by IBM. Before that he was a vice president with Metaphor Systems who sold data warehouse systems and before that he was at Xerox Parc where he co-invented the system that used a mouse and GUI, the Xerox Star Workstation. He also authored the Data Warehouse Toolkit, which in my opinion is indispensable for warehousing professionals.

This guy is rock solid. He doesn’t just write books. He’s been implementing these ideas for years and proving they do work in the real world. The clue I would like to get across is to follow Kimball’s guidelines fully. If you only implement a piece of it, don’t complain that Kimball’s methods don’t work. Combining star schemas and fully relational schemas or taking what you believe are short cuts to make things simpler will probably fail. Stick to the proven method. But that’s my soapbox and I’ll step down now.

When I got up that morning it was raining. It had been raining the evening before and my normally 35-minute or so commute became a 1-? hour commute. The Marriott in downtown Tampa is twice as far from my house as is my employer. I almost decided not to go. The combination of rain, traffic and the fear that it would just be a marketing gimmick almost made me decide not to attend. At the last minute I decided that this might be a one-time opportunity so I drug myself out of bed and got on the road 2 hours before show time. I’m glad I made that decision.

The traffic was bad; as bad as I figured it would be. But it flowed and I didn’t hit any wrecks. I managed to make it in 1 hour, which is good time for me. There were a lot of friendly people waving to each other with one finger but no road rage as far as I could tell. One of my co-workers who also attended started off about 10 minutes behind me. It took her a lot longer to arrive though. I must have left at the right time.

The downtown Marriott, Actually called the Tampa Marriott Waterside Hotel and Marina, is a NICE hotel. The meeting room was high enough up that the hallway outside had great views. The view over the water and the marina is amazing. If you’re ever coming to Tampa, and someone else is paying, you should consider it. Here’s the link: http://marriott.com/property/propertypage/TPAMC

The seminar was arranged, and I guess paid for, by Informatica, Sybase and Trillium Software. They put on a good spread. When I got there, an hour early, after I registered and got my badge, I had my choice of breakfast. The provided orange juice and coffee to drink and bagels, muffins, fruit, etc to eat. I give four stars for the munchy factor. Food is important to me. Food is my friend. I ate way more than I should have. I think I might have had one of each. Or maybe I had more than one.

My co-workers started showing up about ? hour later at 8am. I passed the time reading the latest Dr Dobbs Journal and eating. I thought the seminar was supposed to start at 8:30 but it turned out that breakfast was supposed to start at 8:30. I was full by then.

Finally at 9, the informatica rep got the proceedings started. The schedule we were provided was:

8:30-9:00 Breakfast and Registration
9:00-10:30 Ralph Kimball
10:30-10:45 Break
10:45-12:00 Informatica, Sybase and Trillium
12:00-13:00 Lunch and Q&A with Ralph
13:00-14:30 Afternoon Workshop Part 2
14:30-14:45 Break
14:45-16:00 Afternoon Workshop Part 3

Not bad I thought. If Ralph had the majority of the day, it might be worthwhile after all. The rep introduced Ralph and Ralph got started. It turns out that the daylong seminar was based on his classes. Basically it was to give us a taste of what his classes provide. The taste was enough that I really, really want to take his classes. The classes aren’t cheap though. The two I would like the most are Dimensional Modeling in Depth and Data Warehouse ETL in Depth. You can see his web site and info about he classes at: www.kimballgroup.com

Let me say that if you have already read and studied his book, the data warehouse lifecycle toolkit, this one-day seminar will not teach you anything new. Having him say it the way he does and having him present his experiences and opinions really solidifies what the book says. Having the opportunity to speak with him on breaks and at the Q&A is worth the parking fee anytime. He also said his classes are like boot camps where you are dealing with multiple real world scenarios and actually implementing data marts and doing ETL (in the ETL class).

The morning session covered the basic lifecycle of a warehouse, a bit about dimensional modeling and some definitions. It was the executive overview of the day. He made some very good points. One particular point is that he considers himself an engineer not a scientist. A scientist can ponder theory forever while an engineer delivers working products. I see myself in the same light. I like considering the theories and bouncing ideas around but at the end of the day, I apply theories. I like to design and build but then I want to move on to the next project, not coddle a particular project forever.

Another comment he made is that if you think after interviewing your users, you have defined all of the requirements, then what you really have is foofoo dust. I’m not sure if I spelled that correctly but I don’t think it really matters. I WILL be using that one in a meeting sometime in the future.

His presentation was an analogy to the publishing industry. Like the publishing industry, the warehouse group collects input from a variety of sources and edits for quality. I think two of the most critical things he spoke about are that data warehouse professionals must maintain the end users trust. Data quality, timeliness and just being honest about expectations are paramount. And at the end of the day, success is measured by the happiness of your end users. If it’s fast but nobody uses it, it’s not a success. If it’s too slow to be usable or the data is too stale or the interface, i.e. schema, is too confusing, there are no excuses.

One thing that he said that I asked him to clarify during a break was about aggregating. He said pre aggregating was the kiss of death. That is the exact opposite of what I believe so I asked him to confirm his statement. What he meant was aggregating and then throwing away the details. Some people in the industry say that you should aggregate your data and there is no reason to keep the details. I would agree with Ralph that that would be the kiss of death. In my experience, users ALWAYS want the ability to drill down to detail. How can you research things like data skew without details? What he confirmed to me is that pre-aggregating your details is a surefire time saver for querying and we also agreed that keeping the detail is the way to go.

He went on to cover some of the history of warehousing and the challenges and realities of the current environments. Of course he talked about his specific contributions, the data warehouse bus structure, conforming dimensions, and tying them back to simplicity and performance. He introduced the Bus Matrix which when tied to the data warehouse bus concept is very useful and simple to read. One of the last things he covered was centralization and the risks of being too centralized. Probably the biggest risk of being too centralized is the desire to do things in a big bang instead of incrementally.

The presentations by Informatica, Sybase and Trillium Software were mostly marketing. That’s fair enough. They were paying for the day after all. I will speak about ETL tools in the near future. I got the chance to evaluate Informatica and two others some time back and I’ll share my findings.

Sybase covered a concept I hadn’t heard of and plan to research, columnar data storage. They call their implementation Sybase IQ. Instead of storing data in tables, they store everything in single column bitmaps. It reduces storage requirements and speeds access. It really seems fascinating. I’m still not sure of the details though so I want to spend some time researching this. One of the funniest moments came when the Informatica or Sybase rep, I forget which, said the columnar storage makes access infinitely faster. Ralph’s engineering background kicked in and he broke in with the comment that “infinitely faster is a marketing phrase”. The rep admitted that he should have said, noticeably faster. Ralph agreed that would be more appropriate. I thought of the comment, never bring an engineer to a sales call. Heh

And that got us to lunch. I had a turkey sandwich and then went back for a roast beef sandwich. We got to listen to Ralph and hear others ask questions. All in all a very good morning.

Well, that’s the setup for my next blog. In that one I will talk about the afternoon sessions, which covered dimensional modeling and ETL. Instead of just talking about the seminar, I will get a bit more technical and define some of the attributes of a warehouse as well as share some of what Ralph talked about. In the very near future, I will actually walk through designing a mini-mart using a different set of examples than Ralph did but building on what I will talk about in the next blog.

AS always, if you have any comments, questions or critiques, please post them. Ideas for future blogs are also encouraged.



Technorati : , , , , , , , , , , , , , , , ,

You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

1 Comment »