Designing the Data Mart – Part 1
As I mentioned a while back (a loooong while back), I have been thinking about writing up how I design data marts. The problem with that is that it is a huge topic. Even converting an existing schema (which doesn’t always exist) to a data mart (star schema style), still takes plenty of behind the scenes data analysis and prepwork. Still, I am not going to take a shot at it.
I could start with a laundry list of requirements but I don’t think that would be interesting to very many people. Instead I will start with an existing schema and talk about the requirements as I model the mart. This is kind of cheating as many times you may just have a report from a mainframe (or a verbal description of a mainframe report) and you spend most of your time discovering and analyzing data sources. I’ll save those topics for another series.
I’ve decided to attack this as an agile project. Take a user story, work through some analysis and then implement part of the over all data mart. Over time, this will get the mart built, sort of like a series of sprints would. I don’t personally think an agile approach is the best approach for a data warehouse or data mart, it is an approach being taken by many, including me at my current job.
So, today, I am going to provide the existing schema and a little bit of descriptive text. I go back and forth on what would be a good example. Should I use something as simple as emp/dept? Nah, really too simple. The next step up is a basic order entry. Still too simple though. I decided to settle on an order entry and fulfillment model. That means that we have tables for orders, customers, products, shipping, inventory, etc. A bit more than the basic OE. I won’t take it to extremes though.
This model will still be fairly basic. A complete model for this could be 1000 tables. I’ll stick with as few as possible to make the point. It is easy in an inventory/parts/MRP type environment to take normalization to an amazing level (and sometimes it’s even necessary). For my purposes, I will keep it at a higher level. I am purposely leaving out taxes, sales reps, etc. If I need them for a future entry, I can always add those things in.
So there’s the basic transactional model (click on it for a larger image). Let me repeat, I realize I left plenty out. I’m not trying to show how to do sales fulfillment, I am trying to show how to create a data mart.
Some description of the items.
- Locations is a generic set of addresses. It contains US location info such as street address, state and zip.
- Customers are the people who create orders. They have addresses (ship to and bill to). They also have a credit rating (good, bad, sucks). They have a somewhat informal status with us (repeat buyer, main corporate purchase officer, 1st time buyer, never sell to again). We sometimes gives customers discounts.
- We sell products. Products are sold in units (individual, pallet, case, etc). Each product has a cost and a price. A SKU is a standard code for a product.
- Vendors sell us products. Vendors are located somewhere.
- Products are stored in warehouses which are located somewhere. Warehouses store a certain number of products which is our inventory.
- When we create a sales order, each product gets its own order line. A product might be on sale, which would give the buyer a discount.
- Once we make a sale, the order enters a fulfillment cycle: pull line item, box line item, ship order or line item, item or order is returned, etc.
Our first request from the business for our data mart is that they want to be able to query all of the orders by date, by customer and/or by region (state, city or country). They want to be able to aggregate (sum and average) across those items.
In part 2 we will start our star schema which will eventually fulfill that request. We will discuss just what a data mart and star schema are as well as facts, dimensions and conformed data. The entities we create in this first model will be designed so that they can be used and reused in the future.
You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.