Data Model Examples to Help You Predict the Future

by DJ SpiessJul 9, 2015

DeegeU Quick Hit

The video “Data model examples to help you predict the future” discusses how it’s important to not delete data when changing state. This video is not part of a series, however, if you would like to see more videos on this topic, please comment below!

Transcript – “Data model examples to help you predict the future”

Hi, my name is DJ Spiess. One common theme I often see in data model examples is a data model design that forces you to update records. Let’s say you have an instrument that collects data for you. It can be online, offline, initializing, collecting, sleeping, waking up, and in error. The status is kept in a field on the instrument record. Everytime the status changes, you update the record. Easy.

Then the instrument goes down, and after a week of being in error someone finally notices. You look at the instrument record, and yep, it’s in error. The question is inevitably why? You’re not sure so the next question is, what was it doing before the error occurred. If you’re writing over your status every time you update it, you can’t answer that. You can’t even tell when it went down.

In this video, we’re going to look at different ways you can track status in your applications. Ultimately what I’d like to show you, is why updating your database is a bad idea in general. There are exceptions, but for most cases you want to just append data. Finally we’ll look at some of the benefits of storing a state history in your data model, including predicting the future with data mining!

Data model examples for status

So in our first data model example we have an instrument with a status. It also has an id, name, manufacturer number, description, and installation location. We just saw that updating the record is really a bad idea. We can’t see a history of the statuses, when it changed status, etc. Unlike the id, name, description, and maunufacturer number, the status is something that is likely to change. In fact, I’d argue the installation location is just like the status. It should not be part of the instrument record. Status and location are things that are likely to change.

Data Model Examples adding status - DeegeU

Data model example of adding status without touching the main entity

One data model pattern I’ve seen for status is to create a status table like this. The status table has the name of the status, the timestamp, the previous state, and a reference back to the entity for which this is a record. This is great because you can perform a query where the instrument id is the entity you want. You can even narrow your search for the last week, or what ever you need. The advantage of this approach is it doesn’t affect your Instrument table. The concept of a status is in a different table.

The downside to this data model example is when you have many things you need to status, and you want to easily add more. You wouldn’t want to duplicate the status table for every type, so you include a second pointer back to the next entity type. This doesn’t scale well, and gets ugly as you get more statusable things to track.

This means you have to move the reference back into your statusable entity. Here’s another approach. Let’s add a status history table id in our instrument table. We’ll make the business rule that the status history is created once when the instrument is created. You can enforce this many ways, but that’s another topic.

Data model examples adding status with list in-between

The advantage of this data model example is when we need to add new statusable entities, we just include a mandatory key reference back to the status history. How we do statuses never changes, and it doesn’t grow as we add more things to status.

The other important thing to note, is this doesn’t just apply for status. When we introduced this problem earlier, we also mentioned we would not like location in our instrument definition. You would break out location in a similar way to how we broke out status. You could even add two new statuses, installed and uninstalled, to work with your location data.

Which data model example is better really depends on how likely you are to add statusable things, and how many. If you’re often add things to your data model or if you have more than 10 statusable things, I’d use the status history. If your list of statusable things is small and unlikely to change, use the single status table. Both patterns can grow to include allowable statuses, status types, etc. Anything to make your data mining easier.

Why delete?

What we’re breaking up is definitional data vs operational data. Things that define our instrument go into the instrument table. Things that define it’s operational state should be tracked in a state table. Every record in your operational table should define a single fact. In the example earlier, we are defining the state of the instrument for a single datetime. Repeats in the data will not change the current status of our instrument. The most recent status is the status of the instrument.

Of course we could do this without keeping the history of the state. We’re really just normalizing the data. Why would we want to keep the history? To predict the future, and any future questions you don’t know you need to answer.

Whenever you update data, you’re really deleting data and then doing a new insert. When you delete data, you lose information, including information you didn’t know you needed. If someone comes to you in the future and says “hey the 90210 instrument type seems sketchy, how often are they going into an error state”, you can’t answer the question unless you’re keeping the history of statuses.

If you only append new data, and never update or delete, this allows you to answer questions you didn’t know you needed to. You could tell your boss, the 90210 instruments seem to fail more than the 80205s. In fact, you could even make a future prediction when 50% will have failed and need to be replaced. That helps you predict required budgets, time needed, and so on. It’s one of the ideas behind big data.

When I work on database modeling, if I come across a point where I need an update – that’s a red flag for me. This doesn’t mean updates are completely evil, it means I need to think about the problem more. Data should not be deleted until triggered by your data retention policy. Keep everything until you’re certain you don’t need it. Even then it’s part of a data deletion process, not your day-to-day business as usual activities when you do delete data.

Hopefully this has given you a few ideas on how to track status in your data. Make sure to subscribe so you can keep up to date with new videos. See you in the next video!



Tools Used

none



Media Credits

All media created and owned by DJ Spiess unless listed below.

Background image from Pixabay.com

Final Notes

If you are interested in a free online programming course for Java, it can be found on the course syllabus page.

Don’t miss another video!

New videos come out every week. Make sure you subscribe!

Comments

comments

DJ Spiess

Your personal instructor

My name is DJ Spiess and I’m a developer with a Masters degree in Computer Science working in Colorado, USA. I primarily work with Java server applications. I started programming as a kid in the 1980s, and I’ve programmed professionally since 1996. My main focus are REST APIs, large-scale data, and mobile development. The last six years I’ve worked on large National Science Foundation projects. You can read more about my development experience on my LinkedIn account.