Normalization is a very important process of designing tables to improve data integrity and reduce data redundancy. We were introduced to the concept of Normalization in our previous tutorials about First Normal Form and Second Normal Form . If you missed those tutorials, you should definitely read up on them first!
In this tutorial, we’re going to learn about the third and final normal form: Third Normal Form. We’ll be introduced to a new set of rules you should follow when designing tables in your database.
These are the key points about Third Normal Form we’ll discuss:
Everything discussed in this tutorial can also be found in the following FREE Ebook :
This FREE Ebook provides an excellent summary of all the rules you need to know when normalizing your tables to First, Second, and Third Normal Form. It will definitely be a great resource to keep and reference throughout your career as a data professional. Make sure you download the guide today!
Let’s start from the top:
The first part to Third Normal Form is easy. We need to first make sure our table is already in Second Normal Form.
I have a full tutorial on Second Normal Form found here:
Basically, Second Normal Form states these two rules:
In that tutorial, we learned about what a dependency is in the first place. You need to understand what a dependency is in order to understand Third Normal Form, too.
So definitely check out the full tutorial on Second Normal Form to learn more. Assuming you got it, we’ll move on to the next (and most important) rule about Third Normal Form.
In order for a table to meet Third Normal Form, it must not contain any transitive dependencies.
Again, we need to understand what a dependency is in the first place. Basically, a dependency means if one column value changes, you must also update another column value for everything to make sense.
In the case of a transitive dependency, it means one non-key column depends on another non-key column.
The easiest way to show you what I mean is to work through an example.
Let’s say we own a veterinarian hospital. We want to create a table to keep track of important information we have for each pet we treat, and also some basic information about their owners. Let’s start with the owners table. We’ll call it PetOwners. Here is the definition:
CREATE TABLE PetOwners ( OwnerID INT PRIMARY KEY IDENTITY(10,10), FirstName VARCHAR(20), LastName VARCHAR(20) )
So we have a PRIMARY KEY column called OwnerID which gives each person a unique ID in our table. We also use the handy IDENTITY property to populate the column automatically on inserts.
Let’s populate the table with a few rows:
INSERT INTO PetOwners (FirstName, LastName) VALUES ('Doug','Bluefield'), ('Todd','Snickers'), ('Margaret','Driver')
Let’s take a look at the table:
Cool. Now let’s create a table to store information we have on the pets we treat. We’ll call it PetInfo:
CREATE TABLE PetInfo ( PetID INT PRIMARY KEY IDENTITY, PetName VARCHAR(20), OwnerID INT, OwnerPhoneNumber CHAR(12) )
Nice, so we have another PRIMARY KEY on the PetID column. It uses the handy IDENTITY property also.
Also notice the OwnerID column. This will contain a reference to the OwnerID in our PetOwners table. We use this column to link back to the PetOwners table to find out who owns the pet.
Let’s populate the table with some information:
INSERT INTO PetInfo (PetName, OwnerID, OwnerPhoneNumber) VALUES ('Barnabus',10,'988-445-0989'), ('Billy',20,'988-322-7334'), ('Rosie',30,'850-341-9234')
Here’s what the data should look like:
We need to talk about the columns in the PetInfo table. Remember how we said a transitive dependency means one non-key column depends on another non-key column. So to understand that, let’s first identity what columns are non-key columns.
That’s going to be all these columns:
Basically it’s all the columns except the primary key column, which in our case is the PetID column.
So in our case, we need to think about if there are any transitive dependencies between PetName, OwnerID, and OwnerPhoneNumber. Does the value of one depend on the value of another?
Well, let’s think about it. What if we messed up, and meant to specify the owner of Billy is actually “Doug Bluefield” (who has an OwnerID of 10).
Right now, the data says the owner of Billy is OwnerID # 20, who is Todd Snickers.
It’s easy enough to change the OwnerID:
UPDATE PetInfo SET OwnerID = 10 WHERE PetID = 2
But think about the OwnerPhoneNumber column…..
We likely need to change this value, too. Our new owner “Doug Bluefield” has a different phone number than the person who was there earlier.
In order for everything to be consistent, the OwnerPhoneNumber also needs to change. The phone number for this record ought to be changed to the correct phone number for Doug Bluefield .
So, we need to do another quick update. Doug’s phone number is ‘ 998-445-0989 ‘:
Great, now the data is accurate….
But see how the phone number depends on the owner? If the owner changes, the phone number will likely need to change, too.
This is why we need to apply the rules of Third Normal Form. It will save us from experiencing update anomalies like what we just saw. An update anomaly is when updating one piece of data causes an inconsistency with another piece of data, usually resulting in another UPDATE statement needing to be ran.
Ok, now that you know what the problem is, let’s talk about how to fix it. When it comes to normalization, you should get comfortable with the idea that you’ll need to move columns around, or maybe create a separate table completely. In this case, what we should do is move some columns around.
A table should really only contain details about one thing. Our PetInfo presently table contains information about two things, right? It has details about the Pet, such as it’s name and it’s mom or dad. But it also contains a detail that is solely about the mom or dad, namely the OwnerPhoneNumber column.
The PetInfo table should be dedicated to storing details about a Pet, while the PetOwners table should be dedicated to storing details about an Owner. The OwnerPhoneNumber is really an attribute of an Owner, so we should really put this column in the PetOwners table.
So, let’s add a new column to the PetOwners table called PhoneNumber:
ALTER TABLE PetOwners ADD PhoneNumber CHAR(12)
Then we’ll populate the new column with the correct data:
UPDATE PetOwners SET PhoneNumber = '988-445-0989' WHERE OwnerID = 10 UPDATE PetOwners SET PhoneNumber = '988-322-7334' WHERE OwnerID = 20 UPDATE PetOwners SET PhoneNumber = '850-341-9234' WHERE OwnerID = 30
Here’s the data in the table:
Then, we can make a change to the PetInfo table. Basically we’ll remove the OwnerPhoneNumber column:
--Remove the OwnerPhoneNumber column from PetInfo ALTER TABLE PetInfo DROP COLUMN OwnerPhoneNumber
If we look at the PetInfo table now, we see the OwnerPhoneNumber column is gone:
This will make things much easier. If we need to update details about an owner, we can do it without effecting details about the pet!
And it’s still perfectly possible to get all the information we have about a pet and it’s owner using a single query. It’s just a very simple INNER JOIN :
Now that we have discussed the third and final normal form, there are two things about normalization I’d like you to know.
I know I just said Third Normal Form is the “final” normal form, but it actually isn’t.
There is Fourth Normal Form and then a Fifth Normal Form. There is even a normal form called Boyce-Codd Normal Form.
Here’s a link that discusses Fourth and Fifth Normal Forms , if you’re curious.
Most of the time, your tables will function just fine if they’re normalized up to Third Normal Form (or even just up to Second Normal Form). The thing about normal forms is that you won’t be breaking any laws if your tables aren’t 100% perfectly normalized. You should normalize your tables to the point where it makes sense for your application. Again, this usually means your tables will be normalized up to Third Normal Form.
Which leads us to our next point.
Let’s think about the PetOwners table:
In looking at this table, while thinking about transitive dependencies, maybe you have one of these thoughts:
Folks, don’t drag yourself in the weeds with these thoughts. I mean, sure, maybe Phone Numbers ought to be in their own separate table, and maybe people’s Names should also be in their own separate table, too. That would certainly normalize things even further.
But folks, you might make things more confusing than they need to be.
Right now, the PetOwners table contains details about only one thing : Pet owners.
A great goal you should try to achieve when normalizing tables is to make sure your tables contain attributes about only one thing.
You should normalize to the point that you and your boss are satisfied. If we have an update anomaly here or there, maybe that’s ok as long as we all agree. Again, you won’t get arrested if your tables aren’t 100% perfectly normalized!
….I don’t think, anyway….
Leave a comment if you found this tutorial helpful!
And don’t forget to download your FREE Ebook :
This FREE Ebook provides an excellent summary of all the rules you need to know when normalizing your tables to First, Second, and Third Normal Form. It will definitely be a great resource to keep and reference throughout your career as a data professional. Make sure you download the guide today!
If you skipped my other tutorials about First and Second Normal Form, shame on you.
You need to understand how those work too in order to properly normalize your tables. As we know, normalization is very important for making sure your tables preserve data integrity and reduce data redundancy.
Check out the tutorials to learn more!:
Thank you very much for reading!
If you have any questions, or if you are struggling with a different topic related to SQL Server, I’d be happy to discuss it. Leave a comment or visit my contact page and send me an email!