Database Normalization is the process of reducing or eliminating data redundancy in a database table as well as reducing the likelihood of data insertion anomalies. There are many articles online that talk about, in theory, how to normalize database tables…This post is intended to serve as a practical application demonstration.
First Thing’s First…
Before we really get started, you need to understand what we are going to be accomplishing here. We will be taking a table of data and putting it into Third Normal Form (3NF). However, before we can get to that 3NF mark, we need to go through first and second normal forms (1NF, 2NF respectively). So what are these forms?
1NF –1NF means your table exist with no duplicative columns (e.g. the table is atomic).
2NF – First, you must have met 1NF, then we go about identifying and separating subsets of data that can be reused (we’ll see more of this later).
3NF – First, 1NF and 2NF must have been met, then you’ll remove all data that is not directly related to the primary key.
The Sample Data
|
Last Name |
First Name |
Address |
City |
State |
Zip |
Credits Attempted |
Points Earned |
GPA |
|
Smith |
Joe |
123 Main St |
Orlando |
FL |
32810 |
36 |
126 |
3.5 |
1NF
This is the easy one! You’ll notice from this table that there are no duplicative data columns.
What’s a duplicative (non-atomic) column? Well, suppose we put in the table values “4.0, 3.8, 3.5” because we were tracking Joe’s GPA over time. The problem here is that adding a GPA to the table requires text parsing to add a value. This may not be much for a small data set, but what happens when a semester ends and 5000 students update 5000 GPAs based on 1-50+ GPA’s. Bad things…that’s what…
OR, let’s suppose that we decide to get around that issue, and we create a fixed number of rows for the number of semesters a student SHOULD be in class. What happens if a student exceeds that threshold by 1 semester? The entire database structure would need to be updated in order to fix the problem.
Anyways, this sample is in 1NF form because the table is atomic (has no duplicative columns). Each row describes a single person, event, object, whatever.
One last thing…We need to create a primary key to keep track of unique data rows, so we’re going to modify the table like so:
|
Student ID (P) |
Last Name |
First Name |
Address |
City |
State |
Zip |
Credits Attempted |
Points Earned |
GPA |
|
1234 |
Smith |
Joe |
123 Main St |
Orlando |
FL |
32810 |
36 |
126 |
3.5 |
2NF
In 2NF, we start looking for subsets of data that we can pull out. In this example, there is one pretty obvious set of data that could be removed: the City, State, Zip. While this seems small, think about the context of this table. If 3000 of the 5000 students are in Orlando, FL 32810…that’s a ton of duplicated data and wasted space. So, lets fix the problem by removing that subset.
Student Table
|
Student ID (P) |
Last Name |
First Name |
Address |
Zip |
Credits Attempted |
Points Earned |
GPA |
|
1234 |
Smith |
Joe |
123 Main St |
32810 |
36 |
126 |
3.5 |
Zipcode Table
|
Zip |
City |
State |
|
32810 |
Orlando |
FL |
In this case, we created a second table called “Zipcode Table”. The Zip field acts as a foreign key to relate the Student record to a Zip code.
This is a very simple example, but it’s not uncommon to find several smaller subsets inside a data table. Pull them all out, and relate them to the primary key with a foreign key, just like we did here.
3NF
We are now at the 3NF step! What we want to do here is ensure that all data in the table is fully dependant on the primary key. In the student table, we have a Primary Key (Student ID). The Student ID relates to a specific student, with a specific address, zip, credit count, and point count…right? Right!
But what about the GPA column? The GPA column is simply a calculation of (Points Earned)/(Credits Attempted). It’s not directly related to the key, it’s related to other values in the row. So, let’s get rid of it!
Student Table
|
Student ID (P) |
Last Name |
First Name |
Address |
Zip |
Credits Attempted |
Points Earned |
|
1234 |
Smith |
Joe |
123 Main St |
32810 |
36 |
126 |
Zipcode Table
|
Zip |
City |
State |
|
32810 |
Orlando |
FL |
BUT WAIT! How do we get our GPA, now!? EASY! When we do our SQL select query, we’ll simply do the following:
Select Student ID, LastName, FirstName, Address, Zip, CreditsAttempted, PointsEarned, PointsEarned / CreditsAttempted AS GPA FROM Students table;
(it should also be noted that in a real situation, all the grade information would be in another separate table(s), related by the StudentID)
Conclusion
So what did we accomplish here, other than ensuring that we can say “Yep! It’s Normalized!” We designed a database that saves on storage space, is broken down to ensure that data anomalies are less likely (e.g. you can’t have two cities for the same zip…we designed that!), and we removed data that didn’t need to be stored in the first place!
If you take some time to analyze all your database requirements like this, the world will be a better place… Well, maybe not…
[...] blog has a very good post about database normalization. Read: practical application of database normalization that will help you understand database normalization in [...]
Buy:Prevacid.Nexium.Petcam (Metacam) Oral Suspension.Mega Hoodia.Actos.Arimidex.Valtrex.Accutane.Zyban.100% Pure Okinawan Coral Calcium.Retin-A.Prednisolone.Synthroid.Zovirax.Human Growth Hormone.Lumigan….