Database normalization is important

Is database normalization dead? [closed]

Closed . This question is based on opinion. No responses are currently accepted.

Would you like to improve this question? Update the question so it can be answered with facts and quotes by editing this post.

Closed 6 years ago.

I grew up in the old school - where we learned to design the database schema BEFORE the application's business layer (or use OOAD for everything else). I was pretty good at designing schemas (IMHO :) and only normalized to remove unnecessary redundancy but not when it was slowing down. But most of the time it wasn't like that.

With the advent of some ORM frameworks like Ruby's ActiveRecord or ActiveJDBC (and a few others that I can't remember but I'm sure there are many) it seems they prefer to have a surrogate key for every table even if have some primary keys like "email" - break 2NF immediately. Okay, I don't understand too much, but it (almost) gets on my nerves when some of these ORMs (or programmers) aren't 1-1 or 1-0 | Confirm 1 (ie 1 to 0 or 1). They dictate that it's just better to have anything than a big table, no matter if it's a ton "today's systems can handle it" is the comment I have heard many times.

I agree that memory constraints were directly related to normalization (there are other benefits too :) but the concept of DB normalization is in the today Leave your time with cheap memory and quad-core computers to the texts? As DBAs, are you still practicing normalization to 3NF (if not BCNF :)? Is that important? Is "dirty schema" design good for production systems? How should one justify normalization if it is still relevant?

( Note: I'm not talking about the star / snowflake schemes of data warehouse that have redundancy as part / need of the design, but of commercial systems with a backend database such as B. StackExchange.)


One reason for normalization is to remove data change anomalies caused by the
ORMs are usually not supported.

I have many examples of databases designed by Hibernate that violate this principle:

  • bloated (string repeated over 100 million lines)
  • no look-up tables (see above)
  • no DRI (constraints, keys)
  • varchar clustered indexes
  • Unnecessary link tables (e.g. enforcement of 1..0: 1, if a nullable FK column would be sufficient)

The worst thing I've seen is a 1TB MySQL database that was maybe 75-80% too big

I would also suggest that the statement "today's systems can handle it" applies to most Mickey Mouse systems. If you scale, today's systems won't.

In my example above, there was no way to refactor or change keys or repair data: just complain about the database's growth rates and the inability to build a meaningful DW on it

It seems they prefer to have a surrogate key for every table, even if some have primary keys like "email" - which completely breaks 2NF.

Replacement keys won't break 2NF. 2NF says, "If a column depends on only part of a multi-valued key, remove that column into a separate table."

They dictate that, regardless of whether it has a ton of zeros, it's just better to have anything than a big table

Multiple columns in a table are valid as long as normalization rules are followed. Merging tables without parsing is not correct if you want to take advantage of SQL and normalization.

I agree that storage limits were directly related to normalization. Relation Normal Forms is a mathematical concept and has nothing to do with memory.

Normalization not only saves memory or hard drive, it also increases integrity. After all, it is a hardware-independent mathematical concept.

Simple example: suppose you maintain school information as follows:

Rec 1: North Ridge High School, California, USA

Rec 2: South Toronto Braves High School, Ontario, Canada

If you ask your system where is Ontario, you can tell it is in Canada. A few days later you delete the 2nd line and ask the system the same question and you get nothing. In this example, no matter how much memory or memory or CPU, you will not get a response.

This is an anomaly that prevents normalizing relationships.

Edit: Changed the word "Toronto" to "Ontario" (see comment below).

The more things change, the more they stay the same. There have always been lazy developers who compromised or just didn't know or wanted to follow best practices. Most of the time they can do it with smaller applications.

It used to get COBOL-inspired data structures into early RDBMS or the godly mess that was dBase. Now it's ORMs and "Code-First". Ultimately, these are all just opportunities for people trying to find the kingpin of a working system without wasting time thinking about what you want and need to do. Being in a rush has always been and always will be a problem.

For those with the common sense (and lucky enough) to take the time to design the right thing, the data model is always the most logical place to start. The database stores information about the things (tangible and intangible) that are important to your company. What Your company interests, changes much less quickly than that, What Your business operates. Because of this, your database is generally much more stable than your code.

The database is the rightful foundation of any system, and if you take the time to lay your foundations right, you will inevitably benefit in the long run. This means that normalization is always an important and useful step for any OLTP application.

I agree that storage limits were directly related to normalization ...

Storage limits still play a role. Quantity is not a problem, speed is.

  • CPUs are not getting faster at the moment (we are getting more cores, no cycles per second)
  • Modern CPU architectures try to overcome the speed limit by providing separate memory for each processor (NUMA).
  • The cache sizes on the die do not grow comparable to the main memory.
  • Storage throughput isn't as high as most people expect. QPI is in the range of 25 GB / s.

Some of these reasons were discussed in When should TINYINT be used over INT? what you might find useful. I would also suggest following the antics of @ThomasKejser (blog) from the SQLCAT team as these tend to improve database performance. The recent post on the effects of CPU caches and memory access patterns and the SQLBits presentation on relational modeling for extreme DW scaling are good examples.

In my opinion, it's still all about the balance between Normalize and de-normalize . I totally agree that ORM frameworks are just approaches to get things done, but I don't think these frameworks are the trend towards De-normalization trigger .

It is still the debate where you want time efficiency or space efficiency. By the time Relational Database Theory is brought up, disk storage is expensive, people obviously don't want to spend that much money on it, so relational databases are the ones stuck in the midst of adversity at this point

Nowadays things are very different, storage is very, very cheap. So it is clear that we can tolerate more redundancy compared to before. This is also the reason why the BIG_TABLE approach appeared. In order to achieve more time efficiency, space efficiency must be sacrificed.

But the big table approach is not the end of the story either, it is still the balance between time and space, in terms of the PB volume data to be managed, some developers also began to look for the balance back to space efficiency, that's why there is There is work being done to normalize some data in BIG-TABLE-like structures.

In a word, the normalization approach is not definitely dead, but it is definitely overlooked compared to the old days.

CJ Date answers your question here - the normalization video (beforehand) is free.

The short answer: normalization is the mathematically correct approach. If you don't normalize properly, your data model is just wrong.

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.