By Donavon Gooldy
Normalization is NOT an engineering exercise. The central principle that underlies its rules is that data redundancy, whether repeating groups of attributes or repeating attribute values, is evidence that more than one function’s descriptive pattern exists within a data set.
The result of normalization is a model in which each attribute, each entity, and each relationship speaks business truth about the things that data is evidence of.
In Logical Entity Relationship (ER) Modeling, as Peter Chen defined it, Normalization isn’t the elimination of data redundancy in a database. There is no database yet. The goal is to semantically identify and describe the exact things that data describe, and those things’s interactive context. If that’s done, there will be no data solution redundancy.
Normalization according to the semantics of ER and Knowledge Modeling begins by recognizing that the abstract realm of modelled business subjects is that of human business action. The semantic context of these subjects is as things of doing, whether performers of action, receivers of action, controllers of action, the action itself, places at which action occurs, or things exchanged or utilized in action.
Attributes: The Evidence of Action
The things of action expressed in modeling by entities should be recognized as functions, defined by the actions they perform. In a model, these actions are graphically illustrated by an entity’s relationships. It’s the actions expressed by an entity’s relationships that are the basis of its taxonomy. What it does defines what it is.
The attributes or properties of an entity are evidence of the action performed by its function. Attributes reflect when the action happened, what the action’s status is, how much of something was involved in the action, what the controls of the action are, what the description of the actor who acted, etc.
When data pattern redundancy exists, whether repeating group attributes, or repeating attribute values, the redundancy reflects the fact that a function described by those attributes is repeated within the dataset.
Normalization doesn’t just impact the evidence of business action; it reveals the function (entity) that the evidence describes, as well as actions performed (relationships) that the data is evidence of. This is why normalization isn’t “data” normalization; it is function normalization.
Normalization & Action Semantics Reveal Function

Consider the simple First Normal Form example of Figure 1. Eliminating the repeating groups of skill-related columns in Employee on the left, we create an Employee Skills entity on the right. Normalization identifies a new entity, but it’s the predicate semantics of the newly created relationship that reveal the
In our example, the action of Employee Skill is how it supports the parent function of Employee.
The knowledge representation of the semantics in Figure 1, is expressed in Figure 2 below, unadulterated by constructs demanded by RDMBS design.

A significant knowledge miss in most knowledge models is reciprocating fact statements of actions performed in response to the action of the other relationship. In our simple example, because an Employee practices for fulfillment of their duties an Employee Skill, that Employee’s Skill gives value to the labor of the Employee.
Expressing relationship predicates according to an action performed by the entity’s function, is key to developing the knowledge necessary to write quality entity definitions. The semantics of a function’s actions, represented by an entity’s relationships, are the true basis of its definition. For instance: An Employee Skill is a specialized proficiency that an Employee is qualified to practice in fulfillment of their duties, which gives value to their labor. An entity’s action defines its function, which means, what it does defines what it is.
For those who argue against complex predicate phrasing in favor of single verbs, consider the information about the unmodeled functions Duty and Labor represented in Figure 2’s predicate phrases. Think of it as “conceptualization” that may indicate a need for expanded knowledge scope.
Refined Understanding Through Action Semantics

Often, a relationship’s action semantics provides necessary insight to correctly name an entity according to its actual function, rather than initial perceptions of it.
Transitive data dependency often obscures the understanding of business functions. It exists when one or more attributes rely on another set of attributes within the same data set.
In Figure 3, attributes with repeating values centering around the Policy Transaction Effective Date attribute, are highlighted by the upper callout.
They do so because premium charges for one or
more Coverages, which are highlighted by the lower callout, are transactionally grouped by policy change transactions, such as a new Policy Term, a Policy Endorsement (contract amendment), or a Policy Cancelation.

From a business knowledge standpoint, the semantics of this denormalization doesn’t just hide an entity. It eliminates our ability to semantically express important business knowledge about how Premium is charged during a Policy Term.
In Figure 4, the newly created entity is named Policy Period, rather than Policy Term Transaction or Policy Transaction, because the semantics of its true function indicate it to be a thing of time duration, rather than a pure temporal event.
While the Policy Period is created by a premium bearing transaction, premiums charged are prorated for the days in the period between the transaction date and the end of the Policy Term.
Further, reversals of unearned premium are based on the proration period between the time of a coverage cancellation, or coverage risk change,
and the end of the Policy Term. For this reason, the new entity has been named Policy Period, since its real definition is: A Policy Period prorates Policy Coverage Premium according to a specific definition of Coverage risk, based on the ratio of a Policy Period’s duration divided by the duration of its Policy Term.

Figure 5 illustrates the same semantics in knowledge model form. It should be noted that the direction of knowledge graph relationship notation is quite different than that of an ER model. The direction of the relationship in an ER model is based on functional dependence.
In a Knowledge Model, however, the notation direction indicates the direction of the action.
Any knowledge of functional dependence must be conveyed in the relationship predicate. Single verb phrases fail to provide this.
For instance, in our example, “Policy Period prorates premium according to its risk duration, relative to the duration of a Policy Term”. The implication in the predicate is that the function of the Policy Period is dependent on the Policy Term, which is a commitment of protection for a set duration of time.
Conclusion
Because most things that are subjects of ER and Knowledge Modeling are things of business action, it is the action they perform that defines them. And the semantics of their actions is the visualization tool used to eliminate obfuscation, generalization, incongruity, and gaps in their knowledge of them.
Normalization is key to establishing the correct taxonomy of business functions.
Most consider Normalization to be the normalization of data, but only because data isn’t recognized as evidence of action. Giving normalized data the context of business action semantics extends and perfects business knowledge. So, Normalization is truly the normalization of business knowledge.

Business Semantics and Enterprise Modeling: A Profile of Donavon Gooldy
Don is a Senior Manager Consultant in Accenture’s Thought Leadership and Expert program.
He advocates that human action controlled by business process is the true context of business data. And that logical modeling’s purpose as originally envisioned by, is to set data according to the functional structure and definition of semantics describing this action.
Over the past 18 years, Don has modeled the enterprise business architectures of Cellular Telecom, Wealth Management, Retail, Healthcare Insurance, P&C Insurance, and Consumer Goods Manufacturing industries clients, as well as that of Human Resources and Campaign Management.
He is Principal Modeler/Business Architect and Product Manager for Accenture’s AUDM P&C Insurance Business Model, AUDM Healthcare Payer Business Model and AUDM Human Resource Model products.
He is author of numerous articles relevant to business semantic modeling discipline, which can be found at Donavon Gooldy Articles | LinkedIn.
Don lives in rural southwest Michigan with his wife Ginny, on twenty acres he’s known since childhood.
Join Our Data Community
At Data Principles, we believe in making data powerful and accessible. Get monthly insights, practical advice, and company updates delivered straight to your inbox. Subscribe and be part of the journey!
