By Pete Stiglich
In a dimensional model, how do you handle missing or invalid values in a fact table source record that are used to look into a dimension to find the dim key? Do you just leave that dimension foreign key value null in the fact table? Are there impacts to doing so? I say a definite “Yes”!
One of the main impacts is that you now have to do outer joins whenever joining the fact and dimension. Forcing outer joins significantly complicates joins – the main reason for a dimensional model is to enable simpler queries for business analytics. Forcing outer joins also increases risks – if a developer/user forgets to include the outer join syntax in the SQL, this can lead to an incorrect result set. Of course, as the dimensional model is what most analytics are based on – if queries don’t bring back the correct result set, i.e., number of rows, this can have a tremendous impact on users being able to trust the data/analytics.
Also, when there is a null dim key in the fact foreign key, the users won’t know if its null because the dimension doesn’t apply to that particular row (for example if a return dim key isn’t populated on an
order fact because the order hasn’t been returned), whether no value was received from the source (but a value expected e.g., on the order fact) there should be an order date which would tie to the date dimension, or if a value was received but it was an invalid value (e.g., the order date in the source record had an invalid value of “ABC”). Of course, if the user wants to understand what happened with that fact record, they’d have to reach out to someone in IT to find out…
The alternative I propose is to add placeholder records in the dimensions so that there would NEVER be a null dim key in a fact table, and so no outer joins would be required. The placeholder records would look something like the following. I use 0, -1, and -2 as the dim key for these placeholder records as they more easily indicate a special circumstance and there would need to probably be some special processing to insert these as these wouldn’t come from the source normally – i.e., would probably have 3 insert statements that would be ran manually to insert these (turning off identity insert (SQL Server) or similar so that the dim key isn’t auto generated for these placeholders).

Now, when users submit a query that needs to return a value(s) from the dim – e.g., they want to get a count of returns by RETURN_STATUS_NM – they would be able to quickly see how many returns have a value of Unknown or Invalid Value. They could then investigate in the source system as to why there are returns without a return id or returns with an invalid value, so they could correct them. These placeholders can also be used for data quality reporting in order to be proactive, or could even be used as a constraint in the fact table. For example, if the business determines that there should NEVER be a fact record loaded where the RETURN_KEY is 0 or -2, then a database constraint can be placed on RETURN_KEY in the fact table, which would cause a constraint violation and halt the load when a 0 or -2 is attempted to be inserted into the fact table.
Using dimension placeholder records is a valuable technique for simplifying dimensional queries and for providing more insight into why a fact record doesn’t tie to a “real” dimension record. It can also provide more insight in reporting, e.g., can have a value like “Not Applicable” rather than a blank. It also enables better data quality reporting and can be used as a constraint to prevent invalid records from being loaded into the fact table.

Pete Stiglich: Trusted Expert in Data Architecture & Modeling
Pete Stiglich has over 30 years of data architecture, data management, and analytics experience, most of that time as a consultant in industries such as government, finance, healthcare, insurance, and more. He is an industry thought leader in data architecture and data modeling and has developed and taught many courses on these topics.
Pete enjoys helping clients solve complex data problems leveraging proven approaches such as “Modeling the business before modeling the solution” which provides a benefit to clients that many IT professionals miss.
Join Our Data Community
At Data Principles, we believe in making data powerful and accessible. Get monthly insights, practical advice, and company updates delivered straight to your inbox. Subscribe and be part of the journey!
