{"id":917,"date":"2025-12-22T22:33:28","date_gmt":"2025-12-23T03:33:28","guid":{"rendered":"https:\/\/blog.data-principles.com\/?p=917"},"modified":"2026-01-06T18:20:10","modified_gmt":"2026-01-06T23:20:10","slug":"know-your-keys","status":"publish","type":"post","link":"https:\/\/blog.data-principles.com\/index.php\/2025\/12\/22\/know-your-keys\/","title":{"rendered":"Know your keys!"},"content":{"rendered":"\n<p class=\"has-orange-color has-text-color has-link-color wp-elements-e358becb889ef645eca72e62b736ffc4\"><em>By Pete Stiglich<\/em><\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-52e477bc34a22d7f5bee9b46a896210b\">One question I ask of data architects and data engineers that I\u2019m interviewing is \u201cWhat is the most important thing to know about your data?\u201d&nbsp;It\u2019s a bit of a trick question, it is valid.&nbsp;I frame the question a bit to give some context.&nbsp;For example, I might say \u201cSuppose you have a spreadsheet called \u201cCustomerSales.xlsx\u201d with one tab and 300 columns (with column headings) and you\u2019ve been tasked to understand how to integrate that with other data in your data warehouse.&nbsp;What is the most important thing you need to know about that data?\u201d<\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-af2fa389189d3e2ca923b02689667be2\">The answer I\u2019m looking for is \u201cThe most important thing I need to know about that is what is\/are the key(s) of that data\u201d.&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-e821805a1df5a16941aa39200c990d92\">If you don\u2019t know the keys (primary <strong><em>AND <\/em><\/strong>natural\/alternate \/ business), then you really don\u2019t know much about the data \u2013 you\u2019re left with making potentially catastrophic assumptions about the meaning and grain\/level of detail.\u00a0For example, if you assume the grain of CustomerSales.xlsx you would probably assume the grain is sales by customer \u2013 but that\u2019s just an assumption.\u00a0Not knowing what the key(s) are means that you can make some poor design decisions (which will need to get fixed eventually \u2013 of course, it\u2019s much more expensive to fix once you\u2019re application and data are in production\u2026.)\u00a0and let poor data into your system (e.g., duplicates).\u00a0If you don\u2019t know what the key(s) are \u2013 how can you know whether you have a unique record or not?\u00a0\u00a0<\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-a477b4c212620d78cb65d848d563eadd\">Even if you know, for example, that the key of CustomerSales.xlsx is SaleId (which for the sake of argument we\u2019ll assume is a surrogate key) you still don\u2019t know what the business key is.&nbsp;A surrogate key is a unique sequential number with no inherent business meaning in itself.&nbsp;All it tells you is that you have a unique number representing a record \u2013 but that same record could still be duplicated (just with a different SaleId).&nbsp;You still need to know the natural \/ business key in order to understand the meaning and grain of the data.&nbsp;For example, is the natural key CustomerId, SaleDate, LocationId, SaleNumber or is it CustomerId, SaleDate, LocationId, SaleNumber, and ProductId \u2013 i.e., is it a header or a line item, or are there other perhaps non-obvious columns that should be part of the natural \/ business key e.g., VersionStartDt (in which case it represents a version of a sale record).&nbsp;&nbsp;You should, whenever possible, enforce unique indexes on the natural \/business key.&nbsp;Of course, for some large cloud databases or NoSQL platforms unique indexes on an alternate key might not be possible or the data volumes might require unique indexes be dropped.&nbsp;If that is the case, you should develop programs to at least occasionally test your data to ensure you don\u2019t have dups.&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-853064ae65889722a63845edc51aad2c\">Nearly every data set should have at least one unique key \u2013 but I\u2019ve seen cases where having what looks like a duplicate record is actually acceptable e.g., the same customer executes the same transaction multiple times for the exact same dollar amount.&nbsp;This is poor design but might be outside of your control \u2013 there should at least be a unique surrogate key to differentiate the rows.<\/p>\n\n\n\n<div style=\"height:100px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text is-stacked-on-mobile\" style=\"grid-template-columns:33% auto\"><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"744\" height=\"746\" src=\"https:\/\/blog.data-principles.com\/wp-content\/uploads\/2025\/12\/Screenshot-2025-06-02-at-3.39.28-PM.png\" alt=\"\" class=\"wp-image-886 size-full\" srcset=\"https:\/\/blog.data-principles.com\/wp-content\/uploads\/2025\/12\/Screenshot-2025-06-02-at-3.39.28-PM.png 744w, https:\/\/blog.data-principles.com\/wp-content\/uploads\/2025\/12\/Screenshot-2025-06-02-at-3.39.28-PM-300x300.png 300w, https:\/\/blog.data-principles.com\/wp-content\/uploads\/2025\/12\/Screenshot-2025-06-02-at-3.39.28-PM-150x150.png 150w\" sizes=\"auto, (max-width: 744px) 100vw, 744px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p class=\"has-orange-color has-text-color has-link-color wp-elements-70de9c359e2a8475a8854d53e5de18d8\"><strong>Pete Stiglich: Trusted Expert in Data Architecture &amp; Modeling<\/strong><\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-2038d5acb0e29419a92597632a3925da\">Pete has over 30 years of data architecture, data management, and analytics experience, most of that time as a consultant in industries such as government, finance, healthcare, insurance, and more.&nbsp;He is an industry thought leader in data architecture and data modeling and has developed and taught many courses on these topics. Pete enjoys helping clients solve complex data problems leveraging proven approaches such as \u201cModeling the business before modeling the solution\u201d which provides a benefit to clients that many IT professionals miss.<br><\/p>\n<\/div><\/div>\n\n\n\n<ul class=\"wp-block-social-links is-layout-flex wp-block-social-links-is-layout-flex\"><li class=\"wp-social-link wp-social-link-linkedin  wp-block-social-link\"><a href=\"https:\/\/www.linkedin.com\/in\/petestiglich\/\" class=\"wp-block-social-link-anchor\"><svg width=\"24\" height=\"24\" viewBox=\"0 0 24 24\" version=\"1.1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M19.7,3H4.3C3.582,3,3,3.582,3,4.3v15.4C3,20.418,3.582,21,4.3,21h15.4c0.718,0,1.3-0.582,1.3-1.3V4.3 C21,3.582,20.418,3,19.7,3z M8.339,18.338H5.667v-8.59h2.672V18.338z M7.004,8.574c-0.857,0-1.549-0.694-1.549-1.548 c0-0.855,0.691-1.548,1.549-1.548c0.854,0,1.547,0.694,1.547,1.548C8.551,7.881,7.858,8.574,7.004,8.574z M18.339,18.338h-2.669 v-4.177c0-0.996-0.017-2.278-1.387-2.278c-1.389,0-1.601,1.086-1.601,2.206v4.249h-2.667v-8.59h2.559v1.174h0.037 c0.356-0.675,1.227-1.387,2.526-1.387c2.703,0,3.203,1.779,3.203,4.092V18.338z\"><\/path><\/svg><span class=\"wp-block-social-link-label screen-reader-text\">LinkedIn<\/span><\/a><\/li>\n\n<li class=\"wp-social-link wp-social-link-mail  wp-block-social-link\"><a href=\"mailto:&#112;&#115;&#116;&#105;g&#108;i&#099;h&#064;da&#116;a&#045;p&#114;i&#110;c&#105;p&#108;&#101;s&#046;co&#109;\" class=\"wp-block-social-link-anchor\"><svg width=\"24\" height=\"24\" viewBox=\"0 0 24 24\" version=\"1.1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M19,5H5c-1.1,0-2,.9-2,2v10c0,1.1.9,2,2,2h14c1.1,0,2-.9,2-2V7c0-1.1-.9-2-2-2zm.5,12c0,.3-.2.5-.5.5H5c-.3,0-.5-.2-.5-.5V9.8l7.5,5.6,7.5-5.6V17zm0-9.1L12,13.6,4.5,7.9V7c0-.3.2-.5.5-.5h14c.3,0,.5.2.5.5v.9z\"><\/path><\/svg><span class=\"wp-block-social-link-label screen-reader-text\">Mail<\/span><\/a><\/li><\/ul>\n\n\n\n<div style=\"height:100px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"has-text-align-center has-blue-color has-text-color has-link-color wp-elements-4eed7f44c4b19c87b4545dc9bf46a4ba\" style=\"font-size:26px\"><strong><em><br>Join Our Data Community<\/em><\/strong><\/p>\n\n\n\n<p class=\"has-text-align-center has-black-color has-text-color has-link-color wp-elements-9bdac29360d2b62aa9e765a3bc163366\">At Data Principles, we believe in making data powerful and accessible. Get monthly insights, practical advice, and company updates delivered straight to your inbox. Subscribe and be part of the journey!<\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-16018d1d wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link has-orange-background-color has-background has-text-align-center wp-element-button\" href=\"https:\/\/lp.constantcontactpages.com\/sl\/XIYDUv9\/DataDecisionsPathways\">Subscribe Now<\/a><\/div>\n<\/div>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"946\" height=\"630\" src=\"https:\/\/blog.data-principles.com\/wp-content\/uploads\/2025\/12\/Screenshot-2025-06-02-at-6.34.01-PM.png\" alt=\"\" class=\"wp-image-1087\" style=\"width:450px\" srcset=\"https:\/\/blog.data-principles.com\/wp-content\/uploads\/2025\/12\/Screenshot-2025-06-02-at-6.34.01-PM.png 946w, https:\/\/blog.data-principles.com\/wp-content\/uploads\/2025\/12\/Screenshot-2025-06-02-at-6.34.01-PM-300x200.png 300w, https:\/\/blog.data-principles.com\/wp-content\/uploads\/2025\/12\/Screenshot-2025-06-02-at-6.34.01-PM-768x511.png 768w\" sizes=\"auto, (max-width: 946px) 100vw, 946px\" \/><\/figure><\/div>","protected":false},"excerpt":{"rendered":"<p>By Pete Stiglich One question I ask of data architects and data engineers that I\u2019m interviewing is \u201cWhat is the most important thing to know&hellip;<\/p>\n","protected":false},"author":5,"featured_media":930,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13,258,260],"tags":[88,86,83,90,85,92,89,87,84,91,82],"class_list":["post-917","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-architecture-modeling","category-data-modeling","category-hot-topic","tag-analytics-engineering","tag-business-keys","tag-data-architecture","tag-data-integration","tag-data-modeling","tag-data-principles","tag-data-quality","tag-enterprise-data","tag-know-your-data","tag-modeling-best-practices","tag-primary-keys"],"_links":{"self":[{"href":"https:\/\/blog.data-principles.com\/index.php\/wp-json\/wp\/v2\/posts\/917","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.data-principles.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.data-principles.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.data-principles.com\/index.php\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.data-principles.com\/index.php\/wp-json\/wp\/v2\/comments?post=917"}],"version-history":[{"count":12,"href":"https:\/\/blog.data-principles.com\/index.php\/wp-json\/wp\/v2\/posts\/917\/revisions"}],"predecessor-version":[{"id":1246,"href":"https:\/\/blog.data-principles.com\/index.php\/wp-json\/wp\/v2\/posts\/917\/revisions\/1246"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.data-principles.com\/index.php\/wp-json\/wp\/v2\/media\/930"}],"wp:attachment":[{"href":"https:\/\/blog.data-principles.com\/index.php\/wp-json\/wp\/v2\/media?parent=917"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.data-principles.com\/index.php\/wp-json\/wp\/v2\/categories?post=917"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.data-principles.com\/index.php\/wp-json\/wp\/v2\/tags?post=917"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}