Master Data and Metadata: Why Both Matter
Last week, we looked at the risk side of AI in master data.
AI can help stewards move faster, but it should not govern trusted records by itself. Master data needs proof, review paths, ownership, and clear rules before automation gets anywhere near the golden record.
The next problem usually shows up more quietly. Nobody argues with the hub at first. They just stop using it.
A data team may standardize billing names, merge three supplier records, and fix inactive customer statuses. Then a BI analyst still exports from CRM. Finance still asks where the billing name came from. Procurement still keeps a spreadsheet for supplier parentage because the hub does not explain why three vendor records were merged.
When cleanup removes the old clues people relied on, trust can fall instead of rise.
Master data gives the business shared entities. Metadata gives those entities meaning, history, ownership, and context.
Master Data Gives the Business a Shared Record
A sales order usually carries a customer ID and assumes the record behind that ID is right.
Purchase orders, shipments, margin reports, access workflows, and compliance reports all depend on master data being right. For most teams, that means customer, supplier, product, employee, location, asset, account, and hierarchy data.
These records travel. A supplier record may feed procurement, ERP, risk reporting, tax validation, and a vendor portal before anyone notices a bad status.
A bad value in master data travels farther than a bad value in a local app table. A customer marked inactive in billing but active in CRM can freeze renewals while sales still sees an open account. A product moved from accessory to equipment may change tax handling, freight rules, and how it rolls into margin reports. Procurement may roll three supplier names into one parent while Finance still pays them through separate vendor IDs.
People build work around those records, often long before the MDM team knows how far the records travel.
The record often arrives without its backstory. A record may show the customer name, but not who defined customer. It may show the product hierarchy, but not who approved the rollup. It may store a supplier tax ID without showing which source supplied it, when it changed, or why one value survived over another.
That missing backstory is where metadata starts to earn its keep.
Metadata Tells People What the Data Means
Metadata is the set of clues that keep people from guessing.
The textbook definition is “data about data.” It is technically true, but not always helpful.
Inside an actual hub, metadata answers the questions people ask before they trust a field:
- What does this value mean?
- Who owns it?
- Where did it come from?
- Which rule changed it?
- Can I use it for this report or process?
The list looks simple until the first dispute starts.
CustomerStatusCode says active in CRM, inactive in billing, and unknown in the warehouse. Support shows recent tickets. The warehouse has no orders in two years. Each system may be telling the truth for its own process.
Without those answers, users guess. That can work for a week or two, but it usually ends with every team keeping its own version.
Metadata includes business meaning, technical structure, process history, quality signals, security rules, and usage context. Many organizations already have pieces of this, scattered across documents, catalogs, code comments, tickets, and people’s heads.
| Metadata Type | What It Answers | Example |
|---|---|---|
| Business metadata | What does it mean? | Customer Status means the current commercial relationship. |
| Technical metadata | How is it stored? | CustomerStatusCode is varchar(20), not nullable. |
| Operational metadata | How does it run? | Updated nightly from CRM and ERP. |
| Governance metadata | Who owns it? | Customer domain owner is Sales Ops. |
| Lineage metadata | Where did it come from? | Billing address came from ERP. |
| Quality metadata | Can we trust it? | 98.6 percent complete, 112 open exceptions. |
| Security metadata | Who can use it? | Tax ID is restricted. |
| Usage metadata | Where is it used? | Used by billing, customer portal, and revenue dashboards. |
A field value only gets you so far. People still need enough context to decide whether they can use it.
Metadata Makes Master Data Easier to Find
When people cannot find the approved customer master, they export something else and keep moving.
This often becomes visible when a report uses the wrong extract and the numbers no longer match the certified dashboard.
Discoverability sounds basic, which may be why teams skip it. The hub, catalog, and glossary may all exist. On paper, everything is covered.
Then a user searches for “client region” and finds nothing useful.
The field exists in a certified model, but the analyst only sees CustomerSalesTerritory in a warehouse view. Sales searches for account. Finance searches for customer. Support calls the same thing a client. Legal may care about the party. If the catalog only recognizes one label, people miss the approved data and keep using what they already know.
The certified field may sit three clicks deep in the catalog, while the stale CRM export is already bookmarked in someone’s browser.
A better catalog tool will not fix that by itself. The search terms and business definitions have to be designed.
The metadata has to connect business words to real assets. It maps synonyms. It shows which dataset is approved. It tells users who owns the data, how fresh it is, and what it is meant for.
A BI analyst searching for “client region” should be able to find the approved customer territory field even if the physical column name is CustomerSalesTerritory. The catalog should show the business definition, source, steward, certified dataset, and known limits.
A catalog can look complete and still fail if users cannot find the field before they build around the wrong one.
Metadata Makes Master Data Traceable
A supplier parent changes, and the next month procurement’s spend report moves millions into a different rollup. Finance asks why last month changed. The integration team checks the current hub record, but the current value only shows where the data landed, not how it got there.
The path matters.
Names, addresses, statuses, and hierarchies all move. Sometimes the source changes. Sometimes a steward overrides the value. Sometimes a survivorship rule gets updated and nobody remembers which records it touched.
Traceability lets the team answer follow-up questions without digging through old tickets, job logs, and message threads.
At minimum, the team should be able to see where the value came from, what changed it, who approved it, when it moved, and which downstream systems received it.
People treat the hub like business truth, even when the path to that truth is messy.
A supplier parent change hits procurement. A customer merge hits sales when two account teams suddenly see the same parent. Product category changes show up in finance reports. HR status changes can touch access, payroll, and reporting.
The answer “we think it came from the source system” does not help when Finance needs to explain why last month’s numbers changed.
Traceability matters even more during match and merge.
| Field | Source A | Source B | Golden Record Value |
|---|---|---|---|
| Supplier Name | ABC Industrial Supply LLC | A.B.C. Industrial Supplies | ABC Industrial Supply LLC |
| Tax ID | 12-3456789 | NULL | 12-3456789 |
| Address | 100 Main St | 100 Main Street | 100 Main St |
| Status | Active | Inactive | Active |
The golden record shows the final values. Metadata should show how those values were selected.
Source A might win for tax ID. Status may come from a different source. A steward may approve the name after comparing contract records. The address may have passed through a standardization rule.
If the merge later breaks 1099 reporting, payment terms, or vendor portal access, the team needs the trail. That context is what users ask for when the numbers move.
Metadata Makes Governance Real
Governance often gets talked about like it is a meeting structure: councils, policies, working groups, approval boards, and escalation paths.
Some of that has a place, but governance only works when it reaches the data.
A policy that says customer data must have an owner does not help much if the field in question still has no visible owner in the catalog, the hub, or the stewardship workflow.
A sensitive data policy does not protect Tax ID unless the field is tagged and tied to access rules.
A quality standard does not mean much unless critical attributes are labeled, measured, and reviewed.
A governance slide may say the right thing while the actual field still has no owner.
Metadata turns governance into something people can see and use.
| Governance Question | Metadata Needed |
|---|---|
| Who owns this field? | Domain owner and steward |
| Who approved this value? | Approval workflow metadata |
| Which rule applies? | Validation and survivorship metadata |
| Who can access it? | Security classification |
| Is it fit for reporting? | Quality score and usage notes |
| What breaks if it changes? | Lineage and impact metadata |
| Which system wins? | Source trust metadata |
Many MDM efforts stall right here.
They define owners in a slide deck, but not in the catalog. Quality rules live in a document, but not in the workflow. Source trust gets settled in a meeting, then the survivorship logic tells a different story six months later.
Eventually, the same pattern shows up again. Governance exists somewhere, but it never reaches the workbench, the pipeline, or the report.
Master Data Needs Its Backstory
The master record is only part of the product. People need the explanation around it so they know how to use it without inventing their own rules.
A customer record needs more than a customer ID. It needs a definition of customer. It needs source trust rules for legal name, billing name, display name, address, status, parent account, and consent fields. It needs owners who can settle disputes. It needs lineage that shows which systems feed it and which systems depend on it.
Waiting until the hub is built usually turns metadata into archaeology.
Someone has to reverse engineer definitions, trace feeds, map owners, decode rules, and ask why fields exist. By then, the answers are usually worse because people forget why decisions were made.
Capture the context while the decision is still fresh.
New domains need terms. New sources need trust rules. New consumers need lineage. Steward overrides need reasons. New validation rules need owners and approval history.
No one needs a perfect catalog on day one. The team does need to treat context as part of the product.
Common Failure Patterns
Metadata problems usually show up as trust problems.
One team says the hub is wrong. Another says the field should never have been used for reporting. A third team did not know anyone depended on that value at all.
Plenty of those arguments trace back to missing metadata, not bad values.
| Failure Pattern | What Happens |
|---|---|
| Glossary is separate from MDM | Terms drift away from actual fields. |
| Lineage stops at the table level | Users cannot trace critical attributes. |
| Ownership is undocumented | Issues bounce between teams. |
| Quality rules lack context | Stewards see failures but not business impact. |
| Catalog is tool-owned | Metadata gets stale because the business is not involved. |
| Hub has no usage notes | Consumers do not know which view to trust. |
| Source trust is unclear | Survivorship rules turn into arguments. |
| Sensitive data is not tagged | Access control depends on tribal knowledge. |
Sometimes the glossary says customer, the table says party, and the report says account. The people using the data are left to figure out which label owns the meaning.
So teams solve the problem locally. They add local definitions, local extracts, local fixes, and local rules. A few months later, every team can defend its version.
That is how master data drift starts.
What Metadata Every MDM Domain Needs
Do not start by documenting every field in every system.
Start with the fields people fight over, depend on, or quietly work around.
For most domains, the minimum set should include the business definition, domain owner, steward, source systems, system of entry, and system of record. It should also cover key identifiers, validation rules, security classification, approved uses, downstream consumers, and refresh frequency.
Next, add the layer people need when a value is challenged. That usually means match rules, survivorship rules, lineage paths, quality scores, known limits, open issue counts, and last review dates.
This list is not perfect for every organization. Start with the questions people ask before they trust the data.
Customer data may need special care around legal name, billing name, display name, tax ID, customer type, sales region, account owner, status, parent account, and consent flags.
Product data may need clearer metadata around item number, product family, category, lifecycle status, hazardous material indicator, unit of measure, sellable flag, stocked flag, and customer-facing description.
Supplier data may need stronger context around tax ID, legal name, remit-to address, payment terms, risk rating, active status, and parent company.
A generic metadata template misses the fields where the business actually gets hurt.
How to Start Without Buying Another Tool
A catalog can store metadata, but it cannot decide what customer status means for billing.
Start with one master data domain. Customer, product, or supplier usually works well. Then answer five questions:
- What are the top 10 fields people depend on?
- What does each field mean?
- Who owns each field?
- Where does each value come from?
- Where is each value used downstream?
Those five answers will expose more than most teams expect.
You will find fields with no owner. You will find definitions that differ by team. Some reports may depend on fields nobody monitors. Source conflicts may be buried inside survivorship logic. Downstream users may exist outside the original design.
That is useful.
You are trying to make the hidden parts visible, not blame the team that inherited the mess.
The first pass can be ugly. That is fine. A useful spreadsheet beats a beautiful catalog nobody maintains.
Once the first domain is mapped, expand the metadata set. Add quality rules where bad values cause real damage. Add sensitivity tags where access needs control. Add lineage where changes create downstream risk. Add usage notes where people keep misusing the same field.
A spreadsheet is enough for the first pass. The catalog can come later.
The discipline matters more than the tool.
Metadata Also Needs Governance
Metadata can become stale too, usually in ways that look harmless at first.
The owner changes roles. The source system moves from one feed to another. A field once used only for reporting starts driving an API. A glossary definition survives long after the business process changed.
Nobody notices until someone trusts the old context.
A steward leaves, the ownership field stays the same, and every issue ticket keeps routing to a person who no longer owns the domain.
At minimum, decide who approves business definitions, who maintains lineage, who reviews sensitivity tags, and who checks metadata quality. Keep it light, but make it real.
Metadata quality matters because bad metadata can create false confidence.
Stale metadata is worse than missing metadata because it gives people confidence they did not earn.
A stale owner sends issues to the wrong person. With bad lineage, impact analysis starts from a false map. An old definition can sit in the catalog long after the process changed, and the damage may not show up until someone builds a report from it.
If people are going to rely on metadata, someone has to keep it current.
Final Thought: The Record Needs a Story
Most MDM programs start because the business wants one place to trust.
Clean records help, but they do not answer every question people ask before using the data.
A customer name might be clean, but Finance still needs to know whether it is the legal name, billing name, display name, or a name standardized by a rule. A product may have a valid category, but supply chain may need to know whether that category drives stocking, shipping, tax, or reporting.
Without that context, the hub becomes one more dataset people argue about.
The record matters because people build work around it. The story behind it matters because it tells them whether they should trust what they are using.


