Validate semantic model relationships

Test that your relationships are set up and work as you expect in your semantic model.

In this series, we provide tips for how to effectively build a semantic model. In the previous article, we described how you create relationships in semantic model. Relationships are one of the most important parts of a semantic model, as they’re the foundation of your business logic.

In this article, we describe how you can validate the relationships in your semantic model to ensure that they’re set up correctly. Validating your relationships helps ensure that you don’t have unexpected results or performance issues in your reports.

NOTE

There are many equally valid ways to build a semantic model that fits your data needs. This series presents one approach, sharing tips and good practices from our experience. See below for the other, published articles in this series:

How to validate relationships

Once you create a relationship, you should test that it works as you expect. This is particularly important when you use atypical properties in the relationship or if you are using the relationship in uncommon scenarios. An example of an uncommon scenario might be inactive relationships with a role-playing dimension, or relationships with a many-to-many cardinality. Once you create your relationships and process your model, you’re ready to validate these new relationships.

NOTE

If you’re developing your semantic model in Tabular Editor by using workspace mode, you need to perform a Calculate Refresh of your model to process the relationship before it can be validated and used. That’s because the relationship is an in-memory object that takes up space in your model, and is calculated at process time. Tabular Editor will warn you if you have any objects that require calculation in your model.

Figure 1: Tabular Editor warns you when you have an object like a relationship that must be calculated in your model.

This calculate refresh occurs automatically when you add a relationship to a local model by using Tabular Editor or Power BI Desktop.

The following diagram depicts examples of issues you encounter with relationships.

Figure 2: Generally, the types of issues you encounter include referential integrity violations, unexpected query results, or slow query evaluation times. Note that for the unexpected query results, the table/matrix total is returning exactly the correct result, because it’s using a calculation with a filter context from a flawed model that’s poorly designed.

When you validate a relationship, you should look for the following:

  • Referential integrity (RI) violations: Missing values on the “from” side (typically the dimension table) which are present on the “to” side (typically the fact table). The most common issue caused by RI violations in a semantic model is that (Blank) values will appear in Power BI reports, which confuse users and reflect poor data quality. RI violations are also commonly thought to cause performance issues in a semantic model, although there’s no concrete, consistent evidence of this; it depends upon your data and your semantic model.
  • Unexpected query results: Relationships that aren’t properly configured might produce results that differ from your expected baselines. This is more likely when you use atypical properties for your relationships. Testing these atypical relationships is particularly important so that you’re sure that these specific relationships don’t negatively impact the perceived quality of your models. To identify these issues, you typically use report visuals or DAX queries.
  • Slow query evaluation times: High cardinality relationships can sometimes result in performance issues. You may need to validate these relationships to see if you need to reshape your data or change your model design. To identify these issues, you can use the performance analyser in Power BI Desktop or DAX Studio.

NOTE

You’ll need to validate relationships again once you create DAX measures and row-level security (RLS) or object-level security (OLS) for your model, since both of these objects often rely upon relationships to function properly.
You can also validate relationships as part of regular, automated testing to check the quality of your model and data.

You can validate relationships by using Power BI Desktop, Tabular Editor, or Microsoft Fabric (Semantic Link in Notebooks).

Figure 3: You can take different approaches to validate relationships, using either Power BI Desktop, Tabular Editor, or Microsoft Fabric. Note that you can also use the VertiPaq Analyzer from DAX Studio.

Generally, your choice of tool depends upon your preferred workflow and needs.

  • Power BI Desktop: You can validate your relationship by using the visual canvas or DAX queries in Power BI Desktop. You should use Power BI Desktop to validate your relationships if it’s the main tool that you use to build your semantic model.
  • Tabular Editor: You can validate your relationship by using the VertiPaq Analyzer or DAX queries in Tabular Editor. You can automate this validation by using certain Best Practice Analyzer rules and C# scripts. You should use Tabular Editor to validate relationships when you are using Tabular Editor as an external tool for Power BI Desktop, or if it’s the main tool you use to build your semantic model. You should also use Tabular Editor to validate your relationships when you need automation, or in more complex scenarios and models.
  • Notebooks: You can validate your relationship by using various features of Semantic Link in notebooks in Microsoft Fabric. Semantic link is unique because it has several built-in functions to analyse and visualize relationships. You can also schedule this notebook validation. However, you can’t (currently) write back to your model, so this analysis is read-only and an extra step external to the model development, itself. It also only works if your model is published to a workspace in a Fabric capacity; you can’t analyse local models open in Power BI Desktop, or model metadata files that aren’t uploaded to a Fabric capacity. You should use notebooks to validate relationships when you are comfortable with working in Python and using notebooks, or you’ll be using your semantic model (and its data or logic) for downstream consumption/analysis in notebooks and ML models.

The following sections describe examples of how you can use each of these tools to validate relationships, with particular focus upon identifying and resolving RI violations.

How to identify relationship RI violations

RI violations are straightforward to objectively identify, while unexpected or slow queries require a baseline or context to test. That’s why in this article, we’ll focus on how you can identify RI violations by finding invalid rows and missing keys.

The following diagram depicts an example of RI violation in a one-to-many relationship between a customer dimension table and a sales fact table.

Figure 4: RI violations are a result of keys present on the “to” side (typically a Fact table) that are missing on the “from” side (typically a Dimension table). There’s a blank row automatically added to the dimension table to handle these cases (which you can’t see) and that’s why unmapped values end up as blank in report visuals.

To identify these RI violations, you can take various approaches and use various tools. To reiterate, the choice of tool depends upon your preference and workflow. What’s most important is that you can identify and resolve the RI violations, where possible.

NOTE

You mainly resolve RI violations to improve the quality of your data and your reports. However, depending upon the business process that the data reflects, the RI violations might represent a real circumstance. For instance, a relationship between ‘Date’[Date] and ‘Sales’[Billing Date] might have an RI violation if the product is shipped, but not yet billed. In this case, ‘Sales’[Shipping Date] could have a value, but ‘Sales’[Billing Date] is blank. The blank reflects a business truth; changing it to a de facto date would thus be incorrect.
This is illustrated in the following image.

Figure 5: The RI violation between ‘Date’[Date] and ‘Sales’[Billing Date] is only a technical problem, as it reflects a business process reality. Resolving it with an arbitrary date like 2099-01-01 would be incorrect and also confusing for the business. However, blanks are also confusing. There’s no cookie-cutter answer, here.

Johnny Winter has an excellent blog discussing this topic in detail over at Greyskull Analytics that we recommend you read: The “Unknown Date” Dilemma.

Validate a relationship by using Power BI Desktop

Figure 6: You can validate relationships by using Power BI visuals or the new Power BI Desktop DAX queries. You can do this either while having the semantic model open in Power BI Desktop, or connecting to the published model that’s in a workspace.

You can use the following approaches to validate relationships in Power BI Desktop.

  1. Power BI Visuals: The easiest way to do this is to use a matrix visual, creating an arbitrary hierarchy of the primary key (on the “From” side; usually the dimension table) and beneath it the foreign key (on the “To” side; usually the fact table). You then add a measure that counts the values of the foreign key (and also potentially the number of rows in the fact table). The measures help you identify the magnitude impact. Expand the matrix and sort it alphabetically ascending, looking to identify any blank rows in the primary key. The presence of these blanks indicates an RI violation, and the lower level of the matrix will reveal which keys are missing (and how many rows are affected).   
  2. DAX Queries: You can achieve the same result as (1) by using a DAX query. The simplest DAX query uses the EXCEPT function to identify keys that are present in VALUES ( ‘Fact’[Foreign Key] ) but not in VALUES ( ‘Dimension’[Primary Key] ). You can export these results to use in investigation and data cleaning steps with your source data.

TIP

If you set up a Power BI visual to count invalid rows of a relationship (or to list the invalid members), you can link these to a data alert in Power BI or a Reflex in Data Activator. This can automatically notify you when new issues arise during the lifecycle of your semantic model.
If you want to visualize relationships in Power BI Desktop, just use the model view. Power BI Desktop automatically saves any new layouts you make.

Validate a relationship by using Tabular Editor

Figure 7: You can validate relationships in Tabular Editor by using the VertiPaq Analyzer, DAX Queries, Best Practice Analyzer (BPA), and C# scripts. Note that you don’t typically use the BPA to validate RI violations, but it’s mentioned here for completion.

You can validate relationships by using Tabular Editor in four ways:

  1. VertiPaq Analyzer: The VertiPaq Analyzer is a helpful tool that will identify and raise any RI violations found in your model. It will provide supplemental information about the violating relationships, and also sample values of the keys that are missing. You can change how many samples are shown from the preferences menu of Tabular Editor. The .vpax results can also be saved as documentation, for later reference.
  2. DAX Queries: You can use DAX queries identically as you would in Power BI Desktop. In Tabular Editor, you can also save these DAX queries to re-use in other models or to automate in C# scripts.
  3. Best Practice Analyzer (BPA): You can define rules in the BPA which identify potential issues with relationships. Custom rules that you define with C# can perform more complicated steps, though the BPA can’t query the model to identify RI violations. Instead, you can use automated C# scripts to flag RI violations and save them as annotations in your relationships, which the BPA can flag as an issue.
  4. C# Scripts: C# Scripts can automate the detection of problematic relationships or RI violations. They can execute DAX queries, but also use functions that retrieve VertiPaq statistics or other metadata to inform downstream steps that can immediately correct these issues or flag the relationship for later investigation and validation (once the source data is corrected).

TIP

You can programmatically validate relationships (and other parts of your semantic model) for any of these methods by using the Tabular Editor CLI. Then, you can use this CLI in Azure Pipelines as part of your CI/CD process to automate testing of your semantic model during its lifecycle.
If you want to visualize relationships in Tabular Editor, create a new model diagram, and add the tables with the relationships you want to see. You can save and re-use these diagrams in your documentation.

NOTE

You can also use DAX Studio to use the VertiPaq Analyzer and execute DAX Queries.

Validate a relationship by using Semantic Link in Fabric Notebooks

Figure 8: You can view, discover, and validate relationships by using Semantic Link in notebooks for one or more semantic models. This only works for models deployed to a workspace on Fabric capacity. Note that this example shows how to view and work with relationships in a notebook, not simply detect RI violations.

TIP

In notebooks, you can also use the fabric_cat_tools library from Michael Kovalsky, which allows you to programmatically analyze semantic models using both the VertiPaq Analyzer and BPA.

See this example notebook from Microsoft for a demonstration of how to use Semantic Link to view and validate relationships in a notebook.

In Fabric Notebooks, you can use Semantic Link to connect to a published semantic model and interact with it programmatically by using Python.

  1. DAX Queries: You can execute DAX Queries within notebooks, taking a similar approach as you would by using Power BI Desktop or Tabular Editor DAX queries.
  2. Visualize relationships: You can create a model-like view of specific relationships (from a relationships dataframe obtained by list_relationships) by using the plot_relationship_metadata function.
  3. sempy.relationships: You can retrieve and validate relationships by using SemPy library. There’s different functions that you can use, depending upon your needs:
    1. list_relationships: Returns the existing relationships in a relationships DataFrame.
    1. find_relationships: Attempts to find potential relationships in a relationships DataFrame with better performance than the autodetection of Power BI Desktop. Note that you can’t yet write these discovered potential relationships back to the model from the notebook.
    1. list_relationship_violations: Returns a DataFrame with a summary of all the RI violations from a relationships dataframe (either of physical relationships in the model found with list_relationships, or potential relationships discovered by find_relationships).

You can scale and schedule notebooks so that they run these validations and analyses over multiple semantic models. This can be helpful to automatically detect data quality issues in deployed models caused by issues with the underlying source data. Conditional logic in scheduled notebooks can also trigger downstream actions or updates in response to invalid relationships, such as modifying a table or triggering another tool to notify a data steward.

TIP

In notebooks, you can read data by using Spark in not just Python, but also SQL, R, and Scala.

WARNING

Using a notebook will consume Fabric capacity units (CUs). Ensure that you take the appropriate steps to optimize your notebook schedules and the duration of your compute sessions to avoid unnecessary cost.

In conclusion

It’s essential that you validate relationships both after you create and process them, as well as once you’ve finished developing your DAX and RLS/OLS. Typically, you want to identify common issues like RI violations (missing keys) or more subtle issues in performance or query results. You can test your relationships by using Power BI Desktop, Tabular Editor, and notebooks in Fabric. What’s most important is that you do these tests and ensure that your relationships behave as you expect before you introduce additional logic on top of them in your model.

2 thoughts on “Validate semantic model relationships”

    1. Morten Lønskov

      Hi Bernat
      Great article! Love the use of C# scripts to automate the development of this kind of audit measures.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top