The Spaceparts Co. Dataset 

A learning resource for Tabular Editor and more. 

The Tabular Editor Enterprise Trainings have been recently released as a free community learning resource. These trainings provide an interactive learning experience to learn about Tabular Editor and data modeling for AAS, SSAS, and Power BI datasets. As part of these trainings, you get access to over 25 hours of learning material that cover a wealth of topics taking you from a beginner to an advanced user of Tabular Editor. Additionally, you also get a feature-complete dataset to practice and apply your knowledge. 

The purpose of this article is to introduce the Spaceparts dataset and explain how it’ll help you learn Tabular Editor and more. This article also highlights why this dataset is different from other, popular sample data, like AdventureWorks

Note

The Spaceparts dataset is intended as a learning resource only. The dataset is the intellectual property of Tabular Editor ApS.
It’s intended for personal, non-commercial use, as per the license terms described here

Learning with Sample Data 

Effective learning requires theoretical knowledge and application of this knowledge. You apply theoretical knowledge by using scenarios where you can exercise critical thinking to create solutions for realistic problems. In business intelligence, this means that you need data to learn best. Since you can’t typically work with organizational data for personal learning, you need sample datasets.

However, it can be challenging to find these datasets and to find good datasets. You can find models made readily available for testing like AdventureWorks, or create their own from personal sample data.

Unfortunately, these datasets often have issues that limit how useful they are for learning and testing purposes.

  • Low complexity: Real-world datasets are often complex, as they represent specific business rules and processes. Since these sample datasets don’t originate from a complex environment, they’re often over-simplifications that don’t adequately represent real-world business problems or data needs.
  • Too small: High data volume and cardinality are typical challenges you face when working with data. However, sample datasets often provide a limited number of rows with only a few attributes. If the dataset does contain millions (or even billions) of rows, it’s often because it’s been artificially inflated, or taken from a real-world scenario (and not a business). For example, this makes it difficult to test performance-tuning solutions and approaches.
  • Too clean: Real-world datasets often have many data quality issues, exceptions, or anomalies that must be handled. Many sample datasets don’t sufficiently reflect these data challenges, resulting in a misrepresentation of typical data scenarios.
  • Too boring: Many sample datasets are just not interesting or fun to use, as they have boring attributes and no real relatability.


For this reason, we’ve created the Spaceparts dataset. This dataset addresses these challenges by using a ‘business process’ perspective to make the dataset both useful and engaging for your learning:

  • Designed by business logic: From the ground up, this dataset was designed by creating a fictional business, Spaceparts Co . We illustrated the operating model and business logic of this fictitious company that manufactures and distributes Spaceship Parts, drawing from real-world experience in relevant industries. This way, the dataset reflects specific business processes and problems that the Spaceparts employees are trying to address by using their data.
  • Designed with complexity: The calculation and business logic of Spaceparts is complex, with plenty of exceptions and oddities that one might encounter in a real dataset. For example, their data is expressed in local currency, and they require dynamic currency conversion in not one, but two (sometimes three depending on the region) types of exchange rates.
  • Designed to be a challenge: Spaceparts have a lot of data, with four fact tables that have different challenges like different levels of granularity, data quality issues, and data volumes upwards of 15M rows.
  • Designed to be fun: Spaceparts is a company that sells spaceship parts across the universe. The dataset is full of easter eggs, little jokes, and even fun facts and insights for you to discover.

Some facts about the dataset

The Spaceparts dataset provides information about sales, profitability, and supply chain demand planning. It includes features such as the below:

  • 14 tables: 4 fact tables and 10-dimension tables
  • Currency conversion in multiple rates
  • Trends that differ by region, customer, and product groups
  • Hierarchies for regions, customers, and products
  • Personally identifiable information for the (fictional) Spaceparts employees
  • Security rules to set up Data Security
  • Table schemas or layouts that aren’t optimal for the ideal star schema
  • Subtle data quality issues, exceptions, or anomalies to find if performing standard QA/QC

Where can you get it?

The Spaceparts dataset is available to anyone who participates in the Tabular Editor Enterprise Trainings. These trainings introduce you to Tabular Editor so that you can build better data models, faster. In these trainings, you’ll be able to apply your knowledge in business cases that use the Spaceparts dataset. There, you will receive the credentials to access the dataset.

However, you can use this dataset for more than learning Tabular Editor. For example, you can also use it to practice:

  • Writing and optimizing DAX code
  • Transforming data in Power Query
  • Creating a good star schema and semantic model.
  • Visualizing data effectively in a Power BI report
  • Any of the preview workloads or tools in Microsoft Fabric

Note

The Spaceparts dataset will also receive regular updates and additional features. Occasionally, there may also be challenges and requirements documents for you to practice data modeling, DAX, and data visualization.

In conclusion

Good sample datasets are necessary to learn data tools, methods, and patterns. However, finding these datasets can be a challenge. The Spaceparts dataset is an example of a dataset you can use to not only learn Tabular Editor but also other tools, like the various workloads in Microsoft Fabric. For more information about the dataset and how you can access it, start the Tabular Editor Enterprise Training course, today.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top