Star Schema Design in Oracle: Fundamentals

Data Warehousing with Oracle

Data Warehousing with Oracle

What are the design rules for good performance in a star schema in an Oracle database? This blog post series introduces some recommendations for the physical database design. This first post is about constraints and indexes.

For the last few years, I was supporting one of my customers in solving performance problems on their Oracle databases. I have been involved with all kind of applications and sometimes really complex SQL statements. But the very last performance ticket I had to solve was a classical one: Several queries on a star schema (with a very large fact table) used a lot of database resources. When I looked at the schema design of the star schema, I was not surprised about the bad performance. It seems that the developers didn’t know about some typical design recommendations for a star schema in an Oracle database. This inspired me for this blog post. I think there are more people out there that aren’t aware of the basic rules to design a star schema in an Oracle database.

When I started to write down some of these general recommendations, I realized that there quite a lot of options and additional notes to mention. Instead of one long post, I decided to split it into multiple parts. This first blog post is about the fundamental concepts of a star schema and the basic rules for constraints and indexes on dimension and fact tables in Oracle.

Star Schema in a Nutshell

A star schema is the implementation of a dimensional data model in a relational database. It usually consists of a fact table that references multiple dimension tables. The fact table contains numerical measurements (or “facts”) and dimension keys that refer the corresponding dimensions. The dimension tables contain all information that describes the facts, their attributes are used for filtering and grouping the facts.

For this blog post series, I will use an example of a star schema with one fact table and four dimension tables. We want to have an overview of beer deliveries of a craft beer brewery to multiple customers. All customer related descriptions are stored in the dimension table DIM_CUSTOMER. The different beers of the craft beer brewery are defined in dimension table DIM_BEER. Each delivery contains a set of bottles from the bottling of a specific brew batch on a particular day. The bottling information is stored in dimension table DIM_BOTTLING. Finally, we have a dimension table DIM_DATE. This kind of calendar dimension typically exists in each star schema. The fact table FACT_BEER_DELIVERY consists of dimension keys to refer the four dimension table, as well as some measurements, e.g. the number of bottles.

Example of a Star Schema


Figure 1: Star schema for a craft beer brewery

A typical query in a star schema joins the fact table with one or more dimension tables and group the data by descriptive columns of the dimensions. The dimension columns are also used to filter the facts. The measurements are aggregated, usually with a SUM function.

With the following query, we want to find out how many bottles of India Pale Ale were delivered in 2020 to private customers. The facts is grouped by beer name, city of the delivery address and calendar month:


SELECT b.beer_name
, c.delivery_city
, d.calendar_month_desc
, SUM(f.number_of_bottles)
FROM fact_beer_delivery f
JOIN dim_beer b ON (b.beer_id = f.beer_id)
JOIN dim_customer c ON (c.customer_id = f.customer_id)
JOIN dim_date d ON (d.date_id = f.delivery_date)
WHERE d.calendar_year = 2020
AND c.private_person = 'Y'
AND b.style = 'India Pale Ale'
GROUP BY
b.beer_name
, c.delivery_city
, d.calendar_month_desc

For this example star schema, we now want to have a look how the physical database design in an Oracle database should be implemented. First, let’s have a look at constraints and indexes.

Physical Database Design

An appropriate logical design of the data model for the star schema is the prerequisite for a good data mart. However, another important task is the physical design of the database objects. In addition to the dimension and fact tables, we need to take care of constraints and indexes on the tables. Other features like partitioning, materialized views, etc., I will focus on the next blog posts.

Constraints

Database constraints ensure data integrity within the tables and the validity of the relationships between the tables. Additionally, they are useful for documentation of the data model and can be used by the optimizer to in some cases to improve query performance. However, constraints can have a negative effect in load performance. The different types of constraints and how they are used in a data warehouse environment are described here.

Indexes

For each primary key and unique constraint on the dimension tables, a unique index is created automatically. The purpose of these indexes is to guarantee uniqueness of the constraints. But what additional indexes are required in a star schema?

Indexes are useful for queries with a strong selectivity, i.e. when only a small percentage of the data is selected from a table. This is usually not the case in data warehouses. Typical queries on a star schema read many facts and aggregate them by several dimension columns. So, does it make sense at all to create indexes on a star schema? Here some general rules:

A star transformation will take place when we have WHERE conditions on two or more dimension tables. The selectivity for each of the filter conditions is too weak to use an index, but the combination returns only a small percentage of the data from the fact table. In our example above, let’s say we sold many bottles of India Pale Ale (IPA) in 2020, and also many beers to private customers, but only a small percentage of the IPAs were sold to private customers. With a star transformation, the optimizer will combine the bitmap indexes for the foreign keys on DIM_BEER and DIM_CUSTOMER to find all valid combinations for the query. With the small result set of this BITMAP AND operation, the corresponding rows from the fact table can be selected efficiently. The principle of a Star Transformation is to combine two or more bitmap indexes on the fact table to derive the facts that satisfy the filter conditions on all dimensions. In our example and in figure 2, this is shown for two dimensions, but the same principle works also for three or more dimension tables. A more detailed description of how Star Transformation works, you can find in the Oracle blog post Optimizer Transformations: Star Transformation by Sunil Chakkappen.

Star Transformation

Figure 2: Principle of star transformation (with two dimensions)

Note: To allow star transformations in your Oracle database, you have to enable it by setting the configuration parameter star_transformation_enabled to TRUE.

Some developers prefer to use B-tree indexes instead of bitmap indexes for dimension keys with many distinct values. But this is an exceptional case. Even though it is often claimed otherwise, bitmap indexes also work fine for a high number of distinct values. I use them for all dimension keys in fact tables, independent on the number of distinct values.

Another reason for B-tree indexes is when you use Standard Edition instead of Enterprise Edition. In Oracle Database Standard Edition, bitmap indexes are not available, but star transformation is supported anyway. In this case, you can create B-tree indexes on the fact table. The star transformation also works with B-tree indexes.

Summary

A short summary of the basic rules for the physical database design of star schemas in Oracle:

Advanced Oracle Features

This blog post introduced only the basic rules for the design of a star schema. Oracle provides many additional features that are helpful for star schemas. In my next blog posts, I will provide additional information about Partitioning, Materialized Views and Query Rewrite, Oracle Database In-Memory and how relevant all these rules are for Autonomous Data Warehouses. As soon as these posts are online, I will add the links here.