Normalizing Your Redux State

September 24th, 2019

The popularity of single page applications using technologies like React and Redux has made the job of front-end developers more complex than ever. While we use to rely on the server to handle all our data for us, now we are also managing data on the client side. Because of this, there are some concepts that are now important to know when handling your data. One of those concepts is normalization.

What is normalization?

The following is the Wikipedia definition of data normalization:

Database normalization is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity.

The key terms we are concerned with here are data redundancy and data integrity. When we reduce data redundancy, it means we aren't storing the same information multiple times in our system. This is important because it helps us with the second point, data integrity. When our data is stored only once, it makes it easier to make sure that our application is getting the most accurate and up to date version of that data.

For example, suppose we have an e-commerce store that can issue invoices and also take orders. Our data might look like this in JSON.

// Order
{
    id: "1",
    customer: {
        name: "David",
        address: "123 S Figueroa Ave"
    },
    items: [...]
}

// Invoice
{
    id: "1",
    customer: {
        name: "David",
        address: "123 S Figueroa Ave"
    },
    items: [...]
}

Now imagine that the customer changes their address. The application has to make sure the correct address is used across all its data. In this format, we're having to keep track of where the data has been used and updating it in multiple places. If we normalize the data, it would look something like this.

// Order
{
    id: "1",
    customer: "1",
    items: [...]
}

// Invoice
{
    id: "1",
    customer: "1",
    items: [...]
}

// Customers
{
    "1": {
        name: "David",
        address: "123 S Figueroa Ave"   
    }
}

Now the orders and invoices are referencing a separate entity which is the customer data. When the customer updates their address, we only have to worry about this single point of truth getting updated.

When software engineers talk about normalization, they will often talk about the level of normalization achieved which is called the normal form. I won't get in-depth on this topic as it's outside the scope of this article, but generally speaking, normal forms are a system for determining the level of normalization you have achieved in your data by grading it against a set of criteria.

Normalizing with normalizr

The most popular library for achieving normalization on the client-side is normalizr. Normalizr simply takes an input of data that is not normalized and normalizes it. It does this by using a schema, pre-defined by you, to parse the data and extract entities that can be referenced. If we continue the example from above our schema would look like this.

// schema.js

import { schema } from 'normalizr'

const customer = new schema.Entity('customers')
export const orderSchema = { customer: customer }
export const invoiceSchema = { customer: customer }

We have defined the entity of customer. The entity is data we want to extract to reduce redundancy since it is used by both orders and invoices. Next we create a schema which tells normalizr where that entity lives in the data input.

Let's use this configuration we've created to normalize our data using normalizr.

// entities.js

import { normalize } from 'normalizr'
import { orderSchema } from './schema'

const orderData = {
    id: "1",
    customer: {
        name: "David",
        address: "123 S Figueroa"
    }
}

const normalizedData = normalize(orderData, orderSchema)

The output for normalizedData would look like this.

{
    result: {
        id: "1",
        customer: "1"
    },
    entities: {
        customers: {
            "1": { id: "1", name: "David", address: "123 S Figueroa"
        }
    }
}

The result is our normalized order schema, and the entities are the pieces of data that have been extracted and replaced with a reference. By default, the reference is the id key of the data, though this can be changed in your schema.

Using normalizr with Redux

Now that we know about what normalization is and how it is achieved using normalizr, let's go over how to implement it with Redux. For a detailed example, check out the normalizr examples on github. For now, I'll run through the main part where normalization is happening and explain what's going on. This code snippet is from the normalizr examples linked above.

import * as Repo from './repos';
import { commit } from '../../api/schema';
import { ADD_ENTITIES, addEntities } from '../actions';
import { denormalize, normalize } from '../../../../../src';

export const STATE_KEY = 'commits';

export default function reducer(state = {}, action) {
  switch (action.type) {
    case ADD_ENTITIES:
      return {
        ...state,
        ...action.payload.commits
      };

    default:
      return state;
  }
}

export const getCommits = ({ page = 0 } = {}) => (dispatch, getState, { api, schema }) => {
  const state = getState();
  const owner = Repo.selectOwner(state);
  const repo = Repo.selectRepo(state);
  return api.repos
    .getCommits({
      owner,
      repo
    })
    .then((response) => {
      const data = normalize(response, [schema.commit]);
      dispatch(addEntities(data.entities));
      return response;
    })
    .catch((error) => {
      console.error(error);
    });
};

export const selectHydrated = (state, id) => denormalize(id, commit, state);

Here we have the Redux functionality for Github commits. It has been broken out into its own module containing the reducer, action, and a selector. It's using the thunk library in order to use function actions when handling async logic.

If we take a look inside the getCommits action creator, we'll see these lines of code.

const data = normalize(response, [schema.commit]);
dispatch(addEntities(data.entities));
return response;

The action creator makes a call to the Github API and once we get a response, it takes the response data from the Promise, and normalizes it using the commit schema. You'll notice that the schema is wrapped in brackets. That's because we're expecting an array of multiple commits. Normalizr returns an object that has the keys result and entities. In this example, we're only worried about the entities, so when the data is passed to addEntities, it specifically only sends the data in that key.

If we look at the reducer, we can see that it handles the action type ADD_ENTITIES. In the example on normalizr, all modules are handling this action type in their own similar way. For commits, since the entity used by the schema is configured to output with the key commits, this reducer looks for commits in the payload and merges it with the commits already in the state.

case ADD_ENTITIES:
  return {
    ...state,
    ...action.payload.commits
  };

You've now got normalized data in your Redux state. Since a commit is comprised of multiple normalized entities, you can use a selector to gather the different pieces of data and denormalize it.

export const selectHydrated = (state, id) => denormalize(id, commit, state);

The function is passed the id of the commit, the schema for a commit, and the redux state. What is returned is a Github commit that has been put back together in the shape it was received in from the Github API. It does this by looking inside the state for the entity key commits, finds the id that was passed for the entity, and uses the commit schema to determine which entities were normalized.

Normalization can add a hefty amount of complexity and boilerplate to a project, as does Redux. However, when it comes to managing applications with large amounts of data on the client-side, normalization will add a level of sanity that is imperative to the architecture.