Blog post

SQL or NoSQL? Why not use both (with PostgreSQL)?

2022-11-24

17 minute read

It's a tough decision for any developer starting a new project. Should you store your data in a standard, time-tested SQL database, or go with one of the newer NoSQL document-based databases? This seemingly simple decision can literally make or break your project down the line. Choose correctly and structure your data well, and you may sail smoothly into production and watch your app take off. Make the wrong choice and you could be headed for nightmares (and maybe even some major re-writes) before your app ever makes it out the door.

Simplicity vs Power

There are tradeoffs with both SQL and NoSQL solutions. Typically it's easier to get started with NoSQL data structures, especially when the data is complex or hierarchical. You can just take a JSON data object from your front-end code and throw it in the database and be done with it. But later when you need to access that data to answer some basic business questions, it's much more difficult. A SQL solution makes it easier to gather data and draw conclusions down the line. Let's look at an example:

Each day I track the food I eat, along with the number of calories in each item:

DayFood ItemCaloriesMeal
01 JanApple72Breakfast
01 JanOatmeal146Breakfast
01 JanSandwich445Lunch
01 JanChips280Lunch
01 JanCookie108Lunch
01 JanMixed Nuts175Snack
01 JanPasta/Sauce380Dinner
01 JanGarlic Bread200Dinner
01 JanBroccoli32Dinner

I also track the number of cups of water I drink and when I drink them:

DayTimeCups
Jan 0108:151
Jan 0109:311
Jan 0110:422
Jan 0112:072
Jan 0114:581
Jan 0117:151
Jan 0118:401
Jan 0119:051

And finally, I track my exercise:

DayTimeDurationExercise
Jan 0111:020.5Walking
Jan 0209:440.75Bicycling
Jan 0217:000.25Walking

For each day I also track my current weight along with any notes for the day:

DayWeightNotes
Jan 01172.6This new diet is awesome!
Jan 14170.2Not sure all this is worth it.
Jan 22169.8Jogged past a McDonald's today. It was hard.
Feb 01168.0I feel better, but sure miss all that greasy food.

Gathering All That Data

That's a lot of different data that needs to be gathered, stored, retrieved, and later analyzed. It's organized simply and easily, but the number of records varies from day to day. On any given day I may have zero or more entries for food, water, and exercise, and I may have zero or one entry for weight & notes.

In my app, I gather all the data for a single day on one page, to make it easier for my users. So, I get a JSON object for each day that looks like this:

{ "date": "2022-01-01", 
  "weight": 172.6, 
  "notes": "This new diet is awesome!", 
  "food": [
      { "title": "Apple", "calories": 72, "meal": "Breakfast"},
      { "title": "Oatmeal", "calories": 146, "meal": "Breakfast"},
      { "title": "Sandwich", "calories": 445, "meal": "Lunch"},
      { "title": "Chips", "calories": 280, "meal": "Lunch"},
      { "title": "Cookie", "calories": 108, "meal": "Lunch"},
      { "title": "Mixed Nuts", "calories": 175, "meal": "Snack"},
      { "title": "Pasta/Sauce", "calories": 380, "meal": "Dinner"},
      { "title": "Garlic Bread", "calories": 200, "meal": "Dinner"},
      { "title": "Broccoli", "calories": 32, "meal": "Dinner"}
  ],
  "water": [
      {"time": "08:15", "qty": 1},
      {"time": "09:31", "qty": 1},
      {"time": "10:42", "qty": 2},
      {"time": "10:42", "qty": 2},
      {"time": "12:07", "qty": 1},
      {"time": "14:58", "qty": 1},
      {"time": "17:15", "qty": 1},
      {"time": "18:40", "qty": 1},
      {"time": "19:05", "qty": 1}
  ],
  "exercise": [
      {"time": "11:02", "duration": 0.5, "type": "Walking"}
   ]
}

Saving the Data

Once we've gathered all the data for a day, we need to store it in our database. In a NoSQL database, this can be a pretty easy process, as we can just create a record (document) for a specific user for a specific date and throw document into a collection and we're done. With SQL, we have some structure we have to work within, and in this case it looks like 4 separate tables: food, water, exercise, and notes. We'd want to do 4 separate inserts here, one for each table. If we don't have data for a specific table (say no exercise was recorded today) then we can skip that table.

If you're using SQL to store this data, you might want to save each table's data as it's entered in your data entry form (and not wait until all the data is entered.) Or you might want to create a database function that takes all the JSON data, parses it, and writes it to all the related tables in a single transaction. There's a lot of ways to handle this, but suffice it to say this: it's a bit more complicated than saving the data in a NoSQL database.

Retrieving the Data

If we want to display all the data for a single day, it's pretty much the same. With NoSQL you can grab the data for the user's day and then use it in your application. Nice! With SQL we need to query 4 tables to get all the data (or we could use a function to get it all in a single call.) Of course, when displaying the data, we'd need to first break up our JSON data into pieces that are needed by each section of our dashboard screen, and you could argue that it's simpler to map each SQL table with the dashboard section on the screen, but that's a trivial point.

Analyzing the Data

Now that we've saved the data and we can retrieve it and display it, let's use it for some analysis. Let's display a graph of how many total calories I've eaten over the past month. With SQL, this is a simple task:

select 
   date, sum(calories) as total_calories 
from food_log 
group by date 
where user_id = 'xyz' 
and day between '2022-01-01' and '2022-01-31' 
order by date;

Bam! Done! Now we can send those results to our graphing library and make a nice pretty picture of my eating habits.

But if we've stored this data in NoSQL, it gets a little more complicated. We'll need to:

  • grab all the data for the user for the month
  • parse each day's data to get the food log information
  • loop through each day and total the calories
  • send the aggregate data to our graphing module

If this is something we're going to do regularly, it makes sense to calculate the total calories for each day and store it in the day's document so we can get at that data faster. But that requires more work up front, and we still need to pull the data for each day and parse out that calorie total first. And if we update the data we still need to recalculate things and update that total. Eventually we'll want to do that with the water and exercise totals as well. The code will eventually start to get longer and more complex.

SQL and NoSQL Together - FTW

Let's see how we can use the power of SQL together with the ease-of-use of NoSQL in the same database to make this all a bit easier. We'll create a table for each day of data (for each user) and store the basic fields such as weight and notes first. Then we'll just throw the food_log, water_log, and exercise_log fields in a JSONB field.

CREATE TABLE calendar (
    id uuid DEFAULT uuid_generate_v4() NOT NULL,
    date date,
    user_id uuid NOT NULL,
    weight numeric,
    notes text,
    food_log jsonb,
    water_log jsonb,
    exercise_log jsonb
);
-- (Optional) - create a foreign key relationship for the user_id field 
ALTER TABLE ONLY calendar
    ADD CONSTRAINT calendar_user_id_fkey FOREIGN KEY (user_id) REFERENCES auth.users(id);

Now let's insert some data into the table. PostgreSQL offers both JSON and JSONB fields, and since the latter are more optimized by the database and much faster for query processing, we’ll almost always want to use JSONB. We’ll use JSONB fields for food_logwater_log, and exercise_log and just dump the data we got from our app right into those fields as a string:

insert into calendar (date, user_id, weight, notes, food_log, water_log, exercise_log)
values (
   '2022-01-01', 
   'xyz', 
   172.6, 
   'This new diet is awesome!',
   '[
      { "title": "Apple", "calories": 72, "meal": "Breakfast"},
      { "title": "Oatmeal", "calories": 146, "meal": "Breakfast"},
      { "title": "Sandwich", "calories": 445, "meal": "Lunch"},
      { "title": "Chips", "calories": 280, "meal": "Lunch"},
      { "title": "Cookie", "calories": 108, "meal": "Lunch"},
      { "title": "Mixed Nuts", "calories": 175, "meal": "Snack"},
      { "title": "Pasta/Sauce", "calories": 380, "meal": "Dinner"},
      { "title": "Garlic Bread", "calories": 200, "meal": "Dinner"},
      { "title": "Broccoli", "calories": 32, "meal": "Dinner"}
     ]',
   '[
      {"time": "08:15", "qty": 1},
      {"time": "09:31", "qty": 1},
      {"time": "10:42", "qty": 2},
      {"time": "10:42", "qty": 2},
      {"time": "12:07", "qty": 1},
      {"time": "14:58", "qty": 1},
      {"time": "17:15", "qty": 1},
      {"time": "18:40", "qty": 1},
      {"time": "19:05", "qty": 1}
    ]',
   '[
      {"time": "11:02", "duration": 0.5, "type": "Walking"}
    ]'
);

While that's a big insert statement, it sure beats doing inserts on 4 separate tables. With all those food entries and water log entries, we would have had to made 1 entry in the main table, then 9 food_log entries, 9 water_log entries, and one exercise_log entry for a total of 20 database records. We've wrapped that into a single record.

But How Do We Query This Data?

Great, we're collecting the data now, and it's easy to insert the data into the database. Editing the data isn't too bad either because we're just downloading the data to the client, updating the JSON field(s) as needed, and throwing them back into the database. Not too hard. But how can I query this data? What about that task from before? Let's display a graph of how many total calories I've eaten over the past month.

In this case, that data is stored inside the food_log field inside the calendar table. If only PostgreSQL had a way of converting JSONB arrays into individual database records (recordsets). Well, it does! The jsonb_array_elements function will do this for us, allowing to create a simple table we can use to calculate our caloric intake.

Here's some SQL to turn that food_log array into individual output records:

select 
  user_id,
  date,
  jsonb_array_elements(food_log)->>'title' as title,
  jsonb_array_elements(food_log)->'calories' as calories,
  jsonb_array_elements(food_log)->'meal' as meal
from calendar 
where user_id = 'xyz' 
   and date between 
   '2022-01-01' and '2022-01-31';

This returns a table that looks like this:

datetitlecaloriesmeal
2022-01-01Apple72Breakfast
2022-01-01Oatmeal146Breakfast
2022-01-01Sandwich445Lunch
2022-01-01Chips280Lunch
2022-01-01Cookie108Lunch
2022-01-01Mixed Nuts175Snack
2022-01-01Pasta/Sauce380Dinner
2022-01-01Garlic Bread200Dinner
2022-01-01Broccoli32Dinner

A couple things to note:

  • jsonb_array_elements(food_log)->>'title' as title this returns a text field, since the ->> operator returns TEXT
  • jsonb_array_elements(food_log)->'calories' as calories this returns a JSON object, since the -> operator return JSON

If we want to sum the calories to get some totals, we can't have a JSON object, so we need to cast that to something more useful, like an INTEGER:

  • (jsonb_array_elements(food_log)->'calories')::INTEGER as calories this returns an INTEGER

Now we can't just throw the sum operator on this to get the total calories by day. If we try this:

select 
  date,
  sum((jsonb_array_elements(food_log)->'calories')::integer) as total_calories
from calendar where user_id = 'xyz' 
      and date between '2022-01-01' and '2022-01-31'
group by date;

we get an error back from PostgreSQL: Failed to run sql query: aggregate function calls cannot contain set-returning function calls.

Instead, we need to think of this as a set of building blocks, where our first SQL statement returns a table:

select 
  date,
  (jsonb_array_elements(food_log)->'calories')::integer as calories
from calendar where user_id = 'xyz' 
      and date between 
      '2022-01-01' and '2022-01-31';

Now we can take that "table" statement, throw some (parenthesis) around it, and query it:

with data as
   (
      select 
        date,
        (jsonb_array_elements(food_log)->'calories')::integer as calories
      from calendar 
      where user_id = 'xyz' 
         and date between 
         '2022-01-01' and '2022-01-31'
   ) 
select date, sum(calories) from 
   data group by date;

This gives us exactly what we want:

datesum
2022-01-011838

If we add more data for the rest of the days of the month, we'll have all the data we need for a beautiful graph.

Searching the Data

What if we want to answer the question: How many calories were in the garlic bread I ate last month? This data is stored inside the food_log field in the calendar table. We can use the same type of query we used before to "flatten" the food_log data so we can search it.

To get every item I ate during the month of January, we can use:

select 
  date,
  jsonb_array_elements(food_log)->>'title' as title,
  (jsonb_array_elements(food_log)->'calories')::integer as calories
from calendar 
where user_id = 'xyz' 
   and date between 
   '2022-01-01' and '2022-01-31'

Now to search for the garlic bread we can just put (parenthesis) around this to make a "table" and then search for the item we want:

with my_food as 
(
   select 
     date,
     jsonb_array_elements(food_log)->>'title' as title,
     (jsonb_array_elements(food_log)->'calories')::integer as calories
   from calendar 
   where user_id = 'xyz' 
   and date between 
   '2022-01-01' and '2022-01-31'
) 
select title, calories 
from my_food 
where title = 'Garlic Bread';

which gives us:

titlecalories
Garlic Bread200

Conclusion

If we take a little time to study the JSON Functions and Operators that PostgreSQL offers, we can turn Postgres into an easy-to-use NoSQL database that still retains all the power of SQL. This gives us a super easy way to store our complex JSON data coming from our application code in our database. Then we can use powerful SQL capabilities to analyze and present that data in our application. It's the best of both worlds!

More Postgres resources

Share this article

Last post

Transparent Column Encryption with Postgres

1 December 2022

Next post

Flutter Authentication and Authorization with RLS

22 November 2022

Related articles

Storing OpenAI embeddings in Postgres with pgvector

Supabase Beta December 2022

Launch Week 6 Hackathon Winners

Supabase Vault is now in Beta

PostgREST 11 pre-release

Build in a weekend, scale to millions