Looking for the right database technology to use? Luckily there are many database technologies to choose from, including relational databases (MySQL, Postgres), NoSQL (MongoDB), columnar databases (Amazon Redshift, BigQuery), and others. Each choice has its own pros and cons, but today let’s walk through how columnar databases are unique, by comparing them against the more traditional row-oriented database (e.g., MySQL). In this article, we'll cover what columnar databases are and give the advantages and disadvantages of columnar databases.

Table of Contents

Row-Oriented Database

Traditional databases store data by each row. The fields for each record are sequentially stored. Let’s say you have a table like this:

| ID | name         | address              | zip code | phone       | city       | country   | age |
|  1 | Benny Smith  | 23 Workhaven Lane    | 52683    | 14033335568 | Lethbridge | Canada    |  43 |
|  2 | Keith Page   | 1411 Lillydale Drive | 18529    | 16172235589 | Woodridge  | Australia |  26 |
|  3 | John Doe     | 1936 Paper Blvd.     | 92512    | 14082384788 | Santa Clara| USA       |  33 | 

This two-dimensional table would be stored in a row-oriented database like this:

1,Benny Smith,23 Workhaven Lane,52683,14033335568,Lethbridge,Canada,43;2,Keith Page,1411 Lillydale Drive,18529,16172235589,Woodridge,Australia,26;3,John Doe,1936 Paper Blvd.,92512,14082384788,Santa Clara,USA,33;

As you can see, a record’s fields are stored one by one, then the next record’s fields are stored, then the next, and on and on…

What is a Columnar Database?

Contrast the above with how a columnar database would store this data:

1,2,3;Benny Smith,Keith Page,John Doe;23 Workhaven Lane,1411 Lillydale Drive,1936 Paper Blvd.;52683,18529,92512;14033335578,16172235589,14082384788;Lethbridge,Woodridge,Santa Clara;Canada,Australia,USA;43,26,33;

Each field is stored by the column so that each ‘id’ is stored, then the ‘name’ column, then the ‘zip codes’, etc. So what implications are there when storing data in a column-oriented fashion?

The Advantages and Disadvantages of Columnar Databases

The primary benefit you can get by storing data in a column-oriented database is that some of your queries could become really fast. Imagine, for example, that you wanted to know the average age of all of your users. Instead of looking up the age for each record row by row (row-oriented database), you can simply jump to the area where the “age” data is stored and read just the data you need. So when querying, columnar storage lets you skip over all the non-relevant data very quickly.

Row-oriented database:

1,Benny Smith,23 Workhaven Lane,52683,14033335568,Lethbridge,Canada,43;2,Keith Page,1411 Lillydale Drive,18529,16172235589,Woodridge,Australia,26;3,John Doe,1936 Paper Blvd.,92512,14082384788,Santa Clara,USA,33;

Columnar database:

(skip) 43,26,33;

Hence, aggregation queries (queries where you only need to lookup subsets of your total data) could become really fast compared to row-oriented databases. Further, since the data type for each column is similar, you get better compression when running compression algorithms on each column (which would make queries even faster). And this is accentuated as your dataset becomes larger and larger. You might be thinking, “Well, what if I needed to query multiple values for each user? Aggregation is great, but my app needs to show data for each individual user??” You would be spot on. There are many cases where you actually do need multiple fields from each row. And columnar databases are generally not great for these types of queries. The more fields you need to read per record, the fewer benefits you get from storing in a column-oriented fashion. In fact, if your queries are for looking up user-specific values only, row-oriented databases usually perform those queries much faster.

Another thing to consider about columnar storage is that writing new data could take more time. If you’re inserting a new record into a row-oriented database, you can simply write that in one operation. But if you’re inserting a new record to a columnar database, you need to write to each column one by one. As a result, loading new data or updating many values in a columnar database could take much more time (perhaps, more than you expect). That’s why you would usually want a row-oriented database like MySQL running the back-end of your web app, etc. And once your app becomes huge, you would also want to consider having a columnar database like Amazon Redshift to run your BI (business intelligence) analytics queries (which usually consist of aggregation queries). We’ve seen many companies that make mobile games or web apps go through this same transition.


Pros of a Columnar Database

  • Queries that involve only Row-Oriented Database columns
  • Aggregation queries against vast amounts of data
  • Column-wise compression

Cons of a Columnar Database

  • Incremental data loading
  • Online Transaction Processing (OLTP) usage
  • Queries against only a few rows

In this post, we very quickly skimmed through the differences between columnar and row-oriented databases, but for those of you that want to dive in further, visit these resources that have in-depth explanations into columnar databases:

How Integrate.io Can Help

To integrate the transactional functions of your RDS, MySQL, or PostgreSQL databases with the analytical performance of Amazon Redshift, Integrate.io provides continuous, near real-time replication between your various endpoints. Integrate.io is an extremely intuitive yet powerful, cost-effective way to integrate your transactional databases and analytical data stores in a single interface with no manual scripting.

You can start a 14-day Free Trial and begin syncing your data within minutes. For questions about Integrate.io and how we can help accelerate your use-case and journey on Amazon Redshift or Snowflake, connect with us here.

Image Source