RDBMS ,NoSQL ; When to use what !

arjun dhar
5 min readAug 13, 2022

With advancements in NoSQL; and being the relatively newer entrant a lot of people make RDBMS sound like slow sloths that are outdated and for anything fast or the need of the hour being unstructured or diverse data we need to use NoSQL.

The underlying thinking/subtle message tending to RDBMS is for small data and when you want to scale NoSQL is better is utterly wrong way to put it.

I’ve used RDBMS and NoSQL over many many years. So here goes…

Firstly “Diversity of data” is subjective to the design of Schema. If your data is DIVERSE but known you can plan for it in advance. The issue is when you try to dump Heterogeneous data (like documents) into cells then its bad but that's a result of not being able to plan the schema.

In theory and reality any NoSQL can be mapped to a schema based design.

And here is the thing, any experienced Data Engineer will tell you whether you plan for it upfront or later; you need to clean and partition your data.

In fact data cleaning becomes an ETL process with NoSQL, while with an RDBMS you can mitigate a lot of the risks upfront.

One of the reasons why NoSQL are more attractive is development and agility for systems that do not understand their data completely. Schema design requires a complete understanding of data now and future. Thats a hard and time consuming process.

Another reason is where one does not desire ACID and trades it off for read performance or as per CAP theorem trade off consistency for availability or partition tolerance.

Many applications for POC , Web or data assimilation do not have visibility into what the data will look like 6 months from now, so naturally designing a schema and an entire software around it is a waste of time. However, if you are an enterprise or product application and your mission critical pieces future rest on it, then RDBMS provide a efficient and stable alternative.

In both cases there is diversity of data and scale involved, yet the decisions are motivated by other factors.

The cost is simply in terms of time required to design the Schema. That becomes a bottleneck for reasons explained above already.

Time Series, data is another example where High volumes of STRUCTURED DATA flow in. Typically even NoSQL DBs like MongoDB 5.x that now support time series; omit the fields. Well you can only omit the fields for efficient storage if you know which data is tagged to which field. YES! its in a way schema based.

From MongoDB:

Time series collections efficiently store time series data. In time series collections, writes are organized so that data from the same source is stored alongside other data points from a similar point in time.

Another very important consideration is ACID and reliability. Transactional data that requires ACID; you cant ignore ACID for Diversity or Volume. While some NoSQL DBs like MongoDB do claim ACID compliance its not their forte and comes at a cost. It’s still not a way you want to go if you really want ACID. A database like PostgreSQL is more suited. PostgreSQL also has great support for Spatial querying.

Before I conclude I’d also like to mention personal experiences working with large organizational data, where multiple organizations had to make the decision to move from NoSQL to RDBMS or OLAP* DBs that do require a strong schema representation. The unstructured data was too much to manage and too inefficient for analytics. Worse was the migration of NoSQL data to structured requiring custom ETL workflows that had to perform well on top of that.

  • Critique: It is important to note OLAP DBs may or may not be RDBMs (OLTP) also; however they fall in the paradigm of structured data and conceptually are aligned when dealing with the diversity of data. I also use the word “may” fluidly, to indicate several OLTP providers layer OLAP capabilities on their traditional OLTP offerings. They however differ in the purpose of storage and OLAP DBs tend to be more de normalized; that consideration is moot for the current conversation.

NoSQL does not eliminate joins, it just discourages their use

A lot of beginners think that NoSQL does not use joins hence is faster. Well, its true that it encourages to avoid using joins as it supports nested objects and hence de-serialize the data into a single document. However, this does not mean that there is no need or use for joins. And if there is, then you are in a similar or worse situation than in an RDBMS. Some RDBMS also support nested tables.

Scalability

I’ve seen a lot of content tout NoSQL as readily scalable and suited to horizontal scaling, while RDBMS (SQL) are suited for vertical scaling. This is rubbish.

  1. Firstly, NoSQL and SQL are languages and not indicative of implementation behind it. E.g. Hive uses a limited set of SQL and is sitting over HBase or HDFS; none of which are RDBMS. HDFS scales magnificently but yet in its default state not necessary suited for OLTP or real time queries.
  2. SQL or NoSQL implementations, all need to do a lot to horizontally scale. Sharding, Replication (master-master, master-slave). The pain and headache to implement varies from implementation to implementation; however what is common is that this is no trivial. Even if the database boasts about horizontal scale; understanding the nuances of how it happens, where it fits with CAP, how to deal with synchronization failures are things you need to deal in all databases.

Conclusion

While NoSQL and big data have a strong correlation; this is not due to “Diversity of Data” or “Scale” as primary factors but rather a correlated consequence; due to :

  1. Typically end of the data journey, where we are aggregating data for READ purpose more than write or transaction keeping. You also have no need for ACID.
  2. There is an inherent need for de-normalization. Specially, when dealing with data across multiple data sources. Anyway a Join is not recommended across two or more databases.
  3. ETL is a consideration; and we wan to ingest the data first and then work on it (ELT) rather than transform it before loading.
  4. CAP applies to all databases; Most RDBMS are `CA` and thats its relation to ACID. However, don’t read too much into something being ACID Vs BASE or what aspect of CAP. E.g. MongoDB has some level of transaction support over and above BASE. A lot of improvements and enhancements in databases are causing crosspollination of concepts.

--

--

arjun dhar

Software development enthusiast since I was 8 yrs old. Love communicating on anything regarding innovation, community development … ∞