MongoDB vs DynamoDB: How do you choose between them? Whether you are a two-man team bootstrapping a proof of concept or an established one battling with high throughput and heavy load; this post can serve as a guidepost in your decision process. Before going into the details, a brief history lesson on how these technologies emerged is pertinent; you must understand the optimal conditions for running these systems and how they operate in the wild before making an informed choice.
For more information on Integrate.io's native MongoDB connector, visit our Integration page.
Table of Contents
- The Emergence of NoSQL
- CAP Theorem
- DynamoDB vs MongoDB: 7 Critical Differences
- How to Decide Between the Two
The Emergence of NoSQL
Before the era of Big Data, relational database management systems (RDBMS) were king. The relational model pairs well with traditional client-server business applications that inherently operate on structured data. Classical relational databases follow the ACID property. That is, a database transaction must be Atomic, Consistent, Isolated, and Durable. In a nutshell, this guarantees consistency; every modification on data will transfer the database from one consistent state to another consistent state.
However, many of these systems could not cost-effectively scale with massive volumes of unstructured data; engineering teams began looking for alternatives. NoSQL ("Not Only SQL") came to the fore with creations such as MapReduce and Bigtable, Cassandra, MongoDB, and DynamoDB. The real advantage of NoSQL is horizontal scaling (or "sharding"), meaning that one scales by adding more machines into the pool of resources. For example, each row is stored independently, allowing even distribution across nodes in a cluster. This is opposed to vertical scaling, where one increases the size and computing power of a single instance or node without increasing the number of nodes or instances.
In the year 2000, Dr. Eric Brewer gave a keynote speech at the Principles of Distributed Computing conference called "Towards Robust Distributed Systems". Here he posed CAP theorem which states that a distributed (i.e. scalable) system cannot guarantee Consistency, Availability, and Partition Tolerance in unison; there is assurance for only two out of the three:
- AP: Highly available and partition tolerant, but not consistent.
- CP: Consistent and partition tolerant, but not highly available.
- CA: Highly available and consistent, but not partition tolerant.
RDBMS systems are mainly characterized as CA systems. There is no partition tolerance, and therefore they are usually implemented as a single node, resulting in expensive vertical scaling.
If a NoSQL distributed database chooses availability over consistency (it is an AP system), it cannot provide ACID transactions. Instead, systems like this typically offer a set of properties known as BASE (Basically Available, Soft state, Eventual consistency) which provides a weaker degree of reliability for transactions.
Daniel J. Abadi of Yale University wrote a paper called "Consistency Tradeoffs in Modern Distributed Database System Design" which outlined some of the shortcomings of CAP. The PACELC theorem explores the scenario when there is no partitioning (i.e., when the network is healthy). The acronym means if we suffer from network partitioning (P), we have to choose between availability (A) or consistency (C), else (E) we have to choose between latency (L) or consistency (C). PAC is CAP backward, and the ELC is the extension.
It is worth mentioning the emergence of NewSQL relational database management systems (DBMS); which aim to match the elastic scalability and performance of NoSQL systems for OLTP (Online Transactional Processing) while giving RDBMS level ACID compliance for transactions. VoltDB is a good example, which provides strong consistency (CP) and chooses consistency (C) over availability (A), and also provides partition tolerance.
DynamoDB vs MongoDB: 7 Critical Differences
1) Fully Managed
DynamoDB is a fully managed solution. Using a fully managed service reduces the amount of time a team spends on operations; (no pager duty alerts), no servers to update, kernel patches to roll out, SSDs to replace, hardware provisioning, setup/configuration, throughput capacity planning, replication, software patching, or cluster scaling. The focus shifts to application logic where the real value lies. The general rule of thumb is to choose Dynamo for low throughput apps as writes are expensive and consistent reads are twice the cost of eventually consistent reads. MongoDB's Atlas cost comes from infrastructure availability and backups for external managed services; throughput is inclusive of the pricing. If you do not have a dedicated operations person on your team, Dynamo is a better choice.
2) Out Of The Box Security
DynamoDB provides out of the box security; the security model is based on Identity and Access Management (IAM), enabling one to manage access to AWS services and resources securely. One can create and manage AWS users and groups and use permissions to allow and deny their access to AWS resources. IAM has been battle-tested and found to be intuitive and cooperative with limited configuration. It is not possible to access DynamoDB from the open internet as it is not directly addressed, requests route through an API gateway, and AWS manages authorization from here. MongoDB is secure, but the default configuration is not. Because it does not provide out of the box security, it can be particularly vulnerable to breaches.
DynamoDB supports key-value queries. For queries requiring aggregations, graph traversals, or search, data must be further injected into complimentary AWS services, such as Elastic MapReduce or Redshift; this inherently increases latency, cost, and cognitive load for developers. As this is a managed service, it is not possible to mitigate it by tuning certain database elements such as index use, query structure, data models, system configuration (e.g., hardware and OS settings), and application design, which can significantly impact the overall performance of an application. MongoDB's query language allows developers to query and analyze data in many ways; single key, graph traversal, geospatial queries, range, faceted search, and much more. There is minimal latency, and it is possible to obtain deep levels of performance metrics granularity for optimization and tuning purposes if necessary; throughput metrics, Database performance, Resource utilization, Resource saturation, Errors (asserts).
4) Mutable Indexes
MongoDB supports mutable indexes, allowing the structure of a document to be altered based on dynamic development conditions. It is possible to change the structure of a document without having to update the collection schema on the backend. DynamoDB indexes are immutable; one would have to create a new table with the new name and drop/delete the old one, this is not possible in production systems without considerable resources for a safe transition.
5) Data Types
In comparison with MongoDB, DynamoDB has limited support for different data types, and items are restricted to 400 KB as opposed to MongoDB, which supports up to 16MB document size. AWS charges significantly higher operating prices when items exceed 1 KB in size and suggest persisting larger objects in S3. Depending on one's usage, this may or may not be viable – S3 writes can be slow, and high throughput might not be possible. Dynamo only supports one numeric type and does not support dates.
Communication with MongoDB requires socket connections which, depending on our use case, can be a bit of a bottleneck in application performance. DynamoDB, on the other hand, relies on a widely used method of HTTPS API endpoints. As such, the concurrency model and performance predictability are much easier with DynamoDB.
7) Replication & Distribution
As an AWS managed service, DynamoDB provides Multi-AZ and Multi-Region data replication out of the box. MongoDB can support multi-node clusters, however, it can be challenging to set up. Mango Atlas can simplify the process, however, it still lacks the simplicity that is offered with DynamDB.
How to Decide Between the Two
Deciding between these two capable technologies isn’t about which is better than the other. The decision is ultimately about which provides the best features for your specific use cases.
As of version 4.0, Mongo does support ACID transactions. However, it is a new feature and thus many developers may find its transactional capabilities inferior to other databases. If transactional operations are important for your application, you may be better off sticking with a Dynamo that has extensive experience in that area.
MongoDB runs with defaults that permit unrestricted and direct access to data without authentication. Developers will need to spend extra time upfront reconfiguring security on a Mongo. Dynamo, on the other hand, by default has data encrypted at rest and in transit. Thus, developers won’t have as much upfront security configuration to deal with when getting started with Dynamo.
MongoDB stores query data in RAM which means query performance is significantly better than what you would find with Dynamo. So if speed is a concern for your app, MongoDB may be the best choice.
Does your team have the skill to implement Mongo? Will they be able to support it and make sure it runs smoothly? If not, you may be better off going with Dynamo as a managed solution.
Using DynamoDB may lead you to vendor lock-in. AWS uses a proprietary database model; moving to an alternative cloud provider would require significant resources to architect a new database system. Moreover, once you are dependent on multiple AWS services, it becomes increasingly difficult to focus on a multi-cloud strategy.
The core issue is the cost of changing technologies and the resulting risk of disruption to the business. MongoDB is a transparent, open-source solution runnable anywhere; although MongoDB's SSPL(“Server Side Public License”) license has yet to be approved by the OSI (“Open Source Initiative”), the broader FOSS(“Free and open-source software”) community accepts it. As quoted by the Fedora project "Would you buy a car where the hood cannot be opened, and you will not be able to fix what's wrong or know what's happening?"
A Note on AWS Integration
If you are already heavily invested in the AWS ecosystem, DynamoDB is the better choice. It provides seamless integration with services such as Redshift (large scale data analysis), Cognito (identity pools), Elastic Map Reduce (EMR), Data Pipeline, Kinesis, and S3. Dynamo has tight integration with AWS lambda via Streams and aligns with the server-less philosophy; automatic scaling according to your application load, pay-per-what-you-use pricing, easy to get started with, and no servers to manage.
There is no size fits all, and every production system is different with its own needs and quirks; the following questions should lead to answers that will cement your position on which one to choose.
Is your team deploying a mission-critical application that must be highly available at all times without manual intervention?
Are you comfortable running on proprietary software, without control or knowing what's going on under the hood?
No matter which database system you choose, migrating your data into it could present serious challenges. If you're suffering from a data migration bottleneck, Integrate.io's automated ETL platform can help. Integrate.io offers a visual, no-code interface that makes data migration a snap. Check out our hundreds of out-of-the-box integrations or schedule a demo to find out how Integrate.io can help you with your unique ETL challenges.