Article
Microservices are another architecture technique that uses numerous databases, with each microservice having a database that is better tuned for the duties of that service. For example, you can use MySQL for primary storage, Redis and Memcache for caching, Elastic Search, or native Sphinx for searching. You can use Kafka to send data to the analytics system, which was previously done with Hadoop.
When it comes to primary operational storage, there are two possibilities. With the SQL language, we can choose relational databases. Alternatively, we might use a non-relational database and select one of the available kinds.
When it comes to NoSQL data models, there are many options. Key-value, document, and comprehensive column databases are the most common.
Memcache, MongoDB, and Cassandra are three examples.
What’s more noteworthy is that the same pattern has been found for several database kinds: open source databases are the most popular for numerous types, including columnar databases, time series, and document stories. Only traditional technologies, such as relational database data or even older ones like multivalue databases, are covered by commercial licensing.
We deal with many clients at Percona, and we work closely with the most popular relational and non-relational open-source databases (MySQL, PostgreSQL, and MongoDB). We assist them in making decisions and provide the best advice for each case.
With that in mind, the goal of this post is to demonstrate scenarios worth examining before implementing MongoDB, leading you to decide when and when not to utilize it. Additionally, if you already have your setup, this article may be of interest to you, as some of the following topics may have gone missed throughout the product review process.
Contents Table of Contents
Here’s a list of subjects I’ll go through in more depth in this article:
- Experience and preferences of the team
- The method of development and the app’s lifecycle
- Scalability
- Administration for Data Model
- Transactions and Consistency (ACID)
1. Team Preferences and Experience
Before digging into MongoDB, the most crucial thing to consider is the team’s experience and preferences.
The advantage from MongoDB’s perspective is that we have flexible JSON format documents, which is helpful for some activities and developers. It’s difficult for some teams, primarily if they’ve worked with SQL databases for a long time and are well-versed in relational algebra and SQL language.
You may quickly become familiar with CRUD operations in MongoDB, such as:
- find()
- sinsert()
- supdate()
- sdelete()
Simple queries have a lower chance of causing issues. Still, when a daily task comes that requires more in-depth data processing, you’ll need a powerful tool to handle it, such as the MongoDB aggregation pipeline and map-reduce, which we’ll go over in more detail later in this article.
MongoDB University offers excellent and accessible classes that will benefit the team’s expertise. Still, keep in mind that reaching the top of the learning curve may take some time if the unit isn’t entirely comfortable with it.
2. Application Lifecycle and Development Methodology
When it comes to MongoDB-powered applications, the emphasis is on rapid development because anything may be changed at any moment. You won’t have to worry about the document’s strict format.
The second thing to think about is a data schema. It’s important to remember that data always has a schema; the only question is how it’s implemented. Because this is the data you use, you can implement the data schema in your application. Alternatively, the schema might be implemented at the database level.
It happens rather frequently when you have a single application that exclusively works with database data. The application-level schema, for example, works well when saving data from the application to a database. However, if multiple apps use the same data, it becomes unpleasant and difficult to manage.
The application development cycle can be viewed from the following perspective:
- The rate of development
- There is no requirement to synchronize the database and application schemas.
- It is obvious how to scale up even more.
- Predetermined answers that are simple
3. Information Model
As described in the first paragraph, the data model relies on the application and the team’s experience.
Many web apps’ data is usually simple to show. Because storing the structure, such as an associated array of the program, makes serializing it into a JSON document simple for the developer.
Let’s have a look at an example. We want to save the phone’s contact list. Information can be stored in a single relational table, such as first and last names. When it comes to phone numbers or email addresses, though, one person may have multiples. Suppose we wish to keep information in a relational format. In that case, we should put it in different tables and then join them together, which is less convenient than putting it in a single collection with hierarchical documents.
Example of a Data Model – Contact List
Database with Relationships
- Date of birth, first and last names
- Multiple phone numbers and email addresses are possible for a single person.
- Separate tables should be created for them.
- JSON arrays aren’t your typical extensions.
Database with a Document-Oriented Approach
- Everything is kept together in a single “collection.”
- Arrays and documents that are embedded
However
It’s essential to remember that a more flexible solution will result in a list of documents with wildly varied forms. “With tremendous power comes great responsibility,” as someone once stated.
Unfortunately
Operations frequently fail to manage documents more significant than 16MB or a single collection containing terabytes of data; alternatively, in the worst-case situation, shard-keys were constructed incorrectly.
These anomalies could indicate that your database is transforming into a Data Swamp. It’s a term often used in Big Data deployment to describe data that’s been poorly designed, documented, or maintained.
You don’t have to normalize your data strictly. Still, it is necessary to take the time to consider how you will structure your information once utilizing MongoDB to get the best of both worlds and avoid the drawbacks.
For a greater understanding of data modeling and how it differs, see the blog post “Schema Design in MongoDB vs. Schema Design in MySQL.” The schema validation function, which may use during updates and insertions, is worth emphasizing. On a per-collection basis, you can specify validation criteria to limit the sort of content that is stored.
Terms
There is a lot in common between relational and non-relational DBMSs regarding modeling and querying. In both cases, we’re talking about databases; however, in a non-relational database, a table is commonly referred to as a collection. A field in MongoDB is a column in SQL, and so on.
MongoDB does not have an idea for using JOIN, which we mentioned earlier. In your aggregate pipeline, however, you can utilize $lookup. It only performs a left outer join on your search; frequent $lookup could indicate a data modeling error.
For relational data, we use SQL for access. We employ a standard for MongoDB and many other NoSQL databases, such as CRUD. According to this standard, there are procedures for creating, reading, deleting, and updating documents.
However, the Aggregation Framework will be necessary if we do more complex MongoDB, such as GROUP BY. It is a more complicated interface that displays how we wish to filter, group, and so forth.
4. Consistency and Transactions (ACID)
The issue with bringing this up is that, depending on the business requirements, the database system may need to be ACID compliant. Relational databases are far ahead in this game. Money-related operations are a great example of ACID requirements.
Assume you’re creating a feature that allows you to move money from one account to another. If you successfully withdraw funds from a source account but never credit them to the destination, or if you credit the goal but never withdraw funds from the source to cover it. To keep our system sane, these two writes must either happen or not happen, commonly known as “all or nothing.”
MongoDB did not enable transactions before version 4.0, but it did offer atomic operations within a single document.
The operation will be atomic from the perspective of a single document. If a process changes many documents and a problem happens during the change, some papers will be modified while others will not.
With the release of MongoDB 4.0 and later, this constraint was lifted. MongoDB supports multi-document transactions for cases where atomicity of reads and writes to several documents (in a single or multiple collections) is required. It may be utilized with distributed transactions spanning various operations, displays, databases, documents, and shards.
- MongoDB now enables multi-document transactions on replica sets in version 4.0.
- MongoDB 4.2 supports multi-document transactions on sharded clusters and combines the existing support for multi-document transactions on a Replica. Se
5. The ability to scale
In this context, what is scalability? It refers to how readily a small program may be scaled to millions or even billions of users.
When we discuss the scalability of a cluster where our applications are already large enough, it is evident that one computer, even the most powerful, would not suffice.
It’s also a good idea to discuss if we should scale reads, writes, or data volume. Priorities may differ amongst applications, but they will almost certainly have to deal with all of these issues if the application is large enough.
The initial focus in MongoDB was on scalability over several nodes. Even if it’s a minor application, we can see that in the Sharding feature, which was first published in the early days and has subsequently been improved and refined.
Vertical scalability can be done in MongoDB through Replica Set configuration. Your database can be scaled up and down in a few simple steps, but the important is that just your availability and reads are scaled. Your writing is still confined to a single location, the primary.
However, we know that the application will need more write capacity at some point or that the dataset will grow too large for the Replica Set; therefore, horizontal scaling is recommended by using Sharding, splitting the dataset, and spreading writes across multiple shards.
MongoDB partitioning has significant shortcomings: not all operations work with it, and a bad shard-key design might affect internal cluster processes like automated data splitting or, in the worst-case scenario, necessitate manual re-sharding which is a time-consuming and error-prone procedure.
A resharding functionality was recently introduced with the release of MongoDB 5.0. Like any new feature, my advice is to test it before putting it into production thoroughly. The post Refining Shard Keys in MongoDB 4.4 and Above may help you make a better judgment if you’re thinking about refining your shard-key and then resharding with the new feature.
The administration is number six.
The administration entails all of the details that developers overlook. At the very least, it isn’t their top priority. The necessity to backup, update, monitor, and restore a program in the event of a failure is what administration is all about.
MongoDB is more focused on the traditional approach — administration is kept to a minimum. However, this is at the expense of flexibility. There is a much smaller community of open-source MongoDB solutions. The DB-Engines Ranking given at the beginning of this article and the yearly vote by StackOverflow show that MongoDB is the most popular NoSQL database, although it lacks a robust community. Without question, MongoDB is the most popular NoSQL database, but it lacks a strong community.
Furthermore, many MongoDB recommendations are inextricably linked to the Ops Manager and Atlas services, MongoDB’s commercial platforms.
Until recently, it was running backup/restore processes for Sharded Cluster or ReplicaSet was not a simple task. DBAs were forced to rely on the mongodump/mongorestore tool or File System Snapshot methods.
With capabilities like Percona Hot-Backup and the Percona Backup for MongoDB tool, this situation began to improve.
When we look at the most popular relational database, MySQL, we can see that it is pretty adaptable and various ways. For everything, there are exemplary open-source implementations, which are still vulnerabilities in MongoDB.
Conclusion
I’ve covered a few things that will assist you in your daily tasks, giving you a broad picture of how MongoDB might aid you. It’s important to note that this article is based on MongoDB 5.0, the most recent available release; if your deployment is based on older or deprecated releases, some of the observations and features may not be valid.
Check out our blog if you have an issue or a specific inquiry; we might have written an article about it. We also recommend that you read our white paper, which covers more scenarios and cases where MongoDB is appropriate and when it is not.
MongoDB
Percona Distribution for MongoDB is a free MongoDB database option that combines the best and most essential enterprise components from the open-source community into a single solution that has been built and tested to function together.
Microservices are another architecture technique that employs multiple databases, with each microservice having its database that is better suited to the service’s needs. You can use MySQL for primary storage, Redis and Memcache for caching, Elastic Search or native Sphinx for searching, for example. You can send data to the analytics system using Kafka instead of Hadoop, previously the case.
When it comes to primary operating storage, you have two alternatives. We can choose relational databases using the SQL language. We might also use a non-relational database and select one of the available types.
There are numerous NoSQL data models to choose from. The most prevalent databases are key-value, document, and comprehensive column databases.
Three examples are Memcache, MongoDB, and Cassandra.
We can see from the DB-Engines Ranking that open-source databases have become more popular over time, while commercial databases have steadily declined.
About Enteros
IT organizations routinely spend days and weeks troubleshooting production database performance issues across multitudes of critical business systems. Fast and reliable resolution of database performance problems by Enteros enables businesses to generate and save millions of direct revenue, minimize waste of employees productivity, reduce the number of licenses, servers, and cloud resources and maximize the productivity of the application, database, and operations teams.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Revolutionizing Healthcare IT: Leveraging Enteros, FinOps, and DevOps Tools for Superior Database Software Management
- 21 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Real Estate Operations with Enteros: Harnessing Azure Resource Groups and Advanced Database Software
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Revolutionizing Real Estate: Enhancing Database Performance and Cost Efficiency with Enteros and Cloud FinOps
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enteros in Education: Leveraging AIOps for Advanced Anomaly Management and Optimized Learning Environments
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…