Introduction
Handling a high volume of requests per second while managing an extensive product database remains a key challenge in modern e-commerce platforms. In this discussion, I aim to illustrate the solutions we've implemented within our 'catalog' microservice to address these specific challenges. Initially developed in Django, our choice was driven by the need for rapid development and the inherent convenience of Django's admin interface, which significantly eased the initial setup process.
While managing a vast product inventory within a relational database suffices for smaller online shops with moderate visitor traffic, it becomes a significant hurdle as the platform scales to cater to a larger audience. The simple approach of storing all data in a relational database with multiple relations becomes impractical when faced with a substantial increase in traffic and product entries.
In our 'catalog' microservice, we encountered the limitations of relying solely on a relational database to efficiently handle the increasing flow of requests and a large product inventory. As our platform grew, we recognized the necessity for a more scalable and optimized solution to address these specific challenges.
To overcome these hurdles, our team delved into exploring alternative strategies to manage data efficiently, including considerations for more robust and scalable data storage solutions beyond a traditional relational database model.
Expanding on the specific challenges faced, the limitations encountered, and the subsequent steps taken to enhance scalability and performance would provide a more comprehensive overview of the strategies employed within the 'catalog' microservice.
Today's discussion centers on the challenges posed by relational databases as bottlenecks and the strategies employed to resolve this issue. It's widely acknowledged that a fundamental rule of thumb suggests that as the dataset size grows, non-relational databases often become a more suitable choice. However, despite this, we aim to leverage the advantages offered by relational databases.
The initial concept we pursued was to harness the strengths of both relational and non-relational databases. This entailed seeking a solution that amalgamates the robust relational structure of SQL databases with the agility and speed characteristic of non-relational databases.
Our objective is to strike a balance where we can preserve the relational integrity necessary for certain data aspects, while harnessing the efficiency and speed afforded by non-relational databases. The integration of these two database paradigms serves as a means to optimize performance and scalability while maintaining critical relationships within the data.
Furthermore, our goal is to maintain compatibility with Django models while ensuring seamless integration with Django Admin and Django Rest Framework.
Solution
The main goal is to link each model object in Django with a document in MongoDB. This connection helps in automatically filling up the Django model with data from MongoDB when we read information. Similarly, any changes made in the Django model get reflected back and updated in the MongoDB document.
This objective can be accomplished by dynamically creating descriptors within the Django model. These descriptors serve as proxies to the MongoDB fields and are generated using metaclass. Let's delve into the coding process to implement this solution.
Utilizing the aforementioned descriptor, we enable direct reading of fields from the MongoDB document. However, when it pertains to writing, we store the data in the private field _m_<field>. This approach allows us to consolidate and save all the fields simultaneously
Now, let's transition to the model metaclass, which plays a pivotal role in generating these descriptors.
Here, we iterate through each name of the fields in MongoDB document. For each field name, we create a descriptor with the exact name and connect it to the model class.
Now, let's examine the structure of our base model:
This serves as the base abstract class for our model, establishing the connection between a specific MongoDB document and the model upon initialization. Also, it automatically stores fields that begin with the prefix _m_ into the MongoDB database.
At last, let's examine the concrete implementation of our product model:
Here, the implementation is straightforward. We've defined a MongoDB document containing product fields and connected it to the Product model. Naturally, the Product model can also incorporate relationships.
Performance comparison
To assess performance, we created an endpoint designed to retrieve products within specific category. To evaluate the performance disparity, we employed the OHA tool, configured to perform 10,000 requests across 50 threads. Below are the performance metrics for the standard, basic Django implementation:
After conducting the identical tests on our hybrid-model implementation, we have compiled the following results:
Conclusion
In this blog post, I've introduced the fundamental concept of implementing a hybrid model in Django. The solution with hybrid models has notably enhanced performance, demonstrating an impressive 8-10 times increase in speed. While this is a significant improvement, there remains room for further enhancements. For instance, strategies like loading all MongoDB fields into the model object at once, or loading all MongoDB documents from the Django queryset in a single request could be implemented. Currently, there's a limitation where a call is made to MongoDB for each object in the queryset, which affects efficiency.
Moreover, additional improvements such as implementing caching can further boost performance. Despite these potential optimizations, I trust this demonstration has provided you with an interesting and effective approach to implementing hybrid models in Django. This approach offers a balance between relational and non-relational databases, showcasing the potential for significantly improved application performance