Improve QuerySet performance by removing limit on server-side chunking #347

Jibola · 2025-07-24T17:32:12Z

Summary

We noticed large dips in performance when attempting to retrieve multiple documents (n > 10_000) for any query. Digging into our logic, we found from setting our cursor.batch_size(100) which would limit database fetches to retrieve at most 100 documents. In cases of 10_000+ documents to fetch, that would generate 100 individual queries to the database.

We do not need to configure batch sizing as PyMongo already handles its own performant chunking policy via the cursor iterator. This PR removes any batching on the iterator and prevents end-user configuration of batch sizing. This significantly reduces the number of requests and leaves it to the server to determine how many documents to return.

Changes in this PR

Removed GET_ITERATOR_CHUNK_SIZE import and usage from the SQL compiler
yield the _make_result output of a "row" (document) from the cursor object.
Updated documentation to reflect the change in cursor iteration behavior

Screenshots

>>> def timer(func, *args, **kwargs):  
...      """Time a function call and return (result, execution_time in SECONDS)"""  
...      start = time.perf_counter()  
...      result = func(*args, **kwargs)  
...      end = time.perf_counter()  
...      exec_time = end - start  
...      return result, exec_time  
...

Before

>>> timer(lambda: Author.objects.filter())
(<MongoQuerySet [<Author: >, *REMOVED FOR EASY READING*, '...(remaining elements truncated)...']>, 0.0006403750012395903)
>>> timer(lambda: list(Author.objects.filter()[:101])[0])
(<Author: Author object (6877c9c7d04f764442793292)>, 0.04128820799815003)
>>> timer(lambda: list(Author.objects.filter()[:10_001])[0])
(<Author: Author object (6877c9c7d04f764442793292)>, 1.9592289169959258)
>>> timer(lambda: list(Author.objects.filter()[:100_001])[0])
(<Author: Author object (6877c9c7d04f764442793292)>, 20.37065891599923)
>>> timer(lambda: list(Author.objects.filter()[:1_000_001])[0])
(<Author: Author object (6877c9c7d04f764442793292)>, 20.618212166999)

# JOIN LOGIC
>>> timer(lambda: list(Author.objects.filter(address__city="London")[:1_000_001])[0])
(<Author: Author object (6877c9c7d04f764442793292)>, 34.79546758400102)

After

>>> import time
>>> timer(lambda: Author.objects.filter())
(<MongoQuerySet [<Author: >, *REMOVED FOR EASY READING* '...(remaining elements truncated)...']>, 0.0009210830030497164)
>>> timer(lambda: list(Author.objects.filter()[:101])[0])
(<Author: Author object (6877c9c7d04f764442793292)>, 0.04708974999812199)
>>> timer(lambda: list(Author.objects.filter()[:10_001])[0])
(<Author: Author object (6877c9c7d04f764442793292)>, 0.23040095800388372)
>>> timer(lambda: list(Author.objects.filter()[:100_001])[0])
(<Author: Author object (6877c9c7d04f764442793292)>, 1.136685874997056)
>>> timer(lambda: list(Author.objects.filter()[:1_000_001])[0])
(<Author: Author object (6877c9c7d04f764442793292)>, 1.4290711669964367)
>>> 

# JOIN LOGIC
>>> timer(lambda: list(Author.objects.filter(address__city="London")[:1_000_001])[0])
(<Author: Author object (6877c9c7d04f764442793292)>, 14.903681124997092)

Copilot

Pull Request Overview

This PR removes the use of Django's GET_ITERATOR_CHUNK_SIZE constant in favor of MongoDB's native document fetching policy. The change allows MongoDB to handle cursor batch sizing according to its own optimization strategies rather than forcing a fixed chunk size.

Key changes:

Removed GET_ITERATOR_CHUNK_SIZE import and usage from the SQL compiler
Implemented adaptive chunk sizing that starts with MongoDB's default and adjusts based on document size
Updated documentation to reflect the change in cursor iteration behavior

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
`django_mongodb_backend/compiler.py`	Updated `execute_sql` and `cursor_iter` methods to remove fixed chunk sizing and implement adaptive batching
`django_mongodb_backend/query.py`	Minor formatting change to method call
`docs/source/releases/5.2.x.rst`	Added release note documenting the removal of `GET_ITERATOR_CHUNK_SIZE`
`docs/source/releases/5.1.x.rst`	Added release note documenting the removal of `GET_ITERATOR_CHUNK_SIZE`

Comments suppressed due to low confidence (1)

django_mongodb_backend/compiler.py:324

Using underscore '_' for the chunk_size parameter is unclear. Consider using a more descriptive name like 'unused_chunk_size' or remove the parameter entirely if it's not needed.

    def cursor_iter(self, cursor, _, columns):

django_mongodb_backend/compiler.py

timgraham

Is there some context for this PR to better understand the desired behavior and the reasoning behind it?

django_mongodb_backend/compiler.py

django_mongodb_backend/query.py

timgraham · 2025-07-24T18:00:48Z

docs/source/releases/5.1.x.rst

@@ -11,6 +11,10 @@ Django MongoDB Backend 5.1.x
  the ``base_field`` uses a database converter.
 - Fixed ``RecursionError`` when using ``Trunc`` database functions on non-MongoDB
  databases.
+- Removed ``GET_ITERATOR_CHUNK_SIZE`` as default from
+  :meth:`SQLCompiler.execute_sql() <django_mongodb_backend.compiler.SQLCompiler.execute_sql>`.


SQLCompiler is a private API and generally not mentioned in the docs, thus linking to it using ":meth:" won't work. You should instead try to describe the practical effect of this change at a higher level.

Also, add a section for QuerySet.iterator() after explain() and explain how chunk_size differs from the usual Django behavior.

Added. Let me know if it's clearer.

Jibola · 2025-07-24T21:21:05Z

Is there some context for this PR to better understand the desired behavior and the reasoning behind it?

Yep. I'll attach all of the context shortly.

django_mongodb_backend/compiler.py

WaVEV · 2025-07-24T22:55:32Z

django_mongodb_backend/compiler.py

-                yield chunk
-                chunk = []
-        yield chunk
+            yield [self._make_result(row, columns)]


🤔 , trying to understand, with the change made in the line 261: cursor.batch_size(chunk_size) should be sufficient, right? this is only for consistency?

I mean: remove the line 261 is the fix.

I think what Emanuel is getting at is that chunk size can be used in two ways, server-side and client-side. I believe cursor.batch_size(chunk_size) is about the server, while this method (among others) control how the rows that the server returns are turned into model instances. I'm not sure if we want to/need to change the behavior client-side, and I'm not certain whether this change is sufficient for that, but I have to study it more.

I'm fine with deferring the changes to cursor_iter since this part is client-side model conversion.

This does result in more yields, meaning more ventures up and down the stack, but we can play around with how that affects performance in a different PR. (This I imagine is not as major).

But the benchmark suite may be the best proxy for deciphering that. Be that the case, we may stand to benefit from the optimization of loosely calculating the number of documents that could have been fetched and then loading them into the model. Once they exist on the cursor, they might as well get flushed!

In the PR description you wrote, "PyMongo already handles its own performant chunking policy." But I'm assuming that MongoDB uses server-side cursors, so it's the server, not PyMongo, that does the chunking of up to 16MB of data?

So if we have no chunking in cursor_iter(), we could have some 16MB of model instances instantiated at a time. This seems it would be quite a bit different from using QuerySet.iterator()'s default behavior of chunk_size=2000 which according to the documentation for QuerySet.iterator():

Assuming rows of 10-20 columns with a mix of textual and numeric data, 2000 is going to fetch less than 100KB of data, which seems a good compromise between the number of rows transferred and the data discarded if the loop is exited early.

I might have some details wrong here, but at least we can clarify the desired behavior before confirming that the implementation matches it.

I'd agree with Emanuel and Tim. The major perf issue seems like it should be resolved by just the cursor.batch_size(chunk_size) change, right? If so, I'd suggest defering the cursor_iter changes to a future ticket (if at all).

Sounds good to me. I'll revert the function.
Funny enough, I just finished running the perf analysis on it. We'll probably fold it in, still (just not for this PR).

Here's both results. Looks like just iterating on the cursor without client-side chunking has generated slightly better performance than the client-side chunking. However, this doesn't factor in more nuanced iterator use.

no_chunking.json
with_chunking.json

issues addressed and client-side chunking will be deferred to a later PR

mongodb#347)

Jibola requested review from timgraham, ShaneHarvey, WaVEV, NoahStapp and Copilot July 24, 2025 17:32

Copilot AI reviewed Jul 24, 2025

View reviewed changes

django_mongodb_backend/compiler.py Outdated Show resolved Hide resolved

django_mongodb_backend/compiler.py Outdated Show resolved Hide resolved

django_mongodb_backend/compiler.py Outdated Show resolved Hide resolved

django_mongodb_backend/compiler.py Outdated Show resolved Hide resolved

NoahStapp previously requested changes Jul 24, 2025

View reviewed changes

django_mongodb_backend/compiler.py Outdated Show resolved Hide resolved

django_mongodb_backend/compiler.py Outdated Show resolved Hide resolved

timgraham reviewed Jul 24, 2025

View reviewed changes

Jibola changed the title ~~Remove use of GET_ITERATOR_CHUNK_SIZE in favor of MongoDB native document fetching policy~~ Ignore chunk_size usage in favor of MongoDB's native cursor iteration Jul 24, 2025

Jibola commented Jul 24, 2025

View reviewed changes

django_mongodb_backend/compiler.py Outdated Show resolved Hide resolved

django_mongodb_backend/compiler.py Outdated Show resolved Hide resolved

django_mongodb_backend/compiler.py Outdated Show resolved Hide resolved

WaVEV reviewed Jul 24, 2025

View reviewed changes

Jibola requested review from timgraham, WaVEV and NoahStapp July 25, 2025 19:51

timgraham changed the title ~~Ignore chunk_size usage in favor of MongoDB's native cursor iteration~~ Improve QuerySet performance by removing limit on server-side chunking Jul 25, 2025

Improve QuerySet performance by removing limit on server-side chunking

defec37

timgraham force-pushed the remove-get-default-chunk-size branch from c65b7e5 to defec37 Compare July 25, 2025 20:01

timgraham approved these changes Jul 25, 2025

View reviewed changes

WaVEV approved these changes Jul 25, 2025

View reviewed changes

Jibola merged commit 8077041 into main Jul 26, 2025
17 checks passed

timgraham deleted the remove-get-default-chunk-size branch July 26, 2025 13:57

timgraham pushed a commit to timgraham/django-mongodb-backend that referenced this pull request Jul 28, 2025

Improve QuerySet performance by removing limit on server-side chunking (

fc4ba85

mongodb#347)

timgraham pushed a commit to timgraham/django-mongodb-backend that referenced this pull request Jul 28, 2025

Improve QuerySet performance by removing limit on server-side chunking (

b722252

mongodb#347)

Improve QuerySet performance by removing limit on server-side chunking #347

Improve QuerySet performance by removing limit on server-side chunking #347

Uh oh!

Conversation

Jibola commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes in this PR

Screenshots

Before

After

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timgraham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

timgraham Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Jibola Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Jibola commented Jul 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WaVEV Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timgraham Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Jibola Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

timgraham Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Jibola Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Jibola commented Jul 24, 2025 •

edited

Loading

WaVEV Jul 24, 2025 •

edited

Loading

Jibola Jul 25, 2025 •

edited

Loading