-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Improve downsample performance by buffering docids and do bulk processing. #124477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @martijnvg, I've created a changelog YAML for you. |
ff95cf7
to
0f70d99
Compare
The result without this change downsampling tsdb index to 1m interval buckets:
The result without this change downsampling tsdb index to 1h interval buckets:
The result with this change downsampling tsdb index to 1m interval buckets:
The result with this change downsampling tsdb index to 1h interval buckets:
|
0f70d99
to
04c909d
Compare
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
} | ||
|
||
void bulkCollection() throws IOException { | ||
// The leaf bucked collectors newer timestamp go first, other we capture the incorrect last value for counters and labels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The leaf bucket collectors with newer timestamp go first, to correctly capture the last value for counters and labels.
if (docValuesCount == 1) { | ||
label.collect(docValues.nextValue()); | ||
} else { | ||
var values = new Object[docValuesCount]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large can this be? Do we need to use bigarrays to track its memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it's smaller than DOCID_BUFFER_SIZE
.. Should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is for multivalued fields. If a document has a json array, then this is the size of the json array (actually the number of all unique values this document has for the field being read). Typically this is small.
Note that this has been this way also before this change. I don't recall issues around this.
firstTimeStampForBulkCollection = aggCtx.getTimestamp(); | ||
} | ||
// buffer.add() always delegates to system.arraycopy() and checks buffer size for resizing purposes: | ||
buffer.buffer[buffer.elementsCount++] = docId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: let's rename buffer
to docIdBuffer
, to avoid the buffer.buffer
occurrence..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, added Nhat too to play it safe.
No description provided.