-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Speed up block serialization #124394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up block serialization #124394
Conversation
8fa9212
to
53f0fb8
Compare
53f0fb8
to
502d522
Compare
Hi @dnhatn, I've created a changelog YAML for you. |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
BOOLEAN(0, "Boolean", BlockFactory::newBooleanBlockBuilder, BooleanBlock::readFrom), | ||
INT(1, "Int", BlockFactory::newIntBlockBuilder, IntBlock::readFrom), | ||
LONG(2, "Long", BlockFactory::newLongBlockBuilder, LongBlock::readFrom), | ||
FLOAT(3, "Float", BlockFactory::newFloatBlockBuilder, FloatBlock::readFrom), | ||
DOUBLE(4, "Double", BlockFactory::newDoubleBlockBuilder, DoubleBlock::readFrom), | ||
/** | ||
* Blocks containing only null values. | ||
*/ | ||
NULL("Null", (blockFactory, estimatedSize) -> new ConstantNullBlock.Builder(blockFactory)), | ||
NULL(5, "Null", (blockFactory, estimatedSize) -> new ConstantNullBlock.Builder(blockFactory), BlockStreamInput::readConstantNullBlock), | ||
|
||
BYTES_REF("BytesRef", BlockFactory::newBytesRefBlockBuilder), | ||
BYTES_REF(6, "BytesRef", BlockFactory::newBytesRefBlockBuilder, BytesRefBlock::readFrom), | ||
|
||
/** | ||
* Blocks that reference individual lucene documents. | ||
*/ | ||
DOC("Doc", DocBlock::newBlockBuilder), | ||
DOC(7, "Doc", DocBlock::newBlockBuilder, in -> { throw new UnsupportedOperationException("can't read doc blocks"); }), | ||
|
||
/** | ||
* Composite blocks which contain array of sub-blocks. | ||
*/ | ||
COMPOSITE("Composite", BlockFactory::newAggregateMetricDoubleBlockBuilder), | ||
COMPOSITE(8, "Composite", BlockFactory::newAggregateMetricDoubleBlockBuilder, CompositeBlock::readFrom), | ||
|
||
/** | ||
* Intermediate blocks which don't support retrieving elements. | ||
*/ | ||
UNKNOWN("Unknown", (blockFactory, estimatedSize) -> { throw new UnsupportedOperationException("can't build null blocks"); }); | ||
UNKNOWN(9, "Unknown", (blockFactory, estimatedSize) -> { throw new UnsupportedOperationException("can't build null blocks"); }, in -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minute point: for future extensibility, maybe space out (multiple of two) the writeable code instead of using consecutive numbers: e.g.:
0 - null
1 - unknown
2-3 - unused
4-15: java primitives (including those not supported yet such as byte)
16-32: rest of the objects (doc, composite, etc..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can add new element types with the next ids.
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/ElementType.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Nhat!
Thanks everyone! |
Currently, we use NamedWriteable for serializing blocks. While convenient, it incurs a noticeable performance penalty when pages contain thousands of blocks. Since block types are small and already centered in ElementType, we can safely switch from NamedWriteable to typed code. For example, the NamedWriteable alone of a small page with 10K fields would be 180KB, whereas the new method reduces it to 10KB. Below are the serialization improvements with FROM idx | LIMIT 10000 where the target index has 10K fields: - write_exchange_response executed 173 times took: 73.2ms -> 26.7ms - read_exchange_response executed 173 times took: 49.4ms -> 25.8ms
Currently, we use NamedWriteable for serializing blocks. While convenient, it incurs a noticeable performance penalty when pages contain thousands of blocks. Since block types are small and already centered in ElementType, we can safely switch from NamedWriteable to typed code. For example, the NamedWriteable alone of a small page with 10K fields would be 180KB, whereas the new method reduces it to 10KB. Below are the serialization improvements with FROM idx | LIMIT 10000 where the target index has 10K fields: - write_exchange_response executed 173 times took: 73.2ms -> 26.7ms - read_exchange_response executed 173 times took: 49.4ms -> 25.8ms
Currently, we use NamedWriteable for serializing blocks. While convenient, it incurs a noticeable performance penalty when pages contain thousands of blocks. Since block types are small and already centered in ElementType, we can safely switch from NamedWriteable to typed code. For example, the NamedWriteable alone of a small page with 10K fields would be 180KB, whereas the new method reduces it to 10KB. Below are the serialization improvements with FROM idx | LIMIT 10000 where the target index has 10K fields: - write_exchange_response executed 173 times took: 73.2ms -> 26.7ms - read_exchange_response executed 173 times took: 49.4ms -> 25.8ms
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
Currently, we use NamedWriteable for serializing blocks. While convenient, it incurs a noticeable performance penalty when pages contain thousands of blocks. Since block types are small and already centered in ElementType, we can safely switch from NamedWriteable to typed code. For example, the NamedWriteable alone of a small page with 10K fields would be 180KB, whereas the new method reduces it to 10KB. Below are the serialization improvements with FROM idx | LIMIT 10000 where the target index has 10K fields: - write_exchange_response executed 173 times took: 73.2ms -> 26.7ms - read_exchange_response executed 173 times took: 49.4ms -> 25.8ms (cherry picked from commit 79a1626)
Currently, we use NamedWriteable for serializing blocks. While convenient, it incurs a noticeable performance penalty when pages contain thousands of blocks. Since block types are small and already centered in ElementType, we can safely switch from NamedWriteable to typed code. For example, the NamedWriteable alone of a small page with 10K fields would be 180KB, whereas the new method reduces it to 10KB. Below are the serialization improvements with FROM idx | LIMIT 10000 where the target index has 10K fields: - write_exchange_response executed 173 times took: 73.2ms -> 26.7ms - read_exchange_response executed 173 times took: 49.4ms -> 25.8ms (cherry picked from commit 79a1626)
Adjust wire version after backporting to 8.x. Relates #124394
Adjust wire version after backporting to 8.x. Relates elastic#124394
Adjust wire version after backporting to 8.x. Relates elastic#124394
Currently, we use
NamedWriteable
for serializing blocks. While convenient, it incurs a noticeable performance penalty when pages contain thousands of blocks. Since block types are small and already centered inElementType
, we can safely switch fromNamedWriteable
to typed code. For example, theNamedWriteable
alone of a small page with 10K fields would be 180KB, whereas the new method reduces it to 10KB. Below are the serialization improvements withFROM idx | LIMIT 10000
where the target index has 10K fields:positionCount
as we should already have it from the page.