Skip to content

[Bug] Speed up counting entities for copy/graft #5475

@lutter

Description

@lutter

We need to do something about the entity_count for grafts. Right now, when all data has been copied, graph-node will fire off a big query that counts the entities in the graft; that query can take hours in very large subgraphs.

There's a few different ways to handle that:

  • give up on accurate entity counts and set the count for copies/grafts to some fast estimate (either the count from the source, or the estimate that analyze comes up with)
  • count entities while we copy them. We'd have to turn queries of the form insert into dst select * from src into with ranges (insert into .. returning block_range) select count(*) from ranges where block_range @> int32::MAX and then store the counts for each batch in copy_table_state. After data copying has finished, the entity count is a simple aggregation over copy_table_state
  • keep counting entities as a separate step, but break it into batches along vid just like the actual copying does. That would require quite a bit more book keeping as counting can now be interrupted by node restarts

Metadata

Metadata

Assignees

No one assigned

    Labels

    StalebugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions