-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Labels
Description
We need to do something about the entity_count
for grafts. Right now, when all data has been copied, graph-node
will fire off a big query that counts the entities in the graft; that query can take hours in very large subgraphs.
There's a few different ways to handle that:
- give up on accurate entity counts and set the count for copies/grafts to some fast estimate (either the count from the source, or the estimate that analyze comes up with)
- count entities while we copy them. We'd have to turn queries of the form
insert into dst select * from src
intowith ranges (insert into .. returning block_range) select count(*) from ranges where block_range @> int32::MAX
and then store the counts for each batch incopy_table_state
. After data copying has finished, the entity count is a simple aggregation overcopy_table_state
- keep counting entities as a separate step, but break it into batches along
vid
just like the actual copying does. That would require quite a bit more book keeping as counting can now be interrupted by node restarts