-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Rust: Add predicate for certain type information #20155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
21f2c08
to
f5f7b61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new "phase" to Rust type inference by adding an inferCertainType
predicate that contains only type inference rules guaranteed to be 100% correct. This addresses performance issues caused by cycles in type inference that can occur when inaccurate approximations lead to multiple types being inferred for the same term.
Key changes include:
- Implementation of
inferCertainType
predicate for accurate type inference on annotated terms and simple calls - Integration of certain type information into the main
inferType
predicate - Prevention of inaccurate type propagation when certain information is available
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
rust/ql/lib/codeql/rust/internal/TypeInference.qll | Adds CertainTypeInference module and integrates it into main type inference |
rust/ql/lib/codeql/rust/internal/TypeInferenceConsistency.qll | Adds consistency check for non-unique certain types |
rust/ql/test/library-tests/type-inference/dereference.rs | Adds test case demonstrating the cycle-causing scenario |
rust/ql/test/library-tests/type-inference/main.rs | Updates test expectation comment |
*.expected files | Updates test expectations reflecting improved type inference results |
Comments suppressed due to low confidence (3)
rust/ql/lib/codeql/rust/internal/TypeInference.qll:1
- The predicate
typeMentionIsComplete
should have a documentation comment explaining what constitutes a 'complete' type mention and why this check is important for certain type inference.
/** Provides functionality for inferring types. */
rust/ql/lib/codeql/rust/internal/TypeInference.qll:1
- This complex predicate should have comprehensive documentation explaining its purpose, the rationale behind each constraint, and providing examples of calls that would and wouldn't be considered 'certain'.
/** Provides functionality for inferring types. */
rust/ql/lib/codeql/rust/internal/TypeInference.qll:1
- This guard condition is critical for preventing cycles. The comment should be expanded to explain how this prevents the specific cycle described in the PR description and why it's safe to block all uncertain type propagation when certain information exists.
/** Provides functionality for inferring types. */
f5f7b61
to
3ba285c
Compare
The last DCA report contains the new "Nodes With Type At Length Limit" metric: https://github.com/github/codeql-dca-main/blob/data/paldepind/PR-20155-0-rust__2/reports/readme.md#nodes-with-type-at-length-limit. The changes look pretty good, especially on Databend (though overall analysis time is still down a bit). |
Motivated by performance problems seen in #20140 this PR introduces a new "phase" to type inference in the form of a new
inferCertainType
predicate.Motivation
There's a lot of places where we approximate things in type inference. These inaccuracies cause us to sometimes infer multiple types for the same term. Inaccurate types can lead to other inaccurate types and things can spiral and blow up from there. In the worst case we end up with cycles in type inference.
In #20140 performance blew up in Sway at a place where several cycles in inference happened. This PR adds a reduced version of that problem, which was caused by a type coercion that we don't handle. I don't see an easy way to handle such coercions correctly, and more generally think it'll be hard to make our entire type inference 100% correct.
Proposed solution
This PR adds a new "phase" to type inference, which is just a new predicate called
inferCertainType
. The idea is thatinferCertainType
contains a subset of the type inference that is sure to be 100% correct. This implies thatinferCertainType
(ideally) never contains multiple types for the same node and path.inferType
uses (and is a superset) ofinferCertainType
, but not the other way around. This means that we can useinferCertainType
negated ininferType
to eliminate or block inaccuracies when we actually have accurate information.This PR implements a minimum viable version of the idea which is sufficient to fix the cycle in Sway (and the test). We infer certain type information only for 1/ annotated terms and 2/ very simple calls and let the information propagate along a few equalities. The PR also uses a negated
inferCertainType
to blockinferTypeEquality
from letting (wrong) type information flow to a node where we have complete correct type information. This is what cuts the cycle in the example.Pros of this approach:
not inferCertainType(n, _)
, i.e., if the simple and correct rules give us a type we can stop inference there.Cons:
Results on #20140
With this PR the number of nodes with a type at the type path lenght limit in Sway is reduced from 76 and to 63.
When quick eval'ing
inferType
on Sway I see the following:main
So this PR seems to remove the blowup caused by #20140 for Sway.
Future work
As mentioned this is just a minimum viable implementation.
We should be able to move more stuff into
inferCertainType
.Right now
inferCertainType(n, path)
is only allowed to give results if we have complete information of the entire type tree forn
. It would be nice to lift this requirement s.t. it is allowed to have results as long as the type specifically atpath
is certain.This will make it much easier to soundly include things in
inferCertainType
. For instance, then we can have a certain root type forSome(a)
without necessarily having a certain type for the nested type.However, we still need to know when we have complete certain information of a node (to implement the block in
inferTypeEquality
). Ideally that could be derived frominferCertainType
by checking if the type tree is complete, but that is more work to implement.