-
-
Notifications
You must be signed in to change notification settings - Fork 3k
[mypyc] Optionally log a sampled operation trace to a file #19457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Logging executed ops is useful for performance analysis. For example, we can look for functions which perform many slow operations and try to optimize them. I've already used this successfully to implement several optimizations. A typical optimization that this helps with is replacing a generic Python function call operation with a native call. Compile using `MYPYC_LOG_TRACE=1 mypyc ...` to enable trace logging. The log will be written to `mypyc_trace.txt`. Roughly 1/1000 of ops of certain kinds (e.g. primitive calls) are logged. This can also be enabled by passing `log_trace=True` to `mypycify`. Compared to profiling, this logging data is frequency-oriented instead of CPU time oriented, and it's mostly helpful for micro-optimizations. It also needs some understanding of mypyc internals to be useful. It's not generally possible to reconstruct call stacks from the event data (but we could improve this). However, there is very little noise in the data and even small improvements can be visible. Logging isn't impacted by C compiler optimizations, so for a faster iteration loop, optimizations can be disabled. In the future this could possibly be used for profile-guided optimizations, but we'd probably need to adjust the data collection a bit for this use case. This is currently not documented and mostly intended for mypy or mypyc maintainers for now. Also no tests yet, since this is not a user-evel feature and it's disabled by default. Random example of log entries from mypy self check: ``` mypy.typeops.TypeVarExtractor._merge:1146:call_c:CPyList_Extend mypy.semanal.SemanticAnalyzer.lookup::primitive_op:list_get_item_unsafe mypy.expandtype.ExpandTypeVisitor.visit_type_var__TypeVisitor_glue:239:call:mypy.expandtype.ExpandTypeVisitor.visit_type_var mypy.applytype.apply_generic_arguments:111:call_c:CPy_NoErrOccurred mypy.indirection.TypeIndirectionVisitor.visit_callable_type__TypeVisitor_glue:118:call:mypy.indirection.TypeIndirectionVisitor.visit_callable_type mypy.fastparse.ASTConverter.visit_Call::primitive_op:buf_init_item mypy.semanal.SemanticAnalyzer.is_func_scope::primitive_op:int_eq mypy.meet.is_overlapping_types:397:call:mypy.meet._is_subtype_is_overlapping_types_obj mypy.types.CallableType.serialize:2287:call_c:CPyList_SetItemUnsafe mypy.checkexpr.ExpressionChecker.check_argument_types:2576:call_c:CPyList_SetItemUnsafe ``` For example, let's look at this line: ``` mypy.typeops.TypeVarExtractor._merge:1146:call_c:CPyList_Extend ``` In method `TypeVarExtractor._merge`, on line 1146 of `mypy/typeops.py`, the C primitive CPyList_Extend was called (corresponds to `list.extend`). I'll later add some documentation to the wiki or other developer docs and give examples of using and analyzing the data.
p-sawicki
reviewed
Jul 16, 2025
p-sawicki
approved these changes
Jul 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks useful!
JukkaL
added a commit
that referenced
this pull request
Jul 16, 2025
Specialize for concrete sequence types (`list` and `tuple`) for faster iteration, since the traversal code is very hot. I used log tracing (#19457) to identify functions where PyObject_GetIter was called frequently. This is a aart of a set of micro-optimizations that improve self check performance by ~5.5%.
JukkaL
added a commit
that referenced
this pull request
Jul 16, 2025
Use precalculated set objects in more places. This is similar to #19385. Some cases were still unoptimized. I used trace logging (#19457) to identify functions where we were creating many set objects, and I noticed that these were unncessary. This is a part of a set of micro-optimizations that improve self check performance by ~5.5%.
JukkaL
added a commit
that referenced
this pull request
Jul 16, 2025
Mypyc doesn't generate very efficient code for enums, so switch to a regular class. We can later revert the change if/when we can improve enum support in mypyc. Operations related to ArgKind were pretty prominent in the op trace log (#19457). By itself this improves performance by ~1.7%, based on `perf_compare.py`, which is significant: ``` master 4.168s (0.0%) | stdev 0.037s HEAD 4.098s (-1.7%) | stdev 0.028s ``` This is a part of a set of micro-optimizations that improve self check performance by ~5.5%.
JukkaL
added a commit
that referenced
this pull request
Jul 16, 2025
Use precalculated set objects in more places. This is similar to #19385. Some cases were still unoptimized. I used trace logging (#19457) to identify functions where we were creating many set objects, and I noticed that these were unnecessary. This is a part of a set of micro-optimizations that improve self check performance by ~5.5%.
JukkaL
added a commit
that referenced
this pull request
Jul 17, 2025
Specialize for concrete sequence types (`list` and `tuple`) for faster iteration, since the traversal code is very hot. I used trace logging (#19457) to identify functions where `PyObject_GetIter` was called frequently. This is a part of a set of micro-optimizations that improve self check performance by ~5.5%.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Logging executed ops is useful for performance analysis. For example, we can look for functions which perform many slow operations and try to optimize them. I've already used this successfully to implement several optimizations. A typical optimization that this helps with is replacing a generic Python function call operation with a native call. This has also helped me identify inefficient code generated by mypyc.
Compile using
MYPYC_LOG_TRACE=1 mypyc ...
to enable trace logging. The log will be written tomypyc_trace.txt
. Roughly 1/1000 of ops of certain kinds (e.g. primitive calls) are logged.This can also be enabled by passing
log_trace=True
tomypycify
.Compared to profiling, this logging data is frequency-oriented instead of CPU time oriented, and it's mostly helpful for micro-optimizations. It also needs some understanding of mypyc internals to be useful. It's not generally possible to reconstruct call stacks from the event data (but we could improve this). However, there is very little noise in the data and even small improvements can be visible.
Logging isn't impacted by C compiler optimizations, so for a faster iteration loop, optimizations can be disabled.
In the future this could possibly be used for profile-guided optimizations, but we'd probably need to adjust the data collection a bit for this use case.
This is currently not documented and mostly intended for mypy or mypyc maintainers for now. Also no tests yet, since this is not a user-evel feature and it's disabled by default.
Random example of log entries from mypy self check:
For example, let's look at this line:
In method
TypeVarExtractor._merge
, on line 1146 ofmypy/typeops.py
, the C primitive CPyList_Extend was called (corresponds tolist.extend
).I'll later add some documentation to the wiki or other developer docs and give examples of using and analyzing the data.