Skip to content

[mypyc] Optionally log a sampled operation trace to a file #19457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 16, 2025

Conversation

JukkaL
Copy link
Collaborator

@JukkaL JukkaL commented Jul 16, 2025

Logging executed ops is useful for performance analysis. For example, we can look for functions which perform many slow operations and try to optimize them. I've already used this successfully to implement several optimizations. A typical optimization that this helps with is replacing a generic Python function call operation with a native call. This has also helped me identify inefficient code generated by mypyc.

Compile using MYPYC_LOG_TRACE=1 mypyc ... to enable trace logging. The log will be written to mypyc_trace.txt. Roughly 1/1000 of ops of certain kinds (e.g. primitive calls) are logged.

This can also be enabled by passing log_trace=True to mypycify.

Compared to profiling, this logging data is frequency-oriented instead of CPU time oriented, and it's mostly helpful for micro-optimizations. It also needs some understanding of mypyc internals to be useful. It's not generally possible to reconstruct call stacks from the event data (but we could improve this). However, there is very little noise in the data and even small improvements can be visible.

Logging isn't impacted by C compiler optimizations, so for a faster iteration loop, optimizations can be disabled.

In the future this could possibly be used for profile-guided optimizations, but we'd probably need to adjust the data collection a bit for this use case.

This is currently not documented and mostly intended for mypy or mypyc maintainers for now. Also no tests yet, since this is not a user-evel feature and it's disabled by default.

Random example of log entries from mypy self check:

mypy.typeops.TypeVarExtractor._merge:1146:call_c:CPyList_Extend
mypy.semanal.SemanticAnalyzer.lookup::primitive_op:list_get_item_unsafe
mypy.expandtype.ExpandTypeVisitor.visit_type_var__TypeVisitor_glue:239:call:mypy.expandtype.ExpandTypeVisitor.visit_type_var
mypy.applytype.apply_generic_arguments:111:call_c:CPy_NoErrOccurred
mypy.indirection.TypeIndirectionVisitor.visit_callable_type__TypeVisitor_glue:118:call:mypy.indirection.TypeIndirectionVisitor.visit_callable_type
mypy.fastparse.ASTConverter.visit_Call::primitive_op:buf_init_item
mypy.semanal.SemanticAnalyzer.is_func_scope::primitive_op:int_eq
mypy.meet.is_overlapping_types:397:call:mypy.meet._is_subtype_is_overlapping_types_obj
mypy.types.CallableType.serialize:2287:call_c:CPyList_SetItemUnsafe
mypy.checkexpr.ExpressionChecker.check_argument_types:2576:call_c:CPyList_SetItemUnsafe

For example, let's look at this line:

mypy.typeops.TypeVarExtractor._merge:1146:call_c:CPyList_Extend

In method TypeVarExtractor._merge, on line 1146 of mypy/typeops.py, the C primitive CPyList_Extend was called (corresponds to list.extend).

I'll later add some documentation to the wiki or other developer docs and give examples of using and analyzing the data.

Logging executed ops is useful for performance analysis. For example,
we can look for functions which perform many slow operations and try
to optimize them. I've already used this successfully to implement
several optimizations. A typical optimization that this helps with is
replacing a generic Python function call operation with a native call.

Compile using `MYPYC_LOG_TRACE=1 mypyc ...` to enable trace logging.
The log will be written to `mypyc_trace.txt`. Roughly 1/1000 of ops
of certain kinds (e.g. primitive calls) are logged.

This can also be enabled by passing `log_trace=True` to `mypycify`.

Compared to profiling, this logging data is frequency-oriented instead
of CPU time oriented, and it's mostly helpful for micro-optimizations.
It also needs some understanding of mypyc internals to be useful. It's
not generally possible to reconstruct call stacks from the event data
(but we could improve this). However, there is very little noise in
the data and even small improvements can be visible.

Logging isn't impacted by C compiler optimizations, so for a faster
iteration loop, optimizations can be disabled.

In the future this could possibly be used for profile-guided
optimizations, but we'd probably need to adjust the data collection a
bit for this use case.

This is currently not documented and mostly intended for mypy or mypyc
maintainers for now. Also no tests yet, since this is not a user-evel
feature and it's disabled by default.

Random example of log entries from mypy self check:
```
mypy.typeops.TypeVarExtractor._merge:1146:call_c:CPyList_Extend
mypy.semanal.SemanticAnalyzer.lookup::primitive_op:list_get_item_unsafe
mypy.expandtype.ExpandTypeVisitor.visit_type_var__TypeVisitor_glue:239:call:mypy.expandtype.ExpandTypeVisitor.visit_type_var
mypy.applytype.apply_generic_arguments:111:call_c:CPy_NoErrOccurred
mypy.indirection.TypeIndirectionVisitor.visit_callable_type__TypeVisitor_glue:118:call:mypy.indirection.TypeIndirectionVisitor.visit_callable_type
mypy.fastparse.ASTConverter.visit_Call::primitive_op:buf_init_item
mypy.semanal.SemanticAnalyzer.is_func_scope::primitive_op:int_eq
mypy.meet.is_overlapping_types:397:call:mypy.meet._is_subtype_is_overlapping_types_obj
mypy.types.CallableType.serialize:2287:call_c:CPyList_SetItemUnsafe
mypy.checkexpr.ExpressionChecker.check_argument_types:2576:call_c:CPyList_SetItemUnsafe
```

For example, let's look at this line:
```
mypy.typeops.TypeVarExtractor._merge:1146:call_c:CPyList_Extend
```
In method `TypeVarExtractor._merge`, on line 1146 of
`mypy/typeops.py`, the C primitive CPyList_Extend was called
(corresponds to `list.extend`).

I'll later add some documentation to the wiki or other developer docs
and give examples of using and analyzing the data.
@JukkaL JukkaL changed the title [mypyc] Add optional logging of sampled operation trace to a file [mypyc] Optionally log a sampled operation trace to a file Jul 16, 2025
Copy link
Collaborator

@p-sawicki p-sawicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks useful!

@JukkaL JukkaL merged commit 02a472a into master Jul 16, 2025
13 checks passed
@JukkaL JukkaL deleted the mypyc-op-trace-2 branch July 16, 2025 12:43
JukkaL added a commit that referenced this pull request Jul 16, 2025
Specialize for concrete sequence types (`list` and `tuple`) for faster
iteration, since the traversal code is very hot.

I used log tracing (#19457) to identify functions where
PyObject_GetIter was called frequently.

This is a aart of a set of micro-optimizations that improve self check
performance by ~5.5%.
JukkaL added a commit that referenced this pull request Jul 16, 2025
Use precalculated set objects in more places.

This is similar to #19385. Some cases were still unoptimized.

I used trace logging (#19457) to identify functions where we were
creating many set objects, and I noticed that these were unncessary.

This is a part of a set of micro-optimizations that improve self check
performance by ~5.5%.
JukkaL added a commit that referenced this pull request Jul 16, 2025
Mypyc doesn't generate very efficient code for enums, so switch to a
regular class. We can later revert the change if/when we can improve
enum support in mypyc.

Operations related to ArgKind were pretty prominent in the op trace
log (#19457).

By itself this improves performance by ~1.7%, based on
`perf_compare.py`, which is significant:
```
master                    4.168s (0.0%) | stdev 0.037s
HEAD                      4.098s (-1.7%) | stdev 0.028s
```

This is a part of a set of micro-optimizations that improve self check
performance by ~5.5%.
JukkaL added a commit that referenced this pull request Jul 16, 2025
Use precalculated set objects in more places.

This is similar to #19385. Some cases were still unoptimized.

I used trace logging (#19457) to identify functions where we were
creating many set objects, and I noticed that these were unnecessary.

This is a part of a set of micro-optimizations that improve self check
performance by ~5.5%.
JukkaL added a commit that referenced this pull request Jul 17, 2025
Specialize for concrete sequence types (`list` and `tuple`) for faster
iteration, since the traversal code is very hot.

I used trace logging (#19457) to identify functions where
`PyObject_GetIter` was called frequently.

This is a part of a set of micro-optimizations that improve self check
performance by ~5.5%.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants