You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When passing a non-BMP character such as '𣄃' (U+23103) into an AssemblyScript function, the string gets split into two UTF-16 code units (a surrogate pair), which is expected. However, this becomes problematic when attempting to encode the string into a UTF-8 byte stream for hashing (e.g., MD5), as AssemblyScript provides no built-in way to handle surrogate pairs as a single Unicode code point.
This leads to incorrect behavior, especially in cryptographic applications (e.g., generating an MD5 hash of a string containing such characters), where string encoding must exactly match UTF-8 as produced by JavaScript's TextEncoder.