Documentation
¶
Overview ¶
Windows UTF-16 strings can contain unpaired surrogates, which can't be decoded into a valid UTF-8 string. This file defines a set of functions that can be used to encode and decode potentially ill-formed UTF-16 strings by using the [the WTF-8 encoding](https://simonsapin.github.io/wtf-8/).
WTF-8 is a strict superset of UTF-8, i.e. any string that is well-formed in UTF-8 is also well-formed in WTF-8 and the content is unchanged. Also, the conversion never fails and is lossless.
The benefit of using WTF-8 instead of UTF-8 when decoding a UTF-16 string is that the conversion is lossless even for ill-formed UTF-16 strings. This property allows to read an ill-formed UTF-16 string, convert it to a Go string, and convert it back to the same original UTF-16 string.
See go.dev/issues/59971 for more info.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func WTF8Decode ¶
WTF8Decode returns the WTF-8 encoding of the potentially ill-formed UTF-16 src.