Documentation
¶
Overview ¶
Package unicodedata provides additional lookup functions for unicode properties, not covered by the standard package unicode.
Index ¶
- Constants
- func Compose(a, b rune) (rune, bool)
- func Decompose(ab rune) (a, b rune, ok bool)
- func IsExtendedPictographic(ch rune) bool
- func IsLargeEastAsian(ch rune) bool
- func IsWord(ch rune) bool
- func LookupCombiningClass(ch rune) uint8
- func LookupMirrorChar(ch rune) rune
- type GeneralCategory
- type GraphemeBreak
- type IndicConjunctBreak
- type LineBreak
- type ScriptVerticalOrientation
- type WordBreak
Constants ¶
const ( HangulSBase = 0xAC00 HangulLBase = 0x1100 HangulVBase = 0x1161 HangulTBase = 0x11A7 HangulSCount = 11172 HangulLCount = 19 HangulVCount = 21 HangulTCount = 28 HangulNCount = HangulVCount * HangulTCount )
Algorithmic hangul syllables [de]composition, used in Compose and Decompose, but also exported for additional shaper processing.
Variables ¶
This section is empty.
Functions ¶
func Compose ¶
Compose composes a sequence of two input Unicode code points by canonical equivalence, returning the composed code, if successful. It returns `false` otherwise
func Decompose ¶
Decompose decomposes an input Unicode code point, returning the two decomposed code points, if successful. It returns `false` otherwise.
func IsExtendedPictographic ¶
func IsLargeEastAsian ¶
IsLargeEastAsian matches runes with East_Asian_Width property of F, W or H, and is used for UAX14, rule LB30.
func IsWord ¶
IsWord returns true for all the runes we may found in a word, that is either a Number (Nd, Nl, No) or a rune with the Alphabetic property
func LookupCombiningClass ¶
LookupCombiningClass returns the class used for the Canonical Ordering Algorithm in the Unicode Standard, defaulting to 0.
From http://www.unicode.org/reports/tr44/#Canonical_Combining_Class: "This property could be considered either an enumerated property or a numeric property: the principal use of the property is in terms of the numeric values. For the property value names associated with different numeric values, see DerivedCombiningClass.txt and Canonical Combining Class Values."
func LookupMirrorChar ¶
LookupMirrorChar finds the mirrored equivalent of a character as defined in the file BidiMirroring.txt of the Unicode Character Database available at http://www.unicode.org/Public/UNIDATA/BidiMirroring.txt.
If the input character is declared as a mirroring character in the Unicode standard and has a mirrored equivalent, it is returned. Otherwise the input character itself is returned
Types ¶
type GeneralCategory ¶
type GeneralCategory uint8
GeneralCategory is an enum storing the Unicode General Category of a rune.
const ( Unassigned GeneralCategory = iota Cc // Control Cf // Format Co // Private_Use Cs // Surrogate Ll // Lowercase_Letter Lm // Modifier_Letter Lo // Other_Letter Lt // Titlecase_Letter Lu // Uppercase_Letter Mc // Spacing_Mark Me // Enclosing_Mark Mn // Nonspacing_Mark Nd // Decimal_Number Nl // Letter_Number No // Other_Number Pc // Connector_Punctuation Pd // Dash_Punctuation Pe // Close_Punctuation Pf // Final_Punctuation Pi // Initial_Punctuation Po // Other_Punctuation Ps // Open_Punctuation Sc // Currency_Symbol Sk // Modifier_Symbol Sm // Math_Symbol So // Other_Symbol Zl // Line_Separator Zp // Paragraph_Separator Zs // Space_Separator )
func LookupGeneralCategory ¶
func LookupGeneralCategory(r rune) GeneralCategory
LookupGeneralCategory returns the unicode general categorie of the rune, or Unassigned if not found.
func (GeneralCategory) IsLetter ¶
func (gc GeneralCategory) IsLetter() bool
IsLetter returns true for Lowercase_Letter, Modifier_Letter, Other_Letter, Titlecase_Letter, Uppercase_Letter
func (GeneralCategory) IsMark ¶
func (gc GeneralCategory) IsMark() bool
IsMark returns true for Spacing_Mark, Enclosing_Mark, Nonspacing_Mark
type GraphemeBreak ¶
type GraphemeBreak uint16
GraphemeBreak is a flag storing the Unicode Grapheme Cluster Break Property.
See https://unicode.org/reports/tr29/#Grapheme_Cluster_Break_Property_Values
const ( GB_CR GraphemeBreak = 1 << 0 GB_Control GraphemeBreak = 1 << 1 GB_Extend GraphemeBreak = 1 << 2 GB_L GraphemeBreak = 1 << 3 GB_LF GraphemeBreak = 1 << 4 GB_LV GraphemeBreak = 1 << 5 GB_LVT GraphemeBreak = 1 << 6 GB_Prepend GraphemeBreak = 1 << 7 GB_Regional_Indicator GraphemeBreak = 1 << 8 GB_SpacingMark GraphemeBreak = 1 << 9 GB_T GraphemeBreak = 1 << 10 GB_V GraphemeBreak = 1 << 11 GB_ZWJ GraphemeBreak = 1 << 12 )
func LookupGraphemeBreak ¶
func LookupGraphemeBreak(ch rune) GraphemeBreak
LookupGraphemeBreak returns the grapheme break property for the rune, or 0.
type IndicConjunctBreak ¶
type IndicConjunctBreak uint8
IndicConjunctBreak is a flag storing the Indic_Conjunct_Break property used for UAX29, rule GB9c.
const ( ICBLinker IndicConjunctBreak = 1 << iota ICBConsonant ICBExtend )
func LookupIndicConjunctBreak ¶
func LookupIndicConjunctBreak(r rune) IndicConjunctBreak
LookupIndicConjunctBreak return the value of the Indic_Conjunct_Break, or zero.
type LineBreak ¶
type LineBreak uint64
LineBreak is a flag storing the Unicode Line Break Property.
See https://unicode.org/reports/tr14/#Properties
const ( LB_AI LineBreak = 1 << 0 // Ambiguous LB_AK LineBreak = 1 << 1 // Aksara LB_AL LineBreak = 1 << 2 // Alphabetic LB_AP LineBreak = 1 << 3 // Aksara_Prebase LB_AS LineBreak = 1 << 4 // Aksara_Start LB_B2 LineBreak = 1 << 5 // Break_Both LB_BA LineBreak = 1 << 6 // Break_After LB_BB LineBreak = 1 << 7 // Break_Before LB_BK LineBreak = 1 << 8 // Mandatory_Break LB_CB LineBreak = 1 << 9 // Contingent_Break LB_CJ LineBreak = 1 << 10 // Conditional_Japanese_Starter LB_CL LineBreak = 1 << 11 // Close_Punctuation LB_CM LineBreak = 1 << 12 // Combining_Mark LB_CP LineBreak = 1 << 13 // Close_Parenthesis LB_CR LineBreak = 1 << 14 // Carriage_Return LB_EB LineBreak = 1 << 15 // E_Base LB_EM LineBreak = 1 << 16 // E_Modifier LB_EX LineBreak = 1 << 17 // Exclamation LB_GL LineBreak = 1 << 18 // Glue LB_H2 LineBreak = 1 << 19 // H2 LB_H3 LineBreak = 1 << 20 // H3 LB_HH LineBreak = 1 << 21 // Unambiguous_Hyphen LB_HL LineBreak = 1 << 22 // Hebrew_Letter LB_HY LineBreak = 1 << 23 // Hyphen LB_ID LineBreak = 1 << 24 // Ideographic LB_IN LineBreak = 1 << 25 // Inseparable LB_IS LineBreak = 1 << 26 // Infix_Numeric LB_JL LineBreak = 1 << 27 // JL LB_JT LineBreak = 1 << 28 // JT LB_JV LineBreak = 1 << 29 // JV LB_LF LineBreak = 1 << 30 // Line_Feed LB_NL LineBreak = 1 << 31 // Next_Line LB_NS LineBreak = 1 << 32 // Nonstarter LB_NU LineBreak = 1 << 33 // Numeric LB_OP LineBreak = 1 << 34 // Open_Punctuation LB_PO LineBreak = 1 << 35 // Postfix_Numeric LB_PR LineBreak = 1 << 36 // Prefix_Numeric LB_QU LineBreak = 1 << 37 // Quotation LB_RI LineBreak = 1 << 38 // Regional_Indicator LB_SA LineBreak = 1 << 39 // Complex_Context LB_SG LineBreak = 1 << 40 // Surrogate LB_SP LineBreak = 1 << 41 // Space LB_SY LineBreak = 1 << 42 // Break_Symbols LB_VF LineBreak = 1 << 43 // Virama_Final LB_VI LineBreak = 1 << 44 // Virama LB_WJ LineBreak = 1 << 45 // Word_Joiner LB_ZW LineBreak = 1 << 46 // ZWSpace LB_ZWJ LineBreak = 1 << 47 // ZWJ )
func LookupLineBreak ¶
LookupLineBreak returns the break class for the rune, or 0 (the XX, Unknown class)
type ScriptVerticalOrientation ¶
type ScriptVerticalOrientation struct {
// contains filtered or unexported fields
}
ScriptVerticalOrientation provides the glyph oriention to use for vertical text.
func LookupVerticalOrientation ¶
func LookupVerticalOrientation(s language.Script) ScriptVerticalOrientation
LookupVerticalOrientation returns the prefered orientation for the given script.
func (ScriptVerticalOrientation) Orientation ¶
func (sv ScriptVerticalOrientation) Orientation(r rune) (isSideways bool)
Orientation returns the prefered orientation for the given rune. If the rune does not belong to this script, the default orientation of this script is returned (regardless of the actual script of the given rune).
type WordBreak ¶
type WordBreak uint16
WordBreak is a flag storing the Unicode Word Break Property.
See https://unicode.org/reports/tr29/#Table_Word_Break_Property_Values
const ( WB_ALetter WordBreak = 1 << 0 WB_Double_Quote WordBreak = 1 << 1 WB_ExtendFormat WordBreak = 1 << 2 WB_ExtendNumLet WordBreak = 1 << 3 WB_Hebrew_Letter WordBreak = 1 << 4 WB_Katakana WordBreak = 1 << 5 WB_MidLetter WordBreak = 1 << 6 WB_MidNum WordBreak = 1 << 7 WB_MidNumLet WordBreak = 1 << 8 WB_NewlineCRLF WordBreak = 1 << 9 WB_Numeric WordBreak = 1 << 10 WB_Regional_Indicator WordBreak = 1 << 11 WB_Single_Quote WordBreak = 1 << 12 WB_WSegSpace WordBreak = 1 << 13 )
func LookupWordBreak ¶
LookupWordBreak returns the word break property for the rune, or 0.