unicodedata

package
v0.3.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 25, 2026 License: BSD-3-Clause, Unlicense Imports: 3 Imported by: 0

Documentation

Overview

Package unicodedata provides additional lookup functions for unicode properties, not covered by the standard package unicode.

Index

Constants

View Source
const (
	HangulSBase  = 0xAC00
	HangulLBase  = 0x1100
	HangulVBase  = 0x1161
	HangulTBase  = 0x11A7
	HangulSCount = 11172
	HangulLCount = 19
	HangulVCount = 21
	HangulTCount = 28
	HangulNCount = HangulVCount * HangulTCount
)

Algorithmic hangul syllables [de]composition, used in Compose and Decompose, but also exported for additional shaper processing.

Variables

This section is empty.

Functions

func Compose

func Compose(a, b rune) (rune, bool)

Compose composes a sequence of two input Unicode code points by canonical equivalence, returning the composed code, if successful. It returns `false` otherwise

func Decompose

func Decompose(ab rune) (a, b rune, ok bool)

Decompose decomposes an input Unicode code point, returning the two decomposed code points, if successful. It returns `false` otherwise.

func IsExtendedPictographic

func IsExtendedPictographic(ch rune) bool

func IsLargeEastAsian

func IsLargeEastAsian(ch rune) bool

IsLargeEastAsian matches runes with East_Asian_Width property of F, W or H, and is used for UAX14, rule LB30.

func IsWord

func IsWord(ch rune) bool

IsWord returns true for all the runes we may found in a word, that is either a Number (Nd, Nl, No) or a rune with the Alphabetic property

func LookupCombiningClass

func LookupCombiningClass(ch rune) uint8

LookupCombiningClass returns the class used for the Canonical Ordering Algorithm in the Unicode Standard, defaulting to 0.

From http://www.unicode.org/reports/tr44/#Canonical_Combining_Class: "This property could be considered either an enumerated property or a numeric property: the principal use of the property is in terms of the numeric values. For the property value names associated with different numeric values, see DerivedCombiningClass.txt and Canonical Combining Class Values."

func LookupMirrorChar

func LookupMirrorChar(ch rune) rune

LookupMirrorChar finds the mirrored equivalent of a character as defined in the file BidiMirroring.txt of the Unicode Character Database available at http://www.unicode.org/Public/UNIDATA/BidiMirroring.txt.

If the input character is declared as a mirroring character in the Unicode standard and has a mirrored equivalent, it is returned. Otherwise the input character itself is returned

Types

type GeneralCategory

type GeneralCategory uint8

GeneralCategory is an enum storing the Unicode General Category of a rune.

const (
	Unassigned GeneralCategory = iota
	Cc                         // Control
	Cf                         // Format
	Co                         // Private_Use
	Cs                         // Surrogate
	Ll                         // Lowercase_Letter
	Lm                         // Modifier_Letter
	Lo                         // Other_Letter
	Lt                         // Titlecase_Letter
	Lu                         // Uppercase_Letter
	Mc                         // Spacing_Mark
	Me                         // Enclosing_Mark
	Mn                         // Nonspacing_Mark
	Nd                         // Decimal_Number
	Nl                         // Letter_Number
	No                         // Other_Number
	Pc                         // Connector_Punctuation
	Pd                         // Dash_Punctuation
	Pe                         // Close_Punctuation
	Pf                         // Final_Punctuation
	Pi                         // Initial_Punctuation
	Po                         // Other_Punctuation
	Ps                         // Open_Punctuation
	Sc                         // Currency_Symbol
	Sk                         // Modifier_Symbol
	Sm                         // Math_Symbol
	So                         // Other_Symbol
	Zl                         // Line_Separator
	Zp                         // Paragraph_Separator
	Zs                         // Space_Separator
)

func LookupGeneralCategory

func LookupGeneralCategory(r rune) GeneralCategory

LookupGeneralCategory returns the unicode general categorie of the rune, or Unassigned if not found.

func (GeneralCategory) IsLetter

func (gc GeneralCategory) IsLetter() bool

IsLetter returns true for Lowercase_Letter, Modifier_Letter, Other_Letter, Titlecase_Letter, Uppercase_Letter

func (GeneralCategory) IsMark

func (gc GeneralCategory) IsMark() bool

IsMark returns true for Spacing_Mark, Enclosing_Mark, Nonspacing_Mark

type GraphemeBreak

type GraphemeBreak uint16

GraphemeBreak is a flag storing the Unicode Grapheme Cluster Break Property.

See https://unicode.org/reports/tr29/#Grapheme_Cluster_Break_Property_Values

const (
	GB_CR                 GraphemeBreak = 1 << 0
	GB_Control            GraphemeBreak = 1 << 1
	GB_Extend             GraphemeBreak = 1 << 2
	GB_L                  GraphemeBreak = 1 << 3
	GB_LF                 GraphemeBreak = 1 << 4
	GB_LV                 GraphemeBreak = 1 << 5
	GB_LVT                GraphemeBreak = 1 << 6
	GB_Prepend            GraphemeBreak = 1 << 7
	GB_Regional_Indicator GraphemeBreak = 1 << 8
	GB_SpacingMark        GraphemeBreak = 1 << 9
	GB_T                  GraphemeBreak = 1 << 10
	GB_V                  GraphemeBreak = 1 << 11
	GB_ZWJ                GraphemeBreak = 1 << 12
)

func LookupGraphemeBreak

func LookupGraphemeBreak(ch rune) GraphemeBreak

LookupGraphemeBreak returns the grapheme break property for the rune, or 0.

type IndicConjunctBreak

type IndicConjunctBreak uint8

IndicConjunctBreak is a flag storing the Indic_Conjunct_Break property used for UAX29, rule GB9c.

const (
	ICBLinker IndicConjunctBreak = 1 << iota
	ICBConsonant
	ICBExtend
)

func LookupIndicConjunctBreak

func LookupIndicConjunctBreak(r rune) IndicConjunctBreak

LookupIndicConjunctBreak return the value of the Indic_Conjunct_Break, or zero.

type LineBreak

type LineBreak uint64

LineBreak is a flag storing the Unicode Line Break Property.

See https://unicode.org/reports/tr14/#Properties

const (
	LB_AI  LineBreak = 1 << 0  // Ambiguous
	LB_AK  LineBreak = 1 << 1  // Aksara
	LB_AL  LineBreak = 1 << 2  // Alphabetic
	LB_AP  LineBreak = 1 << 3  // Aksara_Prebase
	LB_AS  LineBreak = 1 << 4  // Aksara_Start
	LB_B2  LineBreak = 1 << 5  // Break_Both
	LB_BA  LineBreak = 1 << 6  // Break_After
	LB_BB  LineBreak = 1 << 7  // Break_Before
	LB_BK  LineBreak = 1 << 8  // Mandatory_Break
	LB_CB  LineBreak = 1 << 9  // Contingent_Break
	LB_CJ  LineBreak = 1 << 10 // Conditional_Japanese_Starter
	LB_CL  LineBreak = 1 << 11 // Close_Punctuation
	LB_CM  LineBreak = 1 << 12 // Combining_Mark
	LB_CP  LineBreak = 1 << 13 // Close_Parenthesis
	LB_CR  LineBreak = 1 << 14 // Carriage_Return
	LB_EB  LineBreak = 1 << 15 // E_Base
	LB_EM  LineBreak = 1 << 16 // E_Modifier
	LB_EX  LineBreak = 1 << 17 // Exclamation
	LB_GL  LineBreak = 1 << 18 // Glue
	LB_H2  LineBreak = 1 << 19 // H2
	LB_H3  LineBreak = 1 << 20 // H3
	LB_HH  LineBreak = 1 << 21 // Unambiguous_Hyphen
	LB_HL  LineBreak = 1 << 22 // Hebrew_Letter
	LB_HY  LineBreak = 1 << 23 // Hyphen
	LB_ID  LineBreak = 1 << 24 // Ideographic
	LB_IN  LineBreak = 1 << 25 // Inseparable
	LB_IS  LineBreak = 1 << 26 // Infix_Numeric
	LB_JL  LineBreak = 1 << 27 // JL
	LB_JT  LineBreak = 1 << 28 // JT
	LB_JV  LineBreak = 1 << 29 // JV
	LB_LF  LineBreak = 1 << 30 // Line_Feed
	LB_NL  LineBreak = 1 << 31 // Next_Line
	LB_NS  LineBreak = 1 << 32 // Nonstarter
	LB_NU  LineBreak = 1 << 33 // Numeric
	LB_OP  LineBreak = 1 << 34 // Open_Punctuation
	LB_PO  LineBreak = 1 << 35 // Postfix_Numeric
	LB_PR  LineBreak = 1 << 36 // Prefix_Numeric
	LB_QU  LineBreak = 1 << 37 // Quotation
	LB_RI  LineBreak = 1 << 38 // Regional_Indicator
	LB_SA  LineBreak = 1 << 39 // Complex_Context
	LB_SG  LineBreak = 1 << 40 // Surrogate
	LB_SP  LineBreak = 1 << 41 // Space
	LB_SY  LineBreak = 1 << 42 // Break_Symbols
	LB_VF  LineBreak = 1 << 43 // Virama_Final
	LB_VI  LineBreak = 1 << 44 // Virama
	LB_WJ  LineBreak = 1 << 45 // Word_Joiner
	LB_ZW  LineBreak = 1 << 46 // ZWSpace
	LB_ZWJ LineBreak = 1 << 47 // ZWJ

)

func LookupLineBreak

func LookupLineBreak(ch rune) LineBreak

LookupLineBreak returns the break class for the rune, or 0 (the XX, Unknown class)

type ScriptVerticalOrientation

type ScriptVerticalOrientation struct {
	// contains filtered or unexported fields
}

ScriptVerticalOrientation provides the glyph oriention to use for vertical text.

func LookupVerticalOrientation

func LookupVerticalOrientation(s language.Script) ScriptVerticalOrientation

LookupVerticalOrientation returns the prefered orientation for the given script.

func (ScriptVerticalOrientation) Orientation

func (sv ScriptVerticalOrientation) Orientation(r rune) (isSideways bool)

Orientation returns the prefered orientation for the given rune. If the rune does not belong to this script, the default orientation of this script is returned (regardless of the actual script of the given rune).

type WordBreak

type WordBreak uint16

WordBreak is a flag storing the Unicode Word Break Property.

See https://unicode.org/reports/tr29/#Table_Word_Break_Property_Values

const (
	WB_ALetter            WordBreak = 1 << 0
	WB_Double_Quote       WordBreak = 1 << 1
	WB_ExtendFormat       WordBreak = 1 << 2
	WB_ExtendNumLet       WordBreak = 1 << 3
	WB_Hebrew_Letter      WordBreak = 1 << 4
	WB_Katakana           WordBreak = 1 << 5
	WB_MidLetter          WordBreak = 1 << 6
	WB_MidNum             WordBreak = 1 << 7
	WB_MidNumLet          WordBreak = 1 << 8
	WB_NewlineCRLF        WordBreak = 1 << 9
	WB_Numeric            WordBreak = 1 << 10
	WB_Regional_Indicator WordBreak = 1 << 11
	WB_Single_Quote       WordBreak = 1 << 12
	WB_WSegSpace          WordBreak = 1 << 13
)

func LookupWordBreak

func LookupWordBreak(ch rune) WordBreak

LookupWordBreak returns the word break property for the rune, or 0.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL