Documentation
¶
Overview ¶
Package gtoken is awesome.
Index ¶
- func DumpTo(rGTkns []*GToken, w io.Writer)
- func HasDoctype(GTs []*GToken) (bool, string)
- type GToken
- func DeleteNils(inGTzn []*GToken) (outGTzn []*GToken)
- func DoGTokens_html(pCPR *PU.ParserResults_html) ([]*GToken, error)
- func DoGTokens_mkdn(pCPR *PU.ParserResults_mkdn) ([]*GToken, error)
- func DoGTokens_xml(pCPR *XU.ParserResults_xml) ([]*GToken, error)
- func GetAllByTag(gTkzn []*GToken, s string) []*GToken
- func GetFirstByTag(gTkzn []*GToken, s string) *GToken
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DumpTo ¶
DumpTo writes out the `GToken`s to the `io.Writer`, one per line, and each line is prefixed with the token type. The output should parse the same as the input file, except perhaps for the treatment of all-whitespace CDATA.
func HasDoctype ¶
Types ¶
type GToken ¶
type GToken struct {
// ==========================================
// CToken has all the info about the original
// source token, when considered in isolation.
// ==========================================
// Fields:
// - CT.SourceToken interface{}: "source code" token
// - SU.MarkupType: one of SU.MU_type_(XML/HTML/MKDN/BIN)
// - CT.FilePosition: char position, and line nr & column nr
// - CT.TDType: type of [xml.Token] or subtype of [xml.Directive]
// - CT.CName: alias of [xml.Name], only for elements
// - CT.CAtts: alias of slice of [xml.Attr], only for start-elm
// - Text string: CDATA / PI Instr / DOCTYPE root elm decl
// - ControlStrings []string: XML PI Target / XML Drctv subtype
CT.CToken
// Depth is the level of nesting of the source tag.
Depth int
// IsBlock and IsInline are
// dupes of TagalogEntry ?
IsBlock, IsInline bool
NodeLevel int
// Key stuff
*lwdx.TagalogEntry
// DitaTag and HtmlTag are
// dupes of TagalogEntry ?
NodeKind, DitaTag, HtmlTag, NodeText string
}
GToken is meant to simplify & unify tokenisation across LwDITA's three supported input formats: XDITA XML, HDITA HTML5, and MDITA-XP Markdown. It also serves to represent all the various kinds of XML Directives, including DTDs(!).
To do this, the tokens produced by each parsing API are reduced to their essentials:
- tag/token type (defined by the enumeration [GTagTokType], named TT_type_*, values are strings)
- tag name (iff a markup element; is stored in a [GName], incl. NS)
- token text (non-tag text content)
- tag attributes
- whatever additional stuff is available for Markdown tokens (to include Pandoc-style attributes)
NOTE that XML Directives are later "normalized", but that's another story. .
func DeleteNils ¶
func DoGTokens_html ¶
func DoGTokens_html(pCPR *PU.ParserResults_html) ([]*GToken, error)
DoGTokens_html turns every html.Node (from stdlib) into a GToken. It's pretty simple because no tree building is done yet. Basically it just copies in the Node type and the Node's data, and sets the [TTType] field,
type Node struct {
Parent, FirstChild, LastChild, PrevSibling, NextSibling *Node
Type NodeType
DataAtom atom.Atom
Data string
Namespace string
Attr []Attribute
}
Data is unescaped, so that it looks like "a<b" rather than "a<b". For element nodes, DataAtom is the atom for Data, or zero if Data is not a known tag name.
type Attribute struct {
Namespace, Key, Val string }
..
func DoGTokens_mkdn ¶
func DoGTokens_mkdn(pCPR *PU.ParserResults_mkdn) ([]*GToken, error)
DoGTokens_mkdn turns every Goldmark ast.Node Markdown token into a GToken. It's pretty simple, because no tree building is done yet. However it does merge text tokens into their preceding tokens, which leaves some nils in the list of tokens. .
func DoGTokens_xml ¶
func DoGTokens_xml(pCPR *XU.ParserResults_xml) ([]*GToken, error)
DoGTokens_xml turns every xml.Token (from stdlib) into a GToken. It's pretty simple because no tree building is done yet. Basically it just copies in the Node type and the Node's data, and sets the [TDType] field,
xml.Token is an "any" interface holding a token types: StartElement, EndElement, CharData, Comment, ProcInst, Directive. Note that gtoken.TDType is a superset of these types. .
func GetAllByTag ¶
GetAllByTag returns a new GTokenization. It checks the basic tag only, not any namespace.
func GetFirstByTag ¶
GetFirstByTag checks the basic tag only, not any namespace.
func (*GToken) SourceTokenType ¶
SourceTokenType returns `XML`, `MKDN`, `HTML`, or future stuff TBD.