gtoken

package module

v0.0.0-...-4003614 Latest Latest Go to latest Published: Feb 2, 2025 License: MIT Imports: 12 Imported by: 3

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/fbaube/gtoken

Links

Open Source Insights

README ¶

gtoken

Generic markup tokens for mixed content (such as LwDITA)

Documentation ¶

Overview ¶

Package gtoken is awesome.

Index ¶

func DumpTo(rGTkns []*GToken, w io.Writer)
func HasDoctype(GTs []*GToken) (bool, string)
type GToken

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func DumpTo ¶

func DumpTo(rGTkns []*GToken, w io.Writer)

DumpTo writes out the `GToken`s to the `io.Writer`, one per line, and each line is prefixed with the token type. The output should parse the same as the input file, except perhaps for the treatment of all-whitespace CDATA.

func HasDoctype ¶

func HasDoctype(GTs []*GToken) (bool, string)

Types ¶

type GToken ¶

type GToken struct {
	// ==========================================
	// CToken has all the info about the original
	// source token, when considered in isolation.
	// ==========================================
	// Fields:
	//  - CT.SourceToken interface{}: "source code" token
	//  - SU.MarkupType: one of SU.MU_type_(XML/HTML/MKDN/BIN)
	//  - CT.FilePosition: char position, and line nr & column nr
	//  - CT.TDType: type of [xml.Token] or subtype of [xml.Directive]
	//  - CT.CName: alias of [xml.Name], only for elements
	//  - CT.CAtts: alias of slice of [xml.Attr], only for start-elm
	//  - Text string: CDATA / PI Instr / DOCTYPE root elm decl
	//  - ControlStrings []string: XML PI Target / XML Drctv subtype
	CT.CToken

	// Depth is the level of nesting of the source tag.
	Depth int
	// IsBlock and IsInline are
	// dupes of TagalogEntry ?
	IsBlock, IsInline bool
	NodeLevel         int
	// Key stuff
	*lwdx.TagalogEntry
	// DitaTag and HtmlTag are
	// dupes of TagalogEntry ?
	NodeKind, DitaTag, HtmlTag, NodeText string
}

GToken is meant to simplify & unify tokenisation across LwDITA's three supported input formats: XDITA XML, HDITA HTML5, and MDITA-XP Markdown. It also serves to represent all the various kinds of XML Directives, including DTDs(!).

To do this, the tokens produced by each parsing API are reduced to their essentials:

tag/token type (defined by the enumeration [GTagTokType], named TT_type_*, values are strings)
tag name (iff a markup element; is stored in a [GName], incl. NS)
token text (non-tag text content)
tag attributes
whatever additional stuff is available for Markdown tokens (to include Pandoc-style attributes)

NOTE that XML Directives are later "normalized", but that's another story. .

func DeleteNils ¶

func DeleteNils(inGTzn []*GToken) (outGTzn []*GToken)

func DoGTokens_html ¶

func DoGTokens_html(pCPR *PU.ParserResults_html) ([]*GToken, error)

DoGTokens_html turns every html.Node (from stdlib) into a GToken. It's pretty simple because no tree building is done yet. Basically it just copies in the Node type and the Node's data, and sets the [TTType] field,

type Node struct {
     Parent, FirstChild, LastChild, PrevSibling, NextSibling *Node
     Type      NodeType
     DataAtom  atom.Atom
     Data      string
     Namespace string
     Attr      []Attribute
     }

Data is unescaped, so that it looks like "a<b" rather than "a<b". For element nodes, DataAtom is the atom for Data, or zero if Data is not a known tag name.

type Attribute struct {
     Namespace, Key, Val string }

..

func DoGTokens_mkdn ¶

func DoGTokens_mkdn(pCPR *PU.ParserResults_mkdn) ([]*GToken, error)

DoGTokens_mkdn turns every Goldmark ast.Node Markdown token into a GToken. It's pretty simple, because no tree building is done yet. However it does merge text tokens into their preceding tokens, which leaves some nils in the list of tokens. .

func DoGTokens_xml ¶

func DoGTokens_xml(pCPR *XU.ParserResults_xml) ([]*GToken, error)

DoGTokens_xml turns every xml.Token (from stdlib) into a GToken. It's pretty simple because no tree building is done yet. Basically it just copies in the Node type and the Node's data, and sets the [TDType] field,

xml.Token is an "any" interface holding a token types: StartElement, EndElement, CharData, Comment, ProcInst, Directive. Note that gtoken.TDType is a superset of these types. .

func GetAllByTag ¶

func GetAllByTag(gTkzn []*GToken, s string) []*GToken

GetAllByTag returns a new GTokenization. It checks the basic tag only, not any namespace.

func GetFirstByTag ¶

func GetFirstByTag(gTkzn []*GToken, s string) *GToken

GetFirstByTag checks the basic tag only, not any namespace.

func (GToken) DumpTo ¶

func (T GToken) DumpTo(w io.Writer)

String implements Markupper.

func (GToken) Echo ¶

func (T GToken) Echo() string

Echo implements Markupper.

func (GToken) EchoTo ¶

func (T GToken) EchoTo(w io.Writer)

EchoTo implements Markupper.

func (*GToken) SourceTokenType ¶

func (p *GToken) SourceTokenType() string

SourceTokenType returns `XML`, `MKDN`, `HTML`, or future stuff TBD.

func (GToken) String ¶

func (T GToken) String() string

String implements Markupper.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL