Documentation
¶
Overview ¶
Package cnty defines a per-file structure Contentity that holds all relevant per-file information. This includes:
- file path info
- file content (UTF-8, tietysti)
- file type information (MIME and more)
- the results of markup-specific file analysis (in the most analysable case, i.e. XML, this comprises tokens, gtokens, gelms, gtree)
For a discussion of tree walk functions, see `doc_wfn.go`
Note that if we do not get an explicit XML DOCTYPE declaration, there is some educated guesswork required.
The first workflow was based on XML, and comprises: `text => XML tokens => GTokens => GTags => GTree`
First, package `gparse` gets as far as the `GToken`s, which can only be in a list: they have no tree structure. Then package `gtree` handles the rest.
XML analysis starts off with tokenization (by the stdlib), so it makes sense to then have separate steps for making `GToken's, GTag's, GTree`. <br/> MKDN and HTML analyses use higher-level libraries that deliver CSTs (Concrete Syntax Tree, i.e. parse tree). We choose to do this processing in `package gparse` rather than in `package gtree`.
MKDN gets a tree of `yuin/goldmark/ast/Node`, and HTML gets a tree of stdlib `golang.org/x/net/html/Node`. Since a CST is delivered fully-formed, it makes sense to have Step 1 that attaches to each node its `GToken´ and `GTag`, and then Step 2 that builds a `GTree`.
There are three major types of `Contentity`, corresponding to how we process the file content: - "XML" - - (§1) Use stdlib `encoding/xml` to get `[]XU.XToken` - - (§1) Convert `[]XU.XToken` to `[]gparse.GToken` - - (§2) Build `GTree` - "MKDN" - - (§1) Use `yuin/goldmark` to get tree of `yuin/goldmark/ast/Node` - - (§1) From each Node make a `MkdnToken` (in a list?) incl. `GToken` and `GTag` - - (§2) Build `GTree` - "HTML" - - (§1) Use `golang.org/x/net/html` to get a tree of `html.Node` - - (§1) From each Node make a `HtmlToken` (in a list?) incl. `GToken` and `GTag` - - (§2) Build `GTree`
In general, all go files in this protocol stack should be organised as: <br/> - struct definition() - constructors (named `New*`) - printf stuff (Raw(), Echo(), String())
Some characteristic methods: - Raw() returns the original string passed from the golang XML parser (with whitespace trimmed) - Echo() returns a string of the item in normalised form, altho be aware that the presence of terminating newlines is not treated uniformly - String() returns a string suitable for runtime nonitoring and debugging
NOTE The use of shorthand in variable names: Doc, Elm, Att.
NOTE We use `godoc2md`, so we can use Markdown in these code comments.
Index ¶
- Variables
- func AddInXName(ElmT StringTally, AttT StringTally, gT *gtoken.GToken)
- func DumpGElm(p AST.Node) string
- func KidsAsSlice(p AST.Node) []AST.Node
- func ListKids(p AST.Node) string
- func NormalizeTextLeaves(rootNode AST.Node)
- type Contentity
- func (p *Contentity) DoBlockList() *Contentity
- func (p *Contentity) DoEntitiesList() error
- func (p *Contentity) DoGLinks() *Contentity
- func (p *Contentity) DoTableOfContents() *Contentity
- func (p *Contentity) DoValidation(pXCF *XU.XmlCatalogFile) (dtdS string, docS string, errS string)
- func (p *Contentity) ExecuteStages() *Contentity
- func (p *Contentity) GatherLinks() error
- func (p *Contentity) GatherXmlGLinks() *Contentity
- func (p *Contentity) L(level LL, format string, a ...interface{})
- func (p *Contentity) LogPrefix(mid string) string
- func (p *Contentity) NewEntitiesList() (gEnts map[string]*gparse.GEnt, err error)
- func (p *Contentity) ProcessEntities_() error
- func (p *Contentity) RefineDirectives() error
- func (p Contentity) String() string
- func (p *Contentity) SubstituteEntities() error
- func (p *Contentity) TallyTags()
- func (p *Contentity) WrapError(s string, e error)
- type ContentityError
- type ContentityFS
- func (p *ContentityFS) AsSlice() []*Contentity
- func (p *ContentityFS) DirCount() int
- func (p *ContentityFS) DoForEvery(stgprocsr ContentityStage)
- func (p *ContentityFS) FileCount() int
- func (p *ContentityFS) ItemCount() int
- func (p *ContentityFS) RootAbsPath() string
- func (p *ContentityFS) RootContentity() *Contentity
- func (p *ContentityFS) Size() int
- type ContentityFactory
- type ContentityStage
- type Flags
- type GLink
- type GLinks
- type LL
- type LinkInfo
- type LinkInfos
- type LogInfo
- type StringTally
Constants ¶
This section is empty.
Variables ¶
var GlobalAttCount int
var GlobalTagCount int
var LwDitaAttsForGLinks = []string{
"name",
"href",
"id",
"idref",
"idrefs",
"conref",
"data-conref",
"keys",
"data-keys",
"keyref",
"data-keyref",
}
Functions ¶
func AddInXName ¶
func AddInXName(ElmT StringTally, AttT StringTally, gT *gtoken.GToken)
func NormalizeTextLeaves ¶
Types ¶
type Contentity ¶
type Contentity struct {
// Nork provides hierarchical structure, only.
// It is embedded as an instance, not as a pointer.
N.Nork
// =========================================
// Substructures for: Data persisted to DB
// =========================================
// ContentityRow includes all fields that get persisted
// to the DB. It contains the field Raw (deeply embedded),
// and embeds [FSObject] embeds [Errer].
m5db.ContentityRow
// LogInfo is (the index of the Contentity in
// the larger slice) + (the processing stage ID)
LogInfo
// ParserResults is parseutils.ParserResults_ffs
// (ffs = file format -specific = "html" or "mkdn" but not
// "xml" cos Go's XML parser does not produce a tree structure)
ParserResults interface{}
GTokens []*gtoken.GToken
GTags []*gtree.GTag
*gtree.GTree // maybe not need GRootTag or RootOfASTptr
GTknsWriter, GTreeWriter,
GEchoWriter io.Writer
GLinks
// GEnts is "ENTITY"" directives (both with "%" and without).
GEnts map[string]*gparse.GEnt
TagTally StringTally
AttTally StringTally
// contains filtered or unexported fields
}
Contentity embeds [Nork] - and embeds [ContentityRow] embeds ([FSObject] embeds [Errer] - but Contentity does NOT embed [FSONork]. .
func NewContentity ¶
func NewContentity(aPath string) *Contentity
NewContentity returns a Contentity -nork (i.e. a [Nork] node with content and an embedded [FSObject] ) that can NOT be the root of a Contentity tree. For error, see embedded [Errer].
FIXME: Should we have Tree and Lone versions and Factory ?
It should accept either an absolute or a relative FP, altho relative is preferred for various reasons, mainly because of security-related preferences of the path and filepath stdlibs.
TODO: Maybe it needs two boolean arguments:
- One to say whether to be strict about security (using os.Root and Valid/Local, and
- One to say whether to follow symlinks.
These two flags might have some interesting interactions. Since this func could (but does not) use os.Root, these can be left as calling options, rather than implementing higher security using funcs io/fs.ValidPath and path/filepath.IsLocal.
We want everything to be in a nice tree of Norks, and it means that we have to create Contenties for directories too (where `Raw_type == SU.Raw_type_DIRLIKE`), so we have to handle that case too. .
func NewContentityFromString ¶
func NewContentityFromString(inString, filext string) *Contentity
NewContentityFromString returns a Contentity Nord where the Nord (i.e. parent and ordered children) is empty. This func does not require file system access. If there is an error, and the returned value is non-nil, then the error is returned in the embedded Errer. However this func should only be called from a well-defined context, so error reporting can be curt. .
func (*Contentity) DoBlockList ¶
func (p *Contentity) DoBlockList() *Contentity
DoBlockList makes a list of all the nodes that are blocks, so that they cn be traversed for rendering, and targeted for references. .
func (*Contentity) DoEntitiesList ¶
func (p *Contentity) DoEntitiesList() error
DoEntitiesList collects all entity definitions. -n Note that each Token has been normalized. -n- rtType:ENTITY string1:foo string2:"FOO" entityIsParameter:false -n- rtType:ENTITY string1:bar string2:"BAR" entityIsParameter:true
func (*Contentity) DoTableOfContents ¶
func (p *Contentity) DoTableOfContents() *Contentity
DoTableOfContents makes a ToC. .
func (*Contentity) DoValidation ¶
func (p *Contentity) DoValidation(pXCF *XU.XmlCatalogFile) (dtdS string, docS string, errS string)
DoValidation TODO If no DOCTYPE, make a guess based on Filext but it can't be fatal.
func (*Contentity) ExecuteStages ¶
func (p *Contentity) ExecuteStages() *Contentity
ExecuteStages processes a Contentity to completion in an isolated thread, and can eaily be converted to run as a goroutine. Summary:
- st0_Init()
- st1_Read()
- st2_Tree()
- st3_Refs()
- st4_Done() (not currently called, but will work on all input files at once !)
An interesting question is, how can we indicate an error and terminate a thread prematurely ? The method currently chosen is to use interface github.com/fbaube/miscutils/Errer. This has to be checked for at the start of a func. But then we can chain functions by writing them left-to-right. Winning!
(If functions accept and return a ptr+error pair then they chain right-to-left, which is a big fail for readability.)
We could also pass in a `Context` and use its cancellation capability. Yet another way might be simply to `panic`, and so this function already has code to catch panics. .
func (*Contentity) GatherLinks ¶
func (p *Contentity) GatherLinks() error
GatherLinks is: @conref to reuse block-level content, @keyref to reuse phrase-level content. TODO Each type of link (i.e. elm/att where it occurs) has to be categorised. TODO Each format of link target has to be categorised. Cross ref : <xref> : <a href> : [link](/URI "title") Key def : <keydef> : <div data-class="keydef"> : <div data- class="keydef"> in HDITA syntax Map : <map> : <nav> : See Example of an MDITA map (20) Topic ref : <topicref> : <a href> inside a <li> : [link](/URI "title") inside a list item TODO Stuff to get: XDITA map - topicref @href (w @format) - task @id HDITA - article @id - span @data-keyref - p @data-conref MDITA - has YAML "id" - uses <p @data-conref> - uses <span @data-keyref> - uses MD [link_text](link_target.dita) - uses  XDITA - topic @id - ph @keyref - image @href - p @id - video/source @value - section @id - p @conref
func (*Contentity) GatherXmlGLinks ¶
func (p *Contentity) GatherXmlGLinks() *Contentity
GatherXmlGLinks is: XmlItems is (DOCS) IDs & IDREFs, (DTDs) Elm defs (incl. Att defs) & Ent defs *xmlfile.XmlItems // *IDinfo
func (p *MCFile) GatherXmlGLinks() *MCFile {
func (*Contentity) L ¶
func (p *Contentity) L(level LL, format string, a ...interface{})
func (*Contentity) LogPrefix ¶
func (p *Contentity) LogPrefix(mid string) string
func (*Contentity) NewEntitiesList ¶
func (p *Contentity) NewEntitiesList() (gEnts map[string]*gparse.GEnt, err error)
NewEntitiesList collects all entity definitions. -n Note that each Token is normalized. -n- rtType:ENTITY string1:foo string2:"FOO" entityIsParameter:false -n- rtType:ENTITY string1:bar string2:"BAR" entityIsParameter:true
CALLED BY ProcessEntities only//
func (*Contentity) ProcessEntities_ ¶
func (p *Contentity) ProcessEntities_() error
func (*Contentity) RefineDirectives ¶
func (p *Contentity) RefineDirectives() error
RefineDirectives scans to patch Directives with correct keyword.
func (Contentity) String ¶
func (p Contentity) String() string
String is developer output. Hafta dump: FU.InputFile, FU.OutputFiles, GTree, GRefs, *XmlFileMeta, *XmlItems, *DitaInfo
func (*Contentity) SubstituteEntities ¶
func (p *Contentity) SubstituteEntities() error
SubstituteEntities does replacement in Entities for simple (single-token) entity references, i.e. that begin with "%" or "&".
func (*Contentity) TallyTags ¶
func (p *Contentity) TallyTags()
func (*Contentity) WrapError ¶
func (p *Contentity) WrapError(s string, e error)
type ContentityError ¶
type ContentityError struct {
PE fs.PathError
*Contentity
}
ContentityError is Contentity + SrcLoc (in source code) + PathError struct { Op, Path string; Err error }
# Maybe use the format pkg.filename.methodname.Lnn .
func NewContentityError ¶
func NewContentityError(ermsg string, op string, cty *Contentity) ContentityError
func WrapAsContentityError ¶
func WrapAsContentityError(e error, op string, cty *Contentity) ContentityError
func (ContentityError) Error ¶
func (ce ContentityError) Error() string
func (*ContentityError) String ¶
func (ce *ContentityError) String() string
type ContentityFS ¶
type ContentityFS struct {
// FS will be set from func [os.DirFS]
fs.FS
// contains filtered or unexported fields
}
ContentityFS is an instance of an fs.FS where every node is a cnty.Contentity.
Note that directories ARE included in the tree, because the instances of [orderednodes.Nord] in each Contentity must properly interconnect in forming a complete tree.
Note that the file system is stored as a tree AND as a slice AND as a map. If any of these is modified without also modifying the others to match, there WILL be problems. For that reason, [asSlice] and [asMapOfAbsFP] are unexported instance variables that are accessible only via getters.
It ain't bulletproof tho. In any case, users of a ContentityFS should feel free to use the functions on the embedded [Nord] ordered nodes. .
func NewContentityFS ¶
func NewContentityFS(aPath string, okayFilexts []string) (*ContentityFS, error)
NewContentityFS is the entrypoint for processing an input directory tree of files and it proceeds as follows:
- initialize
- create an os.Root (see https://pkg.go.dev/os@go1.26.3#Root )
- create a [nork.NorkFactory]
- walk the RootFS, creating Contentities and appending them to a slice
- process the list to identify and make parent/child links
ContentityFS embeds [FSObject] embeds [Errer].
About os.Root and RootFS:
- Root may be used to only access files within a single directory tree.
- Methods on Root can only access files and directories beneath a root directory. If any component of a file name passed to a method of Root references a location outside the root, the method returns an error. File names may reference the directory itself (.).
- Methods on Root will follow symbolic links, but symbolic links may not reference a location outside the root. Symbolic links must not be absolute.
- Methods on Root do not prohibit traversal of filesystem boundaries, Linux bind mounts, /proc special files, or access to Unix device files.
- Methods on Root are OK for use from multiple goroutines simultaneously.
- On most platforms, creating a Root opens a file descriptor or handle referencing the directory. If the directory is moved, methods on Root reference the original directory in its new location.
The path argument should probably be an absolute FP, because a relative FP might cause problems. Note that this is the opposite of the advice for lower-level items.
Note that when we used os.DirFS, it appeared to make no difference whether path
- is relative or absolute
- ends with a trailing slash or not
- is a directory or a symlink to a directory
It probably needs serious testing, especially now with RootFS.
The only error returns for this func are:
- a bad path, rejected by FU func [NewFilepaths]
- the path is not a directory (altho it can be a symlnk to a directory ?)
- TBD: WHat happens of os.Root barfs on something ?
TODO Add two flags ? Maybe it needs two boolean arguments:
- One to say whether to be strict about security (using os.Root (YES!!) and Valid/Local, and
- One to say whether to follow symlinks.
These two flags might have some interesting interactions. OTOH since this func can (and does?) use os.Root, it can easily (and should probably) also default to higher security using funcs io/fs.ValidPath and path/filepath.IsLocal.
Accumulated NewContentity errors are counted in the field CotentityFS.nErrors .
func (*ContentityFS) AsSlice ¶
func (p *ContentityFS) AsSlice() []*Contentity
func (*ContentityFS) DirCount ¶
func (p *ContentityFS) DirCount() int
func (*ContentityFS) DoForEvery ¶
func (p *ContentityFS) DoForEvery(stgprocsr ContentityStage)
func (*ContentityFS) FileCount ¶
func (p *ContentityFS) FileCount() int
func (*ContentityFS) ItemCount ¶
func (p *ContentityFS) ItemCount() int
func (*ContentityFS) RootAbsPath ¶
func (p *ContentityFS) RootAbsPath() string
func (*ContentityFS) RootContentity ¶
func (p *ContentityFS) RootContentity() *Contentity
func (*ContentityFS) Size ¶
func (p *ContentityFS) Size() int
type ContentityFactory ¶
type ContentityFactory struct {
// contains filtered or unexported fields
}
ContentityFactory tracks the state of a ContentityFS tree being assembled, for example when a directory is specified for recursive analysis.
FIXME: ID assignment should be offloaded to the DB ? .
type ContentityStage ¶
type ContentityStage func(*Contentity) *Contentity
type GLink ¶
type GLink struct {
// IsRefnc - else is Refnt (Referents are much more numerous)
IsRefnc bool
// IsExtl - else is Intl (which are more numerous)
IsExtl bool
// AddressMode is "http", "key", "idref", "uri"
AddressMode string
// Att is the XML attribute - id, idref, href, xref, keyref, etc.
Att string
// Tag is the tag that has this link-related attribute of interest
Tag string
// Link_raw as redd in during parsing
Link_raw string
// RelFP can be a URI or the resolution of a keyref.
// "" if target is in same file; NOTE This is relative to the
// sourcing file, NOT to the current working directory during parsing!
RelFP string
// AbsFP can be a URI or the resolution of a keyref.
// "" if target is in same file
AbsFP FU.AbsFilePath
// TopicID iff present (but isn't it mandatory ?)
TopicID string
// FragID is peeled off from Raw
FragID string
// Resolved is used to narrow in on difficult cases
Resolved bool
// LinkedFrom is the GTag where the GLink is defined
LinkedFrom *gtree.GTag
// Original can be nil: it is the tag where the GLink is resolved to,
// i.e. the REFERENT, and is quite possibly in another file, which we
// hope we also have available in memory.
Original *gtree.GTag
}
GLink summarizes a link (or key) (or reference) found in markup content. It is either URI-based (`href conref id`) or key-based (`key keyref`). It applies to all LwDITA formats, but not all fields apply to all LwDITA formats.
type GLinks ¶
type GLinks struct {
// OwnerP points back to the owning struct, so that
// `GLink`s can be processed easily as simple data structures.
OwnerP interface{}
// KeyRefncs are outgoing key-based links/references
KeyRefncs []*GLink // (Extl|Intl)KeyReferences
// KeyRefnts are unique key-based definitions that are possible
// referents (resolution targets) of same or other files' [KeyRefncs]
KeyRefnts []*GLink // (Extl|Intl)KeyDefs
// UriRefncs are outgoing URI-based links/references
UriRefncs []*GLink // (Extl|Intl)UriReferences
// UriRefnts are unique URI-based definitions that are possible
// referents(resolution targets) of same or other files' [UriRefncs]
UriRefnts []*GLink // (Extl|Intl)UriDefs
}
GLinks is used for (1) intra-file ref resolution, (2) inter-file ptr resolution, (3) ToC generation.
type LinkInfos ¶
type LinkInfos struct {
Conrefs []LinkInfo
Keyrefs []LinkInfo
Datarefs []LinkInfo
// contains filtered or unexported fields
}
LinkInfos is: @conref to reuse block-level content, @keyref to reuse phrase-level content. TODO Each type of link (i.e. elm/att where it occurs) has to be categorised. TODO Each format of link target has to be categorised. Cross ref : <xref> : <a href> : [link](/URI "title") Key def : <keydef> : <div data-class="keydef"> : <div data- class="keydef"> in HDITA syntax Map : <map> : <nav> : See Example of an MDITA map (20) Topic ref : <topicref> : <a href> inside a <li> : [link](/URI "title") inside a list item TODO Stuff to get: XDITA map - topicref @href (w @format) - task @id HDITA - article @id - span @data-keyref - p @data-conref MDITA - has YAML "id" - uses <p @data-conref> - uses <span @data-keyref> - uses MD [link_text](link_target.dita) - uses  XDITA - topic @id - ph @keyref - image @href - p @id - video/source @value - section @id - p @conref
In GFile: LinkInfos:
type LogInfo ¶
LogInfo exists mainly to provide a grep'able string: for example "(01:4a)", where 01 is the index of the Contentity and 4a is the processing stage. This is obv a candidate for replacement by stdlib's slog.
The io.Writer field W exists outside of the github.com/fbaube/mlog logging subsystem and should only be used if `mlog` is not. .
type StringTally ¶
var GlobalAttTally StringTally
var GlobalTagTally StringTally
func (StringTally) StringSortedValues ¶
func (st StringTally) StringSortedValues() string
Source Files
¶
- contentity.go
- contentity_new.go
- contentity_newfromstring.go
- contentityerror.go
- contentityfactory.go
- contentityfs.go
- contentityfs_mapfs.go
- contentityfs_new.go
- doc.go
- doc_wfn.go
- getglinks-mkdn.go
- getglinks-xml.go
- glink.go
- handlewalkerrarg.go
- log.go
- mkdn-textleaves.go
- pathexclusions.go
- seterror.go
- st-exec.go
- st0-init.go
- st1-read.go
- st2-tree.go
- st3-refs.go
- st4-done.go
- tallytags.go
- utils-mkdn.go
- validation.go
- xmldoentities.go
- xmlprocentities.go
- xmlprocmeta.go