Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
var DefaultDOMTransformations = []string{
"style, script, path",
"input[type='hidden']",
"meta[content]",
"link[rel='stylesheet']",
"svg",
"grammarly-desktop-integration",
"div[class*='ad'], div[id*='ad'], div[class*='banner'], div[id*='banner'], div[class*='pixel'], div[id*='pixel']",
"input[name*='csrf'], input[name*='token']",
}
DefaultDOMTransformations is default list of CSS selectors to remove from the DOM.
var DefaultTextPatterns = []string{
`\b(?i)[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b`,
`\b(?:25[0-5]|2[0-4]\d|1?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|1?\d?\d)){3}\b`,
`\b[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\b`,
`\b(?:[0-9]{1,2}\s(?:days?|weeks?|months?|years?)\s(?:ago|from\s+now))\b`,
`[\$€£¥]\s*\d+(?:\.\d{1,2})?\b`,
`\b\+?\d{7,15}\b`,
`\b\d{3}-\d{2}-\d{4}\b`,
`\b(?:(?:[0-9]{4}-[0-9]{2}-[0-9]{2})|(?:(?:[0-9]{2}\/){2}[0-9]{4}))\s(?:[0-9]{2}:[0-9]{2}:[0-9]{2})\b`,
}
DefaultTextPatterns is a list of regex patterns for the text normalizer
var NoChildrenDomTransformations = []string{
"div",
"span",
"form",
"iframe",
}
NoChildrenDomTransformations removes all elements with no children
Functions ¶
This section is empty.
Types ¶
type DOMNormalizer ¶
type DOMNormalizer struct {
// contains filtered or unexported fields
}
DOMNormalizer is a normalizer for DOM content
func NewDOMNormalizer ¶
func NewDOMNormalizer() *DOMNormalizer
NewDOMNormalizer returns a new DOMNormalizer
transformations is a list of CSS selectors to remove from the DOM.
type Normalizer ¶
type Normalizer struct {
// contains filtered or unexported fields
}
type TextNormalizer ¶
type TextNormalizer struct {
// contains filtered or unexported fields
}
TextNormalizer is a normalizer for text
func NewTextNormalizer ¶
func NewTextNormalizer() (*TextNormalizer, error)
NewTextNormalizer returns a new TextNormalizer
patterns is a list of regex patterns for the text normalizer DefaultTextPatterns is used if patterns is nil. See DefaultTextPatterns for more info.
func (*TextNormalizer) Apply ¶
func (n *TextNormalizer) Apply(text string) string
Apply applies the patterns to the text and returns the normalized text