css

package module
v1.0.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 25, 2026 License: BSD-3-Clause Imports: 7 Imported by: 0

README

css/scanner

A fast CSS3 tokenizer for Go.

This package tokenizes CSS input into a stream of typed tokens (identifiers, strings, numbers, dimensions, URLs, comments, etc.) following the CSS Syntax specification. It is intended to be used by a lexer or parser.

Origin

Originally based on the Gorilla CSS scanner, significantly reworked by thejerf/css (Barracuda Networks), then forked by speedata with further changes:

  • CSS Syntax Level 3 support: custom properties (--my-var), signed numbers (-42px, +3em)
  • Hand-written scanner replacing all regex-based tokenization (~10x faster)
  • Support for local(), format(), and tech() function tokens

Usage

import scanner "github.com/speedata/css"

s := scanner.New(input)
for {
    token := s.Next()
    if token.Type == scanner.EOF || token.Type == scanner.Error {
        break
    }
    // token.Type, token.Value, token.Line, token.Column
}

Token types

Token Example input .Value
Ident color, -webkit-foo, --my-var color, -webkit-foo, --my-var
Function rgb( rgb
AtKeyword @media media
Hash #fff fff
String "hello" hello
Number 42, -3.14, +0.5 42, -3.14, +0.5
Percentage 50% 50
Dimension 12px, -1.5em 12px, -1.5em
URI url('bg.png') bg.png
Local local('Font') Font
Format format('woff2') woff2
Tech tech('color-SVG') color-SVG
UnicodeRange U+0042 U+0042
S
Comment /* text */ text
Delim :, ,, { :, ,, {

Tokens are post-processed to contain semantic values: CSS escapes are resolved, quotes and delimiters are stripped. Tokens can be re-emitted to valid CSS via token.Emit(w).

Error handling

Following the CSS specification, errors only occur for unclosed quotes or unclosed comments. Everything else is tokenizable; it is up to a parser to make sense of the token stream.

License

BSD 3-Clause. See LICENSE for details.

Documentation

Overview

Package scanner tokenizes CSS input following the CSS Syntax specification.

To use it, create a new scanner for a given CSS string and call Next() until the token returned has type scanner.EOF or scanner.Error:

s := scanner.New(input)
for {
	token := s.Next()
	if token.Type == scanner.EOF || token.Type == scanner.Error {
		break
	}
	// Use token.Type, token.Value, token.Line, token.Column
}

Token values are post-processed to contain semantic content: CSS escapes are resolved, quotes are stripped from strings, and delimiters are removed from functions and URLs. Tokens can be re-emitted to valid CSS via token.Emit(w).

Following the CSS specification, an error can only occur when the scanner finds an unclosed quote or unclosed comment. Everything else is tokenizable and it is up to a parser to make sense of the token stream.

Index

Constants

This section is empty.

Variables

View Source
var AtKeyword = Type{3}

AtKeyword token type is for things like @import. The .Value has the @ removed.

View Source
var BOM = Type{22}

BOM token type refers to Byte Order Marks.

View Source
var CDC = Type{12}

CDC token type represents the --> string.

View Source
var CDO = Type{11}

CDO token type represents the <!-- string.

View Source
var Comment = Type{14}

Comment token type is for comments. The internals of the comment will be in the .Value, with no additional processing.

View Source
var DashMatch = Type{17}

DashMatch token type refers to |=.

View Source
var Delim = Type{21}

Delim token type refers to a character which CSS does not otherwise know how to process as any of the above.

View Source
var Dimension = Type{8}

Dimension token type is for dimensions. No further parsing is done on the dimension, which may be bad since we could break in into number and unit.

View Source
var EOF = Type{1}

EOF token type is the end of the string.

View Source
var Error = Type{0}

Error token type is returned when there are errors in the parse.

CSS tries to avoid these; these are mostly unclosed strings and other things with delimiters.

View Source
var Format = Type{102}

Format token type is for format(). The .Value will be the format.

View Source
var Function = Type{15}

Function token type refers to a function invocation, like "rgb(". The .Value does not have the parenthesis on it.

View Source
var Hash = Type{5}

Hash token type is for things like colors: #fff. The value does not contain the #.

View Source
var Ident = Type{2}

Ident token type for identifiers.

View Source
var Includes = Type{16}

Includes token type refers to ~=.

View Source
var Local = Type{101}

Local token type is for local(). The .Value will be the processed contents.

View Source
var Number = Type{6}

Number token type is for numbers that are not percentages or dimensions.

View Source
var Percentage = Type{7}

Percentage token type is for percentages. The .Value does not include the %.

View Source
var PrefixMatch = Type{18}

PrefixMatch token type refers to ^=.

View Source
var S = Type{13}

S token type is for whitespace. The original space content will be in .Value.

View Source
var String = Type{4}

String token type is for double- or single-quote delimited strings. The strings have been processed to their values and do not contain the quotes.

View Source
var SubstringMatch = Type{20}

SubstringMatch token type refers to *=.

View Source
var SuffixMatch = Type{19}

SuffixMatch token type refers to $=.

View Source
var Tech = Type{103}

Tech token from src:

View Source
var URI = Type{100}

URI token type is for URIs. The .Value will be the processed URI.

View Source
var UnicodeRange = Type{10}

UnicodeRange token type is for Unicode ranges.

Functions

This section is empty.

Types

type Scanner

type Scanner struct {
	// contains filtered or unexported fields
}

Scanner scans an input and emits tokens following the CSS3 specification.

func New

func New(input string) *Scanner

New returns a new CSS scanner for the given input.

func (*Scanner) Next

func (s *Scanner) Next() *Token

Next returns the next token from the input.

At the end of the input the token type is EOF.

If the input can't be tokenized the token type is Error. This occurs in case of unclosed quotation marks or comments.

type Token

type Token struct {
	Type   Type
	Value  string
	Line   int
	Column int
}

Token represents a token and the corresponding string.

func (*Token) Emit

func (t *Token) Emit(w io.Writer) (err error)

Emit will write a string representation of the given token to the target io.Writer. An error will be returned if you either try to emit Error or EOF, or if the Writer returns an error.

Emit will make many small writes to the io.Writer.

Emit assumes you have not set the token's .Value to an invalid value for many of these; for instance, if you manually take a Number token and set its .Value to "sometext", you will emit something that is not a number.

func (*Token) String

func (t *Token) String() string

String returns a string representation of the token.

type Type

type Type struct {
	// contains filtered or unexported fields
}

Type is an integer that identifies the type of the token. Only the types defined as variables in the package may be used.

func (Type) GoString

func (t Type) GoString() string

GoString returns a string representation of the token type.

func (Type) String

func (t Type) String() string

String returns a string representation of the token type.

Directories

Path Synopsis
scanner module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL