Documentation
¶
Overview ¶
Package encoding handles the binary .pulse file format: reading, writing, and schema management.
Index ¶
- Constants
- Variables
- func ReadBit(r io.Reader, bitPos uint) (bool, error)
- func ReadDescription(r io.Reader) (string, error)
- func ReadFieldValue(r io.Reader, ft FieldType) (uint64, error)
- func ReadHeader(r io.Reader) error
- func ReadNibble(r io.Reader, high bool) (uint8, error)
- func WriteBit(w io.Writer, bitPos uint, val bool) error
- func WriteDescription(w io.Writer, desc string) error
- func WriteFieldValue(w io.Writer, ft FieldType, val uint64) error
- func WriteHeader(w io.Writer) error
- func WriteNibble(w io.Writer, high bool, val uint8) error
- func WriteSchema(w io.Writer, s *Schema) error
- type Dictionary
- func (d *Dictionary) Add(s string) (uint32, error)
- func (d *Dictionary) AddWithLimit(s string, maxEntries uint32) (uint32, error)
- func (d *Dictionary) Count() int
- func (d *Dictionary) IDFor(s string) (uint32, bool)
- func (d *Dictionary) ReadFrom(r io.Reader) error
- func (d *Dictionary) Resolve(id uint32) string
- func (d *Dictionary) Values() []string
- func (d *Dictionary) WriteTo(w io.Writer) error
- type Field
- type FieldType
- type RecordReader
- type Schema
Constants ¶
const FormatVersion byte = 0x01
FormatVersion is the current .pulse format version.
const HeaderSize = 9
HeaderSize is the total byte size of the file header (magic + version).
const MaxDescriptionBytes = 1000
MaxDescriptionBytes is the maximum allowed byte length for a field description.
Variables ¶
var MagicBytes = [8]byte{'P', 'U', 'L', 'S', 'E', 0x00, 0x00, 0x00}
MagicBytes identifies a .pulse file. 8 bytes: "PULSE\x00\x00\x00"
Functions ¶
func ReadBit ¶
ReadBit reads a single bit from the byte at the current position in r. bitPos is 0-7 within that byte.
func ReadDescription ¶
ReadDescription reads a field description from r. Format: u16 length + utf8 bytes.
func ReadFieldValue ¶
ReadFieldValue reads a single field value from r, returning raw bits as uint64. For packed types (PackedBool, NullableBool, NullableU4), use ReadBit/ReadNibble instead.
func ReadHeader ¶
ReadHeader reads and validates the .pulse file header from r.
func ReadNibble ¶
ReadNibble reads a 4-bit value from a byte. If high is true, it reads bits 4-7; otherwise bits 0-3.
func WriteDescription ¶
WriteDescription writes a field description to w. Format: u16 length + utf8 bytes. Returns PULSE_IMPORT_DESCRIPTION_TOO_LONG if the UTF-8 byte length exceeds MaxDescriptionBytes.
func WriteFieldValue ¶
WriteFieldValue writes a single field value (as raw bits in uint64) to w. For packed types (PackedBool, NullableBool, NullableU4), use WriteBit/WriteNibble instead.
func WriteHeader ¶
WriteHeader writes the .pulse file header to w.
func WriteNibble ¶
WriteNibble writes a 4-bit value into a byte. If high is true, it writes to bits 4-7; otherwise bits 0-3. The other nibble is zero.
func WriteSchema ¶
WriteSchema serializes a schema to w. Format:
u16 field_count per field: u8 type u16 name_length + utf8 name u32 byte_offset u8 bit_position u16 csv_column_idx u16 description_length + utf8 description (if categorical) dictionary block
Types ¶
type Dictionary ¶
type Dictionary struct {
// contains filtered or unexported fields
}
Dictionary maps string values to sequential uint32 IDs. It is used for categorical field types to encode string categories as compact integer IDs in the binary record format.
func (*Dictionary) Add ¶
func (d *Dictionary) Add(s string) (uint32, error)
Add inserts a string into the dictionary, returning its ID. If the string already exists, the existing ID is returned. There is no capacity limit with Add; use AddWithLimit to enforce one.
func (*Dictionary) AddWithLimit ¶
func (d *Dictionary) AddWithLimit(s string, maxEntries uint32) (uint32, error)
AddWithLimit inserts a string, enforcing a maximum entry count. Returns PULSE_IMPORT_CATEGORICAL_OVERFLOW if the dictionary is full.
func (*Dictionary) Count ¶
func (d *Dictionary) Count() int
Count returns the number of entries in the dictionary.
func (*Dictionary) IDFor ¶
func (d *Dictionary) IDFor(s string) (uint32, bool)
IDFor looks up the ID for a string. Returns the ID and true if found, or 0 and false otherwise.
func (*Dictionary) ReadFrom ¶
func (d *Dictionary) ReadFrom(r io.Reader) error
ReadFrom deserializes a dictionary from r, replacing current contents. Format: u32 count + (u16 strlen + utf8 bytes) x count
func (*Dictionary) Resolve ¶
func (d *Dictionary) Resolve(id uint32) string
Resolve returns the string for a given ID. Returns "" if the ID is out of range.
func (*Dictionary) Values ¶
func (d *Dictionary) Values() []string
Values returns a copy of all dictionary values in insertion order.
type Field ¶
type Field struct {
Name string
Type FieldType
ByteOffset int
BitPosition int
CsvColumnIdx int
Description string // empty = synthesized at inspect time
Dictionary *Dictionary // non-nil only for categorical types
}
Field describes a single column in a .pulse schema.
type FieldType ¶
type FieldType byte
FieldType identifies the data type stored in a schema field.
const ( FieldTypeU8 FieldType = iota // 0 FieldTypeU16 // 1 FieldTypeU32 // 2 FieldTypeU64 // 3 FieldTypeF32 // 4 FieldTypeF64 // 5 FieldTypeNullableBool // 6 FieldTypeNullableU4 // 7 FieldTypeNullableU8 // 8 FieldTypeNullableU16 // 9 FieldTypeDate // 10 FieldTypePackedBool // 11 FieldTypeCategoricalU8 // 12 FieldTypeCategoricalU16 // 13 FieldTypeCategoricalU32 // 14 )
All 15 field types supported by the .pulse format.
func (FieldType) ByteSize ¶
ByteSize returns the number of bytes this field type occupies in a record. Packed types (PackedBool, NullableBool, NullableU4) share bytes with adjacent fields via bit packing and return 0 here.
func (FieldType) IsCategorical ¶
IsCategorical reports whether the field type is one of the categorical types.
func (FieldType) MaxCategoricalEntries ¶
MaxCategoricalEntries returns the maximum dictionary size for a categorical type. Returns 0 for non-categorical types.
type RecordReader ¶
type RecordReader struct {
// contains filtered or unexported fields
}
RecordReader reads records one at a time from a binary stream. It reads directly from the io.Reader without buffering the entire file.
func NewRecordReader ¶
func NewRecordReader(r io.Reader, schema *Schema) *RecordReader
NewRecordReader creates a RecordReader. The reader must be positioned immediately after the header and schema (i.e., at the first record byte).
func (*RecordReader) ReadRecord ¶
ReadRecord reads a single record from the stream, populating the values and nulls maps. Returns io.EOF when no more records are available.
The caller provides pre-allocated maps to avoid per-record allocation. Maps are cleared at the start of each call.
type Schema ¶
type Schema struct {
Fields []Field
}
Schema holds all field descriptors for a .pulse file.
func ReadSchema ¶
ReadSchema deserializes a schema from r.
func (*Schema) Categorical ¶
func (s *Schema) Categorical(name string) (*Dictionary, bool)
Categorical returns the dictionary for a named categorical field. Returns nil, false if the field is not found or is not categorical.