filefilter

package
v0.0.0-...-f06bde7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 3, 2026 License: Apache-2.0 Imports: 14 Imported by: 0

Documentation

Overview

Package filefilter provides a concurrent pipeline for filtering files by size, content type, and glob patterns.

Index

Constants

This section is empty.

Variables

View Source
var ErrPathNotAllowed = errors.New("paths are not allowed in exclude rules")

ErrPathNotAllowed is returned when an exclude rule contains a path separator.

Functions

func ExpandExcludeNames

func ExpandExcludeNames(names []string) ([]string, error)

ExpandExcludeNames validates and converts user-provided names into global glob patterns. It enforces the rule that inputs must be basenames (example: "node_modules"), not paths.

func IsTextContent

func IsTextContent(data []byte) bool

IsTextContent determines if the data slice contains text content based on the null byte method. See: https://docs.google.com/document/d/1GYir_j0ITTxg_CqyAw8BeUZYCCUyNMAePbGw5nsTGYE/

func ReadFileHeader

func ReadFileHeader(path string, n int64) ([]byte, error)

ReadFileHeader reads up to n bytes from the beginning of the file at path.

Types

type Analytics

type Analytics interface {
	RecordSizeFiltered(total int)
	RecordFileFilterTimeMs(startTime time.Time)
}

Analytics defines the metrics recording interface used by the filter pipeline.

type FileFilter

type FileFilter interface {
	FilterOut(path string) bool
	RecordMetrics(analytics Analytics)
}

FileFilter defines the contract for any logic that decides if a file should be dropped.

func FileSizeFilter

func FileSizeFilter(logger *zerolog.Logger) FileFilter

FileSizeFilter returns a filter that drops empty files and files larger than 1 MB.

func TextFileOnlyFilter

func TextFileOnlyFilter(logger *zerolog.Logger) FileFilter

TextFileOnlyFilter returns a filter that drops binary files based on a null-byte heuristic.

type Option

type Option func(*Pipeline)

Option defines the functional option type.

func WithAnalytics

func WithAnalytics(analytics Analytics) Option

WithAnalytics sets the analytics recorder for filter metrics.

func WithConcurrency

func WithConcurrency(n int) Option

WithConcurrency allows overriding the default worker count.

func WithExcludeGlobs

func WithExcludeGlobs(userPatterns []string) Option

WithExcludeGlobs adds user-defined patterns to the pipeline's exclude list.

func WithFilters

func WithFilters(filters ...FileFilter) Option

WithFilters allows passing multiple filters (FileSizeFilter, TextFileOnlyFilter).

func WithLogger

func WithLogger(logger *zerolog.Logger) Option

WithLogger sets the logger for the pipeline.

type Pipeline

type Pipeline struct {
	// contains filtered or unexported fields
}

Pipeline holds the configuration for the filtering process.

func NewPipeline

func NewPipeline(opts ...Option) *Pipeline

NewPipeline creates a filter pipeline with reasonable defaults. Default concurrency is set to runtime.NumCPU().

func (*Pipeline) Filter

func (p *Pipeline) Filter(ctx context.Context, inputPaths []string) chan string

Filter processes the input channel through the configured filters concurrently. It returns a new channel containing only the files that passed all filters.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL