pdftable

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 31, 2026 License: MIT Imports: 4 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var TestBoundaryPositions = PDFTablePositions{
	OriginalDateStart: 0, OriginalDateEnd: 2,
	ReceiptStart: 3, ReceiptEnd: 5,
	DetailStart: 6, DetailEnd: 8,
	ARSAmountStart: 9, ARSAmountEnd: 11,
	USDAmountStart: 12, USDAmountEnd: 13,
}

TestBoundaryPositions defines positions that match exactly with TestBoundaryText.

View Source
var TestBoundaryRow = Row{
	RawText:                        TestBoundaryText,
	RawOriginalDate:                "abc",
	RawReceiptNumber:               "deF",
	RawDetailWithMaybeInstallments: "GHI",
	RawAmountARS:                   "JKL",
	RawAmountUSD:                   "MN",
}

TestBoundaryRow represents a row with text exactly matching column boundaries.

View Source
var TestBoundaryText = "abcdeFGHIJKLMN"

TestBoundaryText is the raw text that produces TestBoundaryRow when parsed. It's exactly at column ends to test boundary conditions.

View Source
var TestCardMovementRow = Row{
	RawText:                        TestCardMovementText,
	RawOriginalDate:                "23 Diciem. 30",
	RawReceiptNumber:               "111111 *",
	RawDetailWithMaybeInstallments: "AN ESTABLISHMENT A          C.07/12",
	RawAmountARS:                   "11.111,11",
	RawAmountUSD:                   "",
}

TestCardMovementRow represents a typical credit card movement row with all fields populated.

View Source
var TestCardMovementText = "23 Diciem. 30 111111 *  AN ESTABLISHMENT A          C.07/12                       11.111,11                    "

TestCardMovementText is the raw text that produces TestCardMovementRow when parsed.

View Source
var TestSaldoAnteriorRow = Row{
	RawText:                        TestSaldoAnteriorText,
	RawOriginalDate:                "",
	RawReceiptNumber:               "",
	RawDetailWithMaybeInstallments: "SALDO ANTERIOR",
	RawAmountARS:                   "222.111,66",
	RawAmountUSD:                   "110,00",
}

TestSaldoAnteriorRow represents a "SALDO ANTERIOR" row with ARS and USD amounts.

View Source
var TestSaldoAnteriorText = "                        SALDO ANTERIOR                                           222.111,66             110,00 "

TestSaldoAnteriorText is the raw text that produces TestSaldoAnteriorRow when parsed.

View Source
var TestShortText = "short text"

TestShortText is the raw text that produces TestShortTextRow when parsed.

View Source
var TestShortTextRow = Row{
	RawText:                        TestShortText,
	RawOriginalDate:                "short text",
	RawReceiptNumber:               "",
	RawDetailWithMaybeInstallments: "",
	RawAmountARS:                   "",
	RawAmountUSD:                   "",
}

TestShortTextRow represents a row with text too short to contain all fields.

View Source
var TestTablePositions = PDFTablePositions{
	OriginalDateStart: 0,
	OriginalDateEnd:   12,
	ReceiptStart:      14,
	ReceiptEnd:        22,
	DetailStart:       24,
	DetailEnd:         74,
	ARSAmountStart:    76,
	ARSAmountEnd:      91,
	USDAmountStart:    93,
	USDAmountEnd:      110,
}

TestTablePositions defines the standard table positions used in tests. These positions match the Santander PDF format used in most test cases.

View Source
var TestWhitespaceRow = Row{
	RawText:                        TestWhitespaceText,
	RawOriginalDate:                "01 Enero",
	RawReceiptNumber:               "1  123456",
	RawDetailWithMaybeInstallments: "*  Detalle con espacios   y\tcaracteres!@#   1.234,5",
	RawAmountARS:                   "78,90",
	RawAmountUSD:                   "",
}

TestWhitespaceRow represents a row with various whitespace and special characters.

View Source
var TestWhitespaceText = "   01 Enero  01  123456 *  Detalle con espacios   y\tcaracteres!@#   1.234,56     78,90   "

TestWhitespaceText is the raw text that produces TestWhitespaceRow when parsed.

Functions

This section is empty.

Types

type FakeTableIterator

type FakeTableIterator struct {
	// contains filtered or unexported fields
}

FakeTableIterator is a test-friendly implementation of a table row iterator that allows preloading with a slice of Rows to be returned in sequence.

func NewFakeTableIterator

func NewFakeTableIterator(rows []Row) *FakeTableIterator

NewFakeTableIterator creates a new FakeTableIterator with the given rows.

func (*FakeTableIterator) Next

func (f *FakeTableIterator) Next() (Row, bool)

Next returns the next Row in the sequence and a boolean indicating if there are more rows to return.

func (*FakeTableIterator) NextUtilRegexIsMatched

func (f *FakeTableIterator) NextUtilRegexIsMatched(regex *regexp.Regexp) (Row, bool)

NextUtilRegexIsMatched returns the next Row whose RawText matches the given regex.

type PDFTablePositions

type PDFTablePositions struct {
	OriginalDateStart int
	OriginalDateEnd   int

	ReceiptStart int
	ReceiptEnd   int

	DetailStart int
	DetailEnd   int

	ARSAmountStart int
	ARSAmountEnd   int

	USDAmountStart int
	USDAmountEnd   int
}

PDFTablePositions are the known positions of the columns of the table of the PDF. They start at 0, so the first character is at position 0. All the positions are inclusive

type RealTableIterator

type RealTableIterator struct {
	// contains filtered or unexported fields
}

RealTableIterator is used to iterate the rows of a PDF table. The idea is to share this iterator between different functions, so it can continue where the old one finished. The "Real" prefix is because couldn't think of a better name to distinguish it from the interface and fake.

func NewRealTableIterator

func NewRealTableIterator(rows []Row) *RealTableIterator

func (*RealTableIterator) Next

func (it *RealTableIterator) Next() (Row, bool)

Next implements the TableIterator interface. It returns the next Row in the sequence and a boolean indicating if there are more rows to return.

func (*RealTableIterator) NextUtilRegexIsMatched

func (it *RealTableIterator) NextUtilRegexIsMatched(regex *regexp.Regexp) (Row, bool)

NextUtilRegexIsMatched implements the TableIterator interface. It iterates through the rows until a row matches the regex and returns it.

type Row

type Row struct {
	// Full text of the row
	RawText string

	// The columns of the row
	RawOriginalDate                string
	RawReceiptNumber               string
	RawDetailWithMaybeInstallments string
	RawAmountARS                   string
	RawAmountUSD                   string
}

func NewRow

func NewRow(rawText, originalDate, receiptNumber, detail, arsAmount, usdAmount string) Row

func (Row) MatchesMovementWithoutYearAndMonth

func (m Row) MatchesMovementWithoutYearAndMonth() bool

type RowFactory

type RowFactory struct {
	// contains filtered or unexported fields
}

RowFactory is responsible for creating Row instances with specific table position configurations.

func NewRowFactory

func NewRowFactory(positions PDFTablePositions) *RowFactory

NewRowFactory creates a new RowFactory with the given table positions.

func (*RowFactory) CreateRow

func (f *RowFactory) CreateRow(rawText string) Row

CreateRow creates a new Row instance using the factory's stored positions and the provided text.

type TableIterator

type TableIterator interface {
	// Next returns the next Row in the sequence and a boolean indicating if there are more rows to return.
	Next() (Row, bool)
	// NextUtilRegexIsMatched iterates through the rows until a row matches the regex and returns it
	NextUtilRegexIsMatched(regex *regexp.Regexp) (Row, bool)
}

type TableIteratorFactory

type TableIteratorFactory struct {
	// contains filtered or unexported fields
}

TableIteratorFactory is responsible for creating table iterators with specific configurations.

func NewTableIteratorFactory

func NewTableIteratorFactory(rowFactory *RowFactory) *TableIteratorFactory

func (*TableIteratorFactory) CreateIterator

func (f *TableIteratorFactory) CreateIterator(docIterator pdfwrapper.DocumentIterator) *RealTableIterator

CreateIterator creates a new RealTableIterator instance using the factory's configuration and the provided document iterator. It pre-parses all rows from the DocumentIterator during construction for efficient iteration.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL