Documentation
¶
Index ¶
- func ExtractDocxText(path string) (string, error)
- func ExtractPptxText(path string) (string, error)
- func ExtractTextFromTags(data []byte, tagPrefix string) string
- func FindFileInZip(reader *zip.ReadCloser, name string) *zip.File
- func FindFilesWithPrefix(reader *zip.ReadCloser, prefix, suffix string) []*zip.File
- func OpenOfficeFile(path string) (*zip.ReadCloser, error)
- func ReadZipFile(file *zip.File) ([]byte, error)
- type DocxMetadata
- type PptxMetadata
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ExtractDocxText ¶
ExtractDocxText extracts all text content from a DOCX file. Text is extracted from <w:t> tags within word/document.xml.
func ExtractPptxText ¶
ExtractPptxText extracts all text content from a PPTX file. Text is extracted from <a:t> tags within slide XML files.
func ExtractTextFromTags ¶
ExtractTextFromTags extracts text content between XML tags with the given prefix. For example, ExtractTextFromTags(data, "a:t") extracts text from <a:t>content</a:t>.
func FindFileInZip ¶
func FindFileInZip(reader *zip.ReadCloser, name string) *zip.File
FindFileInZip finds a file in a ZIP archive by exact name.
func FindFilesWithPrefix ¶
func FindFilesWithPrefix(reader *zip.ReadCloser, prefix, suffix string) []*zip.File
FindFilesWithPrefix finds all files in a ZIP archive matching a prefix and suffix.
func OpenOfficeFile ¶
func OpenOfficeFile(path string) (*zip.ReadCloser, error)
OpenOfficeFile opens an Office file (PPTX, DOCX, XLSX) as a ZIP archive. Caller is responsible for closing the returned reader.
Types ¶
type DocxMetadata ¶
DocxMetadata contains extracted metadata from a DOCX file.
func ExtractDocxMetadata ¶
func ExtractDocxMetadata(path string) (*DocxMetadata, error)
ExtractDocxMetadata extracts metadata from a DOCX file.
type PptxMetadata ¶
PptxMetadata contains extracted metadata from a PPTX file.
func ExtractPptxMetadata ¶
func ExtractPptxMetadata(path string) (*PptxMetadata, error)
ExtractPptxMetadata extracts metadata from a PPTX file.