GO library to convert files of multiple formats to text understandable by LLM
File2LLM is specifically designed to work with LLMs. Unlike other Golang solutions, it preserves text location, padding, and formatting, adding structural boundaries that are understandable by LLMs. It also performs additional processing to ensure that the extracted text is properly interpretable by LLMs.
File2LLM can handle nested file formats (such as archives) by recursively reading them and creating structured file information suitable for LLM input.
It's optimized with custom CGo code and Assembler.
Example
Get the main file2llm library
go get -u github.com/opengs/file2llm
Install dependencies to work with PDF and images (OCR). This is optional.