archives

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 3, 2026 License: MIT Imports: 18 Imported by: 0

README

archives

A Go library for reading and browsing archive files in memory. Supports ZIP, TAR (with gzip, bzip2, xz compression), and Ruby gem formats.

Installation

go get github.com/git-pkgs/archives

Usage

package main

import (
	"fmt"
	"os"

	"github.com/git-pkgs/archives"
)

func main() {
	f, _ := os.Open("package.tar.gz")
	defer f.Close()

	reader, _ := archives.Open("package.tar.gz", f)
	defer reader.Close()

	// List all files
	files, _ := reader.List()
	for _, fi := range files {
		fmt.Println(fi.Path, fi.Size)
	}

	// List a specific directory
	dirFiles, _ := reader.ListDir("src")
	for _, fi := range dirFiles {
		fmt.Println(fi.Name, fi.IsDir)
	}

	// Extract a file
	rc, _ := reader.Extract("README.md")
	defer rc.Close()
	// read from rc...
}
Hashing

Open buffers the raw archive bytes in memory, so the reader can compute checksums of the original artifact without re-reading from the source. This is useful for verifying a downloaded package against the digest published by its registry.

reader, _ := archives.Open("rails-7.1.0.gem", f)
defer reader.Close()

sha, _ := reader.Hash(archives.SHA256)
fmt.Println(sha) // hex-encoded sha256 of the .gem file

// also available: archives.SHA512, archives.SHA1, archives.MD5

The hash is computed over the archive as it was passed to Open, not the decompressed contents. For nested formats like gems this means the outer .gem file, which is what rubygems.org publishes. If you already have the bytes in hand, OpenBytes skips the extra read:

reader, _ := archives.OpenBytes("pkg.tgz", data)
Prefix stripping

Some package formats wrap content in a directory (npm uses package/). OpenWithPrefix strips a path prefix from all entries:

reader, _ := archives.OpenWithPrefix("pkg.tgz", f, "package/")
// files are now accessible without the package/ prefix
Comparing versions

The diff subpackage compares two archives and produces unified diffs. It classifies each file as added, deleted, modified, or binary, and includes line-level diff output for text files.

import "github.com/git-pkgs/archives/diff"

result, _ := diff.Compare(oldReader, newReader)
for _, f := range result.Files {
	fmt.Printf("%s %s (+%d -%d)\n", f.Type, f.Path, f.LinesAdded, f.LinesDeleted)
	if f.Diff != "" {
		fmt.Println(f.Diff)
	}
}

Supported formats

  • .zip, .jar, .whl, .nupkg, .egg (ZIP-based)
  • .tar, .tar.gz, .tgz, .tar.bz2, .tar.xz
  • .gem (Ruby gems with nested data.tar.gz)

License

MIT

Documentation

Overview

Package archives provides in-memory archive reading and browsing capabilities.

It supports multiple archive formats including:

  • ZIP (.zip, .jar, .whl, .nupkg)
  • TAR (.tar, .tar.gz, .tgz, .tar.bz2, .tar.xz)
  • GEM (.gem - Ruby gems with nested tar structure)

The package is designed to work entirely in memory without writing to disk, making it suitable for browsing cached artifacts on-demand.

Index

Constants

View Source
const (
	SHA256 = "sha256"
	SHA512 = "sha512"
	SHA1   = "sha1"
	MD5    = "md5"
)

Supported hash algorithm names for Reader.Hash.

Variables

View Source
var ErrDecompressLimit = errors.New("decompressed content exceeds size limit")

Functions

This section is empty.

Types

type FileInfo

type FileInfo struct {
	Path           string    // Full path within archive
	Name           string    // Base name
	Size           int64     // Uncompressed size in bytes
	ModTime        time.Time // Modification time
	IsDir          bool      // Whether this is a directory
	Mode           uint32    // File mode/permissions
	CompressedSize int64     // Compressed size (if available)
}

FileInfo represents metadata about a file in an archive.

type Reader

type Reader interface {
	// List returns all files in the archive.
	List() ([]FileInfo, error)

	// ListDir returns files in a specific directory path.
	// Use "" or "/" for root directory.
	ListDir(dirPath string) ([]FileInfo, error)

	// Extract reads a specific file from the archive.
	// Returns io.ReadCloser for the file content.
	Extract(filePath string) (io.ReadCloser, error)

	// Hash returns the hex-encoded digest of the raw archive bytes using
	// the named algorithm. Supported algorithms are SHA256, SHA512, SHA1
	// and MD5. The hash is computed over the original archive as passed
	// to Open, not the decompressed contents.
	Hash(algo string) (string, error)

	// Close releases resources associated with the reader.
	Close() error
}

Reader provides methods to browse and extract files from archives.

func Open

func Open(filename string, content io.Reader) (Reader, error)

Open creates an archive reader for the given content. The filename is used to detect the archive format. The content reader will be read entirely into memory.

func OpenBytes added in v0.3.0

func OpenBytes(filename string, content []byte) (Reader, error)

OpenBytes is like Open but accepts the archive content as a byte slice. The slice is retained (not copied) for the lifetime of the Reader and must not be modified by the caller after this call.

func OpenBytesWithPrefix added in v0.3.0

func OpenBytesWithPrefix(filename string, content []byte, stripPrefix string) (Reader, error)

OpenBytesWithPrefix is like OpenWithPrefix but accepts the archive content as a byte slice. The slice is retained (not copied) for the lifetime of the Reader and must not be modified by the caller after this call.

func OpenWithPrefix

func OpenWithPrefix(filename string, content io.Reader, stripPrefix string) (Reader, error)

OpenWithPrefix opens an archive and strips the given prefix from all paths. This is useful for npm packages which wrap content in a "package/" directory.

Directories

Path Synopsis
Package diff provides utilities for comparing package versions.
Package diff provides utilities for comparing package versions.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL