document

package
v0.6.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 28, 2025 License: MIT Imports: 14 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var BatchCmd = &cobra.Command{
	Use:     "batch",
	Aliases: []string{"b"},
	Short:   "Batch create documents from a directory",
	Long: `Batch create documents from all files in a directory.

Supported file types:
- Text files (.txt, .md, .json, etc.)
- Image files (.jpg, .jpeg, .png, .gif, etc.)
- PDF files (.pdf) with text and image extraction

Features:
- Parallel processing with --parallel flag
- Automatic retry on failure with --retry flag
- Skip already processed files (tracks in .processed files)
- Visual progress indicators with emojis and colors
- Time estimation and progress tracking
- Optional CSV report generation with --create-report

Examples:
  weave docs batch --directory ./docs --collection MyCollection
  weave docs batch --dir ./docs --collection MyCollection --parallel 3
  weave docs batch --dir ./docs --collection MyCollection --create-report
  weave docs batch --dir ./docs --collection MyCollection --retry 3 --parallel 5`,
	Run: runBatchCreate,
}

BatchCmd represents the batch command

View Source
var CountCmd = &cobra.Command{
	Use:     "count COLLECTION_NAME [COLLECTION_NAME...]",
	Aliases: []string{"C"},
	Short:   "Count documents in one or more collections",
	Long: `Count the number of documents in one or more collections.

This command returns the total number of documents in the specified collection(s).
You can specify multiple collections to get counts for each one.

Examples:
  weave docs C MyCollection
  weave docs C RagMeDocs RagMeImages
  weave docs C Collection1 Collection2 Collection3`,
	Args: cobra.MinimumNArgs(1),
	Run:  runDocumentCount,
}

CountCmd represents the document count command

View Source
var CreateCmd = &cobra.Command{
	Use:     "create COLLECTION_NAME FILE_PATH",
	Aliases: []string{"c"},
	Short:   "Create a document from a file",
	Long: `Create a document in a collection from a file.

Supported file types:
- Text files (.txt, .md, .json, etc.) - Content goes to 'text' field
- Image files (.jpg, .jpeg, .png, .gif, etc.) - Base64 data goes to 'image_data' field
- PDF files (.pdf) - Text extracted and chunked, images extracted separately

The command will automatically:
- Detect file type and process accordingly
- Generate appropriate metadata
- Chunk text content (default 5000 chars, configurable with --chunk-size)
- Extract images from PDFs with OCR and EXIF data
- Create documents following WeaveDocs/WeaveImages schema (default) or RagMeDocs/RagMeImages (legacy)

For PDF files with images:
- Text chunks go to the main collection
- Extracted images go to a separate collection (use --image-collection)
- Images include OCR text, EXIF data, and captions when available
- Use --skip-all-images for text-only extraction (no image processing)

Examples:
  weave docs create MyCollection document.txt
  weave docs create MyCollection image.jpg
  weave docs create MyCollection document.pdf --chunk-size 500
  weave docs create WeaveDocs document.pdf --image-collection WeaveImages
  weave docs create RagMeDocs document.pdf --image-col RagMeImages
  weave docs create MyDocs document.pdf --skip-all-images  # text only
  weave docs create MyCollection document.txt --embedding text-embedding-3-small`,
	Args: cobra.ExactArgs(2),
	Run:  runDocumentCreate,
}

CreateCmd represents the document create command

View Source
var DeleteAllCmd = &cobra.Command{
	Use:     "delete-all COLLECTION_NAME",
	Aliases: []string{"del-all", "da"},
	Short:   "Delete all documents in a collection",
	Long: `Delete all documents from a specific collection.

⚠️  WARNING: This is a destructive operation that will permanently
delete ALL documents in the specified collection. Use with caution!

This command requires double confirmation:
1. First confirmation: Standard y/N prompt
2. Second confirmation: Red warning requiring exact "yes" input

Use --force to skip confirmations in scripts.

Example:
  weave document delete-all MyCollection`,
	Args: cobra.ExactArgs(1),
	Run:  runDocumentDeleteAll,
}

DeleteAllCmd represents the document delete-all command

View Source
var DeleteCmd = &cobra.Command{
	Use:     "delete COLLECTION_NAME [DOCUMENT_ID] [DOCUMENT_ID...]",
	Aliases: []string{"del", "d"},
	Short:   "Delete documents from a collection",
	Long: `Delete documents from a collection.

You can delete documents in six ways:
1. By single document ID: weave doc delete COLLECTION_NAME DOCUMENT_ID
2. By multiple document IDs: weave doc delete COLLECTION_NAME DOC_ID1 DOC_ID2 DOC_ID3
3. By metadata filter: weave doc delete COLLECTION_NAME --metadata key=value
4. By filename/name: weave doc delete COLLECTION_NAME --name filename.pdf
   (or use --filename as an alias)
5. By original filename (virtual): weave doc delete COLLECTION_NAME ORIGINAL_FILENAME --virtual
6. By pattern: weave doc delete COLLECTION_NAME --pattern "tmp*.png"

Pattern types (auto-detected):
- Shell glob: tmp*.png, tmp?.png, tmp[0-9].png
- Regex: tmp.*\.png, ^tmp.*\.png$, .*\.(png|jpg)$

When using --virtual flag, all chunks and images associated with the original filename
will be deleted in one operation.

Examples:
  weave docs delete MyCollection doc123
  weave docs d MyCollection doc123 doc456 doc789
  weave docs delete MyCollection --metadata filename=test.pdf
  weave docs delete MyCollection --name test_image.png
  weave docs delete MyCollection --filename test_image.png
  weave docs delete MyCollection test.pdf --virtual
  weave docs delete MyCollection --pattern "tmp*.png"
  weave docs delete MyCollection --pattern "tmp.*\.png"

⚠️  WARNING: This is a destructive operation that will permanently
delete the specified documents. Use with caution!`,
	Args: cobra.MinimumNArgs(1),
	Run:  runDocumentDelete,
}

DeleteCmd represents the document delete command

View Source
var DocumentCmd = &cobra.Command{
	Use:     "document",
	Aliases: []string{"doc", "docs"},
	Short:   "Document management",
	Long: `Manage documents in vector database collections.

This command provides subcommands to list, show, and delete documents.`,
}

DocumentCmd represents the document command

View Source
var ListCmd = &cobra.Command{
	Use:     "list COLLECTION_NAME",
	Aliases: []string{"ls", "l"},
	Short:   "List documents in a collection",
	Long: `List documents in a specific collection.

This command shows:
- Document IDs
- Content previews (truncated)
- Metadata information
- Document counts`,
	Args: cobra.ExactArgs(1),
	Run:  runDocumentList,
}

ListCmd represents the document list command

View Source
var PdfConvertCmd = &cobra.Command{
	Use:   "pdf-convert [PDF_FILENAME]",
	Short: "Convert PDF with CMYK images to RGB format",
	Long: `Convert a PDF file with CMYK images to RGB format using Ghostscript or ImageMagick.

This command is useful when you encounter PDFs with CMYK JPEG images that fail
during image extraction. The conversion process creates a new PDF with RGB images
that are compatible with standard image processing libraries.

Conversion Methods:
- Ghostscript (default, recommended) - High-quality RGB conversion
- ImageMagick - Alternative conversion method

The converted PDF will have the same text content and structure, but with RGB images
that can be successfully extracted and processed.

Examples:
  # Convert a single file using Ghostscript (default)
  weave docs pdf-convert document.pdf --ghostscript

  # Convert all PDFs in a directory (non-recursive)
  weave docs pdf-convert --directory /path/to/pdfs

  # Convert all PDFs in a directory and subdirectories
  weave docs pdf-convert --directory /path/to/pdfs --recurse

  # Convert using ImageMagick
  weave docs pdf-convert document.pdf --imagemagick

  # Specify custom output filename
  weave docs pdf-convert document.pdf --ghostscript --converted-filename output.pdf

  # Convert to RGB (auto-detects tool)
  weave docs pdf-convert document.pdf --rgb`,
	Args: cobra.MaximumNArgs(1),
	Run:  runPdfConvert,
}

PdfConvertCmd represents the pdf-convert command

View Source
var ShowCmd = &cobra.Command{
	Use:     "show COLLECTION_NAME [DOCUMENT_ID]",
	Aliases: []string{"s"},
	Short:   "Show documents from a collection",
	Long: `Show detailed information about documents from a collection.

You can show documents in three ways:
1. By document ID: weave doc show COLLECTION_NAME DOCUMENT_ID
2. By metadata filter: weave doc show COLLECTION_NAME --metadata key=value
3. By filename/name: weave doc show COLLECTION_NAME --name filename.pdf
   (or use --filename as an alias)

This command displays:
- Full document content
- Complete metadata
- Document ID and collection information

Use --schema to show the document schema including metadata structure.
Use --expand-metadata to show expanded metadata information.`,
	Args: cobra.RangeArgs(1, 2),
	Run:  runDocumentShow,
}

ShowCmd represents the document show command

View Source
var UpdateCmd = &cobra.Command{
	Use:     "update COLLECTION_NAME DOCUMENT_ID",
	Aliases: []string{"u"},
	Short:   "Update a document's content or metadata",
	Long: `Update an existing document in a collection.

You can update:
- Document content (using --content or --file)
- Document metadata fields (using --metadata)
- Both content and metadata together

The document is identified by its UUID or by metadata filter (--name, --metadata-filter).

Examples:
  # Update content from string
  weave docs update MyCollection abc-123 --content "New content here"

  # Update content from file
  weave docs update MyCollection abc-123 --file updated-doc.txt

  # Update metadata fields
  weave docs update MyCollection abc-123 --metadata title="Updated Title"
  weave docs update MyCollection abc-123 --metadata version=2,author="John Doe"

  # Update by document name (instead of ID)
  weave docs update MyCollection --name README.md --content "Updated README"

  # Update both content and metadata
  weave docs update MyCollection abc-123 --file new.txt --metadata version=2`,
	Args: cobra.MinimumNArgs(1),
	Run:  runDocumentUpdate,
}

UpdateCmd represents the document update command

Functions

This section is empty.

Types

type BatchProgress added in v0.3.0

type BatchProgress struct {
	TotalFiles         int
	ProcessedFiles     int
	SuccessFiles       int
	FailedFiles        int
	SkippedFiles       int
	TotalChunks        int
	TotalImages        int
	StartTime          time.Time
	EstimatedRemaining time.Duration
	CurrentFile        string
}

BatchProgress tracks overall batch processing progress

type ConversionTool added in v0.3.0

type ConversionTool int

ConversionTool represents the tool to use for PDF conversion

const (
	ToolGhostscript ConversionTool = iota
	ToolImageMagick
)

type ProcessedFileStatus added in v0.3.0

type ProcessedFileStatus struct {
	FilePath         string    `json:"file_path"`
	ProcessedAt      time.Time `json:"processed_at"`
	Success          bool      `json:"success"`
	Error            string    `json:"error,omitempty"`
	TextChunks       int       `json:"text_chunks"`
	Images           int       `json:"images"`
	ProcessingTimeMs int64     `json:"processing_time_ms"`
	FileSize         int64     `json:"file_size"`
	RetryCount       int       `json:"retry_count"`
	ChunksFailed     int       `json:"chunks_failed,omitempty"`
	ImagesFailed     int       `json:"images_failed,omitempty"`
}

ProcessedFileStatus represents the status of a processed file

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL