Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
View Source
var BatchCmd = &cobra.Command{ Use: "batch", Aliases: []string{"b"}, Short: "Batch create documents from a directory", Long: `Batch create documents from all files in a directory. Supported file types: - Text files (.txt, .md, .json, etc.) - Image files (.jpg, .jpeg, .png, .gif, etc.) - PDF files (.pdf) with text and image extraction Features: - Parallel processing with --parallel flag - Automatic retry on failure with --retry flag - Skip already processed files (tracks in .processed files) - Visual progress indicators with emojis and colors - Time estimation and progress tracking - Optional CSV report generation with --create-report Examples: weave docs batch --directory ./docs --collection MyCollection weave docs batch --dir ./docs --collection MyCollection --parallel 3 weave docs batch --dir ./docs --collection MyCollection --create-report weave docs batch --dir ./docs --collection MyCollection --retry 3 --parallel 5`, Run: runBatchCreate, }
BatchCmd represents the batch command
View Source
var CountCmd = &cobra.Command{ Use: "count COLLECTION_NAME [COLLECTION_NAME...]", Aliases: []string{"C"}, Short: "Count documents in one or more collections", Long: `Count the number of documents in one or more collections. This command returns the total number of documents in the specified collection(s). You can specify multiple collections to get counts for each one. Examples: weave docs C MyCollection weave docs C RagMeDocs RagMeImages weave docs C Collection1 Collection2 Collection3`, Args: cobra.MinimumNArgs(1), Run: runDocumentCount, }
CountCmd represents the document count command
View Source
var CreateCmd = &cobra.Command{ Use: "create COLLECTION_NAME FILE_PATH", Aliases: []string{"c"}, Short: "Create a document from a file", Long: `Create a document in a collection from a file. Supported file types: - Text files (.txt, .md, .json, etc.) - Content goes to 'text' field - Image files (.jpg, .jpeg, .png, .gif, etc.) - Base64 data goes to 'image_data' field - PDF files (.pdf) - Text extracted and chunked, images extracted separately The command will automatically: - Detect file type and process accordingly - Generate appropriate metadata - Chunk text content (default 5000 chars, configurable with --chunk-size) - Extract images from PDFs with OCR and EXIF data - Create documents following WeaveDocs/WeaveImages schema (default) or RagMeDocs/RagMeImages (legacy) For PDF files with images: - Text chunks go to the main collection - Extracted images go to a separate collection (use --image-collection) - Images include OCR text, EXIF data, and captions when available - Use --skip-all-images for text-only extraction (no image processing) Examples: weave docs create MyCollection document.txt weave docs create MyCollection image.jpg weave docs create MyCollection document.pdf --chunk-size 500 weave docs create WeaveDocs document.pdf --image-collection WeaveImages weave docs create RagMeDocs document.pdf --image-col RagMeImages weave docs create MyDocs document.pdf --skip-all-images # text only weave docs create MyCollection document.txt --embedding text-embedding-3-small`, Args: cobra.ExactArgs(2), Run: runDocumentCreate, }
CreateCmd represents the document create command
View Source
var DeleteAllCmd = &cobra.Command{ Use: "delete-all COLLECTION_NAME", Aliases: []string{"del-all", "da"}, Short: "Delete all documents in a collection", Long: `Delete all documents from a specific collection. ⚠️ WARNING: This is a destructive operation that will permanently delete ALL documents in the specified collection. Use with caution! This command requires double confirmation: 1. First confirmation: Standard y/N prompt 2. Second confirmation: Red warning requiring exact "yes" input Use --force to skip confirmations in scripts. Example: weave document delete-all MyCollection`, Args: cobra.ExactArgs(1), Run: runDocumentDeleteAll, }
DeleteAllCmd represents the document delete-all command
View Source
var DeleteCmd = &cobra.Command{ Use: "delete COLLECTION_NAME [DOCUMENT_ID] [DOCUMENT_ID...]", Aliases: []string{"del", "d"}, Short: "Delete documents from a collection", Long: `Delete documents from a collection. You can delete documents in six ways: 1. By single document ID: weave doc delete COLLECTION_NAME DOCUMENT_ID 2. By multiple document IDs: weave doc delete COLLECTION_NAME DOC_ID1 DOC_ID2 DOC_ID3 3. By metadata filter: weave doc delete COLLECTION_NAME --metadata key=value 4. By filename/name: weave doc delete COLLECTION_NAME --name filename.pdf (or use --filename as an alias) 5. By original filename (virtual): weave doc delete COLLECTION_NAME ORIGINAL_FILENAME --virtual 6. By pattern: weave doc delete COLLECTION_NAME --pattern "tmp*.png" Pattern types (auto-detected): - Shell glob: tmp*.png, tmp?.png, tmp[0-9].png - Regex: tmp.*\.png, ^tmp.*\.png$, .*\.(png|jpg)$ When using --virtual flag, all chunks and images associated with the original filename will be deleted in one operation. Examples: weave docs delete MyCollection doc123 weave docs d MyCollection doc123 doc456 doc789 weave docs delete MyCollection --metadata filename=test.pdf weave docs delete MyCollection --name test_image.png weave docs delete MyCollection --filename test_image.png weave docs delete MyCollection test.pdf --virtual weave docs delete MyCollection --pattern "tmp*.png" weave docs delete MyCollection --pattern "tmp.*\.png" ⚠️ WARNING: This is a destructive operation that will permanently delete the specified documents. Use with caution!`, Args: cobra.MinimumNArgs(1), Run: runDocumentDelete, }
DeleteCmd represents the document delete command
View Source
var DocumentCmd = &cobra.Command{ Use: "document", Aliases: []string{"doc", "docs"}, Short: "Document management", Long: `Manage documents in vector database collections. This command provides subcommands to list, show, and delete documents.`, }
DocumentCmd represents the document command
View Source
var ListCmd = &cobra.Command{ Use: "list COLLECTION_NAME", Aliases: []string{"ls", "l"}, Short: "List documents in a collection", Long: `List documents in a specific collection. This command shows: - Document IDs - Content previews (truncated) - Metadata information - Document counts`, Args: cobra.ExactArgs(1), Run: runDocumentList, }
ListCmd represents the document list command
View Source
var PdfConvertCmd = &cobra.Command{ Use: "pdf-convert [PDF_FILENAME]", Short: "Convert PDF with CMYK images to RGB format", Long: `Convert a PDF file with CMYK images to RGB format using Ghostscript or ImageMagick. This command is useful when you encounter PDFs with CMYK JPEG images that fail during image extraction. The conversion process creates a new PDF with RGB images that are compatible with standard image processing libraries. Conversion Methods: - Ghostscript (default, recommended) - High-quality RGB conversion - ImageMagick - Alternative conversion method The converted PDF will have the same text content and structure, but with RGB images that can be successfully extracted and processed. Examples: # Convert a single file using Ghostscript (default) weave docs pdf-convert document.pdf --ghostscript # Convert all PDFs in a directory (non-recursive) weave docs pdf-convert --directory /path/to/pdfs # Convert all PDFs in a directory and subdirectories weave docs pdf-convert --directory /path/to/pdfs --recurse # Convert using ImageMagick weave docs pdf-convert document.pdf --imagemagick # Specify custom output filename weave docs pdf-convert document.pdf --ghostscript --converted-filename output.pdf # Convert to RGB (auto-detects tool) weave docs pdf-convert document.pdf --rgb`, Args: cobra.MaximumNArgs(1), Run: runPdfConvert, }
PdfConvertCmd represents the pdf-convert command
View Source
var ShowCmd = &cobra.Command{ Use: "show COLLECTION_NAME [DOCUMENT_ID]", Aliases: []string{"s"}, Short: "Show documents from a collection", Long: `Show detailed information about documents from a collection. You can show documents in three ways: 1. By document ID: weave doc show COLLECTION_NAME DOCUMENT_ID 2. By metadata filter: weave doc show COLLECTION_NAME --metadata key=value 3. By filename/name: weave doc show COLLECTION_NAME --name filename.pdf (or use --filename as an alias) This command displays: - Full document content - Complete metadata - Document ID and collection information Use --schema to show the document schema including metadata structure. Use --expand-metadata to show expanded metadata information.`, Args: cobra.RangeArgs(1, 2), Run: runDocumentShow, }
ShowCmd represents the document show command
View Source
var UpdateCmd = &cobra.Command{ Use: "update COLLECTION_NAME DOCUMENT_ID", Aliases: []string{"u"}, Short: "Update a document's content or metadata", Long: `Update an existing document in a collection. You can update: - Document content (using --content or --file) - Document metadata fields (using --metadata) - Both content and metadata together The document is identified by its UUID or by metadata filter (--name, --metadata-filter). Examples: # Update content from string weave docs update MyCollection abc-123 --content "New content here" # Update content from file weave docs update MyCollection abc-123 --file updated-doc.txt # Update metadata fields weave docs update MyCollection abc-123 --metadata title="Updated Title" weave docs update MyCollection abc-123 --metadata version=2,author="John Doe" # Update by document name (instead of ID) weave docs update MyCollection --name README.md --content "Updated README" # Update both content and metadata weave docs update MyCollection abc-123 --file new.txt --metadata version=2`, Args: cobra.MinimumNArgs(1), Run: runDocumentUpdate, }
UpdateCmd represents the document update command
Functions ¶
This section is empty.
Types ¶
type BatchProgress ¶ added in v0.3.0
type BatchProgress struct {
TotalFiles int
ProcessedFiles int
SuccessFiles int
FailedFiles int
SkippedFiles int
TotalChunks int
TotalImages int
StartTime time.Time
EstimatedRemaining time.Duration
CurrentFile string
}
BatchProgress tracks overall batch processing progress
type ConversionTool ¶ added in v0.3.0
type ConversionTool int
ConversionTool represents the tool to use for PDF conversion
const ( ToolGhostscript ConversionTool = iota ToolImageMagick )
type ProcessedFileStatus ¶ added in v0.3.0
type ProcessedFileStatus struct {
FilePath string `json:"file_path"`
ProcessedAt time.Time `json:"processed_at"`
Success bool `json:"success"`
Error string `json:"error,omitempty"`
TextChunks int `json:"text_chunks"`
Images int `json:"images"`
ProcessingTimeMs int64 `json:"processing_time_ms"`
FileSize int64 `json:"file_size"`
RetryCount int `json:"retry_count"`
ChunksFailed int `json:"chunks_failed,omitempty"`
ImagesFailed int `json:"images_failed,omitempty"`
}
ProcessedFileStatus represents the status of a processed file
Click to show internal directories.
Click to hide internal directories.