sync

package
v1.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 16, 2026 License: AGPL-3.0 Imports: 13 Imported by: 0

README

Sync Orchestrator

Package sync coordinates the complete synchronization workflow for reMarkable documents.

Overview

The Sync Orchestrator is the core component that ties together all the pieces of the legible application. It manages the end-to-end workflow from downloading documents from the reMarkable API to generating searchable PDFs with OCR text layers.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                     Sync Orchestrator                             │
└────────┬──────────────┬──────────────┬──────────────┬────────────┘
         │              │              │              │
         ▼              ▼              ▼              ▼
   ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐
   │ RMClient │  │  State   │  │Converter │  │   OCR    │
   └──────────┘  └──────────┘  └──────────┘  └──────────┘
         │              │              │              │
         ▼              ▼              ▼              ▼
    reMarkable      Local DB      .rmdoc → PDF   Tesseract
       API                                          OCR

Workflow

The orchestrator follows this workflow for each sync operation:

1. List Documents
  • Fetch document list from reMarkable API
  • Retrieve document metadata (ID, title, version, modified time, labels)
2. Filter by Labels (Optional)
  • If label filters are configured, filter documents to sync
  • Only documents matching configured labels are processed
  • If no filters, all documents are processed
3. Load Sync State
  • Load previous sync state from local storage
  • State contains document IDs, versions, and last sync times
  • Initialize fresh state if none exists
4. Identify New/Changed Documents
  • Compare API documents with local state
  • Documents to sync:
    • New documents (not in state)
    • Changed documents (version mismatch)
  • Skip unchanged documents (same version in state)
5. Process Each Document

For each document to sync, the orchestrator runs this pipeline:

a. Download

  • Download .rmdoc file from reMarkable API
  • Save to temporary directory

b. Convert

  • Convert .rmdoc to PDF format
  • Extract pages and render content
  • Generate standard PDF file

c. OCR (Optional, if enabled)

  • The converter handles OCR internally when configured with EnableOCR: true
  • Renders PDF pages to images at 300 DPI for optimal OCR accuracy
  • Performs OCR using Ollama vision models (llava, mistral-small3.1, etc.)
  • Extracts text with bounding boxes and confidence scores
  • Adds invisible searchable text layer to PDF automatically
  • Makes PDF searchable without changing visual appearance

e. Save

  • Move final PDF to configured output directory
  • Sanitize filename (remove invalid characters)
  • Update sync state with document info
6. Error Handling
  • Continue processing even if individual documents fail
  • Collect errors for failed documents
  • Update state incrementally (don't lose progress)
  • Report all errors at completion
7. Final Summary
  • Calculate totals and statistics
  • Report: total, processed, successful, failed
  • Display duration and any failures

Usage

Basic Usage
package main

import (
	"context"

	"github.com/platinummonkey/legible/internal/config"
	"github.com/platinummonkey/legible/internal/converter"
	"github.com/platinummonkey/legible/internal/pdfenhancer"
	"github.com/platinummonkey/legible/internal/rmclient"
	"github.com/platinummonkey/legible/internal/state"
	"github.com/platinummonkey/legible/internal/sync"
)

func main() {
	// Create configuration
	cfg := &config.Config{
		OutputDir:  "/path/to/output",
		Labels:     []string{"work", "personal"}, // Optional filter
		OCREnabled: true,
	}

	// Initialize components
	rmClient := rmclient.New(&rmclient.Config{})
	stateStore := state.New(&state.Config{})
	converter := converter.New(&converter.Config{})
	pdfEnhancer := pdfenhancer.New(&pdfenhancer.Config{})

	// Create orchestrator
	orch, err := sync.New(&sync.Config{
		Config:      cfg,
		RMClient:    rmClient,
		StateStore:  stateStore,
		Converter:   converter,
		PDFEnhancer: pdfEnhancer,
	})
	if err != nil {
		panic(err)
	}

	// Run sync
	ctx := context.Background()
	result, err := orch.Sync(ctx)
	if err != nil {
		panic(err)
	}

	// Display results
	fmt.Println(result.Summary())
}
With OCR Processor
// Add OCR processor if Tesseract is installed
ocrProc := ocr.New(&ocr.Config{
	Languages: []string{"eng"},
})

orch, err := sync.New(&sync.Config{
	Config:       cfg,
	RMClient:     rmClient,
	StateStore:   stateStore,
	Converter:    converter,
	OCRProcessor: ocrProc,      // Optional
	PDFEnhancer:  pdfEnhancer,
})
Result Handling
result, err := orch.Sync(ctx)
if err != nil {
	log.Fatal(err)
}

// Check for failures
if result.HasFailures() {
	fmt.Println("Some documents failed:")
	for _, failure := range result.Failures {
		fmt.Printf("  - %s: %v\n", failure.Title, failure.Error)
	}
}

// Display statistics
fmt.Printf("Processed %d/%d documents in %v\n",
	result.SuccessCount,
	result.ProcessedDocuments,
	result.Duration)

Configuration

The orchestrator uses the config.Config struct for configuration:

type Config struct {
	OutputDir  string   // Directory for output PDFs
	Labels     []string // Label filters (empty = all documents)
	OCREnabled bool     // Enable OCR processing
}

Label Filtering:

  • When Labels is empty or nil, all documents are synced
  • When Labels contains values, only documents with matching labels are synced
  • A document matches if it has any of the configured labels (OR logic)

Progress Tracking

The orchestrator logs progress at INFO level:

INFO: Listing documents from reMarkable API
INFO: Retrieved documents from API count=25
INFO: Filtered documents by labels original=25 filtered=15
INFO: Identified documents to sync count=5
INFO: Processing document document=1 total=5 id=abc-123 title="Meeting Notes"
INFO: Downloading document document=1 total=5
INFO: Converting to PDF document=1 total=5
INFO: Performing OCR document=1 total=5
INFO: Adding OCR text layer document=1 total=5
INFO: Document processing completed document=1 total=5 output=/path/to/output/Meeting Notes.pdf duration=15s
...
INFO: Sync workflow completed total=15 processed=5 successful=5 failed=0 duration=1m30s

Error Handling

The orchestrator implements robust error handling:

Graceful Degradation
  • If state loading fails, starts with empty state
  • If OCR is requested but processor is nil, logs warning and skips OCR
  • If text layer addition fails, uses original PDF
Continue on Failure
  • Individual document failures don't stop the sync
  • Errors are collected and reported at the end
  • State is updated incrementally after each successful document
Error Collection
result, _ := orch.Sync(ctx)

for _, failure := range result.Failures {
	fmt.Printf("Failed: %s (%s)\n", failure.Title, failure.DocumentID)
	fmt.Printf("  Error: %v\n", failure.Error)
}

Testing

The package includes comprehensive tests:

Unit Tests:

  • Orchestrator initialization and validation
  • Label filtering logic
  • Document identification (new/changed detection)
  • Filename sanitization
  • File copying utilities

Result Tests:

  • Result creation and modification
  • Success and failure tracking
  • Summary generation
  • String formatting

Integration Tests:

  • Full workflow tests require mocking of dependencies
  • Individual components have their own test suites

Run tests:

go test ./internal/sync
go test -v ./internal/sync

Implementation Details

Filename Sanitization

The orchestrator sanitizes document titles for use as filenames:

// Input: "Project: Design/Architecture (v2)"
// Output: "Project- Design-Architecture (v2).pdf"

Replaced Characters:

  • /- (slash)
  • \- (backslash)
  • :- (colon)
  • *_ (asterisk)
  • ?_ (question mark)
  • "' (quote)
  • <_ (less than)
  • >_ (greater than)
  • |- (pipe)
Incremental State Updates

State is saved after each successful document:

for _, doc := range docsToSync {
	// Process document...

	// Update state immediately
	currentState.UpdateDocument(docState)
	stateStore.Save(currentState)
}

Benefits:

  • Progress is not lost if sync is interrupted
  • Failed documents don't prevent state updates for successful ones
  • Next sync picks up where previous sync left off
Temporary File Handling

Each document is processed in its own temporary directory:

tmpDir := os.MkdirTemp("", fmt.Sprintf("rmsync-%s-*", doc.ID))
defer os.RemoveAll(tmpDir)

Benefits:

  • No conflicts between concurrent processing (future)
  • Clean up happens automatically via defer
  • Isolated processing environment per document

Future Enhancements

Immediate Priorities
  1. PDF-to-Image Rendering

    • Implement PDF page rendering for OCR input
    • Support multiple image formats (PNG, JPEG)
    • Configurable resolution/DPI
  2. Parallel Processing

    • Process multiple documents concurrently
    • Configurable worker pool size
    • Maintain progress tracking across workers
Long-term Enhancements
  1. Retry Logic

    • Exponential backoff for transient failures
    • Configurable retry attempts and delays
    • Distinguish between retryable and permanent errors
  2. Webhooks/Notifications

    • Callback hooks for sync events
    • Email/Slack notifications on completion
    • Custom notification plugins
  3. Incremental Sync Optimization

    • Skip downloading if local file exists with same version
    • Hash-based change detection
    • Resume interrupted downloads
  4. Dry Run Mode

    • Preview what would be synced without syncing
    • Estimate download sizes and durations
    • Validate configuration before actual sync
  5. Selective Sync

    • Sync specific documents by ID
    • Sync documents modified after a date
    • Sync only new documents (skip updates)
  6. Progress Callbacks

    • Programmatic progress reporting
    • Real-time UI updates
    • Progress bar integration

Dependencies

The orchestrator depends on these internal packages:

  • config: Application configuration
  • logger: Structured logging
  • rmclient: reMarkable API client
  • state: Sync state management
  • converter: .rmdoc to PDF conversion
  • ocr: OCR processing (optional)
  • pdfenhancer: PDF text layer addition
  • types: Shared type definitions

License

Part of legible project. See project LICENSE for details.

Documentation

Overview

Package sync provides document synchronization and orchestration.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	Config       *config.Config
	Logger       *logger.Logger
	RMClient     *rmclient.Client
	StateStore   *state.Manager
	Converter    *converter.Converter
	OCRProcessor *ocr.Processor
	PDFEnhancer  *pdfenhancer.PDFEnhancer
}

Config holds configuration for the sync orchestrator

type DocumentFailure

type DocumentFailure struct {
	DocumentID string
	Title      string
	Error      error
}

DocumentFailure contains information about a failed document

type DocumentResult

type DocumentResult struct {
	DocumentID string
	Title      string
	PageCount  int
	OutputPath string
	StartTime  time.Time
	Duration   time.Duration
}

DocumentResult contains the results of processing a single document

type Orchestrator

type Orchestrator struct {
	// contains filtered or unexported fields
}

Orchestrator coordinates the complete sync workflow

func New

func New(cfg *Config) (*Orchestrator, error)

New creates a new sync orchestrator

func (*Orchestrator) Sync

func (o *Orchestrator) Sync(ctx context.Context) (*Result, error)

Sync performs a complete synchronization workflow

type Result

type Result struct {
	TotalDocuments     int
	ProcessedDocuments int
	SuccessCount       int
	FailureCount       int
	Duration           time.Duration
	Successes          []DocumentResult
	Failures           []DocumentFailure
}

Result contains the results of a complete sync operation

func NewResult

func NewResult() *Result

NewResult creates a new sync result

func (*Result) AddError

func (sr *Result) AddError(docID, title string, err error)

AddError adds a failed document

func (*Result) AddSuccess

func (sr *Result) AddSuccess(result *DocumentResult)

AddSuccess adds a successful document result

func (*Result) HasFailures

func (sr *Result) HasFailures() bool

HasFailures returns true if there were any failures

func (*Result) String

func (sr *Result) String() string

String returns a string representation of the sync result

func (*Result) Summary

func (sr *Result) Summary() string

Summary returns a human-readable summary of the sync result

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL