sync

package

v1.3.1 Latest Latest Go to latest Published: Jan 16, 2026 License: AGPL-3.0 Imports: 13 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/platinummonkey/legible

Links

Open Source Insights

README ¶

Sync Orchestrator

Package sync coordinates the complete synchronization workflow for reMarkable documents.

Overview

The Sync Orchestrator is the core component that ties together all the pieces of the legible application. It manages the end-to-end workflow from downloading documents from the reMarkable API to generating searchable PDFs with OCR text layers.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                     Sync Orchestrator                             │
└────────┬──────────────┬──────────────┬──────────────┬────────────┘
         │              │              │              │
         ▼              ▼              ▼              ▼
   ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐
   │ RMClient │  │  State   │  │Converter │  │   OCR    │
   └──────────┘  └──────────┘  └──────────┘  └──────────┘
         │              │              │              │
         ▼              ▼              ▼              ▼
    reMarkable      Local DB      .rmdoc → PDF   Tesseract
       API                                          OCR

Workflow

The orchestrator follows this workflow for each sync operation:

1. List Documents

Fetch document list from reMarkable API
Retrieve document metadata (ID, title, version, modified time, labels)

2. Filter by Labels (Optional)

If label filters are configured, filter documents to sync
Only documents matching configured labels are processed
If no filters, all documents are processed

3. Load Sync State

Load previous sync state from local storage
State contains document IDs, versions, and last sync times
Initialize fresh state if none exists

4. Identify New/Changed Documents

Compare API documents with local state
Documents to sync:
- New documents (not in state)
- Changed documents (version mismatch)
Skip unchanged documents (same version in state)

5. Process Each Document

For each document to sync, the orchestrator runs this pipeline:

a. Download

Download .rmdoc file from reMarkable API
Save to temporary directory

b. Convert

Convert .rmdoc to PDF format
Extract pages and render content
Generate standard PDF file

c. OCR (Optional, if enabled)

The converter handles OCR internally when configured with EnableOCR: true
Renders PDF pages to images at 300 DPI for optimal OCR accuracy
Performs OCR using Ollama vision models (llava, mistral-small3.1, etc.)
Extracts text with bounding boxes and confidence scores
Adds invisible searchable text layer to PDF automatically
Makes PDF searchable without changing visual appearance

e. Save

Move final PDF to configured output directory
Sanitize filename (remove invalid characters)
Update sync state with document info

6. Error Handling

Continue processing even if individual documents fail
Collect errors for failed documents
Update state incrementally (don't lose progress)
Report all errors at completion

7. Final Summary

Calculate totals and statistics
Report: total, processed, successful, failed
Display duration and any failures

Usage

Basic Usage

package main

import (
	"context"

	"github.com/platinummonkey/legible/internal/config"
	"github.com/platinummonkey/legible/internal/converter"
	"github.com/platinummonkey/legible/internal/pdfenhancer"
	"github.com/platinummonkey/legible/internal/rmclient"
	"github.com/platinummonkey/legible/internal/state"
	"github.com/platinummonkey/legible/internal/sync"
)

func main() {
	// Create configuration
	cfg := &config.Config{
		OutputDir:  "/path/to/output",
		Labels:     []string{"work", "personal"}, // Optional filter
		OCREnabled: true,
	}

	// Initialize components
	rmClient := rmclient.New(&rmclient.Config{})
	stateStore := state.New(&state.Config{})
	converter := converter.New(&converter.Config{})
	pdfEnhancer := pdfenhancer.New(&pdfenhancer.Config{})

	// Create orchestrator
	orch, err := sync.New(&sync.Config{
		Config:      cfg,
		RMClient:    rmClient,
		StateStore:  stateStore,
		Converter:   converter,
		PDFEnhancer: pdfEnhancer,
	})
	if err != nil {
		panic(err)
	}

	// Run sync
	ctx := context.Background()
	result, err := orch.Sync(ctx)
	if err != nil {
		panic(err)
	}

	// Display results
	fmt.Println(result.Summary())
}

With OCR Processor

// Add OCR processor if Tesseract is installed
ocrProc := ocr.New(&ocr.Config{
	Languages: []string{"eng"},
})

orch, err := sync.New(&sync.Config{
	Config:       cfg,
	RMClient:     rmClient,
	StateStore:   stateStore,
	Converter:    converter,
	OCRProcessor: ocrProc,      // Optional
	PDFEnhancer:  pdfEnhancer,
})

Result Handling

result, err := orch.Sync(ctx)
if err != nil {
	log.Fatal(err)
}

// Check for failures
if result.HasFailures() {
	fmt.Println("Some documents failed:")
	for _, failure := range result.Failures {
		fmt.Printf("  - %s: %v\n", failure.Title, failure.Error)
	}
}

// Display statistics
fmt.Printf("Processed %d/%d documents in %v\n",
	result.SuccessCount,
	result.ProcessedDocuments,
	result.Duration)

Configuration

The orchestrator uses the config.Config struct for configuration:

type Config struct {
	OutputDir  string   // Directory for output PDFs
	Labels     []string // Label filters (empty = all documents)
	OCREnabled bool     // Enable OCR processing
}

Label Filtering:

When Labels is empty or nil, all documents are synced
When Labels contains values, only documents with matching labels are synced
A document matches if it has any of the configured labels (OR logic)

Progress Tracking

The orchestrator logs progress at INFO level:

INFO: Listing documents from reMarkable API
INFO: Retrieved documents from API count=25
INFO: Filtered documents by labels original=25 filtered=15
INFO: Identified documents to sync count=5
INFO: Processing document document=1 total=5 id=abc-123 title="Meeting Notes"
INFO: Downloading document document=1 total=5
INFO: Converting to PDF document=1 total=5
INFO: Performing OCR document=1 total=5
INFO: Adding OCR text layer document=1 total=5
INFO: Document processing completed document=1 total=5 output=/path/to/output/Meeting Notes.pdf duration=15s
...
INFO: Sync workflow completed total=15 processed=5 successful=5 failed=0 duration=1m30s

Error Handling

The orchestrator implements robust error handling:

Graceful Degradation

If state loading fails, starts with empty state
If OCR is requested but processor is nil, logs warning and skips OCR
If text layer addition fails, uses original PDF

Continue on Failure

Individual document failures don't stop the sync
Errors are collected and reported at the end
State is updated incrementally after each successful document

Error Collection

result, _ := orch.Sync(ctx)

for _, failure := range result.Failures {
	fmt.Printf("Failed: %s (%s)\n", failure.Title, failure.DocumentID)
	fmt.Printf("  Error: %v\n", failure.Error)
}

Testing

The package includes comprehensive tests:

Unit Tests:

Orchestrator initialization and validation
Label filtering logic
Document identification (new/changed detection)
Filename sanitization
File copying utilities

Result Tests:

Result creation and modification
Success and failure tracking
Summary generation
String formatting

Integration Tests:

Full workflow tests require mocking of dependencies
Individual components have their own test suites

Run tests:

go test ./internal/sync
go test -v ./internal/sync

Implementation Details

Filename Sanitization

The orchestrator sanitizes document titles for use as filenames:

// Input: "Project: Design/Architecture (v2)"
// Output: "Project- Design-Architecture (v2).pdf"

Replaced Characters:

/ → - (slash)
\ → - (backslash)
: → - (colon)
* → _ (asterisk)
? → _ (question mark)
" → ' (quote)
< → _ (less than)
> → _ (greater than)
| → - (pipe)

Incremental State Updates

State is saved after each successful document:

for _, doc := range docsToSync {
	// Process document...

	// Update state immediately
	currentState.UpdateDocument(docState)
	stateStore.Save(currentState)
}

Benefits:

Progress is not lost if sync is interrupted
Failed documents don't prevent state updates for successful ones
Next sync picks up where previous sync left off

Temporary File Handling

Each document is processed in its own temporary directory:

tmpDir := os.MkdirTemp("", fmt.Sprintf("rmsync-%s-*", doc.ID))
defer os.RemoveAll(tmpDir)

Benefits:

No conflicts between concurrent processing (future)
Clean up happens automatically via defer
Isolated processing environment per document

Future Enhancements

Immediate Priorities

PDF-to-Image Rendering
- Implement PDF page rendering for OCR input
- Support multiple image formats (PNG, JPEG)
- Configurable resolution/DPI
Parallel Processing
- Process multiple documents concurrently
- Configurable worker pool size
- Maintain progress tracking across workers

Long-term Enhancements

Retry Logic
- Exponential backoff for transient failures
- Configurable retry attempts and delays
- Distinguish between retryable and permanent errors
Webhooks/Notifications
- Callback hooks for sync events
- Email/Slack notifications on completion
- Custom notification plugins
Incremental Sync Optimization
- Skip downloading if local file exists with same version
- Hash-based change detection
- Resume interrupted downloads
Dry Run Mode
- Preview what would be synced without syncing
- Estimate download sizes and durations
- Validate configuration before actual sync
Selective Sync
- Sync specific documents by ID
- Sync documents modified after a date
- Sync only new documents (skip updates)
Progress Callbacks
- Programmatic progress reporting
- Real-time UI updates
- Progress bar integration

Dependencies

The orchestrator depends on these internal packages:

config: Application configuration
logger: Structured logging
rmclient: reMarkable API client
state: Sync state management
converter: .rmdoc to PDF conversion
ocr: OCR processing (optional)
pdfenhancer: PDF text layer addition
types: Shared type definitions

License

Part of legible project. See project LICENSE for details.

Documentation ¶

Overview ¶

Package sync provides document synchronization and orchestration.

Index ¶

type Config
type DocumentFailure
type DocumentResult
type Orchestrator
- func New(cfg *Config) (*Orchestrator, error)
- func (o *Orchestrator) Sync(ctx context.Context) (*Result, error)
type Result
- func NewResult() *Result

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Config ¶

type Config struct {
	Config       *config.Config
	Logger       *logger.Logger
	RMClient     *rmclient.Client
	StateStore   *state.Manager
	Converter    *converter.Converter
	OCRProcessor *ocr.Processor
	PDFEnhancer  *pdfenhancer.PDFEnhancer
}

Config holds configuration for the sync orchestrator

type DocumentFailure ¶

type DocumentFailure struct {
	DocumentID string
	Title      string
	Error      error
}

DocumentFailure contains information about a failed document

type DocumentResult ¶

type DocumentResult struct {
	DocumentID string
	Title      string
	PageCount  int
	OutputPath string
	StartTime  time.Time
	Duration   time.Duration
}

DocumentResult contains the results of processing a single document

type Orchestrator ¶

type Orchestrator struct {
	// contains filtered or unexported fields
}

Orchestrator coordinates the complete sync workflow

func New ¶

func New(cfg *Config) (*Orchestrator, error)

New creates a new sync orchestrator

func (*Orchestrator) Sync ¶

func (o *Orchestrator) Sync(ctx context.Context) (*Result, error)

Sync performs a complete synchronization workflow

type Result ¶

type Result struct {
	TotalDocuments     int
	ProcessedDocuments int
	SuccessCount       int
	FailureCount       int
	Duration           time.Duration
	Successes          []DocumentResult
	Failures           []DocumentFailure
}

Result contains the results of a complete sync operation

func NewResult ¶

func NewResult() *Result

NewResult creates a new sync result

func (*Result) AddError ¶

func (sr *Result) AddError(docID, title string, err error)

AddError adds a failed document

func (*Result) AddSuccess ¶

func (sr *Result) AddSuccess(result *DocumentResult)

AddSuccess adds a successful document result

func (*Result) HasFailures ¶

func (sr *Result) HasFailures() bool

HasFailures returns true if there were any failures

func (*Result) String ¶

func (sr *Result) String() string

String returns a string representation of the sync result

func (*Result) Summary ¶

func (sr *Result) Summary() string

Summary returns a human-readable summary of the sync result

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL