sync

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 25, 2026 License: MIT Imports: 22 Imported by: 0

Documentation

Overview

Package sync provides synchronization logic between Notion and local storage.

Index

Constants

This section is empty.

Variables

View Source
var ErrFileTooLarge = errors.New("file exceeds maximum size limit")

ErrFileTooLarge is returned when a file exceeds the maximum size limit.

Functions

This section is empty.

Types

type Crawler

type Crawler struct {
	// contains filtered or unexported fields
}

Crawler synchronizes Notion pages to local storage using folder-based organization.

func NewCrawler

func NewCrawler(client *notion.Client, st store.Store, opts ...CrawlerOption) *Crawler

NewCrawler creates a new crawler.

func (*Crawler) AddDatabase

func (c *Crawler) AddDatabase(ctx context.Context, databaseID, folder string, forceUpdate bool) error

AddDatabase adds all pages from a database to a folder.

func (*Crawler) AddRootPage

func (c *Crawler) AddRootPage(ctx context.Context, pageID, folder string, forceUpdate bool) error

AddRootPage adds a page as a root page in a folder and queues it for syncing.

func (*Crawler) CommitChanges

func (c *Crawler) CommitChanges(ctx context.Context, message string) error

CommitChanges commits pending changes to git.

func (*Crawler) GetPage

func (c *Crawler) GetPage(ctx context.Context, pageID string, folder string) error

GetPage fetches a single page and places it in the correct location based on its parent hierarchy. Unlike AddRootPage, this does not mark the page as a root page. If folder is empty, it will be determined from the parent chain.

func (*Crawler) GetStatus

func (c *Crawler) GetStatus(ctx context.Context, folderFilter string) (*StatusInfo, error)

GetStatus returns status information.

func (*Crawler) ListPages

func (c *Crawler) ListPages(ctx context.Context, folderFilter string, asTree bool) ([]*FolderInfo, error)

ListPages returns page information for display.

func (*Crawler) ProcessQueue

func (c *Crawler) ProcessQueue(
	ctx context.Context, folderFilter string, maxPages int, maxFiles int, maxQueueFiles int, maxTime time.Duration,
) error

ProcessQueue processes all queue entries, optionally filtered by folder. maxPages limits the number of pages to fetch (0 = unlimited). maxTime limits the duration of the sync (0 = unlimited).

func (*Crawler) ProcessQueueWithCallback

func (c *Crawler) ProcessQueueWithCallback(
	ctx context.Context,
	folderFilter string,
	maxPages, maxFiles, maxQueueFiles int,
	maxTime time.Duration,
	callback QueueCallback,
) error

ProcessQueueWithCallback is like ProcessQueue but calls the callback after each queue file is processed.

func (*Crawler) Pull

func (c *Crawler) Pull(ctx context.Context, opts PullOptions) (*PullResult, error)

Pull fetches all pages changed since the last pull and queues them for sync.

func (*Crawler) Reindex

func (c *Crawler) Reindex(ctx context.Context, dryRun bool) error

Reindex rebuilds the registry from markdown files.

func (*Crawler) ScanPage

func (c *Crawler) ScanPage(ctx context.Context, pageID string) error

ScanPage re-scans a page to discover all child pages and queues them. This is useful for discovering new child pages under an existing root page.

type CrawlerOption

type CrawlerOption func(*Crawler)

CrawlerOption configures the crawler.

func WithCrawlerLogger

func WithCrawlerLogger(l *slog.Logger) CrawlerOption

WithCrawlerLogger sets a custom logger.

type FileManifest

type FileManifest struct {
	FileID       string    `json:"file_id"`
	ParentPageID string    `json:"parent_page_id"`
	DownloadedAt time.Time `json:"downloaded_at"`
}

FileManifest is stored alongside downloaded files as {filename}.meta.json Contains metadata for local file identification.

type FileRegistry

type FileRegistry struct {
	ID         string    `json:"id"`         // File ID extracted from S3 URL
	FilePath   string    `json:"file_path"`  // Local file path (directory + name)
	SourceURL  string    `json:"source_url"` // Original S3 URL
	LastSynced time.Time `json:"last_synced"`
}

FileRegistry is stored in .notion-sync/ids/file-{id}.json Contains metadata for tracking downloaded files (images, attachments, etc.).

type FolderInfo

type FolderInfo struct {
	Name          string
	RootPages     int
	TotalPages    int
	OrphanedPages int
	Pages         []*PageInfo
}

FolderInfo contains information about a folder.

type FolderStatus

type FolderStatus struct {
	Name        string
	PageCount   int
	RootPages   int
	LastSynced  *time.Time
	QueuedPages int
}

FolderStatus contains status for a specific folder.

type PageInfo

type PageInfo struct {
	ID         string
	Title      string
	Path       string
	LastSynced time.Time
	IsRoot     bool
	IsOrphaned bool
	ParentID   string
	Children   []*PageInfo
}

PageInfo contains displayable information about a page.

type PageRegistry

type PageRegistry struct {
	ID          string    `json:"id"`
	Type        string    `json:"type"` // "page" or "database"
	Folder      string    `json:"folder"`
	FilePath    string    `json:"file_path"`
	Title       string    `json:"title"`
	LastEdited  time.Time `json:"last_edited"`
	LastSynced  time.Time `json:"last_synced"`
	IsRoot      bool      `json:"is_root"`
	ParentID    string    `json:"parent_id,omitempty"`
	Children    []string  `json:"children,omitempty"`
	ContentHash string    `json:"content_hash,omitempty"`
}

PageRegistry is stored in .notion-sync/ids/page-{id}.json Contains all metadata needed to locate and identify a page or database.

type PullOptions

type PullOptions struct {
	Folder   string        // Filter to specific folder (empty = all folders)
	Since    time.Duration // Override for cutoff time (0 = use LastPullTime)
	MaxPages int           // Maximum number of pages to queue (0 = unlimited)
	All      bool          // Include pages not yet tracked
	DryRun   bool          // Preview without modifying
	Verbose  bool          // Show detailed output
}

PullOptions configures the pull operation.

type PullResult

type PullResult struct {
	PagesFound   int
	PagesQueued  int
	PagesSkipped int
	NewPages     int
	UpdatedPages int
	CutoffTime   time.Time
}

PullResult contains the result of a pull operation.

type QueueCallback

type QueueCallback func() error

QueueCallback is called after each queue file is processed (written or deleted).

type QueueInfo

type QueueInfo struct {
	Folder    string
	Type      string
	PageCount int
	QueueFile string
}

QueueInfo contains information about queue entries.

type State

type State struct {
	Version          int        `json:"version"`
	Folders          []string   `json:"folders"`
	LastPullTime     *time.Time `json:"last_pull_time,omitempty"`
	OldestPullResult *time.Time `json:"oldest_pull_result,omitempty"` // Oldest page seen in last pull
}

State is persisted in .notion-sync/state.json Simplified to only contain folder names. Page metadata is stored in: - Frontmatter of markdown files (last_synced, file_path) - Page registries (.notion-sync/ids/page-{id}.json).

func NewState

func NewState() *State

NewState creates a new empty state.

func (*State) AddFolder

func (s *State) AddFolder(folder string)

AddFolder adds a folder to state if not already present.

func (*State) HasFolder

func (s *State) HasFolder(folder string) bool

HasFolder checks if a folder exists in state.

type StatusInfo

type StatusInfo struct {
	FolderCount    int
	TotalPages     int
	TotalRootPages int
	QueueEntries   []*QueueInfo
	Folders        map[string]*FolderStatus
}

StatusInfo contains sync status information.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL