sync

package
v0.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 4, 2026 License: MIT Imports: 25 Imported by: 0

Documentation

Overview

Package sync provides synchronization logic between Notion and local storage.

Index

Constants

This section is empty.

Variables

View Source
var ErrFileTooLarge = errors.New("file exceeds maximum size limit")

ErrFileTooLarge is returned when a file exceeds the maximum size limit.

Functions

func LoadConfig added in v0.5.0

func LoadConfig() error

LoadConfig loads configuration from environment variables. It should be called once at application startup.

func ResetConfig added in v0.5.0

func ResetConfig()

ResetConfig resets the global configuration, forcing a reload on next access. This is primarily used for testing.

Types

type CleanupResult added in v0.6.0

type CleanupResult struct {
	OrphanedPages     int
	DeletedRegistries int
	DeletedFiles      int
}

CleanupResult contains the result of a cleanup operation.

type Config added in v0.5.0

type Config struct {
	// BlockDepth is the maximum depth for block discovery (0 = unlimited).
	BlockDepth int
	// QueueDelay is the delay between processing queue files.
	QueueDelay time.Duration
	// MaxFileSize is the maximum file size to download in bytes.
	MaxFileSize int64
}

Config holds sync-related configuration loaded from environment variables.

func GetConfig added in v0.5.0

func GetConfig() *Config

GetConfig returns the global configuration. If not loaded, it loads with defaults.

type Crawler

type Crawler struct {
	// contains filtered or unexported fields
}

Crawler synchronizes Notion pages to local storage using folder-based organization.

func NewCrawler

func NewCrawler(client *notion.Client, st store.Store, opts ...CrawlerOption) *Crawler

NewCrawler creates a new crawler.

func (*Crawler) AddDatabase

func (c *Crawler) AddDatabase(ctx context.Context, databaseID, folder string, forceUpdate bool) error

AddDatabase adds all pages from a database to a folder.

func (*Crawler) AddRootPage

func (c *Crawler) AddRootPage(ctx context.Context, pageID, folder string, forceUpdate bool) error

AddRootPage adds a page as a root page in a folder and queues it for syncing.

func (*Crawler) Cleanup added in v0.6.0

func (c *Crawler) Cleanup(ctx context.Context, dryRun bool) (*CleanupResult, error)

Cleanup deletes orphaned pages that don't trace back to a root in root.md.

func (*Crawler) Commit added in v0.4.0

func (c *Crawler) Commit(ctx context.Context, message string) error

Commit commits the current transaction with the given message. After commit, a new transaction is automatically started.

func (*Crawler) CommitChanges

func (c *Crawler) CommitChanges(ctx context.Context, message string) error

CommitChanges commits pending changes to git.

func (*Crawler) EnsureTransaction added in v0.4.0

func (c *Crawler) EnsureTransaction(ctx context.Context) error

EnsureTransaction ensures a transaction is available. If no transaction exists, creates a new one.

func (*Crawler) GetPage

func (c *Crawler) GetPage(ctx context.Context, pageID string, folder string) error

GetPage fetches a single page and places it in the correct location based on its parent hierarchy. Unlike AddRootPage, this does not mark the page as a root page. If folder is empty, it will be determined from the parent chain.

func (*Crawler) GetRootPageIDs added in v0.6.0

func (c *Crawler) GetRootPageIDs(ctx context.Context) (map[string]bool, error)

GetRootPageIDs returns the page IDs of all roots in root.md.

func (*Crawler) GetStatus

func (c *Crawler) GetStatus(ctx context.Context, folderFilter string) (*StatusInfo, error)

GetStatus returns status information.

func (*Crawler) ListPages

func (c *Crawler) ListPages(ctx context.Context, folderFilter string, asTree bool) ([]*FolderInfo, error)

ListPages returns page information for display.

func (*Crawler) ParseRootMd added in v0.6.0

func (c *Crawler) ParseRootMd(ctx context.Context) (*RootManifest, error)

ParseRootMd reads and parses root.md from the repository root. Returns nil manifest and nil error if the file doesn't exist.

func (*Crawler) ProcessQueue

func (c *Crawler) ProcessQueue(
	ctx context.Context, folderFilter string, maxPages int, maxFiles int, maxQueueFiles int, maxTime time.Duration,
) error

ProcessQueue processes all queue entries, optionally filtered by folder. maxPages limits the number of pages to fetch (0 = unlimited). maxTime limits the duration of the sync (0 = unlimited).

func (*Crawler) ProcessQueueWithCallback

func (c *Crawler) ProcessQueueWithCallback(
	ctx context.Context,
	folderFilter string,
	maxPages, maxFiles, maxQueueFiles int,
	maxTime time.Duration,
	callback QueueCallback,
) error

ProcessQueueWithCallback is like ProcessQueue but calls the callback after each queue file is processed.

func (*Crawler) Pull

func (c *Crawler) Pull(ctx context.Context, opts PullOptions) (*PullResult, error)

Pull fetches all pages changed since the last pull and queues them for sync.

func (*Crawler) ReconcileRootMd added in v0.6.0

func (c *Crawler) ReconcileRootMd(ctx context.Context) error

ReconcileRootMd syncs root.md with registries on startup. - Creates empty root.md if it doesn't exist - Removes duplicates (by page ID) - Creates/updates registries to match root.md - Queues enabled root pages that haven't been synced yet.

func (*Crawler) Reindex

func (c *Crawler) Reindex(ctx context.Context, dryRun bool) error

Reindex rebuilds the registry from markdown files.

func (*Crawler) ScanPage

func (c *Crawler) ScanPage(ctx context.Context, pageID string) error

ScanPage re-scans a page to discover all child pages and queues them. This is useful for discovering new child pages under an existing root page.

func (*Crawler) SetTransaction added in v0.4.0

func (c *Crawler) SetTransaction(tx store.Transaction)

SetTransaction sets an external transaction.

func (*Crawler) Transaction added in v0.4.0

func (c *Crawler) Transaction() store.Transaction

Transaction returns the current transaction.

func (*Crawler) WriteRootMd added in v0.6.0

func (c *Crawler) WriteRootMd(ctx context.Context, manifest *RootManifest) error

WriteRootMd writes the manifest to root.md.

type CrawlerOption

type CrawlerOption func(*Crawler)

CrawlerOption configures the crawler.

func WithCrawlerLogger

func WithCrawlerLogger(l *slog.Logger) CrawlerOption

WithCrawlerLogger sets a custom logger.

type FileManifest

type FileManifest struct {
	NtnsyncVersion string    `json:"ntnsync_version"`
	FileID         string    `json:"file_id"`
	ParentPageID   string    `json:"parent_page_id"`
	DownloadedAt   time.Time `json:"downloaded_at"`
}

FileManifest is stored alongside downloaded files as {filename}.meta.json Contains metadata for local file identification.

type FileRegistry

type FileRegistry struct {
	NtnsyncVersion string    `json:"ntnsync_version"`
	ID             string    `json:"id"`         // File ID extracted from S3 URL
	FilePath       string    `json:"file_path"`  // Local file path (directory + name)
	SourceURL      string    `json:"source_url"` // Original S3 URL
	LastSynced     time.Time `json:"last_synced"`
}

FileRegistry is stored in .notion-sync/ids/file-{id}.json Contains metadata for tracking downloaded files (images, attachments, etc.).

type FolderInfo

type FolderInfo struct {
	Name          string
	RootPages     int
	TotalPages    int
	OrphanedPages int
	Pages         []*PageInfo
}

FolderInfo contains information about a folder.

type FolderStatus

type FolderStatus struct {
	Name        string
	PageCount   int
	RootPages   int
	LastSynced  *time.Time
	QueuedPages int
}

FolderStatus contains status for a specific folder.

type PageInfo

type PageInfo struct {
	ID         string
	Title      string
	Path       string
	LastSynced time.Time
	IsRoot     bool
	IsOrphaned bool
	ParentID   string
	Children   []*PageInfo
}

PageInfo contains displayable information about a page.

type PageRegistry

type PageRegistry struct {
	NtnsyncVersion string    `json:"ntnsync_version"`
	ID             string    `json:"id"`
	Type           string    `json:"type"` // "page" or "database"
	Folder         string    `json:"folder"`
	FilePath       string    `json:"file_path"`
	Title          string    `json:"title"`
	LastEdited     time.Time `json:"last_edited"`
	LastSynced     time.Time `json:"last_synced"`
	IsRoot         bool      `json:"is_root"`
	Enabled        bool      `json:"enabled,omitempty"` // Only meaningful for root pages
	ParentID       string    `json:"parent_id,omitempty"`
	Children       []string  `json:"children,omitempty"`
	ContentHash    string    `json:"content_hash,omitempty"`
}

PageRegistry is stored in .notion-sync/ids/page-{id}.json Contains all metadata needed to locate and identify a page or database.

type PullOptions

type PullOptions struct {
	Folder   string        // Filter to specific folder (empty = all folders)
	Since    time.Duration // Override for cutoff time (0 = use LastPullTime)
	MaxPages int           // Maximum number of pages to queue (0 = unlimited)
	All      bool          // Include pages not yet tracked
	DryRun   bool          // Preview without modifying
	Verbose  bool          // Show detailed output
}

PullOptions configures the pull operation.

type PullResult

type PullResult struct {
	PagesFound   int
	PagesQueued  int
	PagesSkipped int
	NewPages     int
	UpdatedPages int
	CutoffTime   time.Time
}

PullResult contains the result of a pull operation.

type QueueCallback

type QueueCallback func() error

QueueCallback is called after each queue file is processed (written or deleted).

type QueueInfo

type QueueInfo struct {
	Folder    string
	Type      string
	PageCount int
	QueueFile string
}

QueueInfo contains information about queue entries.

type RootEntry added in v0.6.0

type RootEntry struct {
	Folder  string
	Enabled bool
	URL     string
	PageID  string // Normalized, extracted from URL
}

RootEntry represents a row in root.md.

type RootManifest added in v0.6.0

type RootManifest struct {
	Entries []RootEntry
}

RootManifest represents root.md contents.

type State

type State struct {
	NtnsyncVersion   string     `json:"ntnsync_version"`
	Version          int        `json:"version"`
	Folders          []string   `json:"folders"`
	LastPullTime     *time.Time `json:"last_pull_time,omitempty"`
	OldestPullResult *time.Time `json:"oldest_pull_result,omitempty"` // Oldest page seen in last pull
}

State is persisted in .notion-sync/state.json Simplified to only contain folder names. Page metadata is stored in: - Frontmatter of markdown files (last_synced, file_path) - Page registries (.notion-sync/ids/page-{id}.json).

func NewState

func NewState() *State

NewState creates a new empty state.

func (*State) AddFolder

func (s *State) AddFolder(folder string)

AddFolder adds a folder to state if not already present.

func (*State) HasFolder

func (s *State) HasFolder(folder string) bool

HasFolder checks if a folder exists in state.

type StatusInfo

type StatusInfo struct {
	FolderCount    int
	TotalPages     int
	TotalRootPages int
	QueueEntries   []*QueueInfo
	Folders        map[string]*FolderStatus
}

StatusInfo contains sync status information.

type UserRegistry added in v0.6.0

type UserRegistry struct {
	NtnsyncVersion string    `json:"ntnsync_version"`
	ID             string    `json:"id"`
	Name           string    `json:"name"`
	Type           string    `json:"type,omitempty"`
	Email          string    `json:"email,omitempty"`
	LastFetched    time.Time `json:"last_fetched"`
}

UserRegistry is stored in .notion-sync/ids/user-{id}.json Contains cached user information to avoid repeated API calls.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL