Documentation
¶
Overview ¶
Package threading provides generic email thread reconstruction. It can reconstruct threading relationships even when In-Reply-To and References headers are missing, using subject matching, date proximity, and embedded message hints.
Index ¶
- func DefaultSubjectNormalizer(subject string) string
- type Config
- type EmbeddedHint
- type Reconstructor
- func (r *Reconstructor) AddMessage(msg ThreadableMessage)
- func (r *Reconstructor) AddMessages(msgs []ThreadableMessage)
- func (r *Reconstructor) GetThreadingInfo(msgID string) (ThreadingInfo, bool)
- func (r *Reconstructor) GetThreads() []*Thread
- func (r *Reconstructor) Reconstruct()
- func (r *Reconstructor) Stats() Stats
- type Stats
- type Thread
- type ThreadableMessage
- type ThreadingInfo
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DefaultSubjectNormalizer ¶
DefaultSubjectNormalizer removes common reply/forward prefixes.
Types ¶
type Config ¶
type Config struct {
// MaxParentAge is the maximum age difference for subject-based matching.
// Default: 7 days.
MaxParentAge time.Duration
// RequireParticipantOverlap requires messages to share at least one
// participant for subject-based matching. Default: true.
RequireParticipantOverlap bool
// SubjectNormalizer is a custom function to normalize subjects.
// If nil, uses DefaultSubjectNormalizer.
SubjectNormalizer func(string) string
}
Config contains configuration options for thread reconstruction.
func DefaultConfig ¶
func DefaultConfig() Config
DefaultConfig returns the default reconstruction configuration.
type EmbeddedHint ¶
type EmbeddedHint struct {
// SenderPattern is a pattern to match against participant addresses
// (e.g., "john.smith" or "john.smith@enron.com").
SenderPattern string
// Date is the date of the embedded message (if parseable).
Date time.Time
// Subject is the subject of the embedded message (if available).
Subject string
// Type indicates the type of embedding: "reply", "forward", "quoted".
Type string
}
EmbeddedHint represents information about a message embedded in the body, such as a quoted reply or forwarded message.
type Reconstructor ¶
type Reconstructor struct {
// contains filtered or unexported fields
}
Reconstructor builds thread relationships across messages.
func NewReconstructor ¶
func NewReconstructor() *Reconstructor
NewReconstructor creates a new thread reconstructor with default config.
func NewReconstructorWithConfig ¶
func NewReconstructorWithConfig(config Config) *Reconstructor
NewReconstructorWithConfig creates a new thread reconstructor with custom config.
func (*Reconstructor) AddMessage ¶
func (r *Reconstructor) AddMessage(msg ThreadableMessage)
AddMessage adds a message to the reconstruction pool.
func (*Reconstructor) AddMessages ¶
func (r *Reconstructor) AddMessages(msgs []ThreadableMessage)
AddMessages adds multiple messages to the reconstruction pool.
func (*Reconstructor) GetThreadingInfo ¶
func (r *Reconstructor) GetThreadingInfo(msgID string) (ThreadingInfo, bool)
GetThreadingInfo returns the threading info for a message.
func (*Reconstructor) GetThreads ¶
func (r *Reconstructor) GetThreads() []*Thread
GetThreads returns all reconstructed threads.
func (*Reconstructor) Reconstruct ¶
func (r *Reconstructor) Reconstruct()
Reconstruct performs thread reconstruction across all messages.
func (*Reconstructor) Stats ¶
func (r *Reconstructor) Stats() Stats
Stats returns threading statistics.
type Stats ¶
type Stats struct {
TotalMessages int `json:"total_messages"`
TotalThreads int `json:"total_threads"`
UniqueSubjects int `json:"unique_subjects"`
MessagesWithParent int `json:"messages_with_parent"`
MessagesWithRefs int `json:"messages_with_refs"`
SingleMessageThreads int `json:"single_message_threads"`
SmallThreads int `json:"small_threads"` // 2-5 messages
MediumThreads int `json:"medium_threads"` // 6-20 messages
LargeThreads int `json:"large_threads"` // 21+ messages
}
Stats contains statistics about thread reconstruction.
type Thread ¶
type Thread struct {
// ID is a unique identifier for the thread.
ID string `json:"id"`
// Subject is the normalized subject of the thread.
Subject string `json:"subject"`
// RootMessageID is the MessageID of the first message in the thread.
RootMessageID string `json:"root_message_id"`
// MessageIDs contains all message IDs in the thread, sorted by date.
MessageIDs []string `json:"message_ids"`
// Participants contains all unique email addresses in the thread.
Participants []string `json:"participants"`
// StartDate is the date of the first message.
StartDate time.Time `json:"start_date"`
// EndDate is the date of the last message.
EndDate time.Time `json:"end_date"`
// Size is the number of messages in the thread.
Size int `json:"size"`
}
Thread represents a collection of related messages.
type ThreadableMessage ¶
type ThreadableMessage interface {
// GetMessageID returns the unique message identifier.
GetMessageID() string
// GetDate returns the message date.
GetDate() time.Time
// GetSubject returns the message subject.
GetSubject() string
// GetInReplyTo returns the In-Reply-To header value (may be empty).
GetInReplyTo() string
// GetReferences returns the References header values (may be empty).
GetReferences() []string
// GetParticipants returns all email addresses involved in the message
// (From, To, Cc, Bcc).
GetParticipants() []string
// GetEmbeddedMessageHints returns hints about embedded/quoted messages
// that can be used for threading when headers are missing.
GetEmbeddedMessageHints() []EmbeddedHint
// SetThreadingInfo is called after reconstruction to provide
// the computed threading information back to the message.
SetThreadingInfo(info ThreadingInfo)
}
ThreadableMessage is the interface that messages must implement for thread reconstruction.
type ThreadingInfo ¶
type ThreadingInfo struct {
// ThreadID is a unique identifier for the thread this message belongs to.
ThreadID string
// ParentID is the MessageID of the parent message in the thread.
// Empty if this is a root message.
ParentID string
// References is the reconstructed chain of message IDs leading to this message.
References []string
// Depth is the nesting depth in the thread (0 for root messages).
Depth int
}
ThreadingInfo contains the computed threading information for a message.