Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type DocumentRow ¶ added in v0.3.0
type DocumentRow struct {
DocumentMd5 string
UrlMd5 string
StatusCode int // the HTTP status code returned during fetch
AccessedAt *time.Time // Nullable
Body *string // Fulltext of the webpage as markdown
}
DocumentRow represents a full-text document. The HTML version of a web page. However, the HTML body is not stored (for now). The page will be distilled to plain text. A markdown version will be stored on disk, again, for now.
type Extractor ¶
type Extractor interface {
GetName() string
GetDBPath() string
SetDBPath(string)
GetAllUrlsSince(ctx context.Context, conn *sql.DB, since time.Time) ([]UrlRow, error)
GetAllVisitsSince(ctx context.Context, conn *sql.DB, since time.Time) ([]VisitRow, error)
// Verify that the passed db can actually be connected to. In the case of
// sqlite, it's not uncommon for a db to be locked. The Open call will work
// but the db cannot be read.
VerifyConnection(ctx context.Context, conn *sql.DB) (bool, error)
}
type SearchableEntity ¶ added in v0.3.0
type SearchableEntity struct {
Id string `json:"id"`
Url string `json:"url"`
Title *string `json:"title"`
Description *string `json:"description"`
LastVisit *time.Time `json:"last_visit"`
Match *string `json:"match"`
MatchCount *int `json:"match_count"`
SumRank *float64 `json:"sum_rank"`
}
func UrlDbEntityToSearchableEntity ¶ added in v0.3.0
func UrlDbEntityToSearchableEntity(x UrlDbEntity) SearchableEntity
func UrlDbSearchEntityToSearchableEntity ¶ added in v0.4.0
func UrlDbSearchEntityToSearchableEntity(x UrlDbSearchEntity) SearchableEntity
type UrlDbEntity ¶
type UrlDbEntity struct {
UrlMd5 string
Url string
Title *string
Description *string
LastVisit *time.Time
Body *string
BodyMd5 *string
}
Initially this was a URL row representation but it was later augmented with body, which is only available via join.
type UrlDbSearchEntity ¶ added in v0.4.0
type UrlMetaRow ¶
Meta information about the URL
Click to show internal directories.
Click to hide internal directories.