grab

package
v0.0.0-...-16800bc Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 7, 2025 License: MIT Imports: 15 Imported by: 0

Documentation

Index

Constants

View Source
const (
	//charset
	UTF8    = Charset("UTF-8")
	GB18030 = Charset("GB18030")
	UNKNOWN = Charset("UNKNOWN")
	// hostset
	CNBLOGS = Charset("CNBLOGS")
	CSDN    = Charset("CSDN")
	JB51    = Charset("JB51")
	CTO51   = Charset("CTO51")
	JIANSHU = Charset("JIANSHU")
	ZHIHU   = Charset("ZHIHU")
	WEIXIN  = Charset("WEIXIN")
)

Variables

View Source
var HostMap = map[Charset]string{
	CNBLOGS: "cnblogs.com",
	CSDN:    "blog.csdn.net",
	JB51:    "jb51.net",
	CTO51:   "51cto.com",
	JIANSHU: "jianshu.com",
	ZHIHU:   "zhihu.com",
	WEIXIN:  "mp.weixin.qq.com",
}

Functions

func AutoSummary

func AutoSummary(body string, l int) string

自动提取文章摘要

func FirstImgProcessor

func FirstImgProcessor(html string) string

提取第一章图作为封面

func PreCodeLayoutProcessor

func PreCodeLayoutProcessor(html string) string

func SafetyProcessor

func SafetyProcessor(html string) string

安全处理HTML文档,过滤危险标签和属性.

func StripTags

func StripTags(s string) string

Types

type Charset

type Charset string

type Grab

type Grab struct{}

func (Grab) GetHtml

func (g Grab) GetHtml(url, host string) (title string, body string, err error)

GetHtml 访问URL,抓取html

func (Grab) Html2Markdown

func (g Grab) Html2Markdown(htmlstr string) (md string)

convert html to markdown 将html转成markdown

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL