scpr
scpr is a simple and straightforward webscraping CLI tool made to scrape page as markdown content, and developed to be used both by humans and by coding agents (either as an MCP server or as a skill).
scpr is written in Go and based on colly for web scraping and html-to-markdown for converting HTML pages to markdown.
Installation
Install with Go (v1.24+ required):
go install github.com/AstraBert/scpr
Install with NPM:
npm install @cle-does-things/scpr
Usage
Basic usage (scrape a single page):
scpr --url https://example.com --output ./scraped
This will scrape the page and save it as a markdown file in the ./scraped folder.
Recursive scraping
To scrape a page and all linked pages within the same domain:
scpr --url https://example.com --output ./scraped --recursive --allowed example.com --max 3
Parallel scraping
Speed up recursive scraping with multiple threads:
scpr --url https://example.com --output ./scraped --recursive --allowed example.com --max 2 --parallel 5
Additional options
--log - Set logging level (info, debug, warn, error)
--max - Maximum depth of pages to follow (default: 1)
--parallel - Number of concurrent threads (default: 1)
--allowed - Allowed domains for recursive scraping (can be specified multiple times)
For more details, run:
scpr --help
As a stdio MCP server
Start the MCP server with:
scpr mcp
And configure it in agents using:
{
"mcpServers": {
"web-scraping": {
"type": "stdio",
"command": "scpr",
"args": [
"mcp"
],
"env": {}
}
}
}
The above JSON snippet is reported as used by Claude Code, adapt it to your agent before using it
Contributing
Contributions are welcome! Please read the Contributing Guide to get started.
License
This project is licensed under the MIT License