sitemap

command module
v0.0.0-...-d3c3f10 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 20, 2015 License: MIT Imports: 5 Imported by: 0

README

Sitemap Generator

Summary

This little utility, given a starting URL, will crawl a website and find all the static assets and links on that site.

Design Goals

  • Crawl an entire site and report on its structure
  • Flexible output formats (i.e. json, tab, digraph)
  • Customize performance characteristics

Design Decisions

  • The utility will stay within the same domain
  • THe utility, when it finds duplicate URLs, it will not traverse into its links, but still report on the links found.

Features

  • Ability to save results to a file
  • Set number of worker threads/goroutines to crawl a site
  • Set rate limiter, if desired
  • Set inactivity timeout
  • Read in saved results and redisplay in different formats

How to get it

(1) You have Docker installed
docker run mkboudreau/sitemap ....
(2) You have Go installed
go get github.com/mkboudreau/sitemap 
make install

Example Usage

Crawl site with sensible defaults

sitemap www.microsoft.com

Crawl site with 50 workers

sitemap -w 50 www.microsoft.com

Crawl site with rate limiting turned off

sitemap -r 0s www.microsoft.com

Crawl site and output JSON

sitemap -f json www.microsoft.com

Crawl site and output tabular format (default)

sitemap -f tab www.microsoft.com

Crawl site and output digraph (dot)

sitemap -f digraph www.microsoft.com

Crawl site and save results to file

sitemap -o saved.json www.microsoft.com

Use saved results and output as a digraph

sitemap -i saved.json -f digraph

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL