Analyzer π΅οΈββοΈ
Analyzer is a simple web application written in Go. It allows you to gather some basic analytics about any given website. Supported analytics:
- Document Type Definition (extracting it from the HTML if possible)
- Website's title
- Details about headings
- Whether website contains login form (searching for input types of password)
- List of links on the website
- URL
- Is it an internal or external link?
- Number of occurrences of this link
- Whether it's reachable (
HEAD request must return 200)
- HTTP status code for reachability check
How to build and run π
It's a very simple application, so not much is required to build and run this project. Below you can find necessary steps.
- Clone this repository to your local machine.
- Navigate to project's root directory.
- Build and run application using:
go run main.go
- Open browser and navigate to
localhost:8080
- Have fun π
Project details π
This project used go modules as it's dependency management tool. There's just one 3rd party library (not counting golang.org/x/net) called goquery. It's used for easy jQuery-like traversal of HTML.
Directories structure:
.
βββ LICENSE
βββ README.md
βββ config
βΒ Β βββ tmpl.go
βββ go.mod
βββ go.sum
βββ internal
βΒ Β βββ handler
βΒ Β βΒ Β βββ handlers.go
βΒ Β βββ models
βΒ Β βΒ Β βββ error.go
βΒ Β βΒ Β βββ report.go
βΒ Β βββ website
βΒ Β βββ analyze.go
βΒ Β βββ analyze_test.go
βΒ Β βββ fetch.go
βββ main.go
βββ web
βββ template
βββ index.gohtml
βββ report.gohtml
Possible suggestions and improvements π©Ή
What's available right now is a nice starter with many possibilities for future improvements and new features. This section is split into two separate categories - one for technological improvements and second one with a product related improvements.
Technology
- Dockerize this app.
- Replace
goquery with golang.org/x/net/html, so that we can get rid of 3rd party dependencies.
- Better modularization.
- Improving test coverage and adding benchmark tests.
- Creating a better UI and UX.
- Meaningful logs.
- Better routing.
Product
- Gathering more analytics
- Whether website uses JS etc.
- Extended login form checking to also include social network login buttons.
- Create a DTD extractor which would be able to determine HTML version instead of just returning whole
<!DOCTYPE>.
- Improving URL parsing (for links), so that it takes into account local paths, removes whitespaces etc.
- A CLI tool or just plain API instead of being a web app with a frontend.