scraperlite

command module
v0.0.0-...-b154aa7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 26, 2025 License: BSD-3-Clause Imports: 7 Imported by: 0

README

scraperlite

Scrape text and HTML based on CSS selectors and save contents to a SQLite database.

Repeated runs save changed content and the observation timestamp.

Example

scraperlite https://go.dev \
    popularCLIPackages.html '#main-content > section.WhyGo > div > ul > li:nth-child(2) > div.WhyGo-reasonFooter > div.WhyGo-reasonPackages > ul' \
    whyWebDevelopment.txt '#main-content > section.WhyGo > div > ul > li:nth-child(3) > div.WhyGo-reasonDetails > div.WhyGo-reasonText > p'

In a sqlite3 shell:

sqlite> select t, substr(content->'popularCLIPackages'->>'html', 1, 20) || '...' as popular_packages_html,
  content->'whyWebDevelopment'->>'txt' as why_web_development
  from observations join contents on (contents.id=content_id)
  order by t;
+----------------------------------+-------------------------+-----------------------------------------------------------+
|                t                 |  popular_packages_html  |                    why_web_development                    |
+----------------------------------+-------------------------+-----------------------------------------------------------+
| 2025-01-05T18:59:27.496327-04:00 | <div class="WhyGo-re... | With enhanced memory performance and support for several  |
|                                  |                         | IDEs, Go powers fast and scalable web applications.       |
+----------------------------------+-------------------------+-----------------------------------------------------------+

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL