cloudcat

command module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 10, 2023 License: AGPL-3.0 Imports: 1 Imported by: 2

README

Cloudcat

GitHub go.mod Go version Go Report Card GitHub
Cloudcat is a tool for extracting structured data from websites using YAML configuration and the syntax rule is extensible.

CLI example

run analyze a model

cat << EOF | cloudcat run -m -
source:
  name: HackerNews
  http: https://news.ycombinator.com/best
  timeout: 60s
schema:
  type: array
  init:
    - gq: "#hnmain tbody -> slice(2) -> child('tr:not(.spacer,.morespace,:last-child)')"
      js: |
        content?.reduce((acc, v, i, arr) => {
          if (i % 2 === 0) {
            acc.push(arr.slice(i, i + 2).join(''));
          }
          return acc;
        }, []);
  properties:
    index:
      type: integer
      rule:
        - gq: .rank
          regex: /[^\d]/
    title: { gq: .titleline>:first-child }
    by: { gq: .hnuser }
    age: { gq: .age }
    comments:
      type: integer
      rule:
        - gq: .subline>:last-child
          regex: /[^\d]/
EOF

run a js script

cat << EOF | cloudcat run -s -
const http = require('cloudcat/http');
let res = http.get('https://news.ycombinator.com/best');
let stories = cat.getElements('gq', res.string(), "#hnmain tbody -> slice(2) -> child('tr:not(.spacer,.morespace,:last-child)')");
stories?.reduce((acc, v, i, arr) => {
    if (i % 2 === 0) {
        let item = arr.slice(i, i + 2).join('');
        let index = cat.getString('gq', item, '.rank');
        let title = cat.getString('gq', item, '.titleline>:first-child');
        let by = cat.getString('gq', item, '.hnuser');
        let age = cat.getString('gq', item, '.age');
        let comments = cat.getString('gq', item, '.subline>:last-child');
        acc.push({
            index: parseInt(index?.replace(/[^\d]+/g, ''), 10),
            title: title,
            by: by,
            age: age,
            comments: parseInt(comments?.replace(/[^\d]+/g, ''), 10)
        });
    }
    return acc;
}, []);
EOF

Documentation

See Wiki

License

cloudcat is distributed under the AGPL-3.0 license.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
Package analyzer analyze the schema
Package analyzer analyze the schema
api
Package api the api service
Package api the api service
v1
Package v1 the version 1 api
Package v1 the version 1 api
Package cache the data store
Package cache the data store
bolt
Package bolt a low-level key/value store in pure Go
Package bolt a low-level key/value store in pure Go
memory
Package memory the memory key/value store
Package memory the memory key/value store
Package cmd implements the command-line.
Package cmd implements the command-line.
core module
ctl module
Package di a simple dependencies injection
Package di a simple dependencies injection
Package fetch the http resource
Package fetch the http resource
internal
ext
Package ext the extension manager
Package ext the extension manager
js
Package js the JavaScript implementation
Package js the JavaScript implementation
common
Package common the js common
Package common the js common
modules
Package modules the JS module
Package modules the JS module
modules/cache
Package cache the cache JS implementation
Package cache the cache JS implementation
modules/cookie
Package cookie the cookie JS implementation
Package cookie the cookie JS implementation
modules/crypto
Package crypto the crypto JS implementation
Package crypto the crypto JS implementation
modules/encoding
Package encoding the encoding JS implementation
Package encoding the encoding JS implementation
modules/http
Package http the http JS implementation
Package http the http JS implementation
modulestest
Package modulestest the module test vm
Package modulestest the module test vm
jsmodules module
lib
config
Package config the configuration
Package config the configuration
consts
Package consts the standard constants
Package consts the standard constants
logger
Package logger the log helpers
Package logger the log helpers
utils
Package utils the helpers
Package utils the helpers
Package parser the schema parser
Package parser the schema parser
parsers/gq
Package gq the goquery parser
Package gq the goquery parser
parsers/js
Package js the js parser
Package js the js parser
parsers/json
Package json the json parser
Package json the json parser
parsers/regex
Package regex the regexp parser
Package regex the regexp parser
parsers/xpath
Package xpath the xpath parser
Package xpath the xpath parser
parsers module
plugin module
Package schema the data structure definition
Package schema the data structure definition

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL