README
¶
htmlq v1.1.1
htmlq is a command-line tool that allows you to query HTML using CSS selectors or XPATH and retrieve the corresponding text content (similar to JavaScript's document.querySelector(query).textContent).
Installation
go install github.com/avan06/htmlq@latest
Usage
usage: htmlq 1.1.1 [-h|--help] [-f|--file "<value>"] [-t|--text "<value>"]
[-u|--url "<value>"] [-x|--XPATH] [-a|--SelectorAll]
[-r|--ResultAsNode] [--Trim] [--RemoveNewLine]
[--RemoveConsecutiveWhiteChars] [-l|--PrintLastResult]
[--PrintLastResultTemp] [--PrintQuerysInAllResult]
[--PrintTextInAllResult "<value>"] [-H|--headers "<value>"
[-H|--headers "<value>" ...]] [-v|--verbose] -q|--querys
"<value>" [-q|--querys "<value>" ...]
A command-line tool that allows you to query HTML using CSS
selectors or XPATH and retrieve the corresponding text
content (similar to JavaScript's
`document.querySelector(query).textContent`)
Arguments:
-h --help Print help information
-f --file Enter the relative or absolute path of the
HTML file
-t --text Enter the HTML text content
-u --url Enter the URL of the HTML
-x --XPATH Enable default XPATH query syntax instead
of CSS Selectors. Default: false
-a --SelectorAll Enable the SelectorAll mechanism. When
there are multiple results for a single
query, it will return all the results
(similar to querySelectorAll). By default,
it is not enabled, and only a single
result is returned (similar to
querySelector).. Default: false
-r --ResultAsNode Enable using the Node from the previous
query result as the current query's root
Node. Default: false
--Trim trim the result text
--RemoveNewLine remove new line in the result text
--RemoveConsecutiveWhiteChars remove consecutive white characters in the
result text
-l --PrintLastResult Enable printing the content of the last
result in the output when using the
"#lastresult" syntax in the query.
Default: false
--PrintLastResultTemp Enable printing the temporary content of
the source data as last result in the
output, when using the "#lastresult"
syntax in query. Default: false
--PrintQuerysInAllResult Enable printing of query conditions in
results.. Default: false
--PrintTextInAllResult Enter a text content for printing in all
results. If the content contains "#{serial
number}", the content of the specified
serial number's result will be
automatically printed in all results
following that serial number.
-H --headers When the query item is a table or multiple
td fields, you can enter corresponding
names for each individual field in a
single query using the format "#{serial
number}:header1Name;header2Name;header3Name;...",
where the serial number represents the Nth
query starting from zero.
-v --verbose verbose
-q --querys Enter a query value to retrieve the Node
Text of the target HTML. The default query
method is CSS Selectors, but it can be
changed to XPATH using "-x" or "--XPATH".
Both query methods can be mixed. When a
query starts with "C:", it represents CSS
Selectors, and when it starts with "X:",
it represents XPATH.
You can enter multiple query values separated by spaces and return all the
results.
* When the "--ResultNode" is enabled, you can use the following special query
values to change the current position of
the HTML Node:
- Parent: Move the Node up one level
- NextSibling: Move the Node to the next sibling in the same level
- PrevSibling: Move the Node to the previous sibling in the same level
- FirstChild: Move the Node to the first child in the same level
- LastChild: Move the Node to the last child in the same level
- reset: Restore the Node to the root node of the original input
* When using the "#lastresult" query syntax, the text content of the previous
query will automatically replace
"#lastresult".
Example: --ResultNode -q Query1 Parent Query2 NextSibling Query3 LastChild
C:Query4 reset X:Query5
Examples
Note: In command-line, to break a line in Linux you use "\", while in Windows you use "^".
htmlq --XPATH --SelectorAll -vv --PrintLastResult
-t "<table class='ws-table-all notranslate'><tbody><tr><th>Tag</th><th>Description</th></tr><tr><td><table></td><td>Defines a table</td></tr><tr><td><th></td><td>Defines a header cell in a table</td></tr><tr><td><tr></td><td>Defines a row in a table</td></tr><tr><td><td></td><td>Defines a cell in a table</td></tr><tr><td><caption></td><td>Defines a table caption</td></tr><tr><td><colgroup></td><td>Specifies a group of one or more columns in a table for formatting</td></tr><tr><td><col></td><td>Specifies column properties for each column within a colgroup element</td></tr><tr><td><thead></td><td>Groups the header content in a table</td></tr><tr><td><tbody></td><td>Groups the body content in a table</td></tr><tr><td><tfoot></td><td>Groups the footer content in a table</td></tr></tbody></table>"
--headers "#1:Tag1;Description1" --headers "#3:Tag3;Description3"
-q "//table[contains(@class,'ws-table-all')]//tr[5]"
"C:table.ws-table-all tr:nth-child(5)"
"C:table.ws-table-all tr"
"//table[contains(@class,'ws-table-all')]//tr[not(th)]"
"//table[contains(@class,'ws-table-all')]//tr/td[1]"
"//table[contains(@class,'ws-table-all')]//tr[contains(.,'#lastresult')]/td[2]"
0: //table[contains(@class,'ws-table-all')]//tr[5]
0-0:
<td>
Defines a cell in a table
1: table.ws-table-all tr:nth-child(5)
1-0:
Tag1:<td>
Description1:Defines a cell in a table
2: table.ws-table-all tr
2-0:
Tag
Description
2-1:
<table>
Defines a table
2-2:
<th>
Defines a header cell in a table
2-3:
<tr>
Defines a row in a table
2-4:
<td>
Defines a cell in a table
2-5:
<caption>
Defines a table caption
2-6:
<colgroup>
Specifies a group of one or more columns in a table for formatting
2-7:
<col>
Specifies column properties for each column within a colgroup element
2-8:
<thead>
Groups the header content in a table
2-9:
<tbody>
Groups the body content in a table
2-10:
<tfoot>
Groups the footer content in a table
3: //table[contains(@class,'ws-table-all')]//tr[not(th)]
3-0:
Tag3:<table>
Description3:Defines a table
3-1:
Tag3:<th>
Description3:Defines a header cell in a table
3-2:
Tag3:<tr>
Description3:Defines a row in a table
3-3:
Tag3:<td>
Description3:Defines a cell in a table
3-4:
Tag3:<caption>
Description3:Defines a table caption
3-5:
Tag3:<colgroup>
Description3:Specifies a group of one or more columns in a table for formatting
3-6:
Tag3:<col>
Description3:Specifies column properties for each column within a colgroup element
3-7:
Tag3:<thead>
Description3:Groups the header content in a table
3-8:
Tag3:<tbody>
Description3:Groups the body content in a table
3-9:
Tag3:<tfoot>
Description3:Groups the footer content in a table
4: //table[contains(@class,'ws-table-all')]//tr/td[1]
4-0:
<table>
4-1:
<th>
4-2:
<tr>
4-3:
<td>
4-4:
<caption>
4-5:
<colgroup>
4-6:
<col>
4-7:
<thead>
4-8:
<tbody>
4-9:
<tfoot>
5: //table[contains(@class,'ws-table-all')]//tr[contains(.,'#lastresult')]/td[2]
5-0:
<table>
Defines a table
5-1:
<th>
Defines a header cell in a table
5-2:
<tr>
Defines a row in a table
5-3:
<td>
Defines a cell in a table
5-4:
<caption>
Defines a table caption
5-5:
<colgroup>
Specifies a group of one or more columns in a table for formatting
5-6:
<col>
Specifies column properties for each column within a colgroup element
5-7:
<thead>
Groups the header content in a table
5-8:
<tbody>
Groups the body content in a table
5-9:
<tfoot>
Groups the footer content in a table
Credits
Documentation
¶
There is no documentation for this package.
Click to show internal directories.
Click to hide internal directories.