hxextract - extract selected elements from a HTML or XML file
Contents
Bugs
Remote files (specified with a URL) are currently only supported for HTTP. Password-protected files or
files that depend on HTTP "cookies" are not handled. (You can use tools such as curl(1) or wget(1) to
retrieve such files.)
Description
hxextract outputs all elements with a certain name and/or class.
Input must be well-formed, since no HTML heuristics are applied.
Environment
To use a proxy to retrieve remote files, set the environment variables http_proxy and ftp_proxy. E.g.,
http_proxy="http://localhost:8080/"Name
hxextract - extract selected elements from a HTML or XML file
Operands
The following operands are supported:
element-or-class
The name of an element to extract (e.g., "H2"), or the name of a class preceded by "." (e.g.,
".example") or a combination of both (e.g., "H2.example").
file-or-URL
A file name or a URL. To read from standard input, use "-".
Options
The following options are supported:
-x Use XML format conventions.
-stext Insert text at the start of the output.
-etext Insert text at the end of the output.
-bbase URL base
-cconfigfile
Read @chapter lines from configfile (lines must be of the form "@chapter filename") and extract
elements from each of those files.
-h, -? Print command usage.
See Also
hxselect(1) 7.x 10 Jul 2011 HXEXTRACT(1)
Synopsis
hxextract [ -h | -? ] [ -x ] [ -stext ] [ -etext ] [ -bbase ] element-or-class [ -cconfigfile |
file-or-URL ]
