$h=HTML::Clean->new($dataorfile,[$level]);
This creates a new HTML::Clean object. A Prerequisite for all other functions in this module.
The $dataorfile parameter supplies the input HTML, either a filename, or a reference to a scalar value
holding the HTML, for example:
$h = HTML::Clean->new("/htdocs/index.html");
$html = "<strong>Hello!</strong>";
$h = HTML::Clean->new(\$html);
An optional 'level' parameter controls the level of optimization performed. Levels range from 1 to 9.
Level 1 includes only simple fast optimizations. Level 9 includes all optimizations.
$h->initialize($dataorfile)
This function allows you to reinitialize the HTML data used by the current object. This is useful if you
are processing many files.
$dataorfile has the same usage as the new method.
Return 0 for an error, 1 for success.
$h->level([$level])
Get/set the optimization level. $level is a number from 1 to 9.
$myref=$h->data()
Returns the current HTML data as a scalar reference.
strip(\%options);
Removes excess space from HTML
You can control the optimizations used by specifying them in the %options hash reference.
The following options are recognized:
boolean values (0 or 1 values)
whitespace Remove excess whitespace
shortertags <strong> -> <b>, etc..
blink No blink tags.
contenttype Remove default contenttype.
comments Remove excess comments.
entities " -> ", etc.
dequote remove quotes from tag parameters where possible.
defcolor recode colors in shorter form. (#ffffff -> white, etc.)
javascript remove excess spaces and newlines in javascript code.
htmldefaults remove default values for some html tags
lowercasetags translate all HTML tags to lowercase
parameterized values
meta Takes a space separated list of meta tags to remove,
default "GENERATOR FORMATTER"
emptytags Takes a space separated list of tags to remove when there is no
content between the start and end tag, like this: <b></b>.
The default is 'b i font center'
Please note that if your HTML includes preformatted regions (this means, if it includes <pre>...</pre>,
we do not suggest removing whitespace, as it will alter the rendered defaults.
HTML::Clean will print out a warning if it finds a preformatted region and is requested to strip
whitespace. In order to prevent this, specify that you don't want to strip whitespace - i.e.
$h->strip( {whitespace => 0} );
compat()
This function improves the cross-platform compatibility of your HTML. Currently checks for the following
problems:
Insuring all IMG tags have ALT elements.
Use of Arial, Futura, or Verdana as a font face.
Positioning the <TITLE> tag immediately after the <head> tag.
defrontpage();
This function converts pages created with Microsoft Frontpage to something a Unix server will understand
a bit better. This function currently does the following:
Converts Frontpage 'hit counters' into a unix specific format.
Removes some frontpage specific html comments