logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

HTML::Clean - Cleans up HTML code for web browsers, not humans

Authors And Co-Authors

       Paul Lindner for the International Telecommunication Union (ITU)

       Pavel Kuptsov <admin@modernperl.ru>

Description

       The HTML::Clean module encapsulates a number of common techniques for minimizing the size of HTML files.
       You can typically save between 10% and 50% of the size of a HTML file using these methods.  It provides
       the following features:

       Remove unneeded whitespace (beginning of line, etc)
       Remove unneeded META elements.
       Remove HTML comments (except for styles, javascript and SSI)
       Replace tags with equivalent shorter tags (<strong> --> <b>)
       etc.

       The entire process is configurable, so you can pick and choose what you want to clean.

Name

       HTML::Clean - Cleans up HTML code for web browsers, not humans

See Also

Modules
       FrontPage::Web, FrontPage::File

   WebSites
       Distribution Site - http://people.itu.int/~lindner/

Synopsis

         use HTML::Clean;
         $h = HTML::Clean->new($filename); # or..
         $h = HTML::Clean->new($htmlcode);

         $h->compat();
         $h->strip();
         $data = $h->data();
         print $$data;

The Html::Clean Class

$h=HTML::Clean->new($dataorfile,[$level]);
       This creates a new HTML::Clean object.  A Prerequisite for all other functions in this module.

       The  $dataorfile  parameter  supplies the input HTML, either a filename, or a reference to a scalar value
       holding the HTML, for example:

         $h = HTML::Clean->new("/htdocs/index.html");
         $html = "<strong>Hello!</strong>";
         $h = HTML::Clean->new(\$html);

       An optional 'level' parameter controls the level of optimization performed.  Levels range from  1  to  9.
       Level 1 includes only simple fast optimizations.  Level 9 includes all optimizations.

   $h->initialize($dataorfile)
       This function allows you to reinitialize the HTML data used by the current object.  This is useful if you
       are processing many files.

       $dataorfile has the same usage as the new method.

       Return 0 for an error, 1 for success.

   $h->level([$level])
       Get/set the optimization level.  $level is a number from 1 to 9.

   $myref=$h->data()
       Returns the current HTML data as a scalar reference.

   strip(\%options);
       Removes excess space from HTML

       You can control the optimizations used by specifying them in the %options hash reference.

       The following options are recognized:

       boolean values (0 or 1 values)
                 whitespace    Remove excess whitespace
                 shortertags   <strong> -> <b>, etc..
                 blink         No blink tags.
                 contenttype   Remove default contenttype.
                 comments      Remove excess comments.
                 entities      &quot; -> ", etc.
                 dequote       remove quotes from tag parameters where possible.
                 defcolor      recode colors in shorter form. (#ffffff -> white, etc.)
                 javascript    remove excess spaces and newlines in javascript code.
                 htmldefaults  remove default values for some html tags
                 lowercasetags translate all HTML tags to lowercase

       parameterized values
                 meta        Takes a space separated list of meta tags to remove,
                             default "GENERATOR FORMATTER"

                 emptytags   Takes a space separated list of tags to remove when there is no
                             content between the start and end tag, like this: <b></b>.
                             The default is 'b i font center'

       Please  note  that if your HTML includes preformatted regions (this means, if it includes <pre>...</pre>,
       we do not suggest removing whitespace, as it will alter the rendered defaults.

       HTML::Clean will print out a warning if it  finds  a  preformatted  region  and  is  requested  to  strip
       whitespace. In order to prevent this, specify that you don't want to strip whitespace - i.e.

         $h->strip( {whitespace => 0} );

   compat()
       This function improves the cross-platform compatibility of your HTML.  Currently checks for the following
       problems:

       Insuring all IMG tags have ALT elements.
       Use of Arial, Futura, or Verdana as a font face.
       Positioning the <TITLE> tag immediately after the <head> tag.

   defrontpage();
       This  function converts pages created with Microsoft Frontpage to something a Unix server will understand
       a bit better.  This function currently does the following:

       Converts Frontpage 'hit counters' into a unix specific format.
       Removes some frontpage specific html comments

See Also