logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

Html2Wml -- Program that can convert HTML pages to WML pages

Acknowledgements

       Werner Heuser, for his numerous ideas, advices and his help for the debugging

       Igor Khristophorov, for his numerous suggestions and patches

       And all the people that send me bug reports: Daniele Frijia, Axel Jerabek, Ouyang

Actions

       Actions are a feature similar to (but with far less functionalities!)  the  SSI  (Server  Side  Includes)
       available  on  good  servers  like  Apache.  In order not to interfere with the real SSI, but to keep the
       syntax easy to learn, it differs in very few points.

       Syntax

       Basically, the syntax to execute an action is:

           <!-- [action param1="value" param2='value'] -->

       Note that the angle brackets are part of the syntax. Except  for  that  point,  Actions  syntax  is  very
       similar to SSI syntax.

       Availableactions

       Only few actions are currently available, but more can be implemented on request.

       include

           Description
                   Includes a file in the document at the current point. Please note that Html2Wml doesn't check
                   nor  parse  the  file,  and  if the file cannot be found, will silently die (this is the same
                   behavior as SSI).

           Parameters
                   `virtual=url' -- The file is get by http.

                   `file=path' -- The file is read from the local disk.

       fsize

           Description
                   Returns the size of a file at the current point of the document.

           Parameters
                   `virtual=url' -- The file is get by http.

                   `file=path' -- The file is read from the local disk.

           Notes   If you use the file parameter, an absolute path is recommend.

       skip

           Description
                   Skips everything until the first `end_skip' action.

       Genericparameters

       The following parameters can be used for any action.

       for=outputformat
           This parameter restricts the action for the given  output  format.   Currently,  the  only  available
           format is "`wml'" (when using `html2chtml' the format is "`chtml'").

       Examples

       If you want to share a navigation bar between several WML pages, you can `include' it this way:

           <!-- [include virtual="nav.wml"] -->

       Of course, you have to write this navigation bar first :-)

       If  you  want  to use your current HTML pages for creating your WML pages, but that they contains complex
       tables, or unnecessary navigation tables, etc, you can simply `skip' the complex parts and keep the rest.

           <body>
           <!--[skip for="wml"]-->
           unnecessary parts for the WML pages
           <!--[end_skip]-->
           useful parts for the WML pages
           </body>

Author

       Sebastien Aperghis-Tramoni <sebastien@aperghis.net<gt>

Caveats

       Html2Wml  tries  to  make correct WML documents, but the well-formedness and the validity of the document
       are not guarantied.

       Inverted tags (like "<b>bold <i>italic</b></i>") may produce unexpected results. But only bad software do
       bad stuff like this.

Deck Slicing

       The  deckslicing  is  a feature that Html2Wml provides in order to match the low memory capabilities of
       most Wap devices. Many can't  handle  cards  larger  than  2,000  bytes,  therefore  the  cards  must  be
       sufficiently  small  to  be viewed by all Wap devices. To achieve this, you should compile your WML deck,
       which reduce the size of the deck by 50%, but even then your cards may be too big. This is where Html2Wml
       comes with the deck slicing feature. This allows you to limit the  size  of  the  cards,  currently  only
       before the compilation stage.

       Slicebycardsorbydecks

       On some Wap phones, slicing the deck is not sufficient: the WML browser still tries to download the whole
       deck  instead  of just picking one card at a time. A solution is to slice the WML document by decks.  See
       the figure below.

            _____________          _____________
           ⎪    deck     ⎪        ⎪   deck #1   ⎪
           ⎪  _________  ⎪        ⎪  _________  ⎪
           ⎪ ⎪ card #1 ⎪ ⎪        ⎪ ⎪  card   ⎪ ⎪
           ⎪ ⎪_________⎪ ⎪        ⎪ ⎪_________⎪ ⎪
           ⎪  _________  ⎪        ⎪_____________⎪
           ⎪ ⎪ card #2 ⎪ ⎪
           ⎪ ⎪_________⎪ ⎪             . . .
           ⎪  _________  ⎪
           ⎪ ⎪   ...   ⎪ ⎪         _____________
           ⎪ ⎪_________⎪ ⎪        ⎪   deck #n   ⎪
           ⎪  _________  ⎪        ⎪  _________  ⎪
           ⎪ ⎪ card #n ⎪ ⎪        ⎪ ⎪  card   ⎪ ⎪
           ⎪ ⎪_________⎪ ⎪        ⎪ ⎪_________⎪ ⎪
           ⎪_____________⎪        ⎪_____________⎪

             WML document           WML document
           sliced by cards        sliced by decks

       What this means is that Html2Wml generates several WML documents.  In CGI mode, only the appropriate deck
       is sent, selected by the id given in parameter. If no id was given, the first deck is sent.

       Noteonsizecalculation

       Currently, Html2Wml estimates the size of the card on the fly, by summing the length of the strings  that
       compose  the  WML  output,  texts  and tags. I say "estimates" and not "calculates" because computing the
       exact size would require many more calculations than the way it is done now.  One may objects that  there
       are only additions, which is correct, but knowing the exact size is not necessary. Indeed, if you compile
       the WML, most of the strings of the tags will be removed, but not all.

       For  example,  take  an image tag: `<img src="images/dog.jpg" alt="Photo of a dog">'.  When compiled, the
       string `"img"' will be replaced by a one byte value.  Same thing for the strings `"src"' and `"alt"', and
       the spaces, double quotes and equal signs will be stripped. Only the text between double  quote  will  be
       preserved... but not in every cases.  Indeed, in order to go a step further, the compiler can also encode
       parts  of  the  arguments  as binary. For example, the string `"http://www."'  can be encoded as a single
       byte (`8F' in this case). Or, if the attribute is `href', the string `href="http://' can become the  byte
       `4B'.

       As  you see, it doesn't matter to know exactly the size of the textual form of the WML, as it will always
       be far superior to the size of the compiled form. That's why I don't count all the characters that may be
       actually written.

       Also, it's because I'm quite lazy ;-)

       WhycompilingtheWMLdeck?

       If you intent to create real WML pages, you should really consider to always compile them. If you're  not
       convinced, here is an illustration.

       Take the following WML code snipet:

           <a href='http://www.yahoo.com/'>Yahoo!</a>

       It's  the  basic  and  classical  way to code an hyperlink. It takes 42 bytes to code this, because it is
       presented in a human-readable form.

       The WAP Forum has defined a compact binary representation of WML in its specification,  which  is  called
       "compiled  WML".  It's  a  binary format, therefore you, a mere human, can't read that, but your computer
       can. And it's much faster for it to read a binary format than to read a textual format.

       The previous example would be, once compiled (and printed here as hexadecimal):

           1C 4A 8F 03 y a h o o 00 85 01 03 Y a h o o ! 00 01

       This only takes 21 bytes. Half the size of the human-readable form.  For a Wap device,  this  means  both
       less  to download, and easier things to read. Therefore the processing of the document can be achieved in
       a short time compared to the tectual version of the same document.

       There is a last argument, and not the less important: many Wap devices only read binary WML.

Description

       Html2Wml  converts HTML pages to WML decks, suitable for being viewed on a Wap device. The program can be
       launched from a shell to statically convert a set  of  pages,  or  as  a  CGI  to  convert  a  particular
       (potentially dynamic) HTML resource.

       Althought  the  result is not guarantied to be valid WML, it should be the case for most pages. Good HTML
       pages will most probably produce valid WML decks. To check and correct your  pages,  you  can  use  W3C's
       software:  the HTMLValidator, available online at http://validator.w3.org and HTMLTidy, written by Dave
       Raggett.

       Html2Wml provides the following features:

       •   translation of the links

       •   limitation of the cards size by splitting the result into several cards

       •   inclusion of files (similar to the SSI)

       •   compilation of the result (using the WML Tools, see the section on "LINKS")

       •   a debug mode to check the result using validation functions

Name

       Html2Wml -- Program that can convert HTML pages to WML pages

Options

       Please note that most of these options are also available when calling Html2Wml as a CGI. In  this  case,
       boolean  options  are given the value "1" or "0", and other options simply receive the value they expect.
       For example, `--ascii' becomes `?ascii=1' or `?a=1'. See the file t/form.html for an example  on  how  to
       call Html2Wml as a CGI.

       ConversionOptions

       -a, --ascii
           When  this  option  is  on,  named  HTML  entities and non-ASCII characters are converted to US-ASCII
           characters using the same 7 bit approximations as Lynx. For example, `&copy;' is translated to "(c)",
           and `&szlig;' is translated to "ss". This option is off by default.

       --[no]collapse
           This option tells Html2Wml to collapse redundant whitespaces, tabulations,  carriage  returns,  lines
           feeds  and  empty  paragraphs. The aim is to reduce the size of the WML document as much as possible.
           Collapsing empty paragraphs is necessary for two reasons. First, this avoids empty screens (and on  a
           device with only 4 lines of display, an empty screen can be quite ennoying). Second, Html2wml creates
           many  empty  paragraphs  when  converting, because of the way the syntax reconstructor is programmed.
           Deleting these empty paragraphs is necessary like cleaning the kitchen :-)

           If this really bother you, you can deactivate this behaviour with the --nocollapse option.

       --ignore-images
           This option tells Html2Wml to completely ignore all image links.

       --[no]img-alt-text
           This option tells Html2Wml to replace the image tags with their corresponding  alternative  text  (as
           with a text mode web browser).  This option is on by default.

       --[no]linearize
           This  option is on by default. This makes Html2Wml flattens the HTML tables (they are linearized), as
           Lynx does. I think this is better than trying  to  use  the  native  WML  tables.  First,  they  have
           extremely  limited  features  and possibilities compared to HTML tables. In particular, they can't be
           nested. In fact this is normal because Wap devices are not supposed to have a big CPU running at some
           zillions-hertz, and the calculations needed to render the tables are the most  complicated  and  CPU-
           hogger part of HTML.

           Second,  as  they  can't be nested, and as typical HTML pages heavily use imbricated tables to create
           their layout, it's impossible to decide which one could be kept. So the best thing is to keep none of
           them.

           [Note] Although you can deactivate this behaviour, and although there is internal support for tables,
           the unlinearized mode has not been heavily tested with nested tables, and it may  produce  unexpected
           results.

       -n, --numeric-non-ascii
           This option tells Html2wml to convert all non-ASCII characters to numeric entities, i.e., "e" becomes
           `&#233;', and "ss" becomes `&#223;'.  By default, this option is off.

       -p, --nopre
           This options tells Html2Wml not to use the <pre> tag. This option was added because the compiler from
           WML Tools 0.0.4 doesn't support this tag.

       LinksReconstructionOptions

       --hreftmpl=TEMPLATE
           This  options  sets  the  template  that  will  be used to reconstruct the `href'-type links. See the
           section on "LINKS RECONSTRUCTION" for more information.

       --srctmpl=TEMPLATE
           This option sets the template that will be used to reconstruct the `src'-type links. See the  section
           on "LINKS RECONSTRUCTION" for more information.

       SplittingOptions

       -s, --max-card-size=SIZE
           This  option  allows you to limit the size (in bytes) of the generated cards. Default is 1,500 bytes,
           which should be small enough to be loaded on most Wap devices. See the section on "DECK SLICING"  for
           more information.

       -t, --card-split-threshold=SIZE
           This  option sets the threshold of the split event, which can occur when the size of the current card
           is between `max-card-size' - `card-split-threshold' and `max-card-size'. Default value is 50. See the
           section on "DECK SLICING" for more information.

       --next-card-label=STRING
           This options sets the label of the link that points to the next card.  Default is "[&gt;&gt;]", which
           whill be rendered as "[>>]".

       --prev-card-label=STRING
           This options sets the label of the link that points to the previous card.  Default  is  "[&lt;&lt;]",
           which whill be rendered as "[<<]".

       HTTPAuthentication

       -U, --http-user=USERNAME
           Use this option to set the username for an authenticated request.

       -P, --http-passwd=PASSWORD
           Use this option to set the password for an authenticated request.

       ProxySupport

       -[no]Y, --[no]proxy
           Use this option to activate proxy support. By default, proxy support is activated. See the section on
           "PROXY SUPPORT".

       OutputOptions

       -k, --compile
           Setting this option tells Html2Wml to use the compiler from WML Tools to compile the WML deck. If you
           want  to  create a real Wap site, you should seriously use this option in order to reduce the size of
           the WML decks.  Remember that Wap devices have very little amount of memory. If this is  not  enough,
           use the splitting options.

           Take a look in wml_compilation/ for more information on how to use a WML compiler with Html2Wml.

       -o, --output
           Use this option (in shell mode) to specify an output file.  By default, Html2Wml prints the result to
           standard output.

       DebuggingOptions

       -d, --debug[=LEVEL]
           This  option activates the debug mode. This prints the output result with line numbering and with the
           result of the XML check. If the WML compiler was called, the result is also printed in hexadecimal an
           ascii forms. When called as a CGI, all of this is printed as HTML, so that can use  any  web  browser
           for that purpose.

       --xmlcheck
           When this option is on, it send the WML output to XML::Parser to check its well-formedness.

Proxy Support

       Html2Wml  uses  LWP built-in proxy support. It is activated by default, and loads the proxy settings from
       the environment variables, using the same variables as many others programs. Each  protocol  (http,  ftp,
       etc)  can  be  mapped to use a proxy server by setting a variable of the form `PROTOCOL_proxy'.  Example:
       use `http_proxy' to define the proxy for http access, `ftp_proxy' for ftp access. In the shell,  this  is
       only a matter of defining the variable.

       For Bourne shell:

           $ export http_proxy="http://proxy.domain.com:8080/"

       For C-shell:

           % setenv http_proxy "http://proxy.domain.com:8080/"

       Under Apache, you can add this directive to your configuration file:

           SetEnv http_proxy "http://proxy.domain.com:8080"

       but this has the default that another CGI, or another program, can use this to access external resources.
       A better way is to edit Html2Wml and fill the option `proxy-server' with the appropriate value.

Synopsis

       Html2Wml can be used as either a shell command:

         $ html2wml file.html

       or as a CGI:

         /cgi-bin/html2wml.cgi?url=/index.html

       In both cases, the file can be either a local file or a URL.

See Also