logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

URI::Normalize - Normalize URIs according to RFC 3986

Author

       Andrew Sterling Hanenkamp <hanenkamp@cpan.org>

Caveats

       As RFC 3986 notes, normalization is a tool used to help identify whether one URI is equivalent to
       another. This does not, however, imply that the resources identified by two URIs that are different byte-
       for-byte but normalize to the same value will be the same. For example, the presence of "/./" in the path
       might be significant to an implementation, or using octet-encoding (e.g., "%3a") instead of the character
       represented might represent a different actual resource. So use normalization judiciously.

       This implementation of normalization is far from comprehensive. There are many normalizations you may
       wish to perform. In that case, you may want to look into URL::Normalize, which provides a more
       comprehensive list of normalizations, some of which go against the letter of RFC 3986, but can be
       valuable in certain applications.

       This implementation does not include the full gamut of what is keeping with the letter of RFC 3986, but
       might be expanded to include additional normalizations in the future.

       If you know of a normalization that could be implemented here: patches welcome.

Description

       Section 6 of RFC 3986 describes a process of URI normalization. This implements syntax-based
       normalization and may include some schema-based and protocol-based normalization. This includes
       implementing the "remove_dot_segments" algorithm described in Section 5.2.3 of the RFC.

       This has a number of useful applications in allowing URIs to be compared with fewer false negatives. For
       example, all of the following URIs will normalize to the same value:

           HTTPS://www.example.com:443/../test/../foo/index.html
           https://WWW.EXAMPLE.COM/./foo/index.html
           https://www.example.com/%66%6f%6f/index.html
           https://www.example.com/foo/index.html

       That is, they will all be normalized into the last value.

Name

       URI::Normalize - Normalize URIs according to RFC 3986

Subroutines

normalize_uri
           $normal_uri = normalize_uri($uri);
           $normal_uri = normalize_uri($str);

       Given a URI object or a string, this routine basically just calls the "canonical" method and
       "remove_dot_segments" on the URI and returns the result as a URI object.

       The original URI is left unchanged.

   remove_dot_segments
           $clean_path_uri = remove_dot_segments($uri);
           $clean_path_uri = remove_dot_segments($str);

       Given a URI object or a string, this routine will remove dot segments (i.e, "."  and "..") from the path
       of the URI.

Synopsis

           use URI;
           use URI::Normalize qw( normalize_uri remove_dot_segments );
           my $uri = URI->new('HTTPS://www.Example.com:443/../test/../foo/index.html');

           say normalize_uri($uri);       #> https://www.example.com/foo/index.html
           say remove_dot_segments($uri); #> HTTPS://www.Example.com:443/foo/index.html

Version

       version 0.002

See Also