BasicUrlNormalizer Notes
The Basic URL Normalizer class manipulates an URL in several ways.
- Trims white space from the end of the URL. (java.lang.String.trim())
- may lower case protocol. (java.net.URL)
- if protocol is http or ftp:
- lower cases host.
- removes port if default.
- adds trailing slash if no file specified.
- removes any refrence text
- removes any relative paths
For example:
http://wiKI.apache.ORG:80/somedirectory/../DevelopmentCommandLineOptions
would be rewriten:
http://wiki.apache.org/DevelopmentCommandLineOptions
Notes
Other then trimming trailing white space and the normalization performed by java.net.URL no protocols other then http and ftp are further normalized.