Rewrite rules in Apache Knox can be difficult to follow if you are just starting to use Apache Knox, this blog tries to cover the basics of Apache Knox rewrite rules and then go in depth and talk about more advanced rules and how to use them. This blog builds upon the Adding a service to Apache Knox by Kevin Minder
Rules are defined in the rewrite.xml file, an example is:
<rules> <rule dir="IN" name="WEATHER/weather/inbound" pattern="*://*:*/**/weather/{path=**}?{**}"> <rewrite template="{$serviceUrl[WEATHER]}/{path=**}?{**}"/> </rule> </rules> |
A sample service.xml entry
<service role="WEATHER" name="weather" version="0.0.1"> <routes> <route path="/weather/**"/> </routes> </service> |
service.xml file defines the high level URL pattern that will be exposed by the gateway for a service.
<service role="WEATHER"> |
<service name="weather"> |
<service version="0.0.1"> |
<service><routes><route path="/weather/**"></routes></service> |
**
means zero or more paths similar to Ant.<rules><rule pattern="*://*:*/**/weather/{path=**}?{**}"/></rules> |
<rules><rule><rewrite template="{$serviceUrl[WEATHER]}/{path=**}?{**}"/></rules> |
Rewrites rules can be global and local to the service they are defined in. After Apache Knox 0.6.0 all the rewrites rules are local unless they are explicitly defined as global.
To define global rules use the property 'gateway.global.rules.services' in 'gateway-site.xml' that takes a list of services whose rewrite rules are made global. for. e.g.
<property> <name>gateway.global.rules.services</name> <value>"NAMENODE","JOBTRACKER", "WEBHDFS", "WEBHCAT", "OOZIE", "WEBHBASE", "HIVE", "RESOURCEMANAGER"</value> </property> |
Note: Rewrite rules rules for these services "NAMENODE","JOBTRACKER", "WEBHDFS", "WEBHCAT", "OOZIE", "WEBHBASE", "HIVE", "RESOURCEMANAGER" are global by default.
If you want to define a single rule to be scoped inside a global rewrite rules you can do so by using the attribute 'scope' e.g.
<!-- Limit the scope of this rule just to WEBHDFS service --> <rule dir="OUT" scope="WEBHDFS" name="WEBHDFS/webhdfs/outbound" pattern="hdfs://*:*/{path=**}?{**}"> <rewrite template="{$frontend[url]}/webhdfs/v1/{path=**}?{**}"/> </rule> |
Rewrite rules can be applied to inbound (requests going to the Gateway - from browser, curl etc.) or outbound (response going from the Gateway towards browser) requests/responses. The direction is indicated by the "dir" attribute
<rule dir="IN"> |
The possible values are IN and OUT for inbound and outbound requests.
Flows are the logical AND, OR, ALL operators on the rules. So, a rewrite rule could match a pattern A OR pattern B, a rule could match a pattern A AND pattern B, a rule could match ALL the given patterns.
Valid flow values are:
e.g. OR (match )
<rule name="test-rule-with-complex-flow" flow="OR"> <match pattern="*://*:*/~/{path=**}?{**}"> <rewrite template="test-scheme-output://test-host-output:777/test-path-output/test-home/{path}?{**}"/> </match> <match pattern="*://*:*/{path=**}?{**}"> <rewrite template="test-scheme-output://test-host-output:42/test-path-output/{path}?{**}"/> </match> </rule> |
These variables can be used with the rewrite function.
Username of authenticated user
<rule name="OOZIE/oozie/user-name"> <rewrite template="{$username}"/> </rule> |
$inboundurl
<rule dir="OUT" name="NODEUI/node/static" pattern="/static/{**}"> <rewrite template="{$frontend[url]}/node/static/{**}?host={$inboundurl[host]}"/> </rule> |
$serviceAddr
<rule name="hdfs-addr"> <rewrite template="hdfs://{$serviceAddr[NAMENODE]}"/> </rule> |
$serviceHost
<rule name="nn-host"> <rewrite template="{$serviceHost[NAMENODE]}"/> </rule> |
$serviceMappedAddr
<rule name="OOZIE/oozie/name-node-url"> <rewrite template="hdfs://{$serviceMappedAddr[NAMENODE]}"/> </rule> |
$serviceMappedHost
$serviceMappedUrl
<match pattern="{path=**}"> <rewrite template="{$serviceMappedUrl[NAMENODE]}/{path=**}"/> </match> |
$servicePath
<rule name="nn-path"> <rewrite template="{$servicePath[NAMENODE]}"/> </rule> |
$servicePort
<rule name="hdfs-path"> <match pattern="{path=**}"/> <rewrite template="hdfs://{$serviceHost[NAMENODE]}:{$servicePort[NAMENODE]}/{path=**}"/> </rule> |
$serviceScheme
<rule dir="IN" name="NODEUI/logs" pattern="*://*:*/**/node/logs/?{host}?{port}"> <rewrite template="{$serviceScheme[NODEUI]}://{host}:{port}/logs/"/> </rule> |
$import - This function enhances the $frontend function by adding '@import' prefix to the $frontend path. e.g.
<rewrite template="{$import[", url]}/stylesheets/pretty.css";"/> |
. It takes following parameters as options:
$username - This variable is used when we need to get the impersonated principal name (primary principal in case impersonated principal is absent).
<rewrite template="test-output-scheme://{host}:{port}/test-output-path/{path=**}?user.name={$username}?{**}?test-query-output-name=test-query-output-value"/> |
$prefix - This function enhances the $frontend function just like $import but gives the ability to choose a prefix (unlike a constant @import in case of $import) e.g.
<rewrite template="{$prefix[',url]}/zeppelin/components/{**}?{**}"/> |
zeppelin/components/navbar/navbar.html?v=1498928142479' (mind the single tick ' )
$postfix - Just like prefix, postfix function is used to append a character or string to the gateway url (including topology path)
usage - {$postfix[url,<customString>]}
<rewrite template="{scheme}://{host}:{port}/{gateway}/{knoxsso}/{api}/{v1}/{websso}?originalUrl={$postfix[url,/sparkhistory/]}"/> |
usage - {$infix[<customString>,url,<customString>]}
<rewrite template="{scheme}://{host}:{port}/{gateway}/{sandbox}/?query={$infix[',url,/sparkhistory/']}"/> |
The purpose of the Hostmap provider is to handle situations where host are known by one name within the cluster and another name externally. This frequently occurs when virtual machines are used and in particular when using cloud hosting services. Currently, the Hostmap provider is configured as part of the topology file.
For more information see knox user guide
Rewrite rule example:
<rewrite template="{gateway.url}/hdfs/logs?{scheme}?host={$hostmap(host)}?{port}?{**}"/> |
Topology declaration example
<topology> <gateway> ... <provider> <role>hostmap</role> <name>static</name> <enabled>true</enabled> <param><name>external-host-name</name><value>internal-host-name</value></param> </provider> ... </gateway> ... </topology> |
Only used by outbound rules
<rewrite template="{gateway.url}/datanode/static/{**}?host={$inboundurl[host]}"/> |
Sometimes you want the ability to rewrite the *.js, *.css and other non-html pages. FIlters are a way to rewrite these non-html files. FIlters are based on the content-type of the page.
These are the different types of filters that are supported by Apache Knox.
There are three declarations needed for filters,
The is an example of Filters used in Proxying Zeppelin UI, the relevant code snippets in rewrite.xml and service.xml files are:
<!-- Filters --> <rule dir="OUT" name="ZEPPELINUI/zeppelin/outbound/javascript/filter/app/home" > <rewrite template="{$frontend[path]}/zeppelin/app/home/home.html"/> </rule> <rule dir="OUT" name="ZEPPELINUI/zeppelin/outbound/javascript/filter/app/notebook" > <rewrite template="{$frontend[path]}/zeppelin/app/notebook/notebook.html"/> </rule> <rule dir="OUT" name="ZEPPELINUI/zeppelin/outbound/javascript/filter/app/jobmanager" > <rewrite template="{$frontend[path]}/zeppelin/app/jobmanager/jobmanager.html"/> </rule> <filter name="ZEPPELINUI/zeppelin/outbound/javascript/filter"> <content type="application/javascript"> <apply path="app/home/home.html" rule="ZEPPELINUI/zeppelin/outbound/javascript/filter/app/home"/> <apply path="app/notebook/notebook.html" rule="ZEPPELINUI/zeppelin/outbound/javascript/filter/app/notebook"/> <apply path="app/jobmanager/jobmanager.html" rule="ZEPPELINUI/zeppelin/outbound/javascript/filter/app/jobmanager"/> </content> </filter> |
<!-- Filter --> <route path="/zeppelin/scripts/**"> <rewrite apply="ZEPPELINUI/zeppelin/outbound/javascript/filter" to="response.body"/> </route> |
A good example of how to use the filters is Proxying a UI using Knox.
Following are the different types of Content-Types supported by Apache Knox.
Uses Content-Type "application/x-www-form-urlencoded", "*/x-www-form-urlencoded"
Uses Content-Type "application/html", "text/html", "*/html"
Uses Content-Type "application/javascript", "text/javascript", "*/javascript", "application/x-javascript", "text/x-javascript", "*/x-javascript"
Uses Content-Type "application/json", "text/json", "*/json"
Uses Content-Type "application/xml", "text/xml", "*/xml"
Pattern matching for Knox unfortunately does not match the standard Regex format. Following is how pattern matching works in some of the cases
The following format is used for parsing URIs
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9 |
The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis). We refer to the value matched for subexpression <n> as $<n>. For example, matching the above expression to results in the following subexpression matches:
$1 = http:
$2 = http
$3 = //www.ics.uci.edu
$4 = www.ics.uci.edu
$5 = /pub/ietf/uri/
$6 = <undefined>
$7 = <undefined>
$8 = #Related
$9 = Related
where <undefined> indicates that the component is not present, as is the case for the query component in the above example. Therefore, we can determine the value of the five components as
scheme = $2
authority = $4
path = $5
query = $7
fragment = $9
For parsing JSON documents Knox uses JSONPATH
* http://www.ics.uci.edu/pub/ietf/uri/#Related