RewriteFilter
The task of the RewriteFilter is to manipulate the URI of the request during processing. The configuration parameter contains several regular expressions, substitution patterns and actions, as the following example shows:
<init-param>
<param-name>RequestURI</param-name>
<param-value>
^/test/proxy/(.*)$:/unsecure/$1?hallo=true:S=ProxyServlet
^/test/(.*)$:/othertest/$1?hallo=true:PT
^/test/invalidate/(.*)$:/test/$1:R
^/test/images/(.*)$:/images/$1:F
^/$:/index_generic.html:F
Condition:HTTP_USER_AGENT:
^.*MSIE.*$ ^/$:/index_msie.html:F
Condition:HTTP_USER_AGENT:
^.*Firefox.*$ ^/$:/index_firefox.html:F
</param-value>
<description>Support request routing and URL manipulation inside servlet container</description>
</init-param>
Note that it is possible to rewrite the Request URI depending on the value of some other parts of the request header. The examples below show how:
Rewriting dependent on some request headers:
<init-param>
<param-name>RequestURI</param-name>
<param-value>
^/$:/index_generic.html:F
Condition:HTTP_USER_AGENT:^.*MSIE.*$
^/$:/index_msie.html:F
Condition:HTTP_USER_AGENT:^.*Firefox.*$
^/$:/index_firefox.html:F
</param-value>
</init-param>Rewriting dependent on the remote address:
<!-- white model -->
<init-param>
<param-name>RequestURI</param-name>
<param-value>
<!-- allow -->
Condition:REMOTE_ADDR:^194\.69\.(.*)$
^.*$:$0:PT
Condition:REMOTE_ADDR:^138\.190\.(.*)$
^.*$:$0:PT
<!-- block -->
Condition:REMOTE_ADDR:^.*$
^.*$:/forbidden.html:F
</param-value>
</init-param>
<!-- black model -->
<init-param>
<param-name>RequestURI</param-name>
<param-value>
<!-- block -->
Condition:REMOTE_ADDR:^81\.221\.(.*)$
^.*$:/forbidden.html:F
</param-value>
</init-param>
Request information can be added to the substitutions of the rewriting. The syntax is similar as in the conditions:
<substitution>\[;ANY|<substitution>\]*
substitution: <source>\:<value>
<source>\:<value>\[;<source>\:<value>\]
<source>: ENV, AUTH, HEADER, PARAM, or SESSION.
Some samples:
^Cookie1$:value:^.*matching(.*)$:ENV\:HTTP_HEADER_COOKIE;_$1
https?\://www.nevis.net/?:https\://ENV\:HTTP_Host;/de
The :
after ENV
needs to be escaped because it is used as delimiter as well. With the new CIDR notation you can define conditions matching subnets, for example: Condition:REMOTE_ADDR:CIDR/10.4.1.0/22/
, which matches the address range 10.4.12.0 - 10.4.15.255 (network mask 255.255.252.0).
ch::nevis::isiweb4::filter::rewrite::RewriteFilter
libRewriteFilters.so.1
+PRUNE_ACCEPT_ENCODING
Configuration
RequestURI
Type: whitespace or newline-separated list
Syntax: <regexp>:<substitution>:<action>
Usage Constraint: optional, basic, conditional
Default: empty, no rewriting is done
A list of rewrite rules that can be made conditional. Conditions can be added according to individual rules. Thus, the request URI can be rewritten according to values from other parts of the request header.
The regular expression is executed on the URI of the request, e.g., qualified, starting with /
.
The <action>
part has the following syntax:
[PT,R,F[=<statusCode>],S=<servlet name>]
- PT (or passthrough): Replaces the URI and the query string of the request by the substitute URI and query string. However, the filter chain and target servlet are not recomputed. The regular processing goes on. That is, nevisProxy proceeds with the next filter or servlet in the chain that was selected in the beginning.
- S (or servlet): Directly calls the specified target servlet (after the manipulation of the request URI and query as described above for the PT flag). Remaining filters are ignored.
- R (or redirect): Redirects a request to the address in the substitution. A redirect breaks the current filter/servlet chain and sends a 302 HTTP redirection status code with the rewritten location information to the client. The client will then reenter nevisProxy with the rewritten URI.
- F (or file): This action is similar to the "S" or "servlet" action, but instead of addressing a servlet, the specified file is read from the "work" directory of the reverse proxy. An optional HTTP response status code may be appended to serve error pages with a status other than 200 as follows: "F=503".
- H (or home): This action is very special and is only used to enforce a specific entry URL to a content provider when a user accesses the reverse proxy for the first time or the user’s session has timed out. A redirect to the specified URL is sent to the client, which allows to enforce a homepage for the content provider or remove query arguments.
The first matching rule will be executed. All subsequent ones will be skipped.
For an application’s logout request, we recommend using the Http(s)ConnectorServlet’s configuration attributes LogoutURI and LogoutURI.Interception rather than manually adding redirects using the RequestURI attribute (for more information, see the chapter "HttpConnectorServlet" or the chapter "HttpsConnectorServlet").
RequestURL
Type: whitespace or newline-separated list
Syntax: <regexp>:<substitution>:<action>
Usage Constraint: optional, advanced
The attribute RequestURL
has the same meaning as RequestURI
except that the regular expressions will be executed on the URL of the request, e.g., full qualifies, starting with http(s)://
.
This feature is used, for example, to send redirects
- from the default virtual host to other hosts, or
- from HTTP to HTTPS (to selectively deny HTTP access for specific locations).
In case of the "passthrough" and "servlet" actions ("PT" and "S"), the system uses the resulting substitution as RequestURI
for the subsequent filters.
Therefore, make sure that it starts with a "/". For this reason, we recommend using the parameter RequestURI
(instead of the parameter RequestURL
).
RequestHeader
Type: whitespace-separated list
Syntax: <header_name>:<regexp>:<substitution>
Usage Constraint: optional, advanced
A list of request header rewrite rules. Several request header rules can be written and are supported for the same HTTP header.
If the value of an HTTP header with the configured <header_name>
matches the regular expression, the value will be substituted with the configured <substitution>
.
RequestCookie
Type: newline-separated list
Syntax: <cookie-name-regex>:<cookie-value-regex>:<cookie-value-substitution>
Usage Constraint: optional, advanced, conditional
A list of request cookie rewrite rules. The default pragma is continue
.
ResponseHeader
Type: whitespace-separated list
Syntax: <header name>:<regexp>:<substitution>
Usage Constraint: optional, advanced, conditional
A whitespace-separated list of response header rewrite rules. Several response header rules can be written and are supported for the same HTTP header.
If the value of an HTTP header with the configured <header name>
matches the regular expression, the value will be substituted with the configured <substitution>
.
A response header XYZ must be addressed with the key bcx.servlet.response.Header.XYZ
(variable "header name").
Rewriting the Location header may not work in certain cases. In such cases, you could use the LuaFilter for rewriting the Location header.
ResponseCookie
Type: newline-separated list
Usage Constraint: optional, advanced
A list of response cookie rewrite rules (rewrite of the "Set-Cookie" header). A response cookie rewrite rule has the following format:
<cookieName-regex>[:value:<regex>:<substitution>][:maxAge:<regex>:<substitution>][:expires:<regex>:<substitution>][path:<regex>:<substitution>][:domain:<regex>:<substitution>][:[!]secure][:[!]httpOnly] [:version:<regex>:remove] [:+<extra attribute name to add>][:!<extra attribute name to remove>][<extra attribute name>:<regex for the extra attribute value>:<substitution for the extra attribute value>]
If one of [value, maxAge, expires, path, domain]
of a new Cookie with the configured <cookieName-regex>
matches the corresponding regular expression, the corresponding value will be substituted with the configured <substitution>
. Only the first rule with a match will be executed.
ResponseBody.Mode
Type: enum
Possible values: replacement, action
Usage Constraint: optional
Default: action
Defines how the response body is processed. If set to replacement
, the body is tokenized using the value of ResponseBody.Delimiters
and the configured ResponseBody
regexp is then applied to each token.
If ResponseBody.Mode
is set to action
, the response body is buffered using ResponseBody.BufferSize
and the regexp/action configured in ResponseBody
is applied to that buffer.
ResponseBody
Type: newline-separated list
Usage Constraint: optional, advanced
Enables configuration of a list of rewrite rules and conditions. If the response body is of the content-type text/, the configured ResponseBody.Mode
defines how the configured regexp is applied to the body.
If ResponseBody.Mode
is set to ’replacement’, the action that is part of the regexp must always be ’PT’. The rules are evaluated in the configured order. Whenever multiple rules would match, the "earlier" rule takes precedence. No rule can match if the matching data would overlap with a string already matched earlier. In other words, all the rules available are applied as long as they rewrite distinct parts of the input line.
ResponseBody.Delimiters
Type: whitespace- or comma-separated list
Usage Constraint: optional, advanced
Default: 10,13 (\n \r)
Specifies the delimiters that will be used for tokenizing the response body. The configured regex will be executed on the resulting parts of the body (lines by default). The delimiters must be configured as a whitespace- or comma-separated list of decimal ascii codes.
This parameter is relevant only if ResponseBody.Mode
is set to replacement
.
ResponseBody.IgnoreQuotes
Type: boolean
Usage Constraint: optional, advanced
Default: true
If set to 'true', a delimiter inside a quoted string will also be included when tokenizing.
This parameter is relevant only if ResponseBody.Mode
is set to replacement
.
ResponseBody.BufferSize
Type: integer
Usage Constraint: optional, advanced
Default: 2048
Configures the size of the internal buffer for response body processing.
If ResponseBody.Mode
is set to ’action’, no more than ResponseBody.BufferSize
bytes of the response are read and/or processed.
In ’replacement’ mode, the parameter defines the length of the data examined at any single time. The total amount of the data buffered may, however, exceed the configured length to account for long matched strings.
ResponseBody.CaseInsensitive
Type: boolean
Usage Constraint: optional, advanced
Default: false
Permits the configuration of case sensitivity for regular expressions.
The default setting is false , i.e., case-insensitive.
ResponseBody.ContentTypes
Type: newline-separated list
Usage Constraint: optional, advanced
Default:
^text/
^application/javascript
^application/x-javascript
^application/xhtml
A list of regular expressions which defines the content types we want to rewrite. Responses on which the content type header does not match any of the configured regular expressions will not be rewritten.
ResponseBody.MaxMatchLen
Type: integer
Usage Constraint: required, advanced
Default: 2048
Maximum length of a string a regular expression rule may match.
The amount of memory needed for response body rewriting grows with the configured value. For performance reasons, the maximum should be kept as low as possible.
When set to zero, there is no limit to the length of a matched string. Such a setting is not recommended.
UseSingleState
Type: boolean
Usage Constraint: optional
Default: false
Supports configuration of multiple correlated home/H rules.
UseQueryString
Type: boolean
Usage Constraint: optional
Default: false
Defines whether the query of the request will be used for matching/substitution.
ModRewriteConfig
Type: string
Usage Constraint: optional, advanced
This method enables delegation of rewriting to Apache. In other words, the rewriting will be done by Apache, not by the RewriteFilter. However, if this feature is used, the information related to the rewriting is not written to the proxy logfiles. You should therefore only use this feature for rewriting what is not available in the nevisProxy.
RewriteFilter configuration
This filter is used to address mapping problems between the reverse proxy's namespace and the namespaces of content providers. Here are a few samples:
^/test/proxy/(.*)$:/unsecure/$1?hello=true:S=ProxyServlet
This rewrite directly sends requests matching '/test/proxy' (e.g. /test/proxy/logo.gif) to the connector servlet with the name 'ProxyServlet'. The corresponding content provider receives a request for the resource '/unsecure/logo.gif?hello=true'.
^/test/(.*)$:/othertest/$1?hallo=true:PT
This rewrite just replaces the request URI (and query string), but does not alter the processing chain (used to retrieve resources from content providers residing at locations that cannot be addressed by just using the HTTP connectors 'pathinfo' or 'requesturi' mapping). If a query is omitted in the substitution part, the existing query is passed on.
^/test/invalidate/(.*)$:/test/$1:R
Accessing e.g. '/test/invalidate/logo.gif' results in a redirect to the location '/test/logo.gif'.
^/test/images/(.*)$:/images/$1:F
Accessing e.g. '/test/images/logo.gif' directs the request to the reverse proxy's local (static) resources. The file /var/opt/nevisproxy/<nevisProxy_instance>/work/images/logo.gif
is read from the disk.
Virtual host redirects
To allow name-based addressing of a portal, the following 'RequestURL' rule is useful:
^http://(.*).company.com/(.*)$:<http://portal.company.com/$1/$2:R>
This redirects access form http://www.company.com/index.html
to http://portal.company.com/www/index.html
.
HTTPS redirects
To deny access to specified URLs using HTTP, the following HTTPS-redirect is used:
^http\://www.company.com/(secure/.*)$:https\://www.company.com/$1:R
Using the RewriteFilter's 'RequestURI' attribute usually signals that too much functionality is implemented on the proxy instead of on content providers. It can also signal that content providers are not cleanly designed (i.e. not proxy aware and/or no clean namespace design that can be mapped to a reverse proxy). It is preferable to adapt content providers to the reverse proxy pattern (if possible) before using the RewriteFilter. In addition, the RewriteFilter may lead to security holes: when using the 'servlet' action, a client request may directly address a content provider and authentication (by an IdentityCreationFilter) may be skipped.
A note on regular expressions: Since the list of rewrite rules is whitespace- or newline-separated, it is not possible to have a whitespace as part of a regular expression. You can bypass this limitation by using the appropriate character class "\s". For example:
<param-name>RequestURI</param-name>
<param-value>
Condition:HTTP_USER_AGENT:MSIE[\s]?[45]
^/.*$:/legacy_browser:F
</param-value></init-param>
RewriteFilter example
This filter parses the request URI and in those URIs which contain 'whatever' replaces with 'hello' and redirect to that URI.
<filter>
<filter-name>RequestURIRewriteFilter</filter-name>
<filter-class>ch::nevis::isiweb4::filter::rewrite::RewriteFilter</filter-class>
<init-param>
<param-name>RequestURI</param-name>
<param-value>
PCRE/^/whatever/$/:/hello/:PT
</param-value>
</init-param>
</filter>
This filter parses the response body and replaces every occurrence of the string determined in ResponseBody
init param.
<filter>
<filter-name>BodyRewriteFilter</filter-name>
<filter-class>ch::nevis::isiweb4::filter::rewrite::RewriteFilter</filter-class>
<init-param>
<param-name>ResponseBody.MaxMatchLen</param-name>
<param-value>128</param-value>
</init-param>
<init-param>
<param-name>ResponseBody.Mode</param-name>
<param-value>replacement</param-value>
</init-param>
<init-param>
<param-name>ResponseBody</param-name>
<param-value>
PCRE/pleasereplaceme/:hereyouare:PT
PCRE/\s\[0-9\]+\s/: *number* :PT
</param-value>
</init-param>
</filter>