RewriteFilter
The task of the RewriteFilter is to manipulate the URI of the request during processing. The configuration parameter contains several regular expressions, substitution patterns and actions, as the following example shows:
<init-param>
<param-name>RequestURI</param-name>
<param-value>
^/test/proxy/(.*)$:/unsecure/$1?hallo=true:S=ProxyServlet
^/test/(.*)$:/othertest/$1?hallo=true:PT
^/test/invalidate/(.*)$:/test/$1:R
^/test/images/(.*)$:/images/$1:F
^/$:/index_generic.html:F
Condition:HTTP_USER_AGENT:
^.*MSIE.*$ ^/$:/index_msie.html:F
Condition:HTTP_USER_AGENT:
^.*Firefox.*$ ^/$:/index_firefox.html:F
</param-value>
<descriptionn>Support request routing and URL manipulation inside servlet container</description>
</init-param>
Note that it is possible to rewrite the Request URI depending on the value of some other parts of the request header. The examples below show how:
Rewriting dependent on some request headers:
<init-param>
<param-name>RequestURI</param-name>
<param-value>
^/$:/index_generic.html:F
Condition:HTTP_USER_AGENT:^.*MSIE.*$
^/$:/index_msie.html:F
Condition:HTTP_USER_AGENT:^.*Firefox.*$
^/$:/index_firefox.html:F
</param-value>
</init-param>Rewriting dependent on the remote address:
<!-- white model -->
<init-param>
<param-name>RequestURI</param-name>
<param-value>
<!-- allow -->
Condition:REMOTE_ADDR:^194\.69\.(.*)$
^.*$:$0:PT
Condition:REMOTE_ADDR:^138\.190\.(.*)$
^.*$:$0:PT
<!-- block -->
Condition:REMOTE_ADDR:^.*$
^.*$:/forbidden.html:F
</param-value>
</init-param>
<!-- black model -->
<init-param>
<param-name>RequestURI</param-name>
<param-value>
<!-- block -->
Condition:REMOTE_ADDR:^81\.221\.(.*)$
^.*$:/forbidden.html:F
</param-value>
</init-param>
The syntax of the Condition is: [<SOURCE>]:<VARIABLE>:<expression>
, where the optional <SOURCE>
parameter is one of the supported sources:
- AUTH: Any parameter returned by the authentication process.
- ENV: The environment of Apache (CGI-style).
- CONST: Constant values.
- PARAM: Parameters from ParameterFilter.
- HEADER: Request headers.
- SESSION: Session attributes.
If no <SOURCE>
is given, then it defaults to ENV
. Several condition statements can be logical ANDed together as follows:
Condition:REMOTE_ADDR:^138\.190\.(.*)$
Condition:REQUEST_METHOD:^POST$
^.*$:/forbidden.html:F
The syntax for a regular expression is described in: Regular expressions. For example:
PCRE/^.*foo.*$/
The result of a condition can be negated by adding the "!" character in front of the condition:
!Condition:REQUEST_METHOD:^POST$
Request information can be added to the substitutions of the rewriting. The syntax is similar as in the conditions:
<substitution>\[;ANY|<substitution>\]*
substitution: <source>\:<value>
<source>\:<value>\[;<source>\:<value>\]
<source>: ENV, AUTH, HEADER, PARAM, or SESSION.
Some samples:
^Cookie1$:value:^.*matching(.*)$:ENV\:HTTP_HEADER_COOKIE;_$1
https?\://www.nevis.net/?:https\://ENV\:HTTP_Host;/de
The :
after ENV
needs to be escaped because it is used as delimiter as well. With the new CIDR notation you can define conditions matching subnets, for example: Condition:REMOTE_ADDR:CIDR/10.4.1.0/22/
, which matches the address range 10.4.12.0 - 10.4.15.255 (network mask 255.255.252.0).
ch::nevis::isiweb4::filter::rewrite::RewriteFilter
libRewriteFilters.so.1
Configuration
RequestURI
- Type: string array
- Usage Constraints: optional, basic feature
- Default: empty, no rewriting done
For an application’s logout request, we recommend using the Http(s)ConnectorServlet’s configuration attributes LogoutURI and LogoutURI.Interception rather than manually adding redirects using the RequestURI attribute (for more information, see the chapter: HttpConnectorServlet, or the chapter: HttpsConnectorServlet).
A whitespace or newline-separated list of rewrite rules that can be made conditional. Conditions can be added according to individual rules. Thus, the request URI can be rewritten according to values from other parts of the request header.
A rewrite rule has the following format:
<condition>:<CGIvariablename>:<regexp>
<regexp>:<substitution>:<action>
The regular expression is executed on the URI of the request (e.g., qualified, starting with /
). The default type for a <regexp>
is "PCRE(da)".
The <action>
part has the following syntax: [PT,R,F[=<statusCode>],S=<servlet name>]
.
- PT (or passthrough): Replaces the URI and the query string of the request by the substitute URI and query string. However, the filter chain and target servlet are not recomputed. The regular processing goes on. That is, nevisProxy proceeds with the next filter or servlet in the chain that was selected in the beginning.
- S (or servlet): Directly calls the specified target servlet (after the manipulation of the request URI and query as described above for the PT flag). Remaining filters are ignored.
- R (or redirect): Redirects a request to the address in the substitution. A redirect breaks the current filter/servlet chain and sends a 302 HTTP redirection status code with the rewritten location information to the client. The client will then reenter nevisProxy with the rewritten URI.
- F (or file): This action is similar to the "S" or "servlet" action, but instead of addressing a servlet, the specified file is read from the "work" directory of the reverse proxy. An optional HTTP response status code may be appended to serve error pages with a status other than 200 as follows: "F=503".
- H (or home): This action is very special and is only used to enforce a specific entry URL to a content provider when a user accesses the reverse proxy for the first time or the user’s session has timed out. A redirect to the specified URL is sent to the client, which allows to enforce a homepage for the content provider or remove query arguments.
The first matching rule will be executed. All subsequent ones will be skipped.
RequestURL
- Type: string array
- Usage Constraints: optional, advanced
- Default: empty
The attribute RequestURL has the same meaning as RequestURI except that the regular expressions will be executed on the URL of the request (e.g., full qualifies, starting with http(s)://).
This feature is used, for example, to send redirects
- from the default virtual host to other hosts, or
- from HTTP to HTTPS (to selectively deny HTTP access for specific locations).
For the syntax of the rules, see the description of the attribute RequestURI.
In case of the "passthrough" and "servlet" actions ("PT" and "S"), the system uses the resulting substitution as RequestURI for the subsequent filters. Therefore, make sure that it starts with a /
. For this reason, we recommend using the parameter RequestURI (instead of the parameter RequestURL).
RequestHeader
- Type: string array
- Usage Constraints: optional, advanced
- Default: empty
A whitespace-separated list of request header rewrite rules. Several request header rules can be written and are supported for the same HTTP header.
A request header rewrite rule has the following format: <header name>:<regexp>:<substitution>
The default type for a <regexp>
is "PCRE(da)".
If the value of an HTTP header with the configured <header name>
matches the regular expression, the value will be substituted with the configured <substitution>
.
RequestCookie
- Type: string
- Usage Constraints: optional, advanced
Defines a new line-separated list of request cookie rewrite rules. A request cookie rewrite rule has the following format:<cookie-name-regex>:<cookie-value-regex>:<cookie-value-substitution>
This parameter supports conditions. The default pragma is \'continue\''
.
ResponseHeader
- Type: string array
- Usage Constraints: optional, advanced
- Default: empty
A whitespace-separated list of response header rewrite rules. Several response header rules can be written and are supported for the same HTTP header. A response header rewrite rule has the following format: <header name>:<regexp>:<substitution>
The default type for a <regexp>
is "PCRE(da)".
If the value of an HTTP header with the configured <header name>
matches the regular expression, the value will be substituted with the configured <substitution>
.
Optionally, it is possible to specify in which cases request and/or response headers have to be rewritten:
Condition: <header name>:<expression>
CIDR example: REMOTE_ADDR:CIDR/10.4.12.0/22/
A response header XYZ must be addressed with the key bcx.servlet.response.Header.XYZ (variable "header name").
Rewriting the Location header may not work in certain cases. In such cases, you could use the LuaFilter for rewriting the Location header.
ResponseCookie
- Type: string array
- Usage Constraints: optional, advanced
- Default: empty
A newline-separated list of response cookie rewrite rules (rewrite of the "Set-Cookie" header). A response cookie rewrite rule has the following format:
<cookieName-regex>[:value:<regex>:<substitution>][:maxAge:<regex>:<substitution>][:expires:<regex>:<substitution>][path:<regex>:<substitution>][:domain:<regex>:<substitution>][:[!]secure][:[!]httpOnly] [:version:<regex>:remove] [:+<extra attribute name to add>][:!<extra attribute name to remove>][<extra attribute name>:<regex for the extra attribute value>:<substitution for the extra attribute value>]
If one of [value, maxAge, expires, path, domain]
of a new Cookie with the configured <cookieName-regex>
matches the corresponding regular expression, the corresponding value will be substituted with the configured <substitution>
. Only the first rule with a match will be executed.
The default type for a <regexp>
is "PCRE(da)".
ResponseBody.Mode
- Type: enum: replacement, action
- Usage Constraints: optional
- Default: action
Defines how the response body is processed. If set to ’replacement’, the body is tokenized using the value of ’ResponseBody.Delimiters’ and the configured ’ResponseBody’ regexp is then applied to each token. If ResponseBody. Mode is set to ’action’, the response body is buffered using ’ResponseBody.BufferSize’ and the regexp/action configured in ’ResponseBody’ is applied to that buffer.
ResponseBody
- Type: string array
- Usage Constraints: optional, advanced
- Default: empty
Enables configuration of a newline-separated list of rewrite rules and conditions. If the response body is of the content-type text/, the configured ’ResponseBody.Mode’ defines how the configured regexp is applied to the body.
If ’ResponseBody.Mode’ is set to ’replacement’, the action that is part of the regexp must always be ’PT’. The rules are evaluated in the configured order. Whenever multiple rules would match, the "earlier" rule takes precedence. No rule can match if the matching data would overlap with a string already matched earlier. In other words, all the rules available are applied as long as they rewrite distinct parts of the input line.
The default type for a <regexp>
is "PCRE(da)".
ResponseBody.Delimiters
- Type: string array
- Usage Constraints: optional, advanced
- Default: 10,13 (\n \r)
ResponseBody.Delimiters specifies the delimiters that will be used for tokenizing the response body. The configured regex will be executed on the resulting parts of the body (lines by default). The delimiters must be configured as a whitespace- or comma-separated list of decimal ascii codes.
This parameter is relevant only if ResponseBody.Mode is set to ’replacement’.
ResponseBody.IgnoreQuotes
- Type: boolean
- Usage Constraints: optional, advanced
- Default: true
If set to 'true', a delimiter inside a quoted string will also be included when tokenizing.
This parameter is relevant only if ResponseBody.Mode is set to ’replacement’.
ResponseBody.BufferSize
- Type: integer
- Usage Constraints: optional, advanced
- Default: 2048
Configures the size of the internal buffer for response body processing.
If ResponseBody.Mode is set to ’action’, no more than ResponseBody.BufferSize bytes of the response are read and/or processed.
In ’replacement’ mode, the parameter defines the length of the data examined at any single time. The total amount of the data buffered may, however, exceed the configured length to account for long matched strings.
ResponseBody.CaseInsensitive
- Type: boolean
- Usage Constraints: optional, advanced
- Default: false
Permits the configuration of case sensitivity for regular expressions. The default setting is false , i.e., case-insensitive.
ResponseBody.ContentTypes
- Type: string
- Usage Constraints: optional, advanced
- Default: ^text/^application/javascript^application/x-javascript^application/xhtml
A newline-separated list of regular expressions which defines the content types we want to rewrite. Responses on which the content type header does not match any of the configured regular expressions will not be rewritten.
The default type for a <regexp>
is "PCRE(da)".
ResponseBody.MaxMatchLen
- Type: integer
- Usage Constraints: required, advanced
- Default: 2048
Maximum length of a string a regular expression rule may match.
The amount of memory needed for response body rewriting grows with the configured value. For performance reasons, the maximum should be kept as low as possible.
When set to zero, there is no limit to the length of a matched string. Such a setting is not recommended.
UseSingleState
- Type: boolean
- Default: false
Supports configuration of multiple correlated home/H rules.
UseQueryString
- Type: boolean
- Default: false
Defines whether the query of the request will be used for matching/substitution.
ModRewriteConfig
- Type: string
- Usage Constraints: optional, advanced
This method enables delegation of rewriting to Apache. In other words, the rewriting will be done by Apache, not by the RewriteFilter. However, if this feature is used, the information related to the rewriting is not written to the proxy logfiles. You should therefore only use this feature for rewriting what is not available in the nevisProxy.
Use cases
This filter is used to address mapping problems between the reverse proxy's namespace and the namespaces of content providers. Here are a few samples:
^/test/proxy/(.*)$:/unsecure/$1?hello=true:S=ProxyServlet
This rewrite directly sends requests matching '/test/proxy' (e.g. /test/proxy/logo.gif) to the connector servlet with the name 'ProxyServlet'. The corresponding content provider receives a request for the resource '/unsecure/logo.gif?hello=true'.
^/test/(.*)$:/othertest/$1?hallo=true:PT
This rewrite just replaces the request URI (and query string), but does not alter the processing chain (used to retrieve resources from content providers residing at locations that cannot be addressed by just using the HTTP connectors 'pathinfo' or 'requesturi' mapping). If a query is omitted in the substitution part, the existing query is passed on.
^/test/invalidate/(.*)$:/test/$1:R
Accessing e.g. '/test/invalidate/logo.gif' results in a redirect to the location '/test/logo.gif'.
^/test/images/(.*)$:/images/$1:F
Accessing e.g. '/test/images/logo.gif' directs the request to the reverse proxy's local (static) resources. The file /var/opt/nevisproxy/<nevisProxy_instance>/work/images/logo.gif
is read from the disk.
Virtual host redirects
To allow name-based addressing of a portal, the following 'RequestURL' rule is useful:
^http://(.*).company.com/(.*)$:<http://portal.company.com/$1/$2:R>
This redirects access form http://www.company.com/index.html
to http://portal.company.com/www/index.html
.
HTTPS redirects
To deny access to specified URLs using HTTP, the following HTTPS-redirect is used:
^http\://www.company.com/(secure/.*)$:https\://www.company.com/$1:R
Using the RewriteFilter's 'RequestURI' attribute usually signals that too much functionality is implemented on the proxy instead of on content providers. It can also signal that content providers are not cleanly designed (i.e. not proxy aware and/or no clean namespace design that can be mapped to a reverse proxy). It is preferable to adapt content providers to the reverse proxy pattern (if possible) before using the RewriteFilter. In addition, the RewriteFilter may lead to security holes: when using the 'servlet' action, a client request may directly address a content provider and authentication (by an IdentityCreationFilter) may be skipped.
A note on regular expressions: Since the list of rewrite rules is whitespace- or newline-separated, it is not possible to have a whitespace as part of a regular expression. You can bypass this limitation by using the appropriate character class "\s". For example:
<param-name>RequestURI</param-name>
<param-value>
Condition:HTTP_USER_AGENT:MSIE[\s]?[45]
^/.*$:/legacy_browser:F
</param-value></init-param>
RewriteFilter example
This filter parses the request URI and in those URIs which contain 'whatever' replaces with 'hello' and redirect to that URI.
<filter>
<filter-name>RequestURIRewriteFilter</filter-name>
<filter-class>ch::nevis::isiweb4::filter::rewrite::RewriteFilter</filter-class>
<init-param>
<param-name>RequestURI</param-name>
<param-value>
PCRE/^/whatever/$/:/hello/:PT
</param-value>
</init-param>
</filter>
This filter parses the reponse body and replaces every occurance of the string determined in ResponseBody init param.
<filter>
<filter-name>BodyRewriteFilter</filter-name>
<filter-class>ch::nevis::isiweb4::filter::rewrite::RewriteFilter</filter-class>
<init-param>
<param-name>ResponseBody.MaxMatchLen</param-name>
<param-value>128</param-value>
</init-param>
<init-param>
<param-name>ResponseBody.Mode</param-name>
<param-value>replacement</param-value>
</init-param>
<init-param>
<param-name>ResponseBody</param-name>
<param-value>
PCRE/pleasereplaceme/:hereyouare:PT
PCRE/\s\[0-9\]+\s/: *number* :PT
</param-value>
</init-param>
</filter>