原创文章,转载请指明出处并保留原文url地址
本文主要针对nginx的ngx_http_charset_module模块做简单介绍,本文具体包括如下指令:
charset、charset_map、charset_types、override_charset、source_charset
ngx_http_charset_module模块将指定的字符集“内容类型”响应头域。此外,该模块可以从一个到另一个字符集的数据转换,有一些限制:
ngx_http_charset_module 模块添加特定字符集设置到客户端相应的 “Content-Type”域中。另外这个模块能转化数据从一个字符集转换到另外一个字符集,但是有一些限制, 如下:
执行转换的一种方法 - 从服务器到客户端,
仅仅单字节的字符集能被进行转换
或者单字节字符集能被转到utf8格式或者从utf8格式转到单字节字符集。
配置实例:
include conf/koi-win;
charset windows-1251;
source_charset koi8-r;
Nginx原文:
The ngx_http_charset_module module adds the specified charset to the “Content-Type” response header field. In addition, the module can convert data from one charset to another, with some limitations:
conversion is performed one way — from server to client,
only single-byte charsets can be converted
or single-byte charsets to/from UTF-8.
Example Configuration
include conf/koi-win;
charset windows-1251;
source_charset koi8-r;
1. charset
syntax: | charset charset | off; |
default: | charset off; |
context: | http, server, location, if in location |
向返回给用户的相应的头中添加“Content-Type”头不域。如果这个字符集设置同source_charset指令设置的不同,则一个转换将进行。
Off参数相应头中 “Content-Type” 区域的字符集设置。
字符集能够用一个变量进行定义,如下:
charset $charset;
在这样的情况下,一个变量所有可能的值都需要出现在配置中,至少有一次在charset_map中,或者字符集中, 或source_charset指令的形式中。对于UTF-8,WINDOWS-1251,和KOI8-R字符集,包括conf/koi-win, conf/koi-utf, and conf/win-utf这些文件到配置中就足够了。其他字符集,简单的制作一个虚拟的转换表,例如:
charset_map iso-8859-5 _ { }
另外,字符集也可以通过被代理服务器相应头中“X-Accel-Charset”指令进行设置。这个功能可以使用proxy_ignore_headers 或者fastcgi_ignore_headers指令进行禁止。
Nginx原文:
Adds the specified charset to the “Content-Type” response header field. If this charset is different from the charset specified in the source_charset directive, a conversion is performed.
The parameter off cancels the addition of charset to the “Content-Type” response header field.
A charset can be defined with a variable:
charset $charset;
In such a case, all possible values of a variable need to be present in the configuration at least once in the form of the charset_map, charset, or source_charset directives. For utf-8, windows-1251, and koi8-r charsets it is sufficient to include the files conf/koi-win, conf/koi-utf, and conf/win-utf into configuration. For other charsets, simply making a fictitious conversion table works, for example:
charset_map iso-8859-5 _ { }
In addition, charset can also be set in the “X-Accel-Charset” response header field. This ability can be disabled using the proxy_ignore_headers and fastcgi_ignore_headers directives.
2. charset_map
syntax: | charset_map charset1 charset2{ ... } |
default: | — |
context: | http |
设定从一个字符集到另一个字符集的转换表。反向转换表使用相同的数据建立。字符代码采用十六进制字符码方式。在范围80-ff内缺少的字符用”?“进行替换。当从UTF-8字符到其他字符进行转换时,若缺少一个字节这用“# XXXX取代;
例子如下:
charset_map koi8-r windows-1251 {
C0 FE ; # small yu
C1 E0 ; # small a
C2 E1 ; # small b
C3 F6 ; # small ts
...
}
当描述从一个字符集到utf8字符集进行转换时, utf-8代码应该写在第二列,例如:
charset_map koi8-r utf-8 {
C0 D18E ; # small yu
C1 D0B0 ; # small a
C2 D0B1 ; # small b
C3 D186 ; # small ts
...
}
完整的从koi8-r到windows-1251的转换,以及从koi8-r和windows-1251到UTF-8的转换在相关的发布文件中(conf/koi-win, conf/koi-utf, and conf/win-utf配置文件中)
Nginx原文:
Describes the conversion table from one charset to another. A reverse conversion table is built using the same data. Character codes are given in hexadecimal. Missing characters in the range 80-FF are replaced with “?”. When converting from UTF-8, characters missing in a one-byte charset are replaced with “&#XXXX;”.
Example:
charset_map koi8-r windows-1251 {
C0 FE ; # small yu
C1 E0 ; # small a
C2 E1 ; # small b
C3 F6 ; # small ts
...
}
When describing a conversion table to UTF-8, codes for the UTF-8 charset should be given in the second column, for example:
charset_map koi8-r utf-8 {
C0 D18E ; # small yu
C1 D0B0 ; # small a
C2 D0B1 ; # small b
C3 D186 ; # small ts
...
}
Full conversion tables from koi8-r to windows-1251, and from koi8-r and windows-1251 to utf-8 are provided in the distribution files conf/koi-win, conf/koi-utf, and conf/win-utf.
3. charset_types
syntax: | charset_types mime-type...; |
default: | charset_types text/html text/xml text/plain text/vnd.wap.wml application/x-javascript application/rss+xml; |
context: | http, server, location |
本指令出现在0.7.9版。
使能模块处理在响应头包括特定MIME(“text/html”除外)值的相应数据。 “*”匹配任何类型(出现在0.8.29)
Nginx原文:
This directive appeared in version 0.7.9.
Enables module processing in responses with the specified MIME types in addition to “text/html”. The special value “*” matches any MIME type (0.8.29).
4. override_charset
syntax: | override_charset on | off; |
default: | override_charset off; |
context: | http, server, location, if in location |
如果在被代理的服务器或者fastCGI服务器返回的相应中携带了用“Content-Type”域指示的字符集信息,那么这个指令可以用来决定是否应该进行一个转换操作(一个疑惑:若是返回的相应中没有携带信息, 这个指令又该当如何?没去测试,无法准确说明,见谅。)。如果一个转换被使能, 一个在相应(从被代理服务器中返回的相应)中包括的字符集信息用来做源字符集使用。
应当指出的是,如果一个相应是来自子请求中的相应, 那么一个转换通常被进行(从子请求的相应字符集到主请求的相应字符集),这样override_charset指令设置通常被忽略。
Nginx原文:
Determines if a conversion should be performed for answers received from a proxied or FastCGI server, if the answers already carry a charset in the “Content-Type” response header field. If conversion is enabled, a charset specified in the received response is used as a source charset.
It should be noted that if a response was received in a subrequest then conversion from the response charset to the main request charset is always performed regardless of the override_charset directive setting.
5. source_charset
syntax: | source_charset charset; |
default: | — |
context: | http, server, location, if in location |
定义了一个响应的源字符集。如果这个字符集同charset 指令设置的字符集不同,则一个转换操作将被进行。
Nginx原文:
Defines the source charset of a response. If this charset is different from the charset specified in the charset directive, a conversion is performed.