Nginx ngx_http_charset_module模块基本指令整理

原创文章,转载请指明出处并保留原文url地址

本文主要针对nginx的ngx_http_charset_module模块做简单介绍,本文具体包括如下指令:

charset、charset_map、charset_types、override_charset、source_charset

ngx_http_charset_module模块将指定的字符集“内容类型”响应头域。此外,该模块可以从一个到另一个字符集的数据转换,有一些限制:

ngx_http_charset_module 模块添加特定字符集设置到客户端相应的 “Content-Type”域中。另外这个模块能转化数据从一个字符集转换到另外一个字符集,但是有一些限制, 如下:

执行转换的一种方法 - 从服务器到客户端,

仅仅单字节的字符集能被进行转换
或者单字节字符集能被转到utf8格式或者从utf8格式转到单字节字符集。

配置实例:

include        conf/koi-win;

charset        windows-1251;

source_charset  koi8-r;

Nginx原文:

The ngx_http_charset_module module adds the specified charset to the “Content-Type” response header field. In addition, the module can convert data from one charset to another, with some limitations:

conversion is performed one way — from server to client,

only single-byte charsets can be converted

or single-byte charsets to/from UTF-8.

Example Configuration

include        conf/koi-win;

charset        windows-1251;

source_charset koi8-r;

1. charset

syntax:charset  charset | off;
default:charset off;
context:http, server, location, if in location

向返回给用户的相应的头中添加“Content-Type”头不域。如果这个字符集设置同source_charset指令设置的不同,则一个转换将进行。

Off参数相应头中 “Content-Type” 区域的字符集设置。

字符集能够用一个变量进行定义,如下:

charset $charset;

在这样的情况下,一个变量所有可能的值都需要出现在配置中,至少有一次在charset_map中,或者字符集中, 或source_charset指令的形式中。对于UTF-8,WINDOWS-1251,和KOI8-R字符集,包括conf/koi-win, conf/koi-utf, and conf/win-utf这些文件到配置中就足够了。其他字符集,简单的制作一个虚拟的转换表,例如:

charset_map iso-8859-5 _ { }

另外,字符集也可以通过被代理服务器相应头中“X-Accel-Charset”指令进行设置。这个功能可以使用proxy_ignore_headers 或者fastcgi_ignore_headers指令进行禁止。

Nginx原文:

Adds the specified charset to the “Content-Type” response header field. If this charset is different from the charset specified in the source_charset directive, a conversion is performed.

The parameter off cancels the addition of charset to the “Content-Type” response header field.

A charset can be defined with a variable:

charset $charset;

In such a case, all possible values of a variable need to be present in the configuration at least once in the form of the charset_map, charset, or source_charset directives. For utf-8, windows-1251, and koi8-r charsets it is sufficient to include the files conf/koi-win, conf/koi-utf, and conf/win-utf into configuration. For other charsets, simply making a fictitious conversion table works, for example:

charset_map iso-8859-5 _ { }

In addition, charset can also be set in the “X-Accel-Charset” response header field. This ability can be disabled using the proxy_ignore_headers and fastcgi_ignore_headers directives.

2. charset_map

syntax:charset_map    charset1   charset2{ ... }
default:
context:http

设定从一个字符集到另一个字符集的转换表。反向转换表使用相同的数据建立。字符代码采用十六进制字符码方式。在范围80-ff内缺少的字符用”?“进行替换。当从UTF-8字符到其他字符进行转换时,若缺少一个字节这用“# XXXX取代;

例子如下:

charset_map koi8-r windows-1251 {

C0 FE ; # small yu

C1 E0 ; # small a

C2 E1 ; # small b

C3 F6 ; # small ts

...

}

当描述从一个字符集到utf8字符集进行转换时, utf-8代码应该写在第二列,例如:

charset_map koi8-r utf-8 {

C0 D18E ; # small yu

C1 D0B0 ; # small a

C2 D0B1 ; # small b

C3 D186 ; # small ts

...

}

完整的从koi8-r到windows-1251的转换,以及从koi8-r和windows-1251到UTF-8的转换在相关的发布文件中(conf/koi-win, conf/koi-utf, and conf/win-utf配置文件中)

Nginx原文:

Describes the conversion table from one charset to another. A reverse conversion table is built using the same data. Character codes are given in hexadecimal. Missing characters in the range 80-FF are replaced with “?”. When converting from UTF-8, characters missing in a one-byte charset are replaced with “&#XXXX;”.

Example:

charset_map koi8-r windows-1251 {

C0 FE ; # small yu

C1 E0 ; # small a

C2 E1 ; # small b

C3 F6 ; # small ts

...

}

When describing a conversion table to UTF-8, codes for the UTF-8 charset should be given in the second column, for example:

charset_map koi8-r utf-8 {

C0 D18E ; # small yu

C1 D0B0 ; # small a

C2 D0B1 ; # small b

C3 D186 ; # small ts

...

}

Full conversion tables from koi8-r to windows-1251, and from koi8-r and windows-1251 to utf-8 are provided in the distribution files conf/koi-win, conf/koi-utf, and conf/win-utf.

3. charset_types

syntax:charset_types   mime-type...;
default:charset_types text/html text/xml text/plain text/vnd.wap.wml

application/x-javascript application/rss+xml;

context:http, server, location

本指令出现在0.7.9版。

使能模块处理在响应头包括特定MIME(“text/html”除外)值的相应数据。 “*”匹配任何类型(出现在0.8.29)

Nginx原文:

This directive appeared in version 0.7.9.

Enables module processing in responses with the specified MIME types in addition to “text/html”. The special value “*” matches any MIME type (0.8.29).

4. override_charset

syntax:override_charset   on | off;
default:override_charset off;
context:http, server, location, if in location

如果在被代理的服务器或者fastCGI服务器返回的相应中携带了用“Content-Type”域指示的字符集信息,那么这个指令可以用来决定是否应该进行一个转换操作(一个疑惑:若是返回的相应中没有携带信息, 这个指令又该当如何?没去测试,无法准确说明,见谅。)。如果一个转换被使能, 一个在相应(从被代理服务器中返回的相应)中包括的字符集信息用来做源字符集使用。

应当指出的是,如果一个相应是来自子请求中的相应, 那么一个转换通常被进行(从子请求的相应字符集到主请求的相应字符集),这样override_charset指令设置通常被忽略。

Nginx原文:

Determines if a conversion should be performed for answers received from a proxied or FastCGI server, if the answers already carry a charset in the “Content-Type” response header field. If conversion is enabled, a charset specified in the received response is used as a source charset.

It should be noted that if a response was received in a subrequest then conversion from the response charset to the main request charset is always performed regardless of the override_charset directive setting.

5. source_charset

syntax:source_charset    charset;
default:
context:http, server, location, if in location

定义了一个响应的源字符集。如果这个字符集同charset 指令设置的字符集不同,则一个转换操作将被进行。

Nginx原文:

Defines the source charset of a response. If this charset is different from the charset specified in the charset directive, a conversion is performed.

发表评论