UTF-8, mod_perl, CGI & etc

Digging around for the solution to avoid weird behavior of the utf- strings handling in perl scripts running in apache mod_perl environment, I found really nice link:
perl, UNICODE/utf8, CGI.pm, apache, mod_perl and MySQL .

Wonder, but I have missed the mysql_enable_utf8 setting for the DBD::mysql driver!

I'm making a copy of this page here for the references.
1. The plain perl/CGI script case

NOTE: This information also applies to when you run perl scripts normally from the shell (not through CGI).

1.1. First,
perl needs to be invoked with the "-C" switch
, so you need to add that
to your hash-bang sequence starting every script like this:

#!/usr/bin/perl -C

1.2. Perl
also needs the system environent variable LANG to be set
to automatically
detect and come to the conclusion to handle all I/O as unicode.

If it is not inherited by Apache from your system (which it is not in Fedora Core 8), this is
probably easiest done by adding the following statement in your apache
config file http.conf (assuming it is UTF-8 you're after, that is):

SetEnv LANG en_US.UTF-8

This
is preferably done in the global context of your httpd.conf to enable it
in all virtual hosts etc (assuming that is what you want).

1.3. Then we need to do something about CGI.pm, since it does not autmatically decode utf8 CGI parameters (like when you use $cgi->param("paramname")).

The
newer versions of CGI.pm (~3.30 something) have some utf8 decoding
capabilities through the -utf8 switch (see resources #2),
but setting it globally may corrupt file uploads that should be handled
binary without any decoding. Besides, the fedora distribution I'm using
(FC8) does not ship with that version yet anyway (and I like using
the stuff shipped with the distro to avoid having to patch things
manually to the greatest extent possible).

Thus I use a wrapper
for CGI.pm that someone published at perlmonks.org called 'as_utf8', which
also takes care of the file-upload-case and do not perform any
conversion in that case.

See resources #3 for the wrapper script.(Just copy the text into a file and put it somewhere in your @INC, for instance /usr/lib/perl5/site_perl/CGI/as_utf.pm)

To
use it, just replace the normal "use CGI;" with a "use CGI::as_utf8;"
in your scripts and it should transparently decode your
$cgi->param("paramname") for you (without screwing up file-uploads
etc).

1.4. Specifying charset in your HTML output 

To have your browser client understand your UTF8 encoded text and display it correctly, you must somehow make it aware of this fact. This can be done in several ways.

Either specify it in your apache config file using the following statement in the global context:
AddDefaultCharset UTF-8
Note that this of course will effect ALL directories, all virtual servers, and that might not always be what you want.

To specify it at a per-script basis, you can specify it in the http response head

Comments

Чиво-чиво?

А эт мне не нада

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.