Tips and Tricks for implementing multilingual websites
The UTF-8 encoding has been around since the early 90's and Unicode for even longer than that, yet it still isn't completely ubiquitous. So for your website to properly serve and store UTF-8 data, you need to ensure support for it is enabled all the way up your server stack, from the operating system to the end user's browser.
Operating system
Linux has supporte
d UTF-8 for years; you can run the locale command to verify the encoding your system is using.Web Server
The default installation of Apache 2.0 and 2.2 does not have UTF-8 set as the default. Fix this by opening httpd.conf and uncommenting this line:
AddDefaultCharset U
TF-8Database engine
A fresh installation of MySQL 5 needs the following in /etc/my.cnf:
[mysqld]
character-set-server = utf8
# sets charset for new databases
collation-server = utf8_general_ci
# sets collation for new databases http://dev.mysql.com/doc/refman/5.1/en/charset-server.html
skip-character-set-client-handshake
# ignores the character set requested by the client
#init-connect = 'SET NAMES utf8;'
# runs before every DB query; probably not necessary http://dev.mysql.com/doc/refman/5.1/en/charset-connection.html
[mysql] default-character-set = utf8 # http://dev.mysql.com/doc/refman/5.1/en/mysql-command-options.html#option_mysql_default-character-set [client] default-character-set = utf8 # http://dev.mysql.com/doc/refman/5.1/en/charset-configuration.html
Restart mysqld and verify the settings by running:
SHOW variables LIKE '%character_set
%';SHOW VARIABLES LIKE 'collation%';Database data
If you're using existing data, things can get difficult. Essentially, you need to dump your database, convert the dump file to UTF-8, change any DEFAULT CHARSET lines to equal utf8, and then fix bugs, which can be many and varied. Every situation is unique; Google is your friend. http://www.bluebox.net/news/2009/07/mysql_encoding
Website
For a browser to properly render a site, it needs to be told what encoding the site is in. This is done either by changing the server configuration, which causes the encoding to be specified in the HTTP headers (the better way) or by including the appropriate meta tag (which some browsers respond to by re-requesting the webpage and reading it again in the specified encoding). To see what encoding is being sent, use Firefox's Get Info on the page.
Fonts
All this work comes to naught if the end user doesn't have any fonts installed that match the characters that your website is serving. This is most likely to happen if your page has traditional Chinese characters, in which case to make absolutely sure your page renders correctly, you can either serve with your page via @font-face a 20MB font, or rely on fonts.com's web fonts service, the only one which supports far eastern languages.