Tuesday, January 20, 2015

Unicode in JSF

Unicode, what's it all about?

Let's go back in the history of character encoding. Most of you may be familiar with the term "ASCII". This was less or more the first character encoding ever. At the ages when a byte was very expensive and 1MHz was extremely fast, only the characters which appeared on those ancient US typewriters (as well as at the average US International keyboard nowadays) were covered by the charset of the ASCII character encoding. This includes the complete Latin alphabet (A-Z, in both the lowercased and uppercased flavour), the numeral digits (0-9), the lexical control characters (space, dot, comma, colon, etcetera) and some special characters (the at sign, the sharp sign, the dollar sign, etcetera). All those characters fill up the space of 7 bits, half of the room a byte provides, with a total of 128 characters.
Later the remaining bit of a byte is used for Extended ASCII which provides room for a total of 255 characters. Most of the remaining room is used by special characters, such as diacritical characters and line drawing characters. Because everyone used the remaining room their own way (IBM, Commodore, Universities, etcetera), it was not interchangeable. Later ISO came up with standard character encoding definitions for 8 bit ASCII extensions, resulting in the known ISO 8859 character encoding standards such as ISO 8859-1.
8 bits may be enough for the languages using the Latin alphabet, but it is certainly not enough for the remaining non-Latin languages in the world, such as Chinese, Japanese, Hebrew, Cyrillic, Sanskrit, Arabic, etcetera. They developed their own non-ISO character encodings which was -again- not interchangeable, such as Guobiao, BIG5, JIS, KOI, MIK, TSCII, etcetera. Finally a new 16 bits character encoding standard based on top of ISO 8859-1 was established to cover any of the characters used at the world so that it is interchangeable everywhere: Unicode. You can find all of those linguistic characters here. Unicode also covers many special characters (symbols) such as punctuation and mathematical operators, which you can findhere.

Set Unicode your project:

Do you want to set Unicode your project? Must you complete following steps .

Your MySQL database and tables must be using UTF-8

CREATE TABLE tbl_name (...) CHARACTER SET utf8;

Somewhere where you define the mysql url, you must have "UTF-8".

Here is an (unchecked) example.


Flowing your XHTML structure :
<?xml version='1.0' encoding='utf-8'?>
  <html xmlns="http://www.w3.org/1999/xhtml"
    <f:view encoding="utf-8" contentType="application/xhtml+xml">
UTF-8 Encoding with JSF
If you are having character problems with JSF, you can implement a Custom Filter to adjust character encoding to UTF-8.
Turkish character problems overcomed with a filter like this.

package yourpackagename;

import java.io.IOException;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;

public class CustomCharacterEncodingFilter implements Filter {

public void init(FilterConfig config) throws ServletException {

public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
                                                       throws IOException, ServletException {
    chain.doFilter(request, response);

public void destroy() {

Filter and its mapping definitions should be placed in your web.xml.


And the bundles in the project should be converted to UTF-8. The resources files with properties extension that resides in 
${resources.dir}/native directory are encoded to the "${classes.dir}/${resources.dir}". Here is the ant target that achieve this.

<target name="compile-resources" description="">
<native2ascii dest="${classes.dir}/${resources.dir}" src="${resources.dir}/native" includes="**/*" ext=".properties" />


Post a Comment


Copyright @ 2014 Tech Tutorial .