unicode - Understanding character encoding in PHP -

- June 15, 2011

i struggling @ understanding character encoding in php.

consider following script (you can run here):

$string = "\xe2\x82\xac";  var_dump(mb_internal_encoding()); var_dump($string); var_dump(unpack('c*', $string)); $utf8string = mb_convert_encoding($string, "utf-8"); var_dump($utf8string); var_dump(unpack('c*', $utf8string));  mb_internal_encoding("utf-8");  var_dump($string); var_dump($utf8string);

i have string, € character, represented unicode code points. php 5.5 used internal encoding iso-8859-1, hence think string encoded using encoding. unpack can see bite representation of string, , corresponds hexadecimal codes use define string.

then convert encoding of string utf-8, using mb_convert_encoding. @ point string displays differently on screen , byte representation changes (and expected).

if change php internal encoding utf-8, i'd expect utf8string displayed correctly on screen, doesn't happen.

what missing?

the script show doesn't use non-ascii characters, internal encoding not make difference. mb_internal_encoding convert data on output. this question tell more how works; tell it's better not use it.

the three-byte string $string in code utf-8 representation of euro symbol, not "unicode code point" (which 2 bytes wide, common unicode characters: 0x20ac).

does clear behavior see?

Search This Blog

First Image

unicode - Understanding character encoding in PHP -

Comments

Post a Comment

Popular posts from this blog

php - Passing multiple values in a url using checkbox -

java - nested exception is org.hibernate.exception.SQLGrammarException: could not extract ResultSet Hibernate+SpringMVC -

sql - Postgresql tables exists, but getting "relation does not exist" when querying -