unicode - Understanding character encoding in PHP -
i struggling @ understanding character encoding in php.
consider following script (you can run here):
$string = "\xe2\x82\xac"; var_dump(mb_internal_encoding()); var_dump($string); var_dump(unpack('c*', $string)); $utf8string = mb_convert_encoding($string, "utf-8"); var_dump($utf8string); var_dump(unpack('c*', $utf8string)); mb_internal_encoding("utf-8"); var_dump($string); var_dump($utf8string);
i have string, € character, represented unicode code points. php 5.5
used internal encoding iso-8859-1
, hence think string encoded using encoding. unpack
can see bite representation of string, , corresponds hexadecimal codes use define string.
then convert encoding of string utf-8
, using mb_convert_encoding
. @ point string displays differently on screen , byte representation changes (and expected).
if change php
internal encoding utf-8
, i'd expect utf8string
displayed correctly on screen, doesn't happen.
what missing?
the script show doesn't use non-ascii characters, internal encoding not make difference. mb_internal_encoding
convert data on output. this question tell more how works; tell it's better not use it.
the three-byte string $string
in code utf-8 representation of euro symbol, not "unicode code point" (which 2 bytes wide, common unicode characters: 0x20ac
).
does clear behavior see?
Comments
Post a Comment