To display an HTML page correctly, a web browser must know which character set to use.
From ASCII to UTF-8
ASCII was the first character encoding standard. ASCII defined 128 different characters that could be used on the internet: numbers (0-9), English letters (A-Z), and some special characters like ! $ + - ( ) @ < > .
ISO-8859-1 was the default character set for HTML 4. This character set supported 256 different character codes. HTML 4 also supported UTF-8.
ANSI (Windows-1252) was the original Windows character set. ANSI is identical to ISO-8859-1, except that ANSI has 32 extra characters.
The HTML5 specification encourages web developers to use the UTF-8 character set, which covers almost all of the characters and symbols in the world!
The HTML charset Attribute
To display an HTML page correctly, a web browser must know the character set used in the page.
This is specified in the <meta> tag:
<meta charset="UTF-8">
Differences Between Character Sets
The following table displays the differences between the character sets described above:
Numb
ASCII
ANSI
8859
UTF-8
Description
32
space
33
!
!
!
!
exclamation mark
34
"
"
"
"
quotation mark
35
#
#
#
#
number sign
36
$
$
$
$
dollar sign
37
%
%
%
%
percent sign
38
&
&
&
&
ampersand
39
'
'
'
'
apostrophe
40
(
(
(
(
left parenthesis
41
)
)
)
)
right parenthesis
42
*
*
*
*
asterisk
43
+
+
+
+
plus sign
44
,
,
,
,
comma
45
-
-
-
-
hyphen-minus
46
.
.
.
.
full stop
47
/
/
/
/
solidus
48
0
0
0
0
digit zero
49
1
1
1
1
digit one
50
2
2
2
2
digit two
51
3
3
3
3
digit three
52
4
4
4
4
digit four
53
5
5
5
5
digit five
54
6
6
6
6
digit six
55
7
7
7
7
digit seven
56
8
8
8
8
digit eight
57
9
9
9
9
digit nine
91
[
[
[
[
left square bracket
92
\
\
\
\
reverse solidus
93
]
]
]
]
right square bracket
94
^
^
^
^
circumflex accent
95
_
_
_
_
low line
96
`
`
`
`
grave accent
123
{
{
{
{
left curly bracket
124
|
|
|
|
vertical line
125
}
}
}
}
right curly bracket
126
~
~
~
~
tilde
128
€
euro sign
The ASCII Character Set
ASCII uses the values from 0 to 31 (and 127) for control characters.
ASCII uses the values from 32 to 126 for letters, digits, and symbols.
ASCII does not use the values from 128 to 255.
The ANSI Character Set (Windows-1252)
ANSI is identical to ASCII for the values from 0 to 127.
ANSI has a proprietary set of characters for the values from 128 to 159.
ANSI is identical to UTF-8 for the values from 160 to 255.
The ISO-8859-1 Character Set
ISO-8859-1 is identical to ASCII for the values from 0 to 127.
ISO-8859-1 does not use the values from 128 to 159.
ISO-8859-1 is identical to UTF-8 for the values from 160 to 255.
The UTF-8 Character Set
UTF-8 is identical to ASCII for the values from 0 to 127.
UTF-8 does not use the values from 128 to 159.
UTF-8 is identical to both ANSI and 8859-1 for the values from 160 to 255.
UTF-8 continues from the value 256 with more than 10 000 different characters.