THE WORLD'S LARGEST WEB DEVELOPER SITE

HTML字符集


To display an HTML page correctly, the browser must know what character-set (encoding) to use:

Example

<meta charset="UTF-8">

HTML字符集

For HTML5, the default character encoding is UTF-8.

This has not always been the case. The character encoding for the early web was ASCII.

Later, from HTML 2.0 to HTML 4.01, ISO-8859-1 was considered the standard.

With XML and HTML5, UTF-8 finally arrived and solved a lot of character encoding problems.


In the Beginning: ASCII

Computer data is stored as binary codes (01000101) in the electronics.

To standardize the storing of text, the American Standard Code for Information Interchange (ASCII) was created. It defined a unique binary number for each storable character to support the numbers from 0-9, the upper and lower case alphabet (a-z, A-Z), and special characters like ! $ + - ( ) @ < > , .

Since ASCII used 7 bits for the character, it could only represent 128 different characters.

The biggest weakness with ASCII, was that it excluded non English letters.

ASCII is still in use today, especially in large mainframe computer systems.

For a closer look, please study our Complete ASCII 参考.


In Windows: Windows-1252

Windows-1252 was the default character-set in Windows, up to Windows 95.

It is an extension to ASCII, with added international characters.

It uses a full byte (8-bits) to represent 256 different characters.

Since Windows-1252 has been the default in Windows, it is supported by all browsers.

For a closer look, please study: The Complete Windows-1252 参考.



In HTML 4: ISO-8859-1

The default character-set in HTML 4 is ISO-8859-1.

ISO-8859-1 is an extension to ASCII, with added international characters.

Example

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

In HTML 4, a character-set different from ISO-8859-1 can be specified in the <meta> tag:

Example

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-8">

All HTML 4 processors also support UTF-8:

Example

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

When a browser detects ISO-8859-1 it normally defaults to Windows-1252, because Windows-1252 has 32 more international characters.

For a closer look, please study: The Complete ISO-8859-1 参考


In HTML5: Unicode UTF-8

The default character-set for HTML5 is UTF-8.

Example

<meta charset="UTF-8">

A character-set different from UTF-8 can be specified in the <meta> tag:

Example

<meta charset="ISO-8859-1">

The Unicode Consortium developed the UTF-8 and UTF-16 standards, because the ISO-8859 character-sets are limited, and not compatible a multilingual environment.

The Unicode Standard covers (almost) all the characters, punctuations, and symbols in the world.

All HTML5 and XML processors support UTF-8, UTF-16, Windows-1252, and ISO-8859.

For a closer look, please study: The Complete Unicode 参考.




W3Schools is optimized for learning, testing, and training. 示例 might be simplified to improve reading and basic understanding. Tutorials, references, and examples are constantly reviewed to avoid errors, but we cannot warrant full correctness of all content. While using this site, you agree to have read and accepted our terms of use, cookie and privacy policy. Copyright 1999-2020 by Refsnes Data. All Rights Reserved.
Powered by W3.CSS.

W3Schools.com