Small Business Resources, Business Advice and Forms from AllBusiness.com

Business Glossary

Search the Business Glossary:

Business Definition for: Unicode

Unicode

a system for representing characters using up to 20 bits, allowing for 1.048,576 different characters, enough to represent all the written languages of the world, including Japanese and Chinese. This contrasts with the 256 characters possible in ASCII and similar systems.

The Unicode standard is not yet complete. Originally, Unicode characters were 16 bits, as in the UTF-16 format described later, and only 65,536 characters were possible. Unicode version 3 goes beyond this limit and defines over 90,000 characters. Complete information is available at www.unicode.org.

The first 128 Unicode character codes are the same as ASCII, including end-of-line marks (see CR LF ). In various programming languages and editors, Unicode character codes are written as U+XXXX or \uXXXX, where XXXX stands for a series of hexadecimal digits; thus, the letter A, ASCII hexadecimal 41, is Unicode U+0041. Figure 283 shows examples of Unicode characters.

4001053006-01

There are several kinds of Unicode text files. The most important are:

  • UTF-8 – Same as ASCII for codes up to 127; thus, a UTF-8 file can also be anASCII file. Higher-numbered codes are represented by sequences of up to 4 bytes.
  • UTF-16 big-endian – Each character occupies 2 bytes (16 bits), high-order byte first. The file begins with hexadecimal FE FF or with any Unicode character. Codes higher than 16 bits are represented by pairs of 16-bit sequences.
  • UTF-16 little-endian – Just like UTF-16 big-endian, except that each pair of bytes has the low-order byte first, and the file begins with hexadecimal FF FE (representing the value FEFF). This is the Unicode system normally used in Microsoft Windows.

Unicode is used internally by the Java programming language and many newer software packages. However, the characters that you will actually see on your machine are limited by the fonts installed.

Hint: When you open a UTF-16 file in an ASCII text editor on a PC, you generally see characters separated by blanks ("l i k e t h i s"- the blanks are really ASCII 0). The remedy is to use a Unicode editor, such asWindows Notepad, and save the file as ASCII.

See also big-endian , character set , little-endian , ANSI (American National Standards Institute) , ASCII (American Standard Code for Information Interchange)
Copyright © 2006, 2003, 2000, 1998, 1996, 1995, 1992, 1989, 1986 by Barron's Educational Series, Inc. Reprinted by arrangement with Publisher.