Business Definition for: Unicode
Unicode
a system for representing characters using up to 20 bits, allowing for 1.048,576 different characters, enough to represent all the written languages of the world, including Japanese and Chinese. This contrasts with the 256 characters possible in ASCII and similar systems.
The Unicode standard is not yet complete. Originally, Unicode characters were 16 bits, as in the UTF-16 format described later, and only 65,536 characters were possible. Unicode version 3 goes beyond this limit and defines over 90,000 characters. Complete information is available at www.unicode.org.
The first 128 Unicode character codes are the same as ASCII, including end-of-line marks (see
CR
LF
). In various programming languages and editors, Unicode character codes are written as U+XXXX or \uXXXX, where XXXX stands for a series of hexadecimal digits; thus, the letter A, ASCII hexadecimal 41, is Unicode U+0041. Figure 283 shows examples of Unicode characters.
There are several kinds of Unicode text files. The most important are:
- UTF-8 – Same as ASCII for codes up to 127; thus, a UTF-8 file can also be anASCII file. Higher-numbered codes are represented by sequences of up to 4 bytes.
- UTF-16 big-endian – Each character occupies 2 bytes (16 bits), high-order byte first. The file begins with hexadecimal FE FF or with any Unicode character. Codes higher than 16 bits are represented by pairs of 16-bit sequences.
- UTF-16 little-endian – Just like UTF-16 big-endian, except that each pair of bytes has the low-order byte first, and the file begins with hexadecimal FF FE (representing the value FEFF). This is the Unicode system normally used in Microsoft Windows.
Unicode is used internally by the Java programming language and many newer software packages. However, the characters that you will actually see on your machine are limited by the fonts installed.
Hint: When you open a UTF-16 file in an ASCII text editor on a PC, you generally see characters separated by blanks ("l i k e t h i s"- the blanks are really ASCII 0). The remedy is to use a Unicode editor, such asWindows Notepad, and save the file as ASCII.
See also
big-endian
,
character set
,
little-endian
,
ANSI (American National Standards Institute)
,
ASCII (American Standard Code for Information Interchange)
Related Terms:
a system of memory addressing in which numbers that occupy more than one byte in memory are stored "big end first," with the uppermost 8 bits at the lowest address.
For example, the 16-digit binary number 1010111010110110 occupies two 8-bit bytes in memory. On a big-endian computer such as the Macintosh, the upper byte, 10101110, is stored at the first address and the lower byte, 10110110, is stored at the next higher address. On a little-endian machine, the order is reversed. Contrast little-endian.
The terms big-endian and little-endian are from Gulliver's Travels; they originally referred to the parties in a dispute over which end of a boiled egg should be broken first.
the set of characters that can be printed or displayed on a computer. For examples see ASCII; ANSI; IBM PC; Unicode; EBCDIC.
a system of memory addressing in which numbers that occupy more than one byte in memory are stored "little-end-first," with the lowest 8 bits at the lowest address.
For example, the 16-digit binary number 1010111010110110 occupies two 8-bit bytes in memory. On a little-endian computer such as the IBM PC, the lower byte, 10110110, is stored at the first address and the upper byte, 10101110, is stored at the next higher address. On a big-endian machine, the order is reversed. Contrast big-endian.
The terms "big-endian" and "little-endian" are from Gulliver's Travels; they originally referred to the parties in a dispute over which end of a boiled egg should be broken first.
the main industrial standardization organization in the United States. There are official ANSI standards in almost all industries, and many of them have to do with computers. In computer programming, ANSI most often refers to one of the following:
- ANSI standard versions of C, FORTRAN, COBOL, or other programming languages. Typically, a particular manufacturer's version of a language will include all of the features defined in the ANSI standard, plus additional features devised by the manufacturer. To be easily transportable from one computer to another, a program should not use any features that are not in theANSI standard. The programmer can then produce executable versions of it for different types of computers by compiling the same program with different compilers.
- ANSI standard escape sequences for controlling the screen of a computer terminal or microcomputer. An escape sequence is a series of character codes which, when sent to the screen, causes the screen to do something other than simply display the characters to which the codes correspond. The ANSI escape sequences all begin with theASCII Escape character (code 27). See ANSI screen control.
- The ANSI extended character set used in MicrosoftWindows, and shown in Table 2. It includes all the ASCII characters plus many others. See ASCII ; IBM PC; Unicode; Windows (Microsoft).
To type any ANSI character in Microsoft Windows, hold down the Alt key while typing 0 followed by the character code number on the numeric keypad at the right-hand side of the keyboard. For example, to type é, hold down Alt and type 0233. You may prefer to use the Character Map utility to select characters and copy them to the Clipboard, and then paste them into your application.
computer term. The code converts a character into a binary number used by most microcomputers and information services (on-line data bases) so that different makes of microcomputers may be able to communicate with each other. ASCII is used on most microcomputers, computer terminals, and printers. ASCII codes also include control characters that information services use. Many computer books and some software programs (e.g., Borland International's Sidekick) have a table of ASCII characters. The use of ASCII also allows for data files generated by one type of program (i.e., data base management system) to be used in another type of program (i.e., spreadsheet). An example of an ASCII application follows. Data may be downloaded from an information service (e.g., Dow Jones News/Retrieval) in ASCII and then loaded into a word processing program and edited and printed out or even sent to another computer using a telecommunications program. ASCII is quite helpful in electronic mail because with MCI, for example, the accountant can upload an ASCII file as electronic mail to his clients.
Referring Terms:
Copyright © 2006, 2003, 2000, 1998, 1996, 1995, 1992, 1989, 1986 by Barron's Educational Series, Inc. Reprinted by arrangement with Publisher.