Apple Wiki
Advertisement
Apple Wiki

ASCII (American Standard Code for Information Interchange) is the basis of character sets used in almost all present-day computers. US-ASCII uses only the lower seven bits (character points 0 to 127) per byte to convey control codes, space, numbers, most basic punctuation, and unaccented letters a-z and A-Z. A system is described as being "8-bit clean" if it doesn't mangle text with byte values above 127, as some older systems did.

More modern coded character sets (e.g., Latin-1, Unicode) define extensions to ASCII for values above 127 for conveying special Latin characters (like accented characters, or German ess-tsett), characters from non-Latin writing systems (e.g., Cyrillic, or Han characters), and such desirable glyphs as distinct open- and close-quotation marks. ASCII replaced earlier systems such as EBCDIC and Baudot code, which used fewer bytes, but were each broken in their own way.[1]

Background[]

Computers are pickier about spelling than humans. Thus, hackers need to be very precise when talking about characters, and have developed a considerable amount of verbal shorthand for them. Every character has one or more names - some formal (ITU-T), some concise (Usenet), some silly (INTERCAL).

Some other common usages cause odd overlaps. The "#", "$", ">", and "&" characters, for example, were all pronounced "hex" in different communities because various assemblers use them as a prefix tag for hexadecimal constants (in particular, "#" in many assembly-programming cultures, "$" in the 6502 world, ">" at Texas Instruments, and "&" on the BBC Micro, Acorn Archimedes, Sinclair Research, and some Zilog Z80 machines).

Extensions[]

The inability of US-ASCII to correctly represent nearly any language other than English became an obvious and intolerable misfeature as computer use outside the USA and UK became the rule rather than the exception. To avoid software rot, national extensions to US-ASCII were developed, such as Latin-1.

Hardware and software from the USA continued for some time to embody the assumption that US-ASCII is the universal character set and that words of text consist entirely of byte values 65-90 and 97-122 (A-Z and a-z); this is a major irritant to people who want to use a character set suited to their own languages. Perversely, though, efforts to solve this problem by proliferating sets of national characters produced an evolutionary pressure (especially in protocol design, e.g., the URL standard) to stick to US-ASCII as a subset common to all those in use, and therefore to stick to English as the language encodable with the common subset of all the ASCII dialects. This basic problem with having a multiplicity of national character sets ended up being a prime justification for Unicode, which was designed, ostensibly, to become the universal character set that anyone would need.[1]

References[]

See also[]

External links[]

FOLDOC logo This page uses GFDL licensed content from the Free On-line Dictionary of Computing.
Advertisement