ARABIC LANGUAGE IN INTERNET


For the Internet to be truly international, it must support the diverse languages of the world. Arabic, a language spoken by millions of people worldwide, is being increasingly used on the Internet, despite the confronting obstacles. A major obstacle facing Arabization of the Internet is the lack of standards, particularly in the field of character sets. The Internet is a heterogeneous environment composed of different configurations of hardware and software (transport and net equipment). Standards are the way to get the different parties on the Internet to agree on how to format and exchange information. Other obstacles were uncovered by a survey conducted at the beginning of 1997 on the perceived problems facing Arab-speaking Internet/intranet users [1]. Users ranked the obstacles to greater Internet usage in the Arab world in the following order: weak telecom infrastructure, lack of Arabic content on the Internet, and lack of Arabic Internet access programs for the Web and for e-mail. (Most of the survey respondents were in Saudi Arabia.) The support required for Arabic on the Internet can be categorized in the fields of content, transport, client processing, and server processing. A certain level of support is required in each category. The support required is not all unique to Arabic. In fact, internationalization (i18n) is an active field of research in Internet technology. Arabic content (textual content, to be specific) relates to representing the data itself (using character sets) and to formatting it. Formatting is specified by Internet standards such as HTML in the case of the World Wide Web pages and RFC 822 and MIME in the case of e-mail messages. The Transport protocol is HTTP (HyperText Transfer Protocol) for the Web and SMTP (Simple Mail Transfer Protocol) for e-mail. Client processing includes generating, displaying, and interacting with Arabic text, while server processing includes storing, processing, searching, and providing Arabic content. Most of these issues are addressed below. One of the major problems that faces the use of Arabic is the plurality of character sets. Transporting Arabic text over the Internet is problematic because of its non-ASCII character sets. Major among the client processing issues is display of Arabic text. The display features of Arabic text set it apart from other languages in several ways: Arabic text is cursive, and the shapes of its characters depend on their position in the word. Most Arabic characters connect to one another when they are written in the same word. The directionality of Arabic text is peculiar: While Arabic text is written right-to-left, Arabic numbers are written left-to-right. This feature and the frequent need in everyday use of combining Arabic and Latin text on the same line necessitate handling of bi-directional text. These features affect the display of Arabic text in mail programs and Web browsers. One of the most important server processing issues for Arabic text is the problem of search and indexing. These operations are more involved in Arabic than many other languages. The representation and transport problems are external to Arabic, meaning that they are not related to the features of Arabic text. Rather, they are byproducts of Internet protocols originating in the Western world, which uses Latin characters. These problems are shared with many other languages of the world. While the display problems relate to the features of Arabic, they do not affect transport of Arabic text. Solutions have started to emerge with browsers and mail programs building on new Internet standards (such as MIME). The trend toward Unicode also helps the exchange of Arabic on the Internet. Other interim solutions are frequently used, such as encoding text as graphics and relying on ad hoc rules in Web servers to guess the Arabic capabilities of browsers and send information accordingly. The trend set by the Internet standard setters toward internationalization of Internet protocols is also very encouraging .