Guidelines for using XML for Electronic Data Interchange

Version 0.05

25th January 1998

Editor: Martin Bryan, The SGML Centre

Contributors: Members of the XML/EDI working group, including Benoít Marchal, Norbert H Mikula, Bruce Peat and David RR Webber.

XML/EDI Group Home Page URL: http://www.xmledi.org

Copyright © 1998. XML/EDI Group. All rights reserved, no part of this document may be commercially reproduced in part or in whole without consent and prior approval.

Changes made to this version

Addition of figures used in presentation to W3C in January 1998.

Brief explanation of differences in business processes between client-centric electronic business transactions and server-centric web retailing.

Rules templates now linked to messages via a processing instruction rather than XLL simple link (for conformance to way in which style sheets are linked to the message.

The examples in Annex A have been updated to show an XML book order that can be displayed using Micorsoft's MSXSL beta add-on to Internet Explorer 4.0. On-line link to demonstration software now provided.


Contents

  1. Purpose & Goal of the XML/EDI Guidelines
  2. Definitions for XML/EDI
  3. Scope of XML/EDI
  4. Base Technologies of XML/EDI
  5. XML/EDI Components
  6. The Implementation Process

1. Purpose & Goal of the XML/EDI Guidelines

Put simply, the goal of XML/EDI is to deliver unambiguous and durable business transactions via electronic means.

Associated with this is a goal to establish a standard for commercial electronic data interchange that is open and accessible to all, and which delivers a broad spectrum of capabilities suitable to meet the full breadth of business needs.

To achieve this requires the use of a methodology that it is not only extensible enough to meet future requirements but also adaptable enough to incorporate new technologies and requirements as they emerge. To ensure broad adoption the technology selected needs to be widely and freely available. The Extensible Markup Language (XML) developed by the World Wide Web Consortium (W3C) provides such a freely available, widely transportable, methodology for well-controlled data interchange.

XML was designed principally for the exchange of information in the form of computer displayable "documents". Not all commercial data is interchanged in a displayable format. In particular data designed for electronic data interchange typically needs to be processed before it can be displayed. For this to be possible the data must be mapped, using some form of template, to a set of processing rules. These XML/EDI guidelines provide a standardized way in which such rules templates can be added to interchanged data.

These XML/EDI guidelines begin by formally defining the terms used in the text. This is followed by an impact statement that makes predictions from various viewpoints. The guidelines then give a background on the tools and standards which XML/EDI is built.

Note: These guidelines form the basis for development work on XML/EDI. They form an precursor to a formal "Specification of an EDI Application for XML". As a document designed to be a lighting rod for ideas, this working document has been, and will continue to be, released in draft form. Comments on this draft should be sent to the XML/EDI working group at xml-edi@riv.be.

2. Definitions for XML/EDI

Electronic commerce has been defined in the European Workshop on Open System's Technical Guide on Electronic Commerce (EWOS ETG 066) as "Electronic exchange of data to support business transactions, i.e. the exchange of value through the delivery of a product from a seller to a buyer". As such it encompasses much more than what has been possible using traditional methods of Electronic Data Interchange (EDI) such as EDIFACT. Electronic commerce is defined by EWOS as covering activities such as marketing, contract exchange, logistics support, settlement and interaction with administrative bodies (e.g. tax and custom data interchange). Electronic commerce covers all industrial and service operations, including services such as insurance, healthcare, travel and interactive home shopping.

Many people use the term EDI to refer to the set of messages developed for business-to-business communication as part of the United Nations Standard Messages Directory for Electronic Data Interchange for Administration, Commerce and Transport (EDIFACT). EDIFACT messages are transmitted in compressed form, using predefined field identifiers, which must occur in a predefined sequence. While EDI is, strictly speaking, wider in scope than EDIFACT, for the purposes of these guidelines EDI will be used in this restricted sense when not otherwise qualified.

The basic unit of information in an EDI message is the data element. For an EDI invoice, each item being invoiced would be represented by a data element. Data elements can be grouped into compound data elements, and data elements and/or compound data elements may be grouped into data segments. Data segments can be grouped into loops; and loops and/or data segments form business documents.

The EDIFACT standards define whether data segments are mandatory, optional, or conditional, and indicate whether, how many times, and in what order a particular data segment can be repeated. For each EDI message, a field definition table exists. For each data segment, the field definition table includes a key field identifier string to indicate the data elements to be included in the data segment, the sequence of the elements, whether each element is mandatory, optional, or conditional, and the form of each element in terms of the number of characters and whether the characters are numeric or alphabetic. Similarly, field definition tables include data element identifier strings to describe individual data elements. Element identifier strings define an element's name, a reference designator, a data dictionary reference number specifying the location in a data dictionary where information on the data element can be found, a requirement designator (either mandatory, optional, or conditional), a type (such as numeric, decimal, or alphanumeric), and a length (minimum and maximum number of characters). A data element dictionary gives the content and meaning for each data element.

Originally, EDI translation software was developed to support a variety of private system formats. Most often, the sender and receiver were required to contract in advance for a tailored software program that would be dedicated to mapping between their two types of datasets. Each time a new sender or receiver was added to the client list, a new translation program would be needed by the new party to format their data to conform to the standards in use by the participants. Of course, this becomes expensive. Such static systems do not easily allow synchronization of business transactions in distributed business processes that involve global rules, but with participants and actions that are not predetermined. To solve these issues it is desirable to develop automated tools and techniques that are easy to use and allow decomposition of transactions in actions to be performed locally and mapping of local actions onto efficient protocol exchanges.

The Electronic Enterprise

The Electronic Enterprise

The concept of the Electronic Enterprise requires a transition away from paper form based EDI. Key concepts that are required are the encapsulation of agreed sets of business rules (in EDI parlance the Implementation Guidelines) and also mechanisms to handle state and flow control (such as those provided by hyperlink anchors in HTML files). Also message sets must be able to handle partial information, where the complete information is not yet available, or simply is not required for the particular business process. This allows different parts of an enterprise to selectively contribute only the information that is germane to their business functions.

A fundemental difference between the proposals in these XML/EDI Guidelines and those found in other proposals for XML-based web retailing, such as those covered in the Open Trading Protocol (OTP), is the client-centric nature of the business processes, as contrasted with the server-centric nature of electronic retailing. To distinguish these two terms, we use the term "Electronic Business" to refer to the processes of fulfilling customer requirements through the application of negotiated business processes leading to the supply of manufactured goods to retailers and service providers, and "Web Commerce" to describe the process of selling manufactured goods to consumers.

Electronic business is client-centric in that is starts with a specification of a client's requirements, rather than a statement of what the supplier has to offer. The specification of requirements gets sent to a number of potential suppliers, who are asked to tender for the business by a predefined date/time. The purchaser is, as a result of this process, provided with more than one choice, and must determine which quotation to accept. This may require a period of contract negotiation to ensure that adequate terms and conditions, including delivery criteria, are met. This may require a looping of the processes, with a need to cross-refer between successive documents.

Once the purchaser has selected a supplier the business processes involved are very similar to those involved in web commerce, but there are subtle differences. For example, electronic payment before delivery is unlikely to be required for electronic business transactions. Instead of being an integral part of the negotiation phase, with payment being made at the time the order is placed, payment in the electronic business scenario is a separate process that occurs immediately after delivery. This introduces concepts such as statements, which do not occur in web commerce scenarios.

The standards involved in XML/EDI

XML is the Extensible Markup Language subset of ISO's Standard Generalized Markup Language (SGML) developed by the World Wide Web Consortium (W3C) SGML on the Web working party during the latter half of 1996 and early 1997. The formal recommendation was submitted for approval by W3C members on 8th December 1997.

On 10th September 1997 a proposal for a new form of XML Style Language (XSL), which incorporates the ECMAScript standardized variant of JavaScript, was published by a consortium led by Microsoft, ArborText and the Inso Corporation. This version of the XML/EDI specification uses the power provided by this new advanced language combination to show how control of XML/EDI document processes can be achieved in a distributed manner.

In October 1997 a specification for a formal Document Object Model (DOM) for XML documents was published by W3C. This model provides a standardized API for XML-based tools.

Combining XML and EDI to develop XML/EDI suggests that the main method of capturing and coding EDI information will be through XML-coded electronic forms. At present the form handling characteristics of XML are yet to be fully agreed (agreement is expected during 1998). To allow interaction with existing sytems the XML/EDI Guidelines show how EDIFACT messages can be generated from XML/EDI forms, and vice versa.

XML/EDI isn't creating a new standard. XML/EDI is defining how companies can use current standards to solve their business problems.

3. Scope of XML/EDI

Detail of the scope of XML/EDI, and the impact it is expected to have on business communities, are covered in Introducing XML/EDI.... To help readers of this document to appreciate the differences in practice between traditional EDIFACT-based web transactions and XML/EDI this section discusses some of the differences between traditional business-to-business electronic data interchange systems and the new breed of interactive electronic business tools being provided through the Internet.

Business-to-business Electronic Data Interchange

Electronic Data Interchange (EDI) has been used for business-to-business communication for almost a quarter of a century. Initial efforts involved inter-company agreements on how to exchange commercial data, initially as information stored on tape and later as messages sent over dedicated data lines. To avoid having to use different protocols to move data between different companies, various industry groups identified sets of data that could form the basis of individual agreements. The industry groups also sought to agree the format in which fields in such data sets were interchange so that a company only needed to develop one methodology for decoding information received without resource to human intervention.

The Achilles Heel for this approach has always been two fold. Firstly, companies require flexibility in, and wish to deviate from, doctrinaire standards that do not fully meet their business needs. Secondly, because the standards are pre-ordained there is no mechanism provided to transfer processing rules and associated information. It is assumed that the data meets the defined constraints and if not, has been duly modified to conform. This means that companies must conduct exacting analysis to determine precisely how they are going to move their business data to and from the predefined EDI formats. The cost of these constraints has been borne as excessively long and complex implementation cycles for traditional EDI systems.

The world has changed from thirty years ago, and now requires more dynamic and vibrant services that match the organized yet ad hoc nature presented by both modern business practice, and particularly its manifestations on the Internet. The Internet is re-writing the rules on how people interact, buy and sell, and exchange goods and services. In particular the Internet is showing us that EDI is not only relevant for business-to-business communications. The same concepts are also relevant for all consumer-to-supplier relationships, whether the consumer is an end-user, a manufacturer, a service organization such as a hospital or a hotel, a governmental organization or a virtual organization.

Electronic Commerce at end of 1997

Electronic Commerce in 1997

Electronic business transactions

With the arrival of the Internet in the last decade of the 20th century the pattern of electronic commerce has dramatically changed. In particular, the Internet has introduced many new ways of trading, allowing interaction between groups that previously could not economically afford to trade with one another.

Whereas previously commercial data interchange involved mainly the movement of data fields from one computer to another, without human intervention, the new model for web-based commerce introduced by the Internet is typically dependent on human interaction for the transaction to take place. The new model is based principally on the use of interactive selection of a set of options, and on the completion of "electronic forms", to specify user requirements.

As this new model develops there has been a fundamental shift in how data used for commerce should be processed. The original create-->transmit-->receive-->process cycle of information processing, using individual programs, is beginning to be replaced by the concept of active objects which have inherent processes associated with them, based on the class of information they contain. Today an invoice may no longer contain a copy of the information stored in the database it was generated from: instead it contains a pointer that says where it expects to get the data from, and this data will be fetched from its managed source each time the invoice is processed.

Such interactive programs require us to review the underlying philosophy of electronic commerce. What are the characteristics of a system designed for "electronic business transactions" in an international marketplace?

To be truly interactive you need to be able to:

  1. Understand the business concepts represented in the interchanged data.
  2. Apply business-specific rules to the interchanged data to identify what class(es) of data it contains and formulate appropriate responses.

To do this you need to be able to:

Because these interactions can be complex, and potentially require specialized knowledge, the rule templates can be supplemented by XML/EDI data manipulation agents (DataBots) to ensure that users can express their requirements in high-level, natural language, terms. DataBots automatically create appropriate rule templates and XML syntax to match user requirements and broker the entire interchange.

When DataBots are being used XML/EDI is identified as being robot generated by adding an R to its name to become XML/EDI-R.

At this point in time the ECMAScript subset of the Java programming language provides the vehicle that permits the DataBots to be deployed and received along with XML/EDI messages.

Future based on XML/EDI

Future based on XML/EDI

4. Base Technologies of XML/EDI

XML/EDI is a synthesis of many concepts. XML/EDI:

XML/EDI can be seen as the fusion of five existing technologies:

  1. Web data interchange based on the new XML specification
  2. Existing EDI business methods and message structures
  3. Knowledge templates that provide process control logic
  4. Data manipulation agents (DataBots) that perform specialist functions
  5. Data repositories that allow relationships to be maintained.
The Five Technologies of XML/EDI

The Five Technologies of XML/EDI

Why use XML?

XML will be native language for the next generation of most of the popular WWW browsers. XML/EDI seeks to leverage the work and support (technically and financially) which XML is receiving. With traditional EDI, the infrastructure was built from the ground up, without being able to share resources with other programs. This paradigm is no longer appropriate in today's world of shared software development. By adopting XML/EDI, the EDI community can get to share the cost of extension and future development.

In 1986 the International Organization for Standardization (ISO) published an international standard defining a Standard Generalized Markup Language (SGML) that allowed its users to:

SGML has formed the basis of many of the large, multinational, documentation projects that have developed in the decade since its publication. It also formed the basis for the formalization of the HyperText Markup Language (HTML) that led to the formation of the World Wide Web of documentation that has become available on the Internet.

Key to the success of HTML was the development of the concept of Uniform Resource Locators (URLs) that allow users to identify the source of each piece of shared data in a consistent manner. Whilst the original concept has limitations as to the granularity of data access, its universality has greatly improved computer-to-computer communications.

In July 1996 the World Wide Web Consortium (W3C) set up a working group to study how SGML could be simplified to allow for its efficient use over the Internet. The result was the development of an Extensible Markup Language (XML) that combined the expressive power of SGML with the Internet-aware functionality of HTML.

XML provides an ideal methodology for electronic business because:

Integrating XML with EDI

XML can be integrated with existing EDI systems by:

XML can extend existing EDI applications by:

5. XML/EDI Components

The following figyre illustrates the main layers of a fully integrated XML/EDI system.

XML/EDI layers

The layers of an XML/EDI system

The XML/EDI specific components are built on top of existing standards for transmitting and processing XML-encoded data. These standards define shared features such as:

XML parsers, document browsers, page markup programs and related software functions are available of-the-shelf today. XML/EDI isn't, therefore, a new standard; it simply provides a framework for using existing standards to tackle existing problems in a new way.

XML/EDI specific components will either manifest themselves as built-in components into existing products, plug-in programs to existing tools or standalone applications. It is anticipated that new applications will be created from the spark of XML/EDI implementation.

Types of applications

The following examples of the type of facilities that could be built into an XML/EDI implementation isn't comprehensive, but a starting place for discussion:

Each of these options is explained in more detail in the following subsections.

Lexicon Repositories

A primary component of XML/EDI is its dynamic common language and syntax repository. The various type of repositories include:

XML/EDI Data Manipulation Agents (DataBots)

The central goals behind the development of the concept of DataBots are:

All these goals are realizable using XML/EDI-R.

DataBots and their associated XSL scripts provide facilities that allow XML/EDI systems to:

It should also be noted that the template method that XML/EDI DataBots implement is extremely compact and concise. This means that it is a low-bandwidth, efficient protocol, which is required to meet high volume constraints in batch EDI delivery systems.

Some additional considerations also need to be taken into account include Process Control and Object Oriented support. Process Control can be easily accommodated using through the trend towards the use of the Integrated Computer Aided Manufacturing (ICAM) Definition Language (IDEF) process modelling language or Documented Petri Nets. Developers can either assign XML tokens to IDEF entities, and then process control lines added to the template format, or IDEF can be defined as a notation that can be processed by an XML/EDI-aware browser. Object oriented support can be provided through W3C's Document Object Model (DOM), which provides a CORBA IDL definition for XML objects.

Editor's Note: To do - explain how Documented Petri Nets could be used.

In summary, the optional DataBots component provides the agent that brokers, controls, corrects, directs and ensures that the XML/EDI-R method can progress information transfers correctly.

XML/EDI Business Objects

XML/EDI business objects will be available off-the shelf, created by developers, with rule sequences devised by users. The usage of these objects can be defined by their sphere of influence. Business objects can be:

Business objects, in most but not all cases, will be invoked by the XML/EDI Data Manipulation Agents. It is anticipated that for efficiency these object manipulation DataBots will be written in Java, or using similarly dirstributed programming language tools. End-users will be supplied with tools that automatically generate the relevant agents from information provided about the application.

Below are just a few examples of the many possible classes of XML/EDI business objects:

XML/EDItors

Used for the interactive creation and completion of form-based EDI, XML/EDItors are predicated to become the front-end for business applications. XML/EDI editors will reference Lexicon Repositories to prompt users for appropriate data using XML parse trees to request related fields.

XML/EDI extensions for message stores

It is anticipated that message stores will require extensions to provide the types of complex workflow management needed to ensure the correct delivery and processing of XML/EDI messages. For example, a message store should not be able to acknowledge receipt of a message until its contents have been parsed by an XML parser to ensure that the unencrypted data stream still forms a valid message.

In time it is anticipated that message stores will mutate to use XML natively. This is not because of XML/EDI directly but because message stores that know how to identify, search for and process objects within multimedia streams or business messages will be required for a wide range of application scenarios.

Search Agents

Based on ad-hoc, learned or profiled information, search engines will recognize XML/EDI specific tagging and be able to reference suitable private and public message stores, using standard WWW interfacing, to extract data intelligently. This will allow for the best combination of free-text and fielded search. Catalogs and buyer agents will be among the first to use XML/EDI technology in this way.

Trading Partner Pages

XML/EDI will use a mix of today's X.500 technology, security certificates, "yellow pages", Email look-up, and verified characteristics of entities. This is a critical component of performing business, much more so when employing electronic means. Subsystems will undoubtedly develop along these lines: they will have to support XML/EDI interfacing of basic CRUD functions (Create, Revise, Update, Delete) as a minimum. XML/EDI Data Manipulation Agents shall be able to draw upon these resources to validate transactions.

6. The Implementation Process

Using XML for Electronic Data Interchange

The following stages are involved in using XML for the interchange of commercial EDI messages:

An application does not need to use all of the levels of processing shown in Figure 1 and the above list: it can stop at whichever level in the hierarchy suits it. For example, an application can confine itself to checking incoming and outgoing EDI messages using a document object model that has been formally defined in an XML DTD.

Identifying data sets

Identification of data sets for electronic business transactions will often be the responsibility of industry associations and various standardization bodies such as UN/EDIFACT and EBES (the European Board for EDI standardization).

Whereas existing EDI definitions are primarily concerned with the way in which a set of fields forms a message, the concepts required for XML/EDI are based more on the definition of independent classes of information that can be combined together with other classes of information to form interchangeable messages. As such the concepts are more akin to the idea of a Basic Semantic Repository (BSR) being proposed by ISO, and of the Business Systems Interconnection (BSI) proposal from University of Melbourne.

There is, however, one basic difference between using XML/EDI for defining data classes and using the BSR or BSI methodologies. In XML/EDI the order and number of subclasses of a data class can be altered by message creators without having to formally register that fact with any centralized organization. For example, if it was necessary for an application to separate building numbers or names from information about the street the building is located within, XML/EDI would allow system developers to define two new subclasses that would be combined to provide the information needed for an existing EDI address component.

One of the advantages the accrues from XML/EDI's ability to subclass fields is that such fields can be developed interactively using information supplied from more than one location. For example, telephone order processing systems in today's world of electronic business transactions often start by asking users for their postcode. This tells the system which region, town and street the user is located in, but not which building they are in. To find this out you need to ask the user for a number or name that uniquely identifies the building within the street identified by the postcode. Using these two related pieces of information it is possible to interactively complete a standardized class of information, an address, that can then be shared by an order, its delivery note, and the invoice required for settlement.

Once information has been captured once, and used to create an instance of the relevant class of data, it should not be necessary to recreate the information each time it is required. All that should be needed is that business processes that need this information reference the point at which the data was originally captured, e.g. the address associated with the order for the goods.

An essential precursor to the design process of an XML/EDI application is a study of how business processes re-utilize stored information. Where suitable business models already exist, these can be represented in XML form. Where there are no existing model, or the existing models do not meet the requirements of the trading partners for some reason, developers should perform a full analysis of the relevant business processes, and seek to identify similarities between these processes and those already formally documented for use by other applications. Knowledge of the source and contents of public repositories of resusable data segments will help to simplify this process. One of the goals of XML/EDI, therefore is to encourage the setting up of such repositories of knowledge.

To ensure that users can guarantee the long-term maintenance of data set components repositories of formal XML definitions will need to be created, and unique object identifiers will need to be assigned to each set of components. While initially testing can be done using system identifiers that resolve to Internet Unique Resource Locators (URLs), in the longer term a mechanism for identifying shared data sets using formally registered SGML public identifiers associated with URLs will need to be developed. A system for resolving public identifiers to obtain copies of the registered definitions will also be required.

Developing DTDs

Messages that pass between systems will typically conform to a previously agreed XML document type definition (DTD) that formally describes, in terms interpretable by both humans and computers, an internationally accepted message type.

Note: The structure of XML DTDs and document instances is formally defined in Extensible Markup Language (XML). A bried introduction to the components of XML can be found in An Introduction to the Extensible Markup Language. More complete information on the the structure of SGML DTDs, including those that implement the Web SGML extensions, can be found in Web SGML and HTML 4.0 Explained, which contains examples of the use of each of the constructs used in SGML and XML, and explains how these facilities are used within HTML.

Warning: The following text presumes some knowledge of SGML and/or XML.

XML DTDs can be developed by:

Declarations that form a standardized XML DTD will typically be stored in separate files, which can be referenced, as an XML external subset, by those wishing to use it through the Internet Uniform Resource Locator that its originator has assigned to a publicly available copy of the data. Alternatively, if public access is to be restricted, the document type definition can be stored as the internal subset within the document type definition sent with the message.

Where the document type definition is based on classes of information shared by more than one message, each class of information can be defined in a separate file, known in XML as an external entity, these files being referenced in a suitable sequence from within the external or internal subset of the XML DTD.

For example, an XML DTD could have the form:









%address;

%items;

This DTD fragment defines two external and one internal parameter entity, four locally defined elements and contains two parameter entity references (%address; and %items;) that call in the contents of the external entities at appropriate points in the definition. Both of the parameter entity references are preceded by explanatory comments.

Note that the source of each class of information is identified not in the call to the class itself (%address;) but within a formal definition of the data storage entities required to process the class definition references (e.g. the first two lines of the DTD). This technique allows files to be moved without having to change the main definition of the DTD.

Typically the entity definitions will be stored outside the DTD, which will contain a reference to the URL of the point at which the latest details of library file locations can be found. For example:


%library;





%address;

%items;

where %library; references a file containing the entity definitions given at the start of the previous example.

XML provides (experimental) facilities for ensuring that data modules taken from libraries do not introduce name clashes in their elements. The names of elements within each module can be qualified by a module (namespace) identifier. Each namespace identifier can be associated with a URL that uniquely identifies where the module is formally defined. For example, the contents of the library file referenced above could be defined as:













">




">

Application-specific extensions

XML permits entities and attributes that are defined in the external subset to be redefined in the internal subset. This facility allows XML/EDI users to develop locally significant subclasses. It can also be used to create subsets of messages by removing unused fields from the data model.

For example, the internal subset of a DTD based on the above standardized DTD could contain the following local redefinition for the %items; parameter entity:







">

In this case the optional item:database-key field could contain a direct pointer to the database entry from which the EAN and associated product name were obtained. This key could be used by a DataBot to process the item information without having to generate a query based on the EAN normally provided by the identifier field as the basis for a slower-to-process database query.

Creating message instances

An XML/EDI electronic business message consists of a pointer to the document type definition, any definitions required in the internal subset of the DTD, and entries for each of the fields required for the message. For example, the following document type declaration could be used to extend the external DTD shown in the first of the examples shown above, which is identified by its Internet Unique Resource Locator:







">
]>

123456

The SGML Centre 29 Oldbury Orchard Churchdown Glos. GL3 2PU
key151235 15356378797 Special Offer 16 12

Note that, because of the prioritization SGML gives to local definitions, the definition for the %items; parameter entity provided in the local subset will replace the reference to the external source for the same entity provided as part of the file referenced using the external subset.

Validating messages

XML/EDI messages can be validated by a validating XML document instance processor (known as an XML parser) to ensure they contain all required elements from the specified data set, and that the fields are in the required sequence. When the document is found to be valid the parser can generate a document tree that conforms to the rules laid down in the Document Object Model (DOM) specification that provides a standardized API between XML parsers and browsers and other forms of program.

XML elements can be assigned attributes that point to processors that can undertake relevant data validity checks. This can be done either by associating notation processors with an element, or by associating an ECMAScript specification with the element as part of an XSL "action" associated with the specific element types used in specific contexts, or with particular attribute values.

Where the XML Style Language (XSL) is not being used (e.g. because the browser does not yet support it) the basic XML language allows user-defined notation processors to be used to validate the contents of specific XML elements. This is done by adding definitions of the following form to the external or internal subset of the DTD:


...

The predefined check attribute of the EAN element will cause the contents of the element to be passed to the program identified by the declaration for the notation assigned the local name EAN-validator which is stored at the location indicated by the URL given in the notation declaration. This processor would typically pass back a message indicating whether or not the EAN is valid within the context of the relevant message.

XSL provides an alternative, and more generally applicable method that allows ECMAScript to be used to validate the contents of XML elements. Details of this method are given below under the heading "Processing messages".

Note: In December 1997 an extension to SGML allowed typed data attributes to be used in standard SGML files. As soon as this new functionality is absorbed into XML it will be possible to greatly simplify the validation of message contents.

Exchanging messages

Data captured in XML/EDI messages can be exchanged:

Where conversion into a known EDIFACT format is required the DTD can be extended to provide additional attributes that can guide the transformation process. For example, the following additional properties could be added to the list of attributes assigned to the EAN element:

Messages exchanged as XML/EDI files can be re-validated on receipt by running them through an XML/EDI validating parser. Where messages have been converted into non-XML files prior to transmission the conversion should be reversed to allow re-validation of the received message.

During re-validation any linked parts of messages should be retrieved to ensure that the full contents of the message have been checked. When re-validation has been confirmed the Document Object Model created as part of the validation process can be used to create an auditable copy of the received message in a message store/database.

Processing messages

The way in which a received message would be processed would depend on which of the available methods for exchanging messages was chosen. If the message was received in a format that provided the XML/EDI message generated by the originator, the XML Style Language (XSL) can be used to associate different processes with individual element classes so that elements can be processed by one or more local processors.

XML/EDI message instances are specifically designed to make the selection of data fields and classes at the receiver as easy as possible. Each field starts with a "start-tag" that clearly identifies the class (element type in SGML/XML parlance) of the following data or embedded subelement set, and specifies any non-default properties to be associated with the data. The end of each data element is clearly identified by an "end-tag", which consists of the name of the element (class) preceded by a slash between a matched pair of outward pointing angle brackets. Fields that contain no data, and no embedded subelements, (e.g. fields that are only present to point to other data sources) have the slash indicating their end point immediately before the last angle bracket of the start-tag rather than immediately after the first one of the end-tag. (See the example for the element above.) Classes that contain subclasses of information have embedded elements between their start-tag and end-tag.

XSL allows sets of actions to be associated with particular XML elements. Actions can be defined in terms of values to be assigned to a set of data presentation attributes (styles), or in terms of a data processing script that users can define using a define-script object . XSL scripts are defined using the ECMAScript language used for exchanging Java programming modules.

Which actions are associated with which elements can be defined using XML element sets known as XSL rules. A simplified set of style-rules allow presentation properties to be applied to element classes. Rules can be associated with elements that have been assigned a unique identifier (id) attribute or that have been assigned a particular value for a class attribute.

Sets of rules and actions can be defined in macros. Macros can be associated with style processing attributes associated with specific instances of an element. The default set of style properties defined in XSL can be extended using define-style objects

The component parts of an XML Style Sheet can be:

A typical XML/EDI XSL description will contain:

XSL actions are typically associated with the way in which objects should be presented to users. This process is typically controlled through the use of flow objects. XSL provides two default sets of flow objects, one based on the elements typically found in HTML files, and the other based on the flow objects defined in ISO/IEC 10179 (DSSSL). The set of DSSSL flow objects supported by XSL includes:

The element can be used to indicate points at which macros and scripts are to be evaluated as a result of applying a rule.

For an example of the use of XSL specifications based on the use of HTML flow objects refer to Appendix A.

Activating rules

The XML link process can be used to associate XML/EDI rules with a file. Normally the Simple Link format will be used to identify one or more files containing the relevant rules. Typically this will result in a processing instruction of the following form being added to the start of the document instance:

Appendix A1: Using XML/EDI for Book Ordering

The following statement of the current role of EDI in Book Ordering was made by the European Board of EDI Standardization by the UK Book Industry Communication (BIC) manager, Brian Green in May 1997:

"The nature of the book trade has encouraged its adoption of various forms of Electronic Commerce over the last 20 years. The introduction of a national UK standard book numbering system in the 1960's and an international standard (ISBN) in the early 70's together with central catalogues of books in print in nearly all countries was essential for an industry where even the smallest retail outlet offered customers the facility to order any one of around 600,000 books currently in print (in the UK) from 20,000 publishers with, currently, one hundred thousand new titles appearing every year. There was no hub in the traditional sense since, although WH Smith in the UK has always had a large market share, the number of book titles stocked is relatively low and they have not, until very recently, been much concerned with customers special orders.

In the late 1970's, the UK book trade set up Teleordering as a centralized ordering service using a simple non-standard order format, providing dedicated terminals on which booksellers simply keyed quantity and ISBN (their location number was installed on the form as a default). The orders were polled overnight by Teleordering and automatically routed to the correct publisher either electronically or, in the case of small publishers, by mail or fax. The bookseller received a basic confirmation of receipt of the order by Teleordering with an indication from the Teleordering database whether the book was recorded as available or out of print. Today TeleOrdering has an annual throughput of some 27 million orders, runs on PC's and is owned J Whitaker & Sons who also publish a 'books in print' CD-ROM and provide a sales data monitoring service. Teleordering has also established itself as an EDI VAN with a full range of Tradacoms and EDIFACT messages. The two services run side by side and will convert the non-standard Teleordering format orders coming from booksellers to EDIFACT or Tradacoms for transmission to publishers.

Similar services were set up in other European countries, the US, Canada etc., although the UK service has always been the largest in the world.

A second book trade EDI service, called First EDItion was set up in 1992 in the UK. This is a pure EDI service based on INS and is particularly strong in the library sector. Both First EDItion and Teleordering are being used for international trade, mainly between UK publishers and European wholesalers who, e.g. in Netherlands and Germany, operate their own dedicated electronic ordering services for booksellers in their countries. First Edition has announced that it will introduce a book trade service based on GE's "TradeWeb", which offers a forms-based Internet service linking to the GEIS VAN.

There has been an interesting 'light EDI' scheme running in the UK for the last four years. Following publication of the book trade Tradacoms messages by Book Industry Communication, the UK book trade EDI body, the major UK wholesalers, who had until then been offering dedicated electronic ordering services, decided to collaborate in a service called BUYLINE. They provided all their bookseller customers, at a nominal cost with simple forms based ordering software that links in with either the 'book bank' books in print CD-ROM or a wholesalers own stockist, enabling the bookseller to select the books required and choose their supplier from a pull down list. BUYLINE includes communications software that dials up the selected supplier and transmits the order in Tradacoms format. The software will also accept Tradacoms acknowledgments and present these to the user in a simple user-friendly format. The rights in this product have now reverted to the systems house, Triptych! ! ! , who developed it and they are extending the service to the major distributors as well as wholesalers. Their software is also included in a number of the book shop computer systems. It is generally expected that the BUYLINE system will migrate to EDIFACT and use Internet rather than direct dial up communications in due course.

A further development is the regular monthly production of multimedia CD-ROM stock catalogues by major European wholesalers. Most of these allow users to build order files and output them in EDI formats, normally using direct dial-up. It is anticipated that data compression and increased bandwidth will soon allow these facilities to be available over Internet. An important point, however, is that BIC in the UK and EDItEUR in Europe have managed to produce a consensus on the book trade implementation of the messages that ensures that all recent services use standard message formats."

BIC feel that trials of standard forms freely available over the Internet, outputting EDIFACT messages to any trading partner able to receive them, would be very helpful.

Applying XML/EDI to Book Ordering

The form shown in Figure A.1 has been designed for displaying a book order based on the EDItEUR Book Ordering Message as described in the EDItEUR EDI Implementation Guidelines for Book Trade Distribution.

Note: The use of form fields in the following table is gratuitous: it is intended to indicate that user interaction with XML displays is possible. The form was produced using beta software for an add-on to Internet Explorer 4.0 (MSXSL) released by Microsoft in January 1998 to demonstrate the power of their XML Scripting Language (XSL) proposal.

EDItEUR Lite EDI Book OrderUser options associated with order

Figure A.1: Form for displaying EDItEUR Lite-EDI Book Order Messages

Figure A.2 shows how XML is used to code a form in Figure A.1.





EDItEUR Lite-EDI Book Ordering
967634
19990308
5412345000176

0201403943
Bryan, Martin/SGML and HTML Explained
1


0856674427
Light, Richard/Presenting XML
1

Figure A.2: XML encoding of Book Order Message

A typical reaction to comparing this file with the displayed example and its EDI-based EDItEUR equivaent is "Where has all the EDI information gone?", and "Where has all the material under the table come from?". The answer is that all immutable information goes into the document type definition (DTD) referenced in the statement that starts the coding, or in the associated style sheet.

Note: The beta release of MSXSL does not support the xml-stylesheet processing instruction, and requires the use of another technique to associate the stylesheet with the document instance.

Figure A.3 shows the contents of the DTD used for this example. The single line reference to this DTD is sufficient to provide the browser with all the additional information it needs to process the message.

Note how the definition of each element defined in Figure A.3 contains attributes whose fixed values contain the prefixes and suffixes of each of the EDIFACT fields that may need to be generated in response to the messages.

The message format generated from the completed form could be a pure EDIFACT message of the type shown on Page II-2-2 of the EDItEUR EDI Implementation Guidelines for Book Trade Distribution.

Note: The beta release MSXSL add-on to Internet Explorer 4.0 is only capable of submitting form data in the form of an HTTP's Common Gateway Interchange (GCI) message format. Conversion to EDI format would require additional program modules, which are not shown in this simple example.









































%ISOlat1; %ISOnum;
]>

Figure A.3: XML Document Type Definition for Lite-EDI Book Order

The XSL style sheet used to create the displayed version of the form shown in Figure A.1 took the form shown in Figure A.4:


 

 
  EDItEUR Book Order
  
  


 Tick here if a delayed/partial supply of order is acceptable
 Tick here if Confirmation of Acceptance of Order is to be returned by e-mail
 Tick here if e-mail Delivery Note is required to confirm details of delivery

E-mail address:

Please respond in:

Book Order No: Your reference number for order Message Date: (ISO8601DateCheck(this.text)) Date in CCYYMMDD format Buyer EAN: Your unique identification number Supplier EAN: ancestor("Book-Order", this).getAttribute("Supplier") Book Supplies Incorporated ISBN: Order line reference number:
String(Number(ancestor("Order-Line",this).getAttribute("Reference-No")))
Author/Title: Quantity:

Figure A.4: XSL Processing Rules for Example Lite-EDI Book Order

The style sheet is itself coded in XML, conforming to an unidentified meta-DTD specified by the XSL protocol. The first element within this document shows how developers can define functions using the ECMAScript language embedded within XSL. This example converts ISO 8601 format dates to a form that is easier for users to check. This is a simple example of the powerful client-side functionality that can be added to XSL style-sheets.

Note: The comments in the initial function indicate some problems encountered with using the beta release of the MSXSL software.

The remainder of the style sheet consists of a set of elements that identify a sequence of actions associated with target elements. The actions create HTML flow objects that Internet Explorer 4.0 displays.

Note: It should be noted that these HTML elements have to conform to the XML syntax. This is most evident in the case of the empty line break elements, which are entered as

Other significant features that should be noted from this example include:

  1. The use of the contents of an attribute of the Book-Order element as the contents of the Supplier EAN row of the table.
  2. The call to the ISO8601DateCheck function associated with the rules for displaying the Message-Date element.
  3. The use of an attribute associated with the Order-Line element to assign information to a set of fields in the displayed form.

Users of Version 4.0 of Microsoft's Internet Exlplorer web browser will find a demonstration of the application of the above simple examples at http://www.xmledi.net/edi-test.htm

Glossary

DataBots - XML/EDI Data Manipulation Agent (a.k.a. "Bot" is a software term for a component that acts as an Agent).

XML/EDI-R - the combination of XML message syntax and rule based EDI.

Bibliography

Bons, R (1997) Designing Trustworthy Trade Procedures for EC.

To be developed