Version of 16 October 2007 (K&R 4th edn. references)
© Xamax Consultancy Pty Ltd, 2002-2007
This document is at http://www.xamax.com.au/ECOM6001/WWW.html
The home-page for this unit is at http://www.xamax.com.au/ECOM6001/index.html
This page supports the third section of the ECOM6001 unit.
This module ensures that candidates are sufficiently familiar with the various levels of Internet infrastructure. Its purpose is to lay the foundation for the many applications-oriented modules that are available within the program.
This section examines in greater detail one of the most significant application layer protocol families, those which deliver the Internet's second 'killer app' after email, the World Wide Web.
This Introduction comprises the following sub-topics:
The term 'the World Wide Web' is used highly ambiguously. It primarily refers to a service available over the Internet, but is sometimes applied to the set of servers that manage 'web-sites' and make 'web-pages' available to people using 'web-browsers'.
The Web depends on an application-layer protocol called HyperText Transfer Protocol (HTTP). This is a straightforward, stateless, request-and-response protocol, designed to enable a client to acquire a named document from a server. This first sub-section provides an informal overview of the process.
Here are the PowerPoint slides.
The following are Required Readings for this sub-topic:
The following are Strongly Recommended Readings on specific sub-topics. They should be at least skimmed, in order to provide greater depth of appreciation of the technology than can be gleaned from a few slides within a single lecture:
HyperText Transfer Protocol (HTTP) is an application-layer protocol running over TCP/IP, that enables client software to request download of named documents.
An HTTP GET request is sent by the client to the server, usually at port 80, to nominate the document required. The server replies by sending the requested document, or an error-message.
In addition to the GET request, two other so-called 'methods' exist:
Here are the PowerPoint slides.
The following are Required Readings for this sub-topic:
The following are Strongly Recommended Readings on specific sub-topics. They should be at least skimmed, in order to provide greater depth of appreciation of the technology than can be gleaned from a few slides within a single lecture:
The following are Further Readings on this sub-topic:
HyperText Markup Language (HTML) is an open format definition, understood by web-browsers, which specifies hypertext documents for publication on the World Wide Web. It is based on a complex and powerful formatting language called Standard Generalised Markup Language (SGML). HTML has existed in several versions, most importantly 2.0 (1994), 3.2 (1996) and 4.01 (1998).
HTML documents are plain-text / ASCII files that can be maintained using any text editor, or with word-processing software if saved-as 'text only'. HTML-specific editors include Macromedia Dreamweaver, Adobe PageMill, Claris Home Page, and Microsoft FrontPage.
A major problem with HTML has been that designers have used it as though it was a display-formatting language, whereas it was intended as a document-structuring language. In order to provide a more effective display-formatting capability, Cascading Style Sheets (CSS) has been developed. The level of compliance of the main browsers with CSS is, however, only moderate.
eXtensible Markup Language (XML) is the intended (near-)future format for structured documents and data on the Web. "XML is actually a 'metalanguage' - a language for describing other languages - which lets you design your own customized markup languages for limitless different types of documents. XML can do this because it's written in SGML, the international standard metalanguage for text markup systems" (from Peter Flynn's XML FAQ). The equivalent to CSS in the new context is eXtensible Stylesheet Language (XSL).
The eXtensible HyperText Markup Language (XHTML) is a bridging format, intended to move HTML in the future direction of XML.
XML has been exploited in a number of additional standards that address particular needs. For example, XML data schemas for mainstream business documents such as invoices are provided by the Universal Business Language (UBL) standards.
Here are the PowerPoint slides.
The following are Required Readings for this sub-topic:
The following are Strongly Recommended Readings on this sub-topic:
The following are Further Readings on this sub-topic:
The following provide some guidance in relation to the design of web-sites and web-pages:
Web-browsers have a number of features that enable processing to be performed within the client. All of these features are dangerous and error-prone, and as a result are disabled by a proportion of users, and are blocked by many firewalls.
Cookies are not, strictly speaking, a form of processing, but are data stored on a user's machine, as a result of a web-server instructing the web-browser to do so. Each time a user requests a web-page, the web-browser checks whether a cookie exists that has been designated to be sent to such a web-server. If so, the browser transmits the data to the web-server, along with the GET request for the page. The server is therefore able to use the data in the cookie in order to 'remember' something about the user. This may be done with or without the consent of the user. And it might be something useful, such as the user's password (hashed, for security); or it could be a privacy-abusive transfer of personal data, or of a consumer id that enables a club of marketers to build up a profile of the user's behaviour.
Javascript is an extension to the HTML specification, implemented in almost all browsers in one form or another, which enables the web-page designer to cause the web-browser to perform some kinds of processing. It is very effective when used to check whether a form that the user is about to send to the server contains all of the required data. It is used for a vast array of additional purposes, however, many of them ill-advised, and some of them quite harmful to the user's interests.
A plug-in is a program that can be downloaded and added into a browser, in order to extend the browser's capabilities. Examples include the Adobe, Quicktime and Flash plug-ins, which enable a variety of additional formats to be displayed in the browser window. Plug-ins originated with Netscape, and are supported by some other browsers, such as Opera. They were supported for a time by IE, but not from version 5.5 onwards, after which equivalent ActiveX components are needed.
Java is in many respects just another programming language. Many browsers, however, incorporate a Java Run-Time Interpreter (called a 'Java Virtual Machine'- JVM) or a Just-in-Time compiler. (In the case of the latest version of IE, it has to be downloaded and installed). A Java program designed to be downloaded to and run by a browser is called a Java applet. Java applets are meant to be secure, reliable and robust. They are meant to play only within a limited 'sandbox', rather than having access to the whole workstation. There are doubts about those assurances, however. Java applets provide a powerful client-side capability. But, depending on how they are used and how well they are designed and constructed, they may be dangerous and invasive.
"ActiveX is a Windows component technique used to run arbitrary code in other programs (such as a browser). While this is a very powerful programming technique, it is also a potential security nightmare as access is given to the operating system as well as the browser itself. ... Netscape type plug-ins (NP4) create a much smaller, but not entirely negligible security risk" (Opera and Security, viewed July 2003). Like a Java applet, an ActiveX 'control' can be embedded in a Web page. ActiveX controls are distributed as executable binaries, and must be separately compiled for each target machine and operating system. ActiveX therefore enables a variety of languages to be used to produce code roughly comparable with Java applets. ActiveX is implemented primarily by MS's own browser Internet Explorer (although the Mozilla open-source-browser community is working on support for it). Because ActiveX is not restricted in relation to its access to the workstation's resources, it is much more threatening than Java.
Here are the PowerPoint slides.
The following are Required Readings for this sub-topic:
The following are Strongly Recommended Readings on this sub-topic:
The following are Further Readings on this sub-topic:
Web-browsers have quite limited capacity to process data. So if processing is to be performed on data, it probably needs to be done by the server, not the client. This depends on some extensions to the mark-up language, usually referred to as 'web-forms'.
"A web-form is simply a web page with some additional markup tags to instruct a web browser how to display the various form elements, such as checkboxes, selection lists, buttons and user-editable text areas. However, the web page itself does not process the data, nor does the web server, which doesn't know what you'd like to do with the user's answers. A separate program or script, must process that data, in whatever way you wish. ...
"CGI (Common Gateway Interface) is the language or protocol that the browser uses to communicate the data from the form to the web server. When the user submits her answers on a form, the browser bundles them up and sends them to the web server [usually by means of the POST method, although a GET request can also be used], which passes them on to your script for processing. A CGI script is any program which knows how to read that bundle of data" (Sanford Morton's Tour).
In principle, the script that processes the contents of the web-form can be written in any programming language. Some that are much-used for this purpose include Perl, PHP, C and Java, plus proprietary languages and tools such as MS VisualBasic and ASP, Applescript, and Cold Fusion.
Server Side Includes (SSI) are directives that are placed in an HTML page, and processed by the server before the page is despatched to the client. They enable content to be dynamically generated, depending for example on the date or time, the IP-address-range, or the browser-version.
Here are the PowerPoint slides.
The following are Required Readings for this sub-topic:
The following are Further Readings on this sub-topic:
In the preceding sub-sections, a variety of server-side and client-side processing alternatives were introduced. As was no doubt apparent, these were all developed opportunistically, without any overall architecture or grand design. The available tools suffer a wide range of deficiencies.
A number of initiatives are under way, that are intended to produce a coherent and cohesive framework for collaboration between web-browsers and the web-servers and back-end processing and database servers that are run by organisations and other people. These initiatives are generally referred to using the broad title of 'web services'. The term is defined in various ways by various authors and companies, and there are many problems with many of the definitions on offer. The buzz-phrase 'service-oriented architecture (SOA)' has also 'come into vague'.
The longest-standing family of initiatives is centred on the Java programming language, and additional components such as J2EE and EJB. Common Object Request Broker Architecture (CORBA) attempts to achieve similar objectives.
The early versions of the Microsoft DCOM and .NET cluster of development tools represents another overlapping family.
Some of the framework surrounding these technologies is being provided by collaborative standards-setting processes, such as ebXML and UBL. The coordination challenges are huge, and the ability of industry to bring the ambitious plans to fruition has to be in some doubt.
There are many frameworks for web services that are competing for attention, mind-share, and (they hope) market-share. All of them are built around open protocols. Among them are:
Underlying the whole idea are protocols for invoking code using 'remote procedure calls' expressed as messages structured using XML and communicated over HTTP. One contender is the XML - Remote Procedure Call (XML-RPC) protocol, but the dominant one is Simple Object Access Protocol (SOAP). Each SOAP message is a one-way transmission of an HTTP message from a sender to a receiver, which requests that a program be executed. An additional facility is the Universal Description, Discovery and Integration (UDDI) initiative, which is intended to provide a 'yellow-pages' directory for web-services. A complementary protocol is Web Services Definition Language (WSDL), which enables the directory entries to be composed.
Each of the various web-services initiatives has the intention to deliver a comprehensive and cohesive framework within which applications can be delivered that involve collaboration between a client and a server (or, indeed, among one or more clients and multiple servers). Moreover, the hope exists that there will be many common services, so that applications can build on existing elements rather than each application team having to re-invent the wheel.
One early attempt at a common web-service was Microsoft Passport. This was originally devised for use with Hotmail. But its use was extended to other Microsoft services, and then to third-party services such as eBay. It is a user authentication mechanism intended to support single-signon by a remote browser-user to multiple applications. It was quite seriously privacy-invasive. It was also seriously threatening to Microsoft's competitors, and another, less closed scheme is in development, under the auspices of the cutely-named Liberty Alliance. The term 'identity management' has been coined to refer to this class of web-services.
Here are the PowerPoint slides.
The following are Required Readings for this sub-topic:
The following are Further Readings on this sub-topic:
The following are Further Readings on this whole topic:
Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor in the E-Commerce Programme at the University of Hong Kong, a Visiting Professor in the Baker & McKenzie Cyberspace Law & Policy Centre at the University of N.S.W., and a Visiting Professor in the Department of Computer Science at the Australian National University.
Go to Roger Clarke's Home Page.
Go to the contents-page for this segment.
Created: 7 July 2002
Last Amended: 16 October 2007 (Kurose & Ross 4th Ed. page-references)
Xamax Consultancy Pty Ltd, ACN: 002 360
456
78 Sidaway St
Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 1472,
6288 6916
Roger.Clarke@xamax.com.au