Chapter 2, The Internet and World Wide Web

Discovering Computers 2004

Modified 11 May 04 1526 hrs.

Topics

The Internet: World wide collection of interconnected networks that use TCP/IP.
Network: Collection of 2 or more computers connected together via communications devices and media.
Uses: information resource, communications, financial, commerce, passive entertainment, active entertainment, education, document processing.
History: 
A Brief History of the Internet http://www.isoc.org/internet-history/brief.html, Barry M. Leiner, Vinton G. Cerf, David D. Clark, Robert E. Kahn, Leonard Kleinrock, Daniel C. Lynch, Jon Postel, Larry G. Roberts, Stephen Wolff
 
Computer networks have been around for a long time.  More important that its past, the future will bring tremendous capability, with both great opportunity and perhaps chaos.  We are still laying foundations for its use, management, and funding.  Legislators are still considering its taxation and permitted use.  This is the point in history that knowledgeable voters need to be thinking about how this resource can serve the public, and communicating those thoughts to government officials.
AUTODIN (mid 1960s to 15 SEP 2000)
ARPA Net (1969) to Internet: telnet/ FTP.  Goal: reliable, robust digital communications network
BitNet: "Because It's Time Network", was created in 1981 and is operated by the Corporation for Research and Educational Networking (CREN).  [Encarta Encyclopedia online]  {BitNet was well established and in use between University of North Florida (Jacksonville) and University of Florida (Gainesville) by 1981.}
NSF Net (1985 - 1995): gopher. Network with 5 supercomputers.
Internet: 1986
Internet 2 (I2), (academic). Goal: Develop and test advanced technologies before being placed into general use.
National Science Foundation
190 universities
>60 companies
US Government
Next Generation Internet (US Government)
Department of Defense
Other Networks: Global Multiprotocol Open Internet, Bitnet (predates Internet), Internet, Earn, NetNorth, GulfNet, UUCP, FidoNet, OSI, CompuServe (predates Internet), Prodigy.
Packet-Switching Concept
Goals of packet switching: 
speed: Transmit packets in parallel.
reliability: Reroute around disabled links.
security: Randomize path selection to decrease vulnerability in transit.
Transmission Control Protocol / Internet Protocol (TCP/IP) , a packet-switching protocol.
Protocol: A set of rules and procedures.  Protocols are followed in medical procedures, politics, and communications systems.  Example: telephone conversation.
Packet switching network View this presentation online if you did not see it in class, or if you missed class!
Streaming Concept
Send, receive, buffer, play, discard (bitbucket) -> streaming.  
Packets are made available to the user upon packet arrival rather than waiting for complete file transmission.
No beginning or end of message is necessary. Can transmit continuous broadcast, such as a radio station, over the Internet.  Example, WTOP (Washington, DC news station) http://www.wtop.com/listenlive.shtm 
Navigating the Web http://www.usna.edu/Library/Navweb.htm 
Network terminology
node: A terminal or connecting point in a network.
backbone: Long distance links between local networks.
host: A network device that is assigned an IP address.
traffic: Flow of messages on a network. Think of traffic on the Interstate highways.
Connecting to the Internet
Rapidly changing market. Shop around. Avoid long term contracts. Beware of buying a computer at a discounted price that requires an agreement to use a particular ISP for several years.  Your overall cost of computer plus service will be higher that just paying for the computer and separately shopping for service.
Connect via LAN or via modem (dial=up access by telephone or cable) to a Point of Presence (POP) of an Internet Service Provider (ISP).    Advanced Internet Technologies (AIT).
An Online Service Provider is an ISP that charges more money to include additional services, presumably of additional value.  America Online (AOL), Microsoft Network (MSN).
A Wireless Service provider (WSP) is an ISP that permits wireless connection to a POP.
ISDN (Integrated Services Digital Net0owrk), DSL (Digital Subscriber Line, Sprint), CATV, T1, T3, modems, etc.
Road Runner: Time Warner Cable
How Internet works
Client / Server
Client software, such as a web browser or email client, issues task requests to a server.
Server software, such as a web server, print server, file server, or email server, provides services to clients.  Services may include access to hardware (expensive printer,  computer, external storage), software, data, or communications.
(Components: wires and cables, hub, concentrators, bridges, switches, routers, signal processing issues.)
Open System Interconnection (OSI) Model: OSI_Model.pdf [People going into networking should download this table.]
Internet Protocal (IP) addresses
Logical address
Internet Protocol (IP) address uniquely identifies a computer on the Internet.
IPv4  (dotted decimal form: 255.255.255.255) Each number is 8 bits, called an octet, and can have a value from 0 to 255 decimal.  The 32 bit address size limits the number of addresses available, which is why the system is being modified.  This is similar to the problem of running out of area codes in the world of telephone communications.
IPv6 Internet Protocol (IP) address (form: A0 B1 C2 D3 E4 F5 06 17) is a 128-bit address.  This system will solve the address assignment problem for the next decade.  IPv4 is a subset of IPv6.
IPv6 is already in use in some locations.  USA is slow to adopt.  Changing systems is a capital investment and training issue.
Internet Corporation for Assigned Names and Numbers (ICANN) assigns and manages IP addresses.
Hosts permanently connected to the Internet have "permanent" (static) IP addresses.
Dial-up users are assigned an IP address out of its allocated IP address pool for temporary use while connected. This strategy is called a dynamic IP address.  If you disconnect and later reconnect, you will probably have a different IP address from that pool.
Local Area Networks might manage its own internal addressing scheme in a way that does not require terminals to have an IP address that is unique over the Internet.
To find your current IP address while online:
WIN 2000 and WIN XP: type cmd to display the MS-DOS window. Type ipconfig and press Enter.
WIN 98: Select "Run" from the Start menu. Type winipcfg and press Enter.
Media Access Control (MAC) Address
Physical address or hardware address.  48 bit code, 12 hexadecimal digits. These address codes are necessary, and are burned into network devices by the manufacturer.  Network cards have MAC addresses, for example.
IEEE assigns the first 6 hex digits to identify the manufacturer.  This is called the Organizational Unique Identifier.
The manufacturer assigns the last 6 hex digits.
Domain Name System
Domain Name Server: table look-up by domain name to produce an IP address
Top level domain: .com, .gov, .edu, .mil, .net, .org; transition: physical  -->  logical  -->  arbitrary
New (2001) top-level domain names:
.shop  .mp3  .inc   .kids     .sport  .family  .chat  .video  .club  .hola  
.soc    .med  .law  .travel  .game  .free     .ltd    .gmbh  .tech  
.museum  .biz  .info  .name  .pro  .aero   .coop
another one devoted to pornography ( .smut, .crud? )
Ports and Port Numbers:  A port number is a way to identify a specific process to which an Internet or other network message is to be forwarded when it arrives at a server. http://whatis.techtarget.com/WhatIs_Definition_Page/0,4152,212811,00.html 
System Port Numbers:  ftp://ftp.isi.edu/in-notes/iana/assignments/port-numbers . Ports are used in TCP [RFC793] to name the ends of logical connections which carry long term conversations. For the purpose of providing services to unknown callers, a service contact port is defined. 
The contact port is sometimes called the "well-known port". The Well Known Ports are assigned by the IANA and on most systems can only be used by system (or root) processes or by programs executed by privileged users. The range for assigned ports managed by the Internet Corporation for Assigned Names and Numbers (ICANN) is 0-1023.
The Registered Ports are listed by ICANN and on most systems can be used by ordinary user processes or programs executed by ordinary users.  Registered Ports are in the range 1024-49151.
The Dynamic and/or Private Ports are those from 49152 through 65535.

Protocol Number: ftp://ftp.isi.edu/in-notes/iana/assignments/protocol-numbers

4 IP: Internet Protocol

5 ST: Stream

6 TCP: Transmission Control

17 UDP: User Datagram

21 FTP: File Transfer Protocol

45 IDRP: Inter-Domain Routing Protocol

46 RSVP: Reservation Protocol

47 GRE: General Routing Encapsulation

80 HTTP: Hypertext Transfer Protocol

103 PIM: Protocol Independent Multicast

110 POP3: Post Office Protocol Version 3

WWW
The World Wide Web is a collection of Web servers that use the Internet for communication.
The World Wide Web is NOT the Internet.  WWW is a user of the Internet.
W3C is the authority for the World Wide Web (WWW = W3  =>  W3)
Web, internet, hypermedia informal glossary of terms: http://www.w3.org/Glossary 
Web architecture terms: http://www.w3.org/Architecture/Terms 
Uniform Resource Identifier (URI): 
generic set of names and addresses for referring to resources http://www.w3.org/Addressing/schemes.html 
The generic set of all names/addresses that are short strings that refer to resources. http://www.ietf.org/rfc/rfc2396.txt
The following examples illustrate URI that are in common use.

ftp://ftp.is.co.za/rfc/rfc1808.txt
-- ftp scheme for File Transfer Protocol services

gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles
-- gopher scheme for Gopher and Gopher+ Protocol services.  Gopher is deprecated. Very few Gopher servers are still supported.

http://www.math.uio.no/faq/compression-faq/part1.html or
http://131.122.220.30/Library for the US Naval Academy Library (great site!)
-- http scheme for Hypertext Transfer Protocol services

mailto:mduerst@ifi.unizh.ch
-- mailto scheme for electronic mail addresses

news:comp.infosystems.www.servers.unix
-- news scheme for USENET news groups and articles

telnet://melvyl.ucop.edu/
-- telnet scheme for interactive services via the TELNET Protocol.  This is the usual method for connecting to a supercomputer.
Uniform (or Universal) Resource Locator (URL): access method, hostname, port number,
directory name.
URL is an informal term (no longer used in technical specifications) associated with popular URI (Uniform Resource Identifier) schemes: http, ftp, mailto, etc. http://www.w3.org/Addressing/Overview.html#URL94 
Universal was the original definition of choice but was deemed by most to be too ambitious, and the more frequently used Uniform was instated by the now-defunct URI Working Group. http://malaysia.cnet.com/Briefs/Glossary/Terms/url.html 
A Uniform Resource Locator (URL) is a compact string representation of the location for a resource that is available via the Internet. [Masinter, Alvestrand, Zigmond, Petke, "Guidelines for new URL Schemes", Network Working Group, The Internet Society (1999). http://www.ietf.org/rfc/rfc2718.txt ]  
File URI, for files on an ftp server: "file://ftp.ABC.com/directory_path/myfile.txt"
Gopher URI, for files on a gopher server (now considered obsolete): "gopher://gopher.myschool.edu:9876/" where "9876" is an example port number. 70 is the default Gopher port number.
USENET newsgroup: news:alt.computing
HTTP, used for web pages: "http://www.myhost.com:9876/directory_path/mypage.html", where "9876" is an example port number. 80 is the default HTTP port number.
Uniform Resource Name (URN): persistent, location-independent resource identifiers
HyperText Transfer Protocol (http:// )
Web server, webmaster, web administrator
A web server resides on a host computer and is assigned a URI This server is software that that responds to http protocol requests from Web clients. These requests include requests to transmit files and perform related tasks, and provides error messages when requested files are not available.  It is not unusual for a host machine to host several servers, such as Web, ftp, mail, news, chat, simultaneously.
A web page is an electronic document that is written in Hypertext Markup Language (HTML).  A web page may include files of other types by reference, such as graphics or sound.  A web page may also provide access to other files by hyperlinks.  
A web site is a directory accessible to a web server that has assigned to it the name of a file as a default web page called a home page.  A web site usually contains one or more web pages that are linked together and related data files.  Even if such a file is not an HTML file, it is transmitted to the client making the request.  In addition to files that may be referenced (directly or indirectly) by a home page, a web site may contain other files.  Sometimes non-referenced files are placed on a web site for document sharing between people who know what file names to ask for without having to use hyperlinks.  Non-referenced files may also be used for related tasks, such as capturing and recording product order information, or grading on-line quizzes.  Designing the directory structure and access controls for these files is a task requiring knowledge of the server, host operating system, and attention to detail.
A home page is a file on a web site with a specific file name specified by the web server administrator.  Often, the name is "default" or "index".   The default file name is appended to a received URI if the incoming request does not specify a file name.  A home page usually acts as an index to other documents on a web site.  While a usual and nice feature of most web sites, a home page is not a necessary component of a site accessible via the World Wide Web.
WWW server software: http://www.w3.org/Servers.html 
A popular free good server is Apache which runs under UNIX.
Web Browser
A Web browser is the client software in the World Wide Web client-server system.  This is the software used by people to issue requests to retrieve files from Web sites, view Web pages, and do related tasks.  The activity of issuing requests is called surfing the Web.  The client browser determines if, and how, a received file is handled locally.
Mosaic. Written by National Center for Supercomputer Applications (NCSA). Progenitor of Netscape.
Internet Explorer
Netscape Navigator
Opera http://www.opera.com (Smaller, loads faster than Netscape or Internet Explorer)
Who is using what browsers and operating systems
http://www.ews.uiuc.edu/bstats/latest-month.html
http://www.netatlantic.com/traffic/agents.htm 
Browsing the Internet
FTP: the earliest method, and still the fastest method for retrieving a file if you know where it is.
Gopher. Text based. Very few Gopher sites are still actively maintained.
Connecting to the Internet and starting a browser.
Uniform Resource Identifier (URI): HTTP, domain name: IP address, path, file name.
Hyperlink
A reference that can request a file on a web site, if the hyperlink is activated.
Sometimes can locate a position within a file.
Navigating Web pages using hyperlinks.
Searching for information on the Web.
Kinds of Search Engines
Catalogs
Good for broad searches of established sites
Yahoo: http://www.yahoo.com 
Search Engines
A search engine is a computer program with an associated database of key words and web addresses.
A response from a search query is a list prepared by a search engine from its own database and transmitted to you.
Search Engines update their database to increase its usefulness or profitability.
Commercial: Alta Vista, Infoseek, Yahoo, Excite, Lycos
Search results are heavily influenced by advertising.  For a fee, you can get your site listed closer to the top of a search result.
For information searches, particularly for academic use, you must view many pages of responses before you start viewing ones that are not given preferential position in the listing based upon that site's fee paid to the search engine owner.
Lycos ftp search http://ftpsearch.lycos.com/?form=normal 
20 Feb 2002, 0708 hrs, NPR News: AltaVista announced it is dropping its free email account service and focus on profits from its search engine business.
Academic ftp search: http://sunsite.cnlab-switch.ch:8000/ 
Veronica: Text based. Predates the WWW. Used to search Gopher sites.
Archie is shut down, and therefore unsupported and obsolete.
Metasearch Engines
Often search smaller, less well known search engines and specialized sites.
DogPile, Metacrawler
Terminology
search text, key words
spider, crawler, bot [buzz... ]
hit, directory
Types of Web Pages (You could come up with your own classification system. The list is not exhaustive.)
Advocacy, business and marketing, information, news, portal, personal
Multipurpose Internet Mail Extensions (MIME) Types, Multimedia: 
graphics: JPEG, GIF, PNG
animation: animated GIF
An animation is originated in the mind of its creator. It is a sequence of static images displayed quickly, much in the same manner as a movie on film.  It is not a continuous image reproduced automatically from a continuous recording process.
A motion picture is a sequence of still pictures displayed rapidly, such as seen at a movie theater.
A video is a continuous recording of the scan of an image, such as a video tape.
audio: WAV, AU, MP3 (MPEG Layer 3)
video: MOV, AVI, MPEG
File compression [computer majors should know this]
Text: 
exact reconstruction
remove white space, compress duplication, encode
index words, make dictionary of duplicate words, replace duplicated words with code or links.
Static Images: 
exact reconstruction: bitmap, compress duplication
parameterization: extract and encode geometric shapes
approximation: transforms (Fourier).  Any approximation method removed information which cannot be restored.  A goal is to have an approximation that retains information you are interested in, and omits information (such as noise) that you are not interested in.
Moving images, sound: not an easy problem.  Can compress and send packets.  Still want to have smooth reception.  Cannot hold all data. 
approximation: Wavelet transformation for compression.
transmit just changes in image after initial image is sent.
Smart Download
Packets stored.  Local record kept of which packets are received.
Interrupted download can be restarted, fetching next packet in sequence.
Very good for downloading very large files.
Coding: HTML, XML, Java, ActiveX control
Web Page Design Tutorial http://www.officeport.com/enrich/webdesign/slide1.htm 
NCSA (at UIUC) Beginner's Guide to HTML. http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimer.html  
See "Bare Bones HTML" at end of lesson plan.
XML: http://www.biztalk.org  
JavaScript Reference: http://developer.netscape.com/docs/manuals/js/client/jsref/index.htm 
Java Platform 1.1 API Specification: http://www.javasoft.com/products/jdk/1.1/docs/api/packages.html 
Interaction
Who bears the computational burden? client versus server.
HTML: HyperText Markup Language.
See bottom of this lesson plan for a simple example.  Most kids graduating from public high schools now have programmed in HTML.
Static, no interaction other than hyperlinks
HTML transactions are memoryless.
Apparent "memory" is imbedded in hidden statements retransmitted back to you.
XML: Extensible Markup Language is the universal format for structured documents and data on the Web. http://www.w3.org/XML/
CGI: executed on the host server
NCSA CGI Primer http://hoohoo.ncsa.uiuc.edu/cgi/ 
Executable, security problems, fast:  C++, Fortran, Cobol, assembly language
Interpreted, more secure, slower: Perl, TCL, Unix shell, MS Visual Basic, Applescript
Collecting data centrally, interaction with other software, particularly e-commerce.  Multiplayer games.
Java: an interpreter, executed on the client computer
Demonstrations, animation, educational software for exercises
Java will significantly slow the responsiveness on a slow computer.  Its use should be carefully planned so that it adds real value to the interaction when it is used.
Teleconferencing, Internet telephone service
Data is compressed.
Data is partitioned into packets.
Packets are transmitted over internet.
Packets are received, reassembled, and displayed.
Virtual reality, VRML
Medical world: extend expert knowledge to field hospitals globally
Star Trek holodeck
"Push" technology
Preposition files on servers closer to users to make them more quickly available when demanded. This is similar to the grocery store ordering and stocking food on shelves close to your home. The gasoline distributors attempt to do this.
"Push" technology: Viewable online and later off-line.
Good for stock brokers and traders
News junkies
Military command and control force status information
Webcasting
Webcasting is the use of Internet for broadcasting by using the streaming protocol.
Webcasting, combined with "Push" technology, is used to distribute syndicated programs for radio broadcast.  This ensures programs are available ahead of time and not as much at risk for delay due to communication systems disruptions.
audio and video programs: CNN, Focus on the Family
Security, privacy
authentication, firewalls
https: use of http over secure sockets layer
Electronic commerce
EDI: Electronic Data Interchange
electronic money
CA: Certificate Authority
digital certificate
Web publishing
Five major steps to Web publishing:
Plan the web site: Purpose, characteristics of people you want to visit the site, how to differentiate your site from similar ones.
Analyze and design the web site: 
Layout: text, graphics, audio, video, virtual reality.
Do you have the resources to meet your design requirements? Equipment, software, training
Create the web site
Fast coding with inefficient and slow loading HTML can be done with HTML editors.
Adobe GoLive, Macromedia Dreamweaver, Macromedia Flash
MS Front Page, Lotus FastSite.  These can do most of what you would want as a beginner.
HTML editors: Arachnophilia (free), Hot Dog
Many productivity applications now can generate HTML. MS Office 2000 components can generate web pages that are functional and convenient, at the expense of large file sizes and inefficient code.
Cut out unnecessary HTML code manually with text editor. Can generate some things easier manually.
Deploy the web site
Issues: Passive versus Active sites (my terminology).  Do you need to capture responses, maintain an automated database, do computations, and interact with customers?  Do you need to manage financial transactions?  These require more sophisticated permissions and protections than passive websites that display static content.  You may need to get advice and services from a company such as Verisign, http://www.versign.com 
Web hosting at
free: http://www.oocities.org
low cost: http://www.bravenet.com/samples/mybravenet.php 
professional: Do your homework, and shop around.  If you are a business, you need professional service.
Maintain the web site
Webmaster, Web Administrator, Server Administrator
Email: Exchange of text messages and attachments
Mail server: POP3 server, holds email
Email program, address book, mailbox
Email address: UserName@DomainName
SMTP: protocol for message format and addressing.
POP3: Post Office Protocol 3, for retrieving email from a mail server.
Free email at http://www.hotmail.comhttp://www.yahoo.com.
Instant Messaging: notifies you when people are on line and allows you to exchange messages or files.
FTPhttp://hoohoo.ncsa.uiuc.edu/ftp/faq.html 
File Transfer Protocol
Fast and efficient download of files from an FTP site.
Anonymous ftp.  Usually use your email address as the password to log on.
telnet (remote terminal emulation): 
Connect as a terminal to a remote computer
This is the usual way to use a supercomputer
Bulletin Boards and Discussions
USENET: newsgroups, news server, news reader
mailing lists, list servers.  "Majordomo" is a popular list server.
postings, thread
Forum: moderated, unmoderated; discussion threads
chat rooms, instant messaging, chat client and chat server
Portal
Netiquette = Network Etiquette: 
Conserve bandwidth.  Be polite.  Avoid generating flame wars or spam.
Do not assume material is accurate or up to date.  Be forgiving of others who innocently pass on false material when it was not intended to cause harm.
If you must be critical, check your facts first.
Assume anything you say on the Internet is public, and can and will be used against you, perhaps even in a court of law.  This is not a forum for truly private conversations.  Use a password protected forum to increase the privacy.  Use encrypted communications to greatly increase privacy.
Cookies: 
Small static files placed on your computer by a web server. 
Contrary to urban legends, cookies cannot transmit a computer virus.
Issues
Porn and stealth URIs: Whitehouse.gov versus Whitehouse.com (a similar site with a different top level domain name), scsite.com versus scite.com (a similar site with one letter dropped).

Bare Bones Basic HTML

<html>
<head>
  <title>
Here is a title for the Title Bar</title>
</head>

<body>
  <p>
Here is a simple web page.</p>
  <p><Center>
Here is centered text.</Center></p>

  <p><B>Here is bold text.</B><br>
  <I>
Here is italic text.</I><br>
  Normal size.&nbsp;
<big>Bigger size.&nbsp; <big>Even gibber.&nbsp;
  <big>
Neve bigreb.</big></big></big></p>

  <H1> Heading 1 Text </H1>
  <H2>
Heading 2 Text </H2>
  <H3>
Heading 3 Text </H3>
  <H4>
Heading 4 Text </H4>
  <H5>
Heading 5 Text </H5>
  <H6>
Heading 6 Text </H6>
  <Normal>
Normal Size Text </Normal>

  <p>Here is an enumeration.
  <ol>
    <li>
List Item one.</li>
    <li>
List Item two.</li>
    <li>
List Item three.</li>
  </ol>

  <p>Bullets
  <ul>
    <li>
Bang 1</li>
    <li>
Ouch 2</li>
    <li>
Bandaid 3</li>
  </ul>

  <p>Now, a table with 3 rows and 4 columns.</p>
  <table border=
"1" width="100%">
    <tr>
      <td colspan=
"4">Row 1, spanning all 4 columns</td>
    </tr>
    <tr>
      <td>
Row 2, Column 1</td>
      <td>
Row 2, Column 2</td>
      <td>
Row 2, Column 3</td>
      <td>
Row 2, Column 4</td>
    </tr>
    <tr>
      <td>
Row 3, Column 1</td>
      <td>
Row 3, Column 2</td>
      <td>
Row 3, Column 3</td>
      <td>
Row 3, Column 4</td>
    </tr>
  </table>

  <p>Here is <a href="here_is_a_page_to_reference.htm">A page</a> that is referenced.</p>

  <p>A picture.</p>
  <p><img src=
"Gridlock.gif" width="500" height="300" alt="Gridlock.gif (11281 bytes)"></p>

</body>

</html>