A Guide to Understanding URLs
- Working with Uniform Resource Locators on the Internet
by Michael R. Irwin, copyright 1994-1997
Contents --
1, Introduction.
2. What Is a URL?
3. Computers on the Internet
4. Parts of a URL
5. Types of URLs
6. Bookmarking URLs
7. Errors and URLs
8. In Summary
note : NO PICTURES ARE INCLUDED WITH THIS VERSION
===============
1. Introduction
===============
This document is written as an aid to assist people in understanding and
working with URL addresses. It is not a paper on the mechanics of using
URLs; rather it focuses on explaining URLs and how people use them when
working on the Internet. Specifically:
How to go from one location to another on the Internet.
The URL concept is really pretty simple, as you will learn. This guide
is just a quick tour through some of the more common URL types and will
allow you to be working with and understanding URLs in a variety of
context very quickly.
(c) copyright by Michael R. Irwin, 1994-1997
=================
2. What is a URL?
=================
URL is the acronym for something called a Uniform Resource Locator. It
is known as an address that is used on the Internet. Every time you
want to view or get a file on the World Wide Web, you need to access the
file via its URL.
A URL is like your complete mailing address: it specifies all the
information necessary for someone to address an envelope to you.
However, they are much more than that, since URLs can refer to a variety
of very different types of resources. A more fitting analogy would be a
system for specifying your mailing address, your phone number, or the
location of the book you just read from the public library, all in the
same format.
In short, a URL is a very convenient and succinct way to direct people
and applications to a file or other electronic resources. Learning how
to interpret, use, and construct URLs will greatly assist your
exploration of the Internet.
The idea behind URLs is actually a good one -- create a universal system
for accessing information on the Internet, no matter if is a single
document (HTML page) on a server, a file on an anonymous FTP site, a
query from a database, an entire gopher server, or even a Web images.
In other words ... “it it’s out there, you can point to it!”
Unfortunately, that means that to access files in the World Wide Web,
you have to get use to seeing, and typing, things like:
http://www.germany.eu.net/books/eegtti/eegti.html
This is the actual Web address for a great paper, electronic book, named
the “Everybody’s Guide to the Internet”.
Where do you use URLs
Whenever you work with your browser, you will use URLs. When you want
to go to a specific resource of the Internet, you would type in the name
of the URL. For instance in Netscape, you may want to go directly to
yahoo, a search engine on the Internet. To do this, you would type in
the name of the URL in the Location field of Netscape (below the menu
and main program buttons. It would look something like this:
*** PICTURE WOULD BE INSERTED HERE
Most browsers have a similar location for typing in the name of a URL
you want to go to. Once you type in the name of the URL and press
enter, the browser will connect to the Internet and attempt to locate
that URL. If it can find it, it will go to it and display the URL or
offer to download it, if the browser does not recognize the format.
----------------------------
Comparing URLs to file names
----------------------------
The URL above is the address (location and filename) for a specific
document. When the URL concept was introduced, users of the Internet
agreed that a single methodology needed to be created that would allow
anyone on the Web to find and access anything on the Internet.
To understand what a URL is and how you use them, let us compare a URL
to a filename on a computer.
For instance, you may want to copy a text file named myFile.txt that is
on your D: hard drive to a floppy disk. You know that the file is
sitting on drive D: in a sub-directory named
D:\DOCUMENT\WINWORD6\TEXTFILE\.
To copy this file you can move to the drive using the change drive
command (D:) and then move to the directory using the change directory
(CHDIR or CD) command and finally use the copy command to copy the file
to drive A:. This would take a minimum of three commands:
D:
cd \document\winword6\textfile
copy myFile.txt a:\myFile.txt
An alternative is to copy the file using one command that references the
drive, directory and filename:
copy d:\document\winword6\textfile\myFile.txt a:\myFile.txt
Although the second method is longer, it is prone to the same level of
error as the first method. Specifically, typos.
In this case, the file was sitting on your own computer. If the file
was on a network that you were a part of, you could have just as easily
copied the file from the network. To copy the file from the network,
you replace the local drive and directory name with the network drive
and directory name where the file actually resides.
(c) copyright by Michael R. Irwin, 1994-1997
============================
3. Computers on the Internet
============================
In the IBM PC world commands like copy are not case sensitive. This
means that you can type the command using any combination of upper and
lower case letters.
When working on the Internet, you need a way to access a document that
is sitting on some computer in some directory. This document can be on
a Mainframe, Mini or PC computer. It can be found on a computer that
has an operating system different than yours -- UNIX, Windows NT, Mac
OS, or one of many other operating systems. Each system stores and
names files differently. To overcome this problem, the URL concept was
instituted. Every document, query, graphic, FTP file, Gopher site, etc.
are assigned a unique Uniform Resource Locator or address.
A URL can point to a file in a directory and that file and directory can
exist on any machine on the Internet and this file can be served via any
of several different methods. As pointed out, it can be more than a
file, it can be query, a gopher server contents, and on and on.
---------------------------
The servers of the Internet
---------------------------
There are several types of computers on the Internet, they are joined
together as one single network. Each computer that exchanges or
transfers information on the Internet is known as a server. In fact,
the Internet is the largest Client/Server database in the World. There
are thousands of servers available on the Internet.
There are several different types of servers on the Internet, each
running their own server software. The four basic types of servers are:
HTTP, or HyperText Transport Protocol, server
Used to store and send standard World Wide Web
hypertext documents. HTTP is a simple protocol
which is the basis of the Web.
FTP, or File Transfer Protocol, server
Used to store and transfer files across the Internet.
These servers are the file libraries or archives, which
can be used by the public. These can be program files
or documents
Gopher server
Used to accept requests for information and then scan
the Net for it. A Gopher server lets the user work
through menus, instead of typing in long sequences of
characters. It works in conjunction with FTP sites
letting a user select a file from a menu. It is normally
based on a single database.
WAIS, or Wide Area Information, database server
WAIS is similar to a Gopher server; however, it lets
users access and find files using a single interface.
The WAIS server program worries about how to access
information on hundreds of different databases.
Each of these servers can store different types of documents and files.
Once a server is on the Internet, it can be accessed by the millions of
users of the Internet.
(c) copyright by Michael R. Irwin, 1994-1997
=================
4. Parts of a URL
=================
A URL, like a file on your local computer has several parts.
Since the document, or file, can be on any type of Internet server, and
on any type of computer, accessing the file via a URL requires
specifying several pieces of information:
----------------
Parts of an URL:
----------------
Specify the type of server, or method needed, to retrieve the document.
By telling the browser or program the type of server or method it will
connect to lets the browser know what it has to do with the information
once it gets it. This is the only part of a URL that does not directly
relate to locating a file on your local machine or on a network that you
are attached to.
Specify the machine name where the document is located. This portion is
used to identify the type of computer and where it is located on the
Internet. This is equivalent to specifying a drive on your local
computer or on a network.
Specify the path and document name that you want. This is exactly the
same as specifying a path and file name in commands on your local
network or drive.
Understanding these three actions, explains the three parts of a URL.
Each URL is comprised of three parts:
First, the type of resource to access
Second, the name of the site where the resource is located
Third, the directory path and resource name, or directory path alone
--------------------------------
>> WARNING: URLs are Case Sensitive <<
--------------------------------
Since many Internet servers are running in an UNIX environment, you must
pay attention to the URL name. Most programs running in the UNIX
environment are case sensitive. Because of this you should be very
careful when typing URLs. Always assume that the URL is case sensitive.
-----------------
Working with URLs
-----------------
Look at the following URL --
http://www.europa.com/~ria/links.html
This URL is an html document of “A Collection of Philippine Pages
Picturesque Philippines World Wide Web Links” it is a great link page
for finding resources and information about the Philippines --
government, education, Internet, business, even newspapers and ezines.
The URL is made up of three parts:
http://
The “http” means that you are dealing with a World
Wide Web resource. It stands for “HyperText Transport
Protocol”. This is the way that the Web moves information
around the world. This information is critical to your
browser. It tells the browser how to connect to the system.
www.europa.com
This is the next part of the URL. It is the name of the
site where the resource is located. It is the name given
to the actual server that sits somewhere in the world on
the Internet.
/~ria/links.html
The final part of the URL is the directory path and resource
name. Notice that the path is separated with forward slashes.
In the example above, notice how the last item ends in “.html”. That
stands for HyperText Markup Language, which is the program coding that
is used to create hypertext documents. Many Web addresses will end in
it.
If you connect to this URL, you will see a page that begins similar to
the following:
*** PICTURE WOULD BE INSERTED HERE
Some other URLs may use numbers in them, as in the following --
http://204.146.46.134:80/prev/explore/wtools/
This URL is an html document of “World Wide Web Search Tools”, a part of
the IBM Global Network pages. It is a great link page for finding the
different search tools available on the Internet for locating URL
resources.
Like the previous example, this URL consists of three parts:
http://
The “http” means that you are dealing with a World
Wide Web resource.
204.146.46.134:80
It is the name of the site where the resource is located.
Notice that it has a numeric name instead of an English name.
/prev/explore/wtools/
This is the directory path and resource name. Notice that
it does not end with a document name.
------------------------------
>> NOTE: Ending an URL in a slash <<
------------------------------
When using FTP, HTTP, and Gopher URLs, the "directory path and resource
name" will sometimes end in a slash. This simply means that the URL is
not pointing to a specific file, but a directory. In this case, the
server generally returns the "default index" of that directory. This
might be just a listing of the files available within that directory, or
a default file that the server automatically looks for in the directory.
With HTTP servers, this default index file is generally called
"index.html", but is frequently seen as "default.html”, "home.html", or
"welcome.html".
(c) copyright by Michael R. Irwin, 1994-1997
================
5. Types of URLs
================
There are several different types of URLs. The one we have currently
seen in this paper is the HTTP URL. When the World Wide Web was first
introduced to the Internet, in late 1993, it offered an easy, single,
consistent user interface that could be used to browse, or view, text
and graphics at the same time. With the introduction of the Web was a
new server known as the HTTP server.
Prior to the introduction of the HTTP server, there were several other
servers already in use on the Internet. These servers allowed user to
(1) transfer files via archaic UNIX commands, like ls or get, (2) read
news via programs like rn and nn that use commands like j or sz, and (3)
using menus for finding things in gopher and WAIS servers.
To work with the different types of resources found on the Internet, you
need a way to tell your browser how to find the resource and the type of
resource you want to work with. It can be a file that you want to
download, a news article you want to read, or a gopher site that has a
menu that you want to view. Each type of resource will reside on its
own type of server.
--------------------------------
>> NOTE: Different types of servers <<
--------------------------------
To review the types of Internet servers see the section Servers of the
Internet section found earlier in this paper.
There are many different types of URLs, however the most common schemes
are:
HTTP URLs
FTP URLs
Gopher URLs
News URLs
---------
HTTP URLs
---------
HTTP is the Internet protocol specifically created for the World Wide
Web, thus it will be the most common scheme you are likely to use.
These are the HyperText documents of the World Wide Web. HTTP, as
pointed out previously, stands for HyperText Transport Protocol. HTTP
servers are commonly used for storing and serving hypertext documents.
These types of documents tend to be extremely efficient, containing
navigational information within themselves. Moving from one document to
another is handled via an embedded reference this means that the server
protocol does not have to contain support for navigational features like
Gopher or FTP protocols require.
For instance, you may want to go to a page that gives you information on
creating a home page, you can enter an address like:
http://www.goliath.org/makepage/index.html
This URL is the Welcome to "Make Your Own Home Page" page. Notice that
it is an HTTP type URL.
HTTP URLs have become the most common type of URLs on the Net today.
---------------------------------
File Transfer Protocol (FTP) URLs
---------------------------------
FTP URL scheme is used to access files and directories on Internet hosts
using the FTP protocol. The FTP protocol is one of the oldest was of
transmitting files over the Internet. While there are many advantages
to using HTTP instead, many servers don't offer the full support of
HTTP. In addition, many client programs are developed for FTP. This is
especially true if you are accessing the Internet via Terminal emulation
as many UNIX clients still do. In addition, many files are distributed
only via FTP on the Internet.
Connecting to an FTP site works basically the same way as logging into
an HTTP site. For example, to connect to the Internet’s Electronic
Frontier Foundation computer, you would use the URL:
ftp://ftp.eff.org/
Notice that the URL is very similar to an HTTP URL. Instead of
specifying the type of server as http://, you specified ftp://. The
name of the site where the resource is located is ftp.eff.org/. Notice
in this case that you ended the URL with a forward slash. This ftp does
not specify a specific path and document name. Therefore, it displays
the contents of the sub-directory pointed to on the ftp server.
Another example will specify a specific file that you want to locate.
Once located, your browser will either display it, if it recognizes the
format or notify you that it doesn’t recognize the format and offer to
save it to disk for you. If you want to find and display the file named
cda_approved.gif on the same ftp server we just connected to, you would
enter the following URL:
ftp://ftp.eff.org/pub/EFF/Graphics/cda_approved.gif
Your browser will display a graphic similar to the following:
*** PICTURE WOULD BE INSERTED HERE
Notice that the above URL looks very similar to the URLs that you have
specified when working with HTTPs. In this case it has a directory and
file name as part of the URL.
----------------------
>> Note: Case sensitivity <<
----------------------
Notice that the above URL has both lower and upper case in the URL name.
URLs are case sensitive and must be entered exactly the same as the case
sensitivity of the directory and file names.
-----------
Gopher URLs
-----------
As you work with FTP URLs you begin to realize ftp sites can be very
frustrating to work with. You have to remember all of those ftp site
names and, oh, many of the ftp sites have weird directory and file
names. This is where a gopher URL can help. Gophers (and WAISs) are
essentially menu systems. They take a request for information and then
scans the Net for it. This eliminates the need for you to have to
search for it. Once a menu is displayed, you can select files and
programs from ftp sites for downloading or displaying.
The Gopher protocol syntax is very similar to FTP and HTTP. Instead of
using http:// or ftp:// you specify gopher://. For example, to connect
to the National Cancer Center gopher site in Tokyo, Japan, the URL is:
gopher://gopher.ncc.go.jp/
Or another site you may be interested in is the United Nations Criminal
Justice Country Profiles gopher site. This site is maintained at the
Albany, NY university. The gopher URL is:
gopher://UACSC2.ALBANY.EDU:70/11/newman
Once on the server, you will see a menu similar to the one at the top of
the next page:
This menu is actually a Gopher server. Notice that it is a series of
menus choices. Since you are using a browser, like Netscape, it shows
all the choices as underlined text.
To select any of the choices all you have to do is double mouse click on
the menu choice you want.
*** PICTURE WOULD BE INSERTED HERE
Using this Gopher menu, you can click on the UN Criminal Justice Country
Profiles and then select any country whose information you want to
review or copy.
---------------------------------------------------------------
>> Warning: It asks a port # when connecting to FTP/ GOPHER server <<
---------------------------------------------------------------
Sometimes, you may have to specify a port number for the FTP or Gopher
site you are trying to connect to. Usually it will default OK with a
port number. If you are connecting to a FTP or GOPHER site via a
browser and a menu choice on another Web document, the port number will
be passed at the same time, automatically.
---------
News URLs
---------
The final most common type of URL used is to connect to an Usenet
newsgroup. These URLs are known as News URLs.
Before demonstrating how to connect to a News URL, we need to quickly
discuss UseNet:
What is USENET
--------------
USENET is a large collection of computers that share data with each
other. It is the people that use these computers that make USENET worth
the effort. Imagine a conversation that is being carried on over days,
where anyone can put their two-cents in. Usenet is like email, except
that it is many-to-many instead of one-to-one. It is the international
meeting place where people gather to meet their friends, discuss
events, or talk about anything they want. Often, many people believe
that USENET is the Internet. However, it is a totally separate system.
All Internet sites CAN carry Usenet. Usenet has millions of messages
posted each day -- it is HUGE.
The basic building block of the Usenet is the newsgroup which is a
collection of messages related to a theme. There are almost 10,000 of
these newsgroups, in a wide range of languages, covering any subject you
can imagine. Which Usenet groups you have access to depends upon your
Internet service provider. Each newsgroup usually has a fee attached to
it and requires that the provider pay this fee. Therefore the services
available are those that your provider subscribes to.
To connect to a newsgroup you use its URL. Unlike the previous URLs,
you do not specify the new service the same way you connect to other
URLs. Specifically, you do not specify the double forward slashes.
Before you use any news services, you will have to specify the news
server used by your Internet provider. In Netscape this is done via
specifying your NNTP (Net News Transfer Protocol) server in the
Preferences dialog box, under the Options box. In Mosaic you set an
environment variable NNTPSERVER to the name of the news server. Most
browsers will let you set the news server via a file menu choice like
options.
Once your news server has been specified, you can point to a Usenet
newsgroup by referencing the URL. For instance, to connect to the US
jobs offered newsgroup you would type:
news:us.jobs.offered
----------
Other URLs
----------
There are several other URLs that you can reference from your browser.
Each can be referenced similar to the way you have worked with HTTP, FTP
and GOPHER URLs.
Some of the other URLs you may come across are:
File URLs file://ftp.unt.edu/README
WAIS URLs wais://wais.free.net/
NNTP URLs nntp:////<# doc>
Telnet URLs telnet://none@edlis.ied.edu.hk:23/
Mailto URLs mailto:mrirwin@ibm.net
Although you may not come in contact with these URLs often, you may find
them as links in other WWW documents. For instance, if you see a link
like “send message to page owner” it will probably use a MAILTO URL.
Some URLs like Telenet will require that you have a Telenet application
linked to your browser. Since Telenet allows you to login to a server
as a terminal, you will need some sort of program that lets you act as a
terminal. This application will run by your browser when you log into
the server.
(c) copyright by Michael R. Irwin, 1994-1997
===================
6. Bookmarking URLs
===================
Although URLs are frustrating to work with, there is an easy to return
to a URL resource. Nearly all of the Internet Web browsers today have a
feature which is like an automated address book. Some browsers call it
“Book Marking” others call it “Hot Listing”, in both cases the effect is
the same.
Bookmarking allows you to grab a copy of a URL and store it so that you
can easily go back to the site at a future time.
Understanding the action of book marking, the definition of a bookmark
becomes obvious. A bookmark is a Web page tag or reference that you
place in a list that can be accessed later to return to the URL.
Following are instructions for bookmarking (or hot listing) using
several popular internet web browsers:
Netscape’s Navigator (Version 2.0x and 3.0x)
Go to the First page of the site you want to reference
Select BookMarks >> Add Bookmark from the main menu
Microsoft’s Internet Explorer (Version 2.0x and 3.0x)
Go to the First page of the site you want to reference
Select Favorites >> Add to Favorites from the main menu
Spry’s Mosaic (Version 4.00.xx)
Go to the First page of the site you want to reference
Select Navigate >> Add Web page to Hot list or click on the ADD
button on the button bar.
As you can see, adding URLs to a browser are relatively easy. All
require the same basic action -- go to the URL resource that you want to
add to the list. Once you are at the resource, add it to the Bookmark
or Hot list.
(c) copyright by Michael R. Irwin, 1994-1997
==================
7. Errors and URLs
==================
If you receive an error when attempting to connect to a URL, first check
to see if you entered the correct URL -- in other words, check the
typing.
If it is OK then perhaps the Web server is busy, simply try again later.
Finally, if you connect and it tells you that you must specify a port
number and/or a user name and password, you will need to obtain the
appropriate information and add it to you URL before accessing the URL
source.
(c) copyright by Michael R. Irwin, 1994-1997
=============
8. In Summary
=============
Using URLs is relatively easy as long as you remember one simple rule:
RULE: URLs are case sensitive.
There are several different types of URLs; however, they all tend to
work the same way.:
First - you put in the type of URL you want to connect
to (e.g. : http, ftp).
Second - after this, you place the host server name.
Third - the path and resource you want to access.
That is all there is to URLs. Using URLs lets you move from one Web
resource to another quickly and easily.
(c) copyright by Michael R. Irwin, 1994-1997
               (
geocities.com/tokyo/towers)                   (
geocities.com/tokyo)