Using search
he Netscape Enterprise Server search function
provides you with the ability to search the contents and attributes of
documents on the server. As the server administrator, you can create a
customized text search interface that's tailored to your user community.
Server documents can be in a variety of formats,
such as HTML, Microsoft Word, Adobe PDF, and WordPerfect. The server converts
many types of non-HTML documents into HTML as it indexes them so that users
can use your web browser to view the documents that are found for their
search.
Users can search through server documents for
a specific word or attribute value, obtaining a set of search results that
list all documents that match the query. They can then select a document
from the list to browse it in its entirety. This provides easy access to
server content.
As the server administrator, you can restrict
which users and groups are authorized to use text search and which documents
they can access, you can modify the configuration files that govern how
text search operates, and you can customize the search query and results
pages.
To enable searching capability on your server,
you begin by identifying the special configuration needs of your server
and using the several search configuration forms to input these. Then you
need to identify the directory or directories of documents that you want
prepared for searching and index the document information into a searchable
database, called a collection. The next several sections discuss
the details of configuring search and indexing collections.
Note
Search cannot work if the web publishing collection (web_htm)
does not yet exist or has been deleted. If search does not work, restart
the server with the web publishing function turned on (the default), and
try searching again.
Configuring text search
You can configure several aspects of the search function
for your specific server, some of which are collection-specific and others
apply across all collections during a search. Collection-specific configuring
affects how documents are indexed into a particular collection, so you
must define these before creating the collection. Other configuring actions
can be defined at any time because they only affect the searches themselves.
Collection-specific configuration actions:
-
define URL mappings for the document directories to be indexed
-
define the pattern files to display for searches on a particular collection
Configurations that affect all collections:
-
establish access control for files and directories
-
define any words you want dropped from the search
-
define the search parameters
-
turn the search function off and on
-
restrict the amount of memory available for indexing operations
Controlling search access
The search function accesses the ACL database that
is the default for your server. You can restrict access to the documents
and directories on your server by defining explicit access control (ACL)
rules or you can rely on the default access control definitions. You can
add users to your server's access control database through the Administration
Server's Users & Groups function. See Chapter
6, "Controlling access to your server" for more information about setting
access control.
You can set your server to check access permissions
before displaying search results (through the Agents & Search|Search
Configuration form discussed in "Configuring
the search parameters"). When this is set, before returning the results
of a search query, the server checks a user's access privileges and challenges
the user to identify themselves before displaying any results.
Mapping URLs
When users search through a collection's files, the
documents that are returned as search results use a partial URL, called
a URI (or Uniform Resource Identifier), to identify them.
This is a security feature that prevents users from knowing the complete
physical pathname for a file. A URI is set up by mapping a URL to an additional
document directory.
For example, if the path for a file is server_root/Docs/marketing/bizplans/planB.doc,
you could set up a mapping that prevents users from seeing all but the
last directory by defining a URL prefix of plans and mapping it
to server_root/Docs/marketing/bizplans. From then on, users
need only type /plans/planB.doc to locate the file. For more information,
see Chapter 4, "Managing server content."
The Enterprise server provides five default
mappings:
-
/--the primary document directory (sometimes called the document
root), which initially maps to server_root/docs
-
/help--the directory for the help files
-
/search-ui--the directory for most of the search interface files
-
/webpub-ui--the directory for most of the Web Publisher interface
files
-
/publisher--the directory for most of the Web Publisher files
-
/help--the directory for the online Web Publisher help files
When you create a collection, you must specify which
document directory to index. You can only choose a directory that has a
URL mapping or a subdirectory within such a mapped directory. You can create
your own mappings to define specific directories. To do this, follow these
steps:
-
From the Server Manager, choose Content Management.
-
Click the Additional Document Directories link.
-
Type in a nickname that maps the URL to the additional document directory
you want to define. For example, type in the word plans.
-
Type the absolute physical path of the directory you want the URL mapping
to map to. For example, C:/Netscape/SuiteSpot/Docs/marketing/bizplans.
-
If you want to apply a style to the directory, select the style in the
Apply Style drop-down list. See Chapter 4,
"Managing server content" for more information about styles.
-
Click OK to create the additional document directory.
Note
Once you create a collection based on an additional document
directory, you cannot change the URL mapping or the collection's entries
will target the URL mapping to the wrong physical file location.
Deciding which words not to search
You can specify words the search engine should not
index or search against. These words are sometimes referred to as stop
words or drop words and typically include articles, conjunctions,
and prepositions such as at, and, be, for,
and the.
To specify stop words, you need to edit the file named style.stp.
This file resides in each of the subdirectories html, pdf,
mail, and news (for each collection type) in the directory
server_root\plugins\search\common\style. Each style.stp
file controls stop words for that collection type; for example, the style.stp
file in server_root\plugins\search\common\style\html controls
stop words for html files in the collection.
Add the stop words to style.stp, one per line and left justified.
You can use operators such as square brackets ([]) to indicate character
classes, periods (.) to indicate any character, and plus notation (+) to
indicate repeats. For example, the style.stp file might contain
the following lines:
........................................+
at
and
be
[0-9a-zA-Z]
[0-9][0-9][0-9][0-9]+
In this example, the first line of periods (in the file by default)
indicates that words with 40 or more characters are not to be indexed as
well as the words at, and, and be. [0-9a-zA-Z] indicates
that all one letter words are not to be indexed. [0-9][0-9][0-9][0-9]+
indicates that all integers with 4 or more digits are not to be indexed.
The words you specify are case sensitive so if you want to stop all
the case variations of a word you need to enter them all. For instance,
for the you might enter the, THE, and The.
Make sure you have the stop list you want before you create a collection.
If you need to change the stop list after a collection has been created,
you need to delete the collection, change the stop list for the collection
type, recreate the collection, and reindex all the documents in the collection.
Turning search on or off
You can turn search capabilities on and off for your
server. Turning search off for a server where users do not use this function
can improve server performance. You may also want to turn off the search
function at certain times when you know the server will have heavy traffic,
reserving this function for times when traffic is lighter.
If you turn it off, the search plug-in is not
loaded when the HTTP server starts up. The default is for search to be
turned on.
Note
If search is turned off, the Find Broken Links function in Web Publisher
is not available because it executes a search as part of its operation.
To turn off the search function, use these steps:
-
From the Server Manager, choose Agents & Search.
-
Click the Search State link.
-
To turn the search function off, click the Off button.
-
Click OK to turn search off.
You can turn search back on with these steps:
-
From the Server Manager, choose Agents & Search.
-
Click the Search State link.
-
To turn the search function on, click the On button.
-
Click OK to turn search on.
Configuring the search parameters
As server administrator, you can set the default
parameters that govern what users see when they get search results.
To configure search parameters:
-
From the Server Manager, choose Agents & Search.
-
Click the Search Configuration link.
-
Type the default maximum number of search result items displayed to users
at a time. This cannot be larger than the value for the largest possible
result set size, as defined in Step 4. The default is 20.
-
Type the maximum number of items in a result set. The default is 5000.
For example, if you type 250 as the value, and there were 1000 documents
that match the search criteria, users would only be able to see the first
250 or the 250 top-ranked documents (for searches that rank their results).
-
Type the format of the date/time string in Posix format. This is how the
search results are displayed to users in the search results page. For example,
the format %b-%d-%y %H:%M produces Oct-1-97 14:24. You can use the symbols
listed in Table 11.1.
Common Posix date and time formats
Format
|
Displayed result
(example) |
%a
|
Abbreviated week day (for example, Wed)
|
%A
|
Full week day (for example, Wednesday)
|
%b
|
Abbreviated month (for example, Oct)
|
%B
|
Full month (for example, October)
|
%c
|
Date and time formatted for current locale
|
%d
|
Day of the month as a decimal number (for example,
01-31)
|
%H
|
Hour as a decimal number, 24-hr military format
(for example, 00-23)
|
%m
|
Month as a decimal number (for example, 01-12)
|
%M
|
Minute as a decimal number (for example, 00-59)
|
%x
|
Date
|
%X
|
Time
|
%y
|
Year without century (for example, 00-99)
|
%Y
|
Year with century (for example, 1999) |
-
Type a default title for the document that is to be used if the document's
author has not included a title as part of the document, tagged with the
HTML Title tag. The typical default is (Untitled), which appears
in the search results page for HTML files.
-
If you want the user's access permission to be checked on a collection
before displaying the search results, click the Yes radio button under
the label "Check access permissions on collection root before doing a search?"
If you click Yes, the server checks the user's
access privileges for each collection before displaying the documents found
as a result of the search. Only the documents in a collection that you
have permission to view are displayed.
-
If you want the user's access permission to be checked before displaying
the search results, click the Yes radio button under the label "Check access
permissions on search results?" If you click Yes, the server checks
the user's access privileges for each file before displaying the documents
found as a result of the search. Only the documents that you have permission
to view are displayed.
-
Click OK to set your new search configuration.
Configuring your pattern files
Pattern files are HTML files that define the layout
of the text search interface. You can associate a pattern file with a search
function and a set of pattern variables to create a specific portion of
the interface. In the pattern file, you define the look, feel, and function
of the text search interface. Pattern files use pattern variables that
you can use to customize background color, help text, banners, and so on.
In some cases, the values are pathnames to the files that contain the actual
text and graphics that these variables represent; in other cases, the values
represent text and HTML.
You can use the default pattern files, or you
can create your own customized set of files and point to them from here.
See "Customizing the search interface"
for more information about how to change the user interface.
To define where the search function is to look
for default pattern files associated with a particular search request,
you have to specify the paths for the files.
-
From the Server Manager, choose Agents & Search.
-
Click the Search Pattern Files link.
-
Type the absolute path for the directory where you store your pattern files.
The default start (header), end (footer), and query page pattern files
are located in this directory.
-
Type in the relative pathname for the default pattern file you want to
use for the top of the search results page when a collection has no defined
header file or when more than one collection is being searched. Specify
the path relative to the pattern file directory, as defined in Step 3.
-
Type in the relative pathname for the default pattern file you want to
use for the footer of the search results page when for a collection has
no defined footer file or when more than one collection is being searched.
Specify the path relative to the pattern file directory, as defined in
Step 3.
-
Type in the relative pathname for the pattern file you want to use for
the search query page that appears when you start up the search function.
Specify the path relative to the pattern file directory, as defined in
Step 3.
-
Click OK to configure your search pattern files.
Configuring manually
The search function examines several configuration
files to determine how search is configured on your server. These files
define system settings, user-defined variables, and information about your
search collections. You normally change this information through the Server
Manager's Agents & Search forms, but you can also modify the files
manually with your own text editor. Some of the implications of changing
the configuration files in order to customize the user interface are discussed
in "Customizing the search interface."
Note
It is not recommended that you make any manual modifications
to your configuration files, but if you do, you must restart the server
for your modifications to take effect.
The configuration files
The configuration files that govern searching are:
-
webpub.conf--This system configuration file contains system settings
and file paths. In your server's obj.conf file, the search system
initialization is mapped to the webpub.conf file. When you use
the Search Configuration and Search Pattern Files forms, the data you input
is reflected in the webpub.conf file. You can customize your server's
search configuration by changing some of the settings in the webpub.conf
file, but in general, you can make the changes you need through the Server
Manager's forms.
-
userdefs.ini--This user definitions file defines the user-defined
pattern variables. In the webpub.conf file, this is mapped to
the userdefs.ini file for your language (English, German, Japanese,
and so on).
You can customize a search interface by creating
and defining your own pattern variables in the userdefs.ini file
that can be used throughout your pattern files (See
"User-defined pattern variables" for details).
-
dblist.ini--This collection contents file describes collection-specific
information. When you create and maintain collections, the dblist.ini
file is updated for you with information about your collections.
Adjusting the maximum number of attributes
Collections have different sets of default attributes
that depend on which file format they are. For example, HTML files have
Title and SourceType. You can also define META-tagged
HTML attributes in your HTML files. Some file formats, such as PDF, have
a great many default attributes. See "About
collection attributes" and Table 11.2
for more information about the attributes for each format.
You can use the Add Custom Property form to add
additional properties for the Web Publishing collection. These are the
default maximum settings:
-
Text (a maximum of 30, including all META-tagged attributes)
-
Numeric (a maximum of 5)
-
Date (a maximum of 5)
You can change the maximum settings for these in
the webpub.conf file, although larger sets of attributes impact
the performance of your server. You cannot set the maximums beyond 100
for text and 50 for dates and numbers.
To do this, you need to manually edit the
[NS-loader] section of the webpub.conf file to define
maximum numbers of attributes. For example, to change all three values,
you could use these lines:
NS-max-text-attr = 50
NS-max-numeric-attr = 10
NS-max-date-attr = 10
Note
You cannot use the additional attributes in existing collections,
only in subsequently created collections. To use them in a search collection,
you must use the Agents & Search | Maintain Collection form to remove
the collection and then use the Agents & Search | New Collection form
to create a new collection. If you want to use the new attributes in the
web publishing collection, you must use your file system to remove both
the web_htm and link_mgr collection files from the search
collections directory and then restart your server.
Restricting memory for indexing
You can set a limit on the amount of RAM available
for indexing operations. To do this, you need to manually edit the [NS-loader]
section of the webpub.conf file to add a line defining a maximum
memory amount. For example,
NS-max-memory = 32000000
The default is for the server to use all of the available
memory that the system can offer. Most typically, you need to limit the
RAM used for indexing in these two cases:
-
The Enterprise Server 3.0 is installed on a machine that has less than
the suggested minimum RAM requirement, 32MB.
-
For server administrators on Windows NT servers that require a great deal
of indexing but who wish to set aside some memory for other server operations.
Note
When you are indexing large collections of documents, set the virtual
memory availability to the current system maximum or the indexing operation
may fail due to lack of available disk space.
Restricting your index file size
You can limit how much disk space an index file can
consume. To do this, you need to manually edit the [NS-loader]
section of the webpub.conf file to define a maximum index file
size. For example,
NS-max-idx-file-size = 1500000
Typically, an indexing operation requires approximately
1.5MB per file, and since there are two files, one of which is temporary,
you may need as much as 3MB of disk space for indexing. Setting the file
size to 1.5MB per file puts a cap on how large each file can become.
Removing access to the Web Publishing collection
Web Publishing appears in the Search In field of the user's standard search
query page. To remove the Web Publishing collection from this field, you
need to edit the dblist.ini file as follows:
-
In the "[web_htm]" section, change "NS-display-select=YES" to "NS-display-select=NO".
-
Restart the server.
Indexing your documents
Before users can execute searches, they need a database
of searchable data against which they can target their searches. To do
this, you create a database, called a collection, that indexes and stores
information about the documents such as their content and file properties.
Searches require collections of files upon which
to perform their searches. Once the documents are indexed, their contents
and file properties, such as their titles, creation dates, and authors,
are available for searching.
You can add or delete documents from a collection:
optimizing, updating, and managing your collections as needed.
About collections
When your server administrator indexes all or some
of a server's documents, information about the documents is stored in a
collection. Collections contain such information as the format of the documents,
the language they are in, their searchable attributes, the number of documents
in the collection, the collection's status, and a brief description of
the collection. For more details, see the section "Displaying
collection contents."
When you create a collection, you indicate
the type of files that it contains: HTML, ASCII, news, email, PDF, or multiple
formats. This determines what happens during indexing: which attributes
are indexed and what, if any, file conversion has to be done. Files in
multiformat collections are converted to HTML. You can index all the files
in a directory or only those with a specific extension--for example, all
the HTML, PDF, or *.doc documents.
A collection has records with information about
each document that has been indexed. If the document is deleted from the
collection, only the collection's entry for that document is removed. The
original document is not deleted.
When you have multiple server instances, the collection you create is
only associated with the server instance on which the collection was created.
Therefore, users can only search collections for that server instance.
About collection attributes
Server documents can be in a variety of formats,
such as HTML, Microsoft Excel, Adobe PDF, and WordPerfect. If there is
a conversion filter available for a particular file format, the server
converts the documents into HTML as it indexes them so that you can use
your web browser to view the documents that are found for your search.
You can also add new convertors to support new document formats. Enterprise
Server 3.0 allows you to plug in new convertors as they become available,
provided you place them in the <server-root>/plugins/search/filters
directory.
Note
Complex PDF files, such as those that are password protected or that
contain graphical navigation elements cannot be correctly converted when
they are indexed as part of a multi-format collection. The file data converts
correctly when they are part f a PDF-only collection. Graphic elements
are not converted.
There are conversion filters for documents in
these formats:
-
HTML
-
ASCII
-
MS Rich Text Format (RTF)
-
Interleaf 5.2-6.0
-
MS Word (DOS) 3.0-6.0
-
MS Word (Macintosh) 3-6
-
MS Word (Windows) 2.0, 6.0, 7.0
-
MS Excel 2-5
-
MS Excel (Macintosh) 3-4
-
MS PowerPoint 7.0
-
Adobe PDF to ASCII
-
Adobe FrameMaker (MIF) 3.0-5.0
-
Ami Pro 1.x-3.1
-
WordPerfect (Macintosh) 2-3.5
-
WordPerfect (Windows) 5.x-6.1
-
news and mail file formats
Certain file formats have a default set of attributes
that are indexed for files of that type, as shown in Table
11.2.
The default attributes indexed for each
file format
File format |
Attribute
|
Type
|
Description
|
ASCII
|
(none)
|
-
|
-
|
HTML
|
Title
|
text
|
The user-defined title of the file.
|
|
SourceType
|
text
|
The original format of the document.
|
NEWS
|
From |
text
|
The source userID of the news item.
|
|
Subject |
text
|
The text from the subject field of the news item. |
|
Keywords
|
text
|
Any keywords defined for the news item
|
|
Date
|
date
|
The date the news item was created.
|
EMAIL
|
From |
text
|
The source userID of the email.
|
|
To |
text
|
The destination userID of the email.
|
|
Subject |
text
|
The text from the email's subject field. |
|
Date |
date
|
The date the email was created.
|
PDF |
InstanceID |
text
|
An internal ID number.
|
|
PermanentID |
text
|
An internal ID number.
|
|
NumPages |
integer
|
The number of pages in the document.
|
|
DirID |
text
|
The directory where the PDF file exists.
|
|
FTS_ModificationDate
|
date
|
The document's last modification date.
|
|
FTS_CreationDate |
date
|
The document's creation date.
|
|
WXEVersion |
integer
|
The version of Adobe Word Finder used to extract
the text from the PDF document.
|
|
FileName |
text
|
The Adobe filename specification.
|
|
FTS_Title |
text
|
The document's title.
|
|
FTS_Subject |
text
|
The document's subject.
|
|
FTS_Author |
text
|
The document's author.
|
|
FTS_Creator |
text
|
The document's creator.
|
|
FTS_Producer |
text
|
The document's producer.
|
|
FTS_Keywords |
text
|
The document's keywords.
|
|
PageMap |
text
|
The page map, describing the word instances for
the page.
|
By default, HTML collections have Title
and SourceType attributes, but they can be indexed to permit searching
and sorting by up to 30 file attributes tagged with the HTML <META>
tag. You can change the maximum settings for file attributes in webpub.conf,
as discussed in "Adjusting the maximum number
of attributes."
For example, a document could have these lines
of HTML code:
<META NAME="Writer" CONTENT="J. S. Smith">
<META NAME="Product" CONTENT="Communicator">
If this document was indexed with its META tags extracted,
you could search it for specific values in the writer or product fields.
For example, you could enter this query: Writer <contains> Smith
or Product <contains > Comm.
Note
Any attribute values in META-tagged fields are text strings
only, which means that dates and numbers are sorted as text, not as dates
or numbers. Also, illegal HTML characters in a META-tagged attribute are
replaced with a hyphen. You can use the Web Publishing | Add Custom Property
form to redefine the text-formatted dates and numbers so that you can perform
searches based on actual dates and numbers for data in the Web Publishing
collection.
Creating a new collection
You can create a collection that indexes the content
of all or some of the files in a directory. You can define collections
that contain only one kind of file or you can create a collection of documents
in various formats that are automatically converted to HTML during indexing.
When you define a multiple format collection (with the auto-convert option),
the indexer first converts the documents into HTML and then indexes the
contents of the HTML documents. The converted HTML documents are put into
the html_doc directory in the server's search collections folder.
You can only have 12 collections on your server, which is limited to
10 user-defined collections for any server that uses web publishing. If
you want to use a 13th collection, you must remove one of your existing
collections (with the Agents & Search | Maintain Collection function).
Do not remove the web publishing collection if one exists for your server.
You can only have entries for a maximum of 16 million documents in your
collections. A document that is indexed in multiple collections counts
as multiple documents. It is best to create new collections of over 10,000
documents at low-traffic times, or the indexing operation may affect your
system's performance.
Note
You need to have at least 3MB of available disk space on your
system to create a collection. For information on how you can restrict
the size of the index files, see "Restricting
your index file size."
To create a new collection, follow these steps:
-
From the Server Manager, choose Agents & Search.
-
Click the New Collection link.
-
The Directory to Index field displays the currently defined document directory
and provides a drop-down list of all the additional document directories
defined for the server. See "Mapping URLs"
for more information about additional document directories. You can select
any of the items in the drop-down list as a starting point for finding
the directory you want to index.
If you want to index a different subdirectory,
click the View button to see a list of resources. You can index any directory
that is listed or you can view the subdirectories in a listed directory
and index one of those instead. Once you click
the index link for a directory, you return to the Create Collection form
and the directory name appears in the Directory to Index field.
-
You can index all HTML files in the chosen directory by leaving the
default *.html pattern in the "Documents matching" field or you
can define your own wildcard expression to restrict indexing to documents
that match that pattern.
For an example, you could enter *.html to only index the content
in documents with the .html extension, or you could use either
of these patterns (complete with parentheses) to index all HTML documents:
(*.htm|*.html)
or
*(.htm|.html) You can define multiple
wildcards in an expression. See Chapter 3,
"Managing your server" for details of the syntax for wildcard patterns.
Note
You cannot index a file that includes a semi-colon (;) in its name.
You must rename such files before you can index them.
-
To also index the subdirectories within the specified directory, click
the "Include Subdirectories" checkbox. For Unix users, this option also
follows symbolic links.
-
In the Collection Name field, type a name for your collection. The collection
name is used for collection maintenance. This is the physical file name
for the file, so follow the standard directory-naming conventions for your
operating system. You can use any characters up to a maximum of 128 characters.
Spaces are converted to underscores.
-
In the optional Collection Label field, type a user-defined name for your
collection. This is what users see when they use the text search interface.
Make your collection's label as descriptive and relevant as possible. You
can use any characters except single or double quotation marks, up to a
maximum of 128 characters.
Note
Using single or double quotation marks prevents agent services from
operating. If you know that you are not going to use agent services, you
can use these quotation marks, but it is good practice to avoid using them.
-
In the optional Description field, type a description for your collection
up to a maximum of 1024 characters. This is displayed in the collection
contents page.
-
Select the type of files the collection is to contain: ASCII, HTML, news,
email, PDF, or multiple document formats. The kind of file format you choose
indicates which default attributes are used in the collection and which,
if any, automatic HTML conversion of the content is done as part of indexing.
See "About collection attributes" and
Table 11.2 for information about the
attributes for each format.
If you choose HTML as the file type and also
try to index non-HTML files, the server creates the collection with the
HTML set of default attributes and does not attempt to convert any non-HTML
file it indexes. If you index HTML files into an ASCII collection, even
the HTML markup tags are indexed as part of the file's contents and when
you display the files, the contents are displayed as raw text. Regardless
of the file type chosen, the content of the file is always indexed.
-
Select whether or not to extract META-tagged attributes from HTML files
during indexing. If you extract these attributes, you can search on their
values. You can index on a maximum of 30 different user-defined META tags
in a document. You cannot use this option for multiple-format collections.
-
Select the collection's language from the drop-down list. The default is
English, labeled "English (ISO-8859-1)." For more information on character
sets, see Chapter 4, "Managing server content."
-
Click OK to create a new collection.
Note
Once you begin indexing a collection, you cannot stop the process until
either the indexing is complete or you reboot the system. Shutting down
your server does not kill the process.
Configuring an existing collection
After you have initially created a collection, you
can modify some of the initial settings for the collection. This data resides
in the collection information file, dblist.ini, and when you reconfigure
a collection, the dblist.ini file is updated to reflect your changes.
See "Configuring manually" for more
information about the configuration files. You can revise the description,
change its label, define a different URL for its documents, and define
how to indicate highlighting in displayed documents, which pattern files
to use, and how to format dates.
Note
This form allows you to modify some of the settings for the
web publishing default collection, web_htm, because you are not
changing actual collection data. Avoid making unnecessary making changes
to this collection's settings.
To configure a collection, follow these steps:
-
From the Server Manager, choose Agents & Search.
-
Click the Configure Collection link.
-
In the optional Description field, you can type a description for your
collection up to a maximum of 1024 characters.
-
In the optional Collection Label field, you can type a user-defined name
for your collection. This is what users see when they use the text search
interface. Make your collection's label as descriptive and relevant as
possible. You can use any characters except single or double quotation
marks, up to a maximum of 128 characters.
Using single or double quotation marks prevents
agent services from operating. If you know that you are not going to use
agent services, you can use these quotation marks, but it is good practice
to avoid using them.
-
In the URL for Documents field, you can type in the new URL mapping for
the collection's documents if that has changed. That is, if you originally
indexed the directory of files that corresponded to those defined by the
URL mapping /publisher/help, and you have changed that mapping
to the simpler /helpFiles, you would replace the URL of /publisher/help
with the /helpFiles in this field. See
"Mapping URLs" for more information about additional document directories.
-
In the Highlight Begin and Highlight End fields, you can type in the HTML
tagging you want the server to use when highlighting a search query word
or phrase in a displayed document. The default is to use bold, with the
<b> and </b> tags, but you can add to this or change
it. For example, you could add <blink><FONT COLOR = #FF0000>
and the corresponding </blink></FONT> to highlight with
blinking bold red text.
-
You can define different default pattern files for displaying the search
results: how the search result's header, footer, and list entry line are
formatted, respectively. Initially, the pattern files are in the server_root/plugins/search/ui/text
folder.
-
In the Result Pattern File field, you can enter the name of the pattern
file you want to use when displaying a single highlighted document from
the list of search results.
-
In the Date Format field, you can specify how you want input dates to be
interpreted when using this collection: MM/DD/YY, DD/MM/YY, or YY/MM/DD.
-
Click OK to change the collection configuration.
Updating an existing collection
After you have initially created a collection, you
may want to add or remove files. If you are adding documents, the files'
contents are indexed (and converted if necessary), when their entries are
added to the collection. If you are removing documents, the entries for
the files are removed from the collection along with their metadata. This
function does not affect the original documents, only their entries in
the collection.
Note
If you selected the Extract Metatags option when you created
this collection, then the META-tagged HTML attributes are indexed whenever
you add new documents to this collection.
To update a collection, follow these steps:
-
From the Server Manager, choose Agents & Search.
-
Click the Update Collection link.
-
Select the collection you want to update from the drop-down list.
The scrollable list of documents in the center
of the form shows you what documents have index entries in the currently
selected collection. The list holds 100 records, and the Prev and Next
buttons get the previous (or next) set of 100 files for collections that
have more than 100 files in them.
-
In the Documents Matching field, you can type in a single filename or you
can use wildcards to specify the type of files you want added to or removed
from the collection. If you enter a wildcard such as *.html, only
files with this extension are affected. You can indicate files within a
subdirectory by typing in the pathname as it appears in the list of files.
For example, you could delete all the HTML files in the /frenchDocs
directory by typing in (no slash before the directory name): frenchDocs/*.html
Note: Be careful how you construct wildcard expressions.
For example, if you type in index.html, you can add or remove
the index file from the current collection. If instead you type in the
expression */index.html, you can add or remove all index.html
files in the collection.
-
Select whether to index and add all matching documents from the subdirectories
of the document directory that was originally defined for the collection.
That is, if the collection originally indexed the /publisher directory,
this option looks for documents matching the new pattern within all the
subdirectories within /publisher. This does not apply for removing
documents.
-
Click AddDocs to add the indicated files and subdirectories.
-
Click RemoveDocs to remove the indicated files.
Note
Do not use your local file manager to remove collections, especially
not the web publishing collection. If by chance you do, when you try to
execute a search before restarting your server again, the search will fail
even if it doesn't use the web publishing collection. Once you restart
your server, a new web publishing collection will be automatically created
for you, so your search can execute.
Maintaining an existing collection
Periodically, you may want to maintain your collections.
With normal usage, these tasks may not be necessary, but if you do a great
deal of indexing and updating of collections, you may want to use some
of these functions occasionally. You can perform the following collection
management tasks:
-
Optimize collections--You can optimize a collection to improve performance
if you frequently add, delete, or update documents or directories in your
collections. An analogy is defragmenting your hard drive. Optimizing is
done automatically when you reindex or update a collection, so you should
not need to do additional optimizing. One situation when you might want
to optimize a collection is just before publishing it to another site or
before putting it onto a read-only CD-ROM.
-
Reindex--You can reindex a collection, which locates each file that already
has an entry in the collection and reindexes its attributes and contents,
extracting the META-tagged attributes if that option was selected when
the files were originally indexed into the collection. This does not return
to the original criteria for creating the collection, say *.html,
and add any new documents that fit the original criteria. This option also
removes collection entries when the source documents have been deleted
and can no longer be found.
-
Remove--You can remove a collection. This only removes the collection,
not the original source documents.
To perform any of the collection management tasks:
-
From the Server Manager, choose Agents & Search.
-
Click the Maintain Collection link.
-
Select the collection you want to manage from the scrollable list and information
about the collection you selected is displayed.
-
To optimize a collection, click Optimize. To remove a collection, click
Remove. To reindex a collection, click Reindex.
Scheduling regular maintenance
You can schedule collection maintenance at regular
intervals. You can set up separate maintenance schedules for optimizing,
reindexing, and updating. With normal usage, these tasks may not be necessary,
but if you do a great deal of indexing and updating of collections, you
may want to use some of these functions occasionally. For example, some
very active web sites may require frequent reindexing if new documents
are added on a daily basis.
A common combination of tasks is to set up a pair of regularly scheduled
reindex and update operations to clean out deleted entries and to add entries
for new documents matching your collection criteria.
You can optimize a collection to improve
performance if you frequently add, delete, or update documents or directories
in your collections. An analogy is defragmenting your hard drive. Optimizing
is done automatically when you reindex or update a collection, so you should
not need to do additional optimizing. One situation when you might want
to optimize a collection is just before publishing it to another site or
before putting it onto a read-only CD-ROM.
You can reindex a collection, which locates each
file that has an entry in the collection and reindexes its attributes and
contents, extracting the META-tagged attributes if that option was selected
when the files were originally indexed into the collection. This does not
add entries for new documents. This cleans up the collection by removing
entries to files that have been deleted.
You can update a collection, by entering new indexing
criteria for the collection, say *.html, which adds any new documents
that match the criteria.
To optimize, reindex, or update your collection, follow these steps:
-
From the Server Manager, choose Agents & Search.
-
Click the Schedule Collection Maintenance link.
-
Choose a collection from the drop-down list. This lists all the collections
that you have created.
-
Choose an action from the drop-down list: Reindex, Optimize, or Update.
You can set up different schedules for different operations on the same
collection.
-
If you choose to update your collection, two extra fields are displayed
for entering the document matching criteria and for including documents
found in subdirectories that match your criteria.
-
In the Schedule Time field, type in the time of day when you want the scheduled
maintenance to take place. Use a military format (HH:MM). HH must be less
than 24 and MM must be less than 60. You must enter a time.
-
In the section labeled "Schedule Day(s) of the Week," check one or more
of the day checkboxes. You can select all days. You must select at least
one day.
-
Click OK to schedule the maintenance.
-
In the new form that is displayed, click the Cron Control link.
-
This displays the Cron Control form. You can also go to this form by choosing
the Global Settings function that's part of General Administration.
-
If NS-CRON is already on, click Restart to restart it. If NS-CRON is not
on, click Start to start it up. In either case, your regularly scheduled
maintenance is now in place.
Unscheduling collection maintenance
If you have scheduled regular reindexing or optimizing
of a collection, you can remove the scheduled maintenance when you no longer
want the collection to be maintained at regular intervals. To do this:
-
From the Server Manager, choose Agents & Search.
-
Click the Remove Scheduled Collection Maintenance link.
-
Choose a collection from the drop-down list for Choose Collection. This
lists all your collections for which you have set up regular maintenance.
-
Choose an action from the drop-down list: Reindex or Optimize.
-
In the lower part of the frame, you can see the time and days of the week
when the scheduled maintenance is currently scheduled to take place.
-
Click OK to remove the scheduled maintenance.
-
Click the Cron Control link that is displayed on the form.
-
This displays the Cron Control form. You can also go to this form by choosing
the Global Settings function that's part of General Administration.
-
If NS-CRON is already on, click Restart to restart it. If NS-CRON is not
on, click Start to start it up. In either case, your regularly scheduled
maintenance will no longer take place.
Performing a search: the basics
Users are primarily concerned with asking questions
of the data in the search collections and getting a list of documents in
return. When you install the Enterprise server, a default set of search
query and result forms are included. These allow users a simple method
of accessing the search function.
There are four parts to text searching:
-
making a query--you enter your search criteria.
-
displaying search results--the server displays a list of the documents
that match your criteria.
-
viewing a document--you can view a specific highlighted document from the
search results list.
-
viewing the contents of a collection--you can look at the information that
is maintained for each of your collections.
Note
If the search function is turned off, these query forms are not available.
Search home page
The search home page, at http://search-ui/examples,
provides individual links to each of the three search query interfaces
as well as an online QuickStart tutorial on customizing the interface.
The tutorial discusses the various pattern files and gives examples of
how they can be changed to produce different results.
A search query
The default installation of Netscape Enterprise Server
includes three search query pages: standard and advanced HTML queries and
a Java-based guided query.
On the standard search query, you select a collection
to search against and type in a word or phrase to search for using the
query language operators.
On the guided Java-based search interface, you
can use the many drop-down lists to easily construct a query. You can only
obtain this interface when Java is enabled for your browser.
On the advanced HTML page, you have the
additional options of selecting multiple collections to search through,
establishing a sort sequence for the results, and defining how many documents
are to be displayed on a page at a time (clicking the Prev and Next arrows
moves you through the pages of results).
Note
You can only execute date and number comparison searches against HTML
META attribute values in the web publishing collection provided you have
redefined them as date or number properties through the Web Publishing
| Add Custom Property form.
To perform a standard search, follow these steps:
-
Type this URL in the location field in your web browser:
http://yourServer/search
-
In the search query page that appears, choose the collection you want to
search through from the drop-down list in the Search In field.
-
Enter the word or phrase for your search query in the For field. You can
create complex queries by combining operators. See
"Query operators: a reference" for details about the search operators.
-
Click the Search button to execute your query.
Guided search
You can choose to use the Java-based guided search
interface, which helps you construct the query. This is especially useful
if you want to build a query that has several parts, say searching for
a word in the documents' content as well as a specific attribute value.
Note
Make sure Java is enabled for your browser. To do this, use
the Languages option preferences menu command.
There are two ways to obtain the guided search page:
through the Search home page or through the standard search query page.
To access guided search through the Search home
page, follow these steps:
-
Type this URL in the location field in your web browser:
http://yourServer/search-ui/examples
-
Click the Guided Search link on the home page.
To access guided search through the standard search
query page, follow these steps:
-
Go to the standard search query page by typing this URL in the location
field in your web browser:
http://yourServer/search
-
Click Guided Search on the standard search page and the guided Java-based
query page is displayed.
-
Choose the collection you want to search through from the drop-down list
in the Search In field.
-
Use the For drop-down list to select the type of element you wish to search
for. In this example, choose Words.
-
In the blank text field, type in the word you want to search for. See
"Query operators: a reference" for details about the search operators.
-
Click Add Line to add the first part of the query. The word appears in
the large text display box at the bottom of the form.
-
To add to your query, choose another element from the drop-down list. In
this example, choose Attribute.
-
A new drop-down list appears on the right side of the form, listing all
attributes that are available for the chosen collection. Choose the attribute
you want to search against.
-
From the drop-down list above the text input field, choose a query operator
(Contains, Starts, Ends, Matches, Has a substring) or logical operator
(=, <, >, <=, >=) for your query.
-
In the blank text field, type in the attribute value you want to search
for.
-
Click Add Line to add another line for your query. You can click Undo Line
to remove the last line you added or Clear to remove the entire query.
-
Click the Search button to execute the search.
Advanced search
You can choose to use the advanced HTML search interface,
which helps you construct the query. This is especially useful if you want
to create a query that searches through more than one collection or that
produces results sorted by a specific attribute value.
There are two ways to obtain the advanced
HTML search page: through the Search home page or through the standard
search query page.
To access advanced HTML through the Search home
page, follow these steps:
-
Type this URL in the location field in your web browser:
http://yourServer/search-ui/examples
-
Click the Advanced HTML Search link on the home page.
To access advanced HTML search through the standard
search query page, follow these steps:
-
Go to the standard search query page by typing this URL in the location
field in your web browser:
http://yourServer/search
-
Disable Java for your browser. To do this, use the Languages option preferences
menu command.
-
Click Guided Search on the standard search page and the advanced HTML query
page is displayed.
-
In the For field, type in the word or phrase you want to search for. You
can create complex queries by combining operators. See
"Query operators: a reference" for details about the search operators.
-
You can type in one or more attributes to sort the results by. The default
is an ascending sort order, but you can indicate a descending sort order
with a minus. (See "Sorting the results"
for more information about sorting).
-
Depending on how many fields are listed for each document in the search
results page or how many you want to see at a time, you can expand or limit
the number of matching documents you want the search to return at a time.
The Prev and Next buttons allow you access to additional pages of documents
if there are too many to fit on a page at once.
-
Use the drop-down list in the Search In field to choose the collection
you want to search through. You can select more than one collection by
holding down the Ctrl key as you click on another collection. All collections
in a query must be in the same language.
-
Click the Search button to execute your query.
The search results
There are two standard types of search results: a
list of all documents that match the search criteria and the text of a
single document that you selected from the list of matching documents.
Listing matched documents
In the default installation of the Netscape Enterprise
Server, when you execute a search from either the simple or advanced search
query pages, you obtain a list of the documents that match your search
criteria. The list gives some standard information about each file, depending
on the collection's format. For example, the default results page for email
collections give subject, to, from, and date for each entry and news collections
give subject, from, and date for each entry.
The kind of file format in the collection indicates
which default attributes are available for searching. See
"About collection attributes" and Table
11.2 for information about the attributes for each format.
For entries resulting from a search that checks
for comparative proximity of words to each other or for the exactness of
the match, the file's ranking can be provided by showing a score.
If there are more matching documents than can
fit on a page, click Next to see the next batch. You can always execute
a new search by entering new query data and clicking Search.
Sorting the results
By default, or if you don't enter anything in the
Sort By field on the advanced HTML query page, all documents matching the
search are output according to their relevance ranking (for queries that
consider this) or their position in the server file database (for other
queries).
If you enter an attribute name in the Sort
By field, the documents are displayed in an ascending sort sequence. You
can list the documents in a descending sort sequence by adding a minus
sign (-) prefix to the attribute, as in -keywords or -title.
You can do a multiple sort, by typing in more than one field, as in Author,-PubDate.
In a short query, sort order usually isn't critical,
but in queries that result in a great many matches, you may want to set
a sort value in order to obtain useful search results. Note, however, using
a special sort sequence may impact the search's performance.
Note
Attribute values in META-tagged fields are text strings, which
means that dates and numbers are sorted as text, not as dates or numbers.
To convert the value into a date or number, you can create a new property
in the Web Publishing|Add Custom Property form and check the box that marks
this property as a META-tagged attribute.
Displaying a highlighted document
In the default installation of Netscape Enterprise
Server, when you obtain a list of the documents that match your search
criteria, you can select a single document to view in your web browser.
Depending on how the pattern files are set up, the word you entered as
your original search query can be highlighted in the displayed document
with color, boldface text, or blinking.
To view a highlighted document, you click
on a link in the document's entry in the search results. The field you
use to access the highlighted document depends on how your search interface
has been designed, but in the default installation, you click the icon
shown next to the document's listing. When you click it, there is additional
code defined behind the icon's link to format the displayed document with
the search query highlighted.
In the default search results page, if you click the file's URL you
open the file
inyour browser without any special highlighting.
In the case of documents that have been converted
into HTML, the URL points you to the original document. To get to the converted
HTML document, click the document's title.
Displaying collection contents
You can display the contents of your collection database
to see which attributes are set for each collection. The default installation
of Netscape Enterprise Server uses the HTML-description.pat file
to display information about each of your collections that have been defined
as displayable (NS-display-select = YES) in the dblist.ini
file. The collection contents typically include these items:
-
collection name, label, and description
-
collection format
-
number of attributes in the collection and a list of their names
-
number of documents in the collection
-
collection size and status
-
language and character set
-
input and output date formats
To display your collection database contents, type
this line in the web browser's URL location field:
http://yourServer/search?NS-search-page=c
Using the query operators
To perform an effective search, you need to know
how to use the query operators. You can only do Boolean searches, so all
the subsequent information is based on Boolean search rules.
Note
The query language is not case-sensitive. The examples use
uppercase for clarity only.
The search engine interprets the search query based
on a set of syntax rules. For example, by entering the word region,
the actual word region and all its stemmed variations (such as regions
and regional) are found. The search results are ranked for "importance,"
which means how close the matched word comes to the originally input search
criteria. In the example above, region would rank higher than any
of the stemmed variants.
Not all queries rank their results. Only those
queries that can have varying
degrees of matching can be ranked. For example, <CONTAINS>
queries
either do or do not contain the given string, but <NEAR>
queries can be
ranked according to how close the words are to each other: words closer
together are listed at the top of the search results, while those that
are far
apart are put at the bottom of the results.
Default assumptions
The search query language has some implicit defaults
and assumptions that dictate how it interprets your input. In some cases,
you can circumvent the defaults, but here is how the search engine decides
what you want as the search results:
<STEM>--Search finds all documents
that contain any stemmed variant of the search word or phrase. The search
engine looks at the meaning of the word, not just its spelling. For example,
if you want to search on plan, the results would include documents
that contain planning and plans, but not those that contain
plane or planet.
<MANY>--Search considers how
often the search word or phrase appear in the found documents and ranks
the results for frequency (or relevancy).
<PHRASE>--Search considers words separated
by spaces to be part of a phrase. For example, Monterey otter is
interpreted as a phrase and both must be present and together to be found.
Such a search would not find documents containing sea otter or Monterey
Bay.
Note
In any case where it's not clear that two words are to be considered
as a phrase, you can use parentheses for clarity. For example,
<PHRASE> (rise "and" fall).
OR--Search considers each word or phrase
in the query separated by a comma to be optional, although at least one
must be present. In effect, this is an implicit OR operation. For example,
Monterey, otter is interpreted as find documents that contain either
Monterey or otter. Note that angle brackets are not required
for OR.
Search rules
To create complex searches, you can combine query
operators, manipulate the query syntax, and include wildcard characters.
Angle brackets
With the exception of the AND, OR,
NOT, and the date and numeric comparison operators, you need to
enclose query operators in angle brackets, as in <CONTAINS>
and <WILDCARD> .
Combining operators
You can combine several query operators into a single
query to obtain precise results. For example, you can input the following
query to limit your search to those documents that have Bay and
Monterey but excludes those that also mention Aquarium
Monterey AND Bay NOT <CONTAINS> Aquarium
You can achieve even greater precision by including
some implicit phrases, as in the following query that finds documents that
refer to the Monterey Bay Aquarium by its full name and also mention
otters but do not refer to shark:
Monterey Bay Aquarium AND otter AND NOT shark
Using query operators as search words
You can use any of the query operators as a search
word, but you must enclose the word in quotation marks. For example, you
could search for documents about the ebb and flow of the tides with
the following query:
<CONTAINS> ebb "and" flow
Canceling stemming
You can cancel the implicit stemming by using quotation
marks around a word. For example, you can be exact by using a query such
as this:
"plan"
This search only results in documents that contain
the exact word plan. It ignores documents with plans or planning.
Modifying operators
You can use AND, OR, and NOT to modify other operators.
For example, you may want to exclude documents with titles that contain
the phrase theme park. A query such as this would solve this problem:
Title NOT <CONTAINS> theme park
Determining which operators to use
Use the following reference to help determine which
operators to use. Note that the query language is not case-sensitive, so
<starts> and <STARTS> are equivalent. This document uses uppercase
for clarity only.
Deciding which operator to use
Type of Search
|
Valid Operators
|
Examples
|
Finding documents by date or numeric value comparison.
|
is equal to (=),
greater than (>),
greater than or equal to (>=),
less than (<),
less than or equal to (<=)
|
DATE >= 06-30-96
Finds documents created on or after June 30, 1996.
|
Finding words or phrases in specific document
fields or in specific locations in the field.
|
<STARTS>,
<CONTAINS>,
<ENDS>,
is equal to (=)
|
Title <STARTS> Help
Finds documents with titles that start with Help.
|
Finding two or more words in a document.
|
AND,
<NEAR/1>
|
specifications AND review
Finds documents that contain both specifications
and review.
|
Query operators: a reference
The following table describes some commonly used
operators and provides examples of how to use each one. All are relevance
ranked except where explicitly noted.
Query language operators
Operator
|
Description
|
Examples
|
AND
|
Adds mandatory criteria to the search. Finds
documents that have all of the specified words.
|
Antarctica AND mountain climb
Finds only documents containing both Antarctica
and mountain climb plus all the stemmed variants, such as mountain
climbing.
|
<CONTAINS>
|
Finds documents containing the specified words
in a document field. The words must be in the exact same sequential and
contiguous order.
You can use wildcards. Only alphanumeric values.
Does not rank documents for relevance.
|
Title <CONTAINS> higher profit
Finds documents containing the phrase higher
profit in the title. Ignores documents with profits higher in
the title.
|
<ENDS>
|
Finds documents in which a document field ends
with a certain string of characters.
Does not rank documents for relevance.
|
Title <ENDS> draft
Finds documents with titles ending in draft.
|
equals (=)
|
Finds documents in which a document field matches
a specific date or numeric value.
|
Created = 6-30-96
Finds documents created on June 30, 1996.
|
greater than (>)
|
Finds documents in which a document field is
greater than a specific date or numeric value.
|
Created > 6-30-96
Finds documents created after June 30, 1996.
|
greater than or equal to (>=)
|
Finds documents in which a document field is
greater than or equal to a specific date or numeric value.
|
Created >= 6-30-96
Finds documents created on or after June 30, 1996.
|
less than (<)
|
Finds documents in which a document field is
less than a specific date or numeric value.
|
Created < 6-30-96
Finds documents created before June 30, 1996.
|
less than or equal to (<=)
|
Finds documents in which a document field is
less than or equal to a specific date or numeric value.
|
Created <= 6-30-96
Finds documents created on or before June 30,
1996.
|
<MATCHES>
|
Finds documents in which a string in a document
field matches the character string you specify.
Ignores documents that contain partial matches.
Does not rank documents for relevance.
|
<MATCHES> employee
Finds documents containing employee or
any of its stemmed variants such as employees.
|
<NEAR>
|
Finds documents that contain the specified words.
The closer the terms are to each other in the document, the higher the
document's score.
|
stock <NEAR> purchase
Finds any document containing both stock
and purchase, but gives a higher score to a document that has stock
purchase than to one that has purchase supplies and stock up.
|
<NEAR/N>
|
Finds documents in which two or more specified
words are within N number of words from each other. N can be an integer
up to 1000. Also ranks the documents for relevance based on the words'
proximity to each other.
|
stock <NEAR/1> purchase
Finds documents containing the phrases stock
purchase and purchase stock.
Ignores documents containing phrases like purchase
supplies and stock up because stock and purchase do not
appear next to each other.
When N is 2 or greater, finds documents that contain
the words within the range and gives a higher score for documents which
have the words closer together.
|
NOT
|
Finds documents that do not contain a specific
word or phrase.
Note: You can use NOT to modify
the OR or the AND operator.
|
surf AND NOT beach
Finds documents containing the word surf
but not the word beach.
|
OR
|
Adds optional criteria to the search. Finds any
document that contains at least one of the search values.
|
apples OR oranges
Finds documents containing either apples
or oranges.
|
<PHRASE>
|
Finds documents that contain the specified phrase.A
phrase is a grouping of two or more words that occur in a specific order.
|
<PHRASE> (rise "and" fall)
Finds documents that include the entire phrase
rise and fall. The and is in quotes to force the search to
interpret it as a literal, not as an operator.
|
<STARTS>
|
Finds documents in which a document field starts
with a certain string of characters.
Does not rank documents for relevance.
|
Title <STARTS> Corp
Finds documents with titles starting with Corp,
such as Corporate and Corporation.
|
<STEM>
(English only)
|
Finds documents that contain the specified word
and its variants.
|
<STEM> plan
Finds documents that contain plan, plans,
planned, planning, and other variants with the same meaning
stem. Ignores similarly spelled words such as planet and plane
that don't come from the same stem.
|
<SUBSTRING>
|
Finds documents in which part or all of a string
in a document field matches the character string you specify.
Similar to <MATCHES>, but can match
on a partial string.
Does not work with wildcards.
Does not rank documents for relevance.
|
<SUBSTRING> employ
Finds documents that can match on all or part
of employ, so it can succeed with ploy.
Note: This works with literals only. If
you input web*, the asterisk does not work as a wildcard, so the
search succeeds only with the exact "web*" string. |
<WILDCARD>
|
Finds documents that contain the wildcard characters
in the search string. You can use this to get words that have some similar
spellings but which would not be found by stemming the word.
Some characters, such as * and ?, automatically
indicate a wildcard-based search, so you don't have to include the word
<WILDCARD>.
|
<WILDCARD> plan*
Finds documents that contain plan, plane,
and planet as well as any word that begins with plan, such
as planned, plans, and planetopolis.
See the next section for more details and examples.
|
<WORD>
|
Finds documents that contain the specified word.
|
<WORD> theme
Finds documents that contain theme, thematic,
themes, and other words that stem from theme.
|
Using wildcards
You can use wildcards to obtain special results.
For example, you can find documents that contain words that have similar
spellings but are not stemmed variants. For example, plan stems
into plans and planning but not plane or planet.
With wildcards, you can find all of these words.
Some characters, such as * and ?, automatically
indicate a wildcard-based search and do not require you to use the <WILDCARD>operator
as part of the expression.
Wildcard operators
Character
|
Description
|
*
|
Specifies 0 or more alphanumeric characters.
For example, air* finds documents that contain air, airline,
and airhead.
Cannot use this wildcard as the first character
in an expression.
This wildcard is ignored in a set of ([ ]) or
in an alternative pattern ({ }).
With this wildcard, the<WILDCARD>
operator is implicit.
|
?
|
Specifies a single alphanumeric character, although
you can use more than one ? to indicate multiple characters. For example,
?at finds documents that contain cat and hat, while ??at
finds documents that contain that and chat.
This wildcard is ignored in a set of ([ ]) or
in an alternative pattern ({ }).
With this wildcard, the<WILDCARD>
operator is implicit.
|
{}
|
An alternative pattern that specifies a series
of patterns, one for each pattern separated by commas. For example,
<WILDCARD> `Chat{s, ting, ty}`
finds documents that contain chats, chatting, and chatty.
You must enclose the entire string in back quotes
and you cannot have any embedded spaces.
|
[ ]
|
A set that specifies a series of characters that can be
used to find a match. For example,
<WILDCARD> `[chp]at`
finds documents that contain cat, hat, and pat.
You must enclose the entire string in back quotes
and you cannot have any embedded spaces.
|
^
|
Specifies one or more characters to exclude from
a set. For example, <WILDCARD> `C[^io]t` finds documents that
contain cat and cut, but not cot.
The caret (^) must be the first character after
the left bracket.
|
-
|
Specifies a range of characters in a set. For
example, <WILDCARD> `Ch[a-j]t` finds documents that contain
any four-letter word from chat to chjt. |
Non-alphanumeric characters
You can only search for non-alphanumeric characters
if the style.lex file used to create the collection is set up
to recognize them. This file is in the HTML, news, and mail subdirectories
in the server_root/plugins/common/ directory.
Wildcards as literals
Sometimes you may want to search on characters that
are normally used as wildcards, such as *or ?. To use a wildcard as a literal,
you must precede it with a backslash. In the case of asterisks, you must
use two backslashes. For example, to search on a magazine with a title
of Zine***, you would type:
<WILDCARD>Zine\\*\\*\\*
Several characters have special meaning for the search
engine and require you to use back quotes to be interpreted as literals.
The special search characters are listed here:
-
comma ,
-
left and right parentheses ( )
-
double quotation mark "
-
backslash \
-
at sign @
-
left curly brace {
-
left bracket [
-
back quote ` (Note: You can only search on back quotes as literals
if the style.lex file has been set up to recognize it.)
For example, to search for the string "a{b", you
would type
<WILDCARD>`a{b`
For another example, if you wanted to search on the
string "c`t", which contains a back quote, you would type
<WILDCARD>`c``t`
Customizing the search interface
As server administrator, you can customize the search
interface to meet specific user requirements. All of the HTML-based forms
that the user sees are defined through a set of pattern files that set
up display formats for the search results page header and footer as well
as each search result record listed in response to a query. There are a
set of pattern variables that you can use to construct the forms used for
search input and output. Many of the variables are defined in the system
and user configuration files (userdefs.ini, webpub.conf,
and dblist.ini, which are discussed in "Configuring
manually.")
Note
The search home page, at http://yourServer/search-ui/examples,
also provides an introduction to the search interface as well as an online
QuickStart tutorial on customizing the interface. The tutorial discusses
the various pattern files and gives examples of how they can be changed
to produce different results.
HTML pattern files
A good place to begin customizing the interface is
by modifying the existing pattern files. After you see how they work and
you understand pattern variables, you can create your own pattern files
and change the configuration files and other pattern files to point to
them. In the default installation of Netscape Enterprise Server, the pattern
files are in this directory: server_root/plugins/search/ui/text.
(Make copies of your original pattern files so you can restore them afterwards.)
There are pattern files for different kinds
of collections: email, news, ASCII, PDF, and HTML as well as one for the
web publishing collection. (The web publishing pattern file is a special
case, using a great many collection-specific attributes as variables in
the dblist.ini file.) There are several general types of pattern
files, each of which has a particular use:
-
query.pat displays the standard and advanced query pages
-
tocstart.pat displays the header across the top of the search
results page
-
tocrec.pat displays each document listed on the search results
page
-
tocend.pat displays the footer across the bottom of the search
results page
-
record.pat displays a single highlighted document from the search
results page (See "Displaying a highlighted
document" for more information.)
-
descriptions.pat displays the collection contents
The pattern files contain HTML formatting instructions,
which define how elements look, and HTML search arguments and variables,
which define the text label or value that is displayed.
There are three kinds of pattern variables (discussed
further in "Using pattern variables"):
To see how these work together, here are some lines
from the standard query pattern file, NS-query.pat:
<input type="hidden" name="NS-max-records" value="$$NS-max-records">
<td align=left colspan=2>$$logo</td>
<td align=right><h3>$$sitename</h3></td>
<td align=right><b>$$queryLabel</b></td>
<td align=left> <input name="NS-query" size=40 value="$$NS-display-query"></td>
Each line contains standard HTML tags and one or
more variables with the $$ or $$NS- prefix. Examining
each line more closely requires looking at the configuration files mentioned
in "Configuring manually."
-
NS-max-records: Defined in the webpub.conf file. Because
this field is hidden, users cannot change this value, which defines how
many matching documents to return at a time. In the advanced HTML query
pattern file, NS-advquery.pat, this is a user-modifiable input
field.
-
$$NS-max-records: The search generates a variable from this field
that can be used in subsequent searches to calculate how many result records
to display at a time. Because this field is not modifiable here, the value
is set to that in webpub.conf file. In the advanced query, this
value could vary for each query.
-
$$logo: Defined in the userdefs.ini file. This could
be any image or text the user wanted to display on the form.
-
$$sitename: Defined in the userdefs.ini file as the server's
host name that is provided by the $$NS-host search macro.
-
$$queryLabel: Defined in the userdefs.ini file as a text
label for the query input field. In this case, the label on the form is
the word "For:"
-
NS-query: Defined in this pattern file as the name of the input
field.
-
$$NS-display-query: Defined in the userdefs.ini file.
The search generates a variable from this field that can be used in subsequent
searches to determine which word or phrase to highlight when an entire
matching document is displayed.
Search function syntax
The search function uses standard URL syntax with
a series of name-value pairs for the search arguments. This is the basic
syntax:
http://yourServer/search?name=value[&name=value][&name=value]
As you use the HTML search query and results pages,
you can see search functions and arguments displayed in the URL field of
your browser. When entered directly into the URL field, these are sometimes
called decorated URLs. You can also embed them in your pattern files
with the HREF tag.
You can create a complete search function
as an HREF element within a pattern file. The example given is from the
HTML-descriptions.pat file, which defined how collection information is
displayed. The following lines produce a heading for each collection for
the label ("Collection:") and provides a link to the actual collection
file through the collection's label (NS-collection-alias) that
was defined in the dblist.ini file.
<td colspan=6><font size=+2><b>$$collectionLabel</b>
<a href=$$NS-server-url/search?NS-collection=$$NS-collection>$$NS-collection-alias</a>
</font></td>
The HREF contains a complete search function by using
the following elements:
-
$$NS-server-url: A search macro that determines the user's server
URL.
-
/search: The search command itself.
-
?: The query string indicator. Everything after the ? is information
used by the search function.
-
NS-collection=$$NS-collection: This uses the search macro $$NS-collection
to define the collection's filename.
You can set up a search to use a variable conditionally
so that if there is no value associated with the variable, nothing is displayed.
The syntax is as follows:
variableName[conditionalized output]
For example, you could request that the document's
title be output if it exists. If there is no title for this document, not
even the label "Title:" is to be displayed. To do this, you would use code
like this:
$$Title[<P>Title: <B>$$Title</B>]
URL encodings
When you construct HTML instructions, whether in
decorated URLs or within a pattern file, you need to follow the rules for
URL encoding. Any character that might be misunderstood as part of an URL
should be encoded with a code in the format of %nn, where nn
is a hexadecimal code. Blanks are converted to the + symbol (plus sign)
in queries or to %20 in output. Table 11.6
shows the most commonly used URL codes.
Common URL encodings
Character
|
Description
|
Code
|
|
Space
|
%20
|
;
|
Semicolon
|
%3B
|
/
|
Slash
|
%2F
|
?
|
Question mark
|
%3F
|
:
|
Colon
|
%3A
|
@
|
At sign
|
%40
|
=
|
Equal sign
|
%3D
|
&
|
Ampersand
|
%26
|
Required search arguments
Although you can customize almost every aspect of
query and result pages, there are some arguments required for search functions
to display the different types of search pages. These arguments are required
whether the search function is in a decorated URL or embedded as an HREF
in a pattern file.
Search functions that display the search
query page require these arguments:
-
search query (the word, phrase, or attribute you want to search on)
-
collection (can specify more than once for multiple-collection searches)
Search functions that display the search results
page require these arguments:
-
NS-search-page=results (or r, in upper- or lowercase)
-
collection (can be specified more than once for multiple-collection searches)
-
search query
Search functions that display a highlighted document
require these arguments:
-
NS-search-page=document (or d, in upper- or
lowercase)
-
document path
-
collection (can be specified only once)
-
search query (necessary if you want to highlight the query data)
Search functions that display the collection contents
require only this argument:
-
NS-search-page=contents (or c, in upper- or lowercase)
Using pattern variables
By using pattern variables, you can customize the
search text interface and eliminate the need to update the actual HTML
pages as user requirements change. For example, if the interface has graphics
or text elements that change periodically, you can define a pattern variable
that points to a pathname where that graphic or text is maintained and
stored.
There are three categories of pattern variables:
-
variables defined in the userdefs.ini file, to which are added
a $$ prefix in decorated URLs and pattern files. For example,
uidir, logo, and title become $$uidir,
$$logo, and $$title.
-
variables defined in the configuration files, webpub.conf and
dblist.ini files, which have a NS- prefix where they
are defined in the configuration file and which have a $$NS- prefix
when they are used in decorated URLs and pattern files. For example, NS-max-records,
NS-doc-root, and NS-date-time become $$NS-max-records,
$$NS-doc-root, and $$NS-date-time.
-
search macros and variables generated by a pattern file, which always have
a $$NS- prefix. For example, $$NS-host, $$NS-get-next,
and $$NS-sort-by.
User-defined pattern variables
You can create any number of your own user-defined
pattern variables in the user definitions file, userdefs.ini,
or you can modify existing definitions. When one of these variables is
used in a pattern file, the $$ prefix is added to it. Variable names can
have up to 32 characters or digits, or combinations of both. Characters
can be letters A-Z in upper or lower case, hyphens (-), and underscores
(_). Names are case sensitive.
The default userdefs.ini file included
with Netscape Enterprise Server contains variables that are used to define
the search query page (labeled [query] in the file, the results
listing ( labeled [toc]), the document display page, (labeled
[record]), and the collection contents page (labeled [contents]).
Each line begins with a variable name and is followed by a definition for
that variable. Many are labels for screen elements, some are paths to other
files, and some have more complex contents. For example, the following
lines are from the query section of that file.
[query]
help=/help/srchhelp.html
title=ES3.0 Sample Search Interface
queryLabel=Search for:
collectionLabel=Collection:
booleanLabel=Boolean:
sortByLabel= Sort for:
copyright = Copyright © 1997 Netscape Communications Corporation.
All Rights Reserved.
The file also includes references to search macros,
such as $$NS-server-url, and can also refer to other user-defined variables,
as in the following lines:
uidir = $$NS-server-url/search-ui
icondir = $$uidir/icons
Search macros are described further in the section
"Macros and generated pattern variables."
You can use any supported HTML character
entity in your variable definitions. You can use entity names that are
defined in the &name; format as well as those defined with the
three-digit code in the &#nnn; format. In the userdefs.ini
code sample, the entity inserts a nonbreaking space
and © inserts a copyright symbol. Some of the more commonly
used entities are in Table 11.7.
Common HTML character entities
Numeric code
|
Entity name
|
Description
|
 
|
|
Space
|
"
|
"
|
Quotation mark
|
$
|
$
|
Dollar sign
|
:
|
-
|
Colon
|
<
|
<
|
Less than
|
>
|
>
|
Greater than
|
™
|
-
|
Trademark symbol
|
 
|
|
Nonbreaking space
|
©
|
©
|
Copyright symbol
|
®
|
®
|
Registered trademark |
Configuration file variables
Some variables are defined in the system configuration
and the collection configuration files. These use a prefix of NS-
in the configuration file to differentiate them from other markup tags
in an HTML page. To use these variables as arguments to the search function,
you add another prefix $$ to the variable, as in $$NS-date-time
and $$NS-max-records.
Variables that define defaults for all searches
on a server are defined in the system configuration file, webpub.conf.
For example, the default installation of Netscape Enterprise Server includes
the following variables in the webpub.conf file:
NS-max-records = 20
NS-query-pat = /text/NS-query.pat
NS-ms-tocstart = /text/HTML-tocstart.pat
NS-ms-tocend = /text/HTML-tocend.pat
NS-default-html-title = (Untitled)
NS-HTML-descriptions-pat = /text/HTML-descriptions.pat
NS-date-time = %b-%d-%y %H:%M
Although installations may vary depending on how
each server is configured, the most commonly found variables from the webpub.conf
file are listed in Table 11.8.
Commonly found variables defined in webpub.conf
Variable
|
Description
|
NS-default-html-title
|
The name given to HTML documents that do not
contain a user-defined title. Typically set to "(Untitled)."
|
NS-date-time
|
The date and time format to use when displaying
results.
|
NS-date-input-format
|
The format for inputting dates (the default is
MMDDYY).
|
NS-HTML-descriptions-pat
|
The pattern file to use when displaying the contents
of the collections.
|
NS-largest-set
|
The maximum number of records that can be handled
as matching the search criteria. The records are displayed in groups of
NS-max-records.
|
NS-max-records
|
The maximum size of the result set displayed
at one time. |
NS-ms-tocend
|
The pattern file to use for the footer at the
bottom of the search results page when searching multiple collections.
|
NS-ms-tocstart
|
The pattern file to use for the header at the
top of the search results page when searching multiple collections.
|
NS-query-pat
|
The query pattern file used when creating a query
page.
|
NS-search-type
|
The type of search to perform. Only Boolean is
permitted.
|
Collection-specific variables are defined in the
dblist.ini file. For example, the default installation of Netscape
Enterprise Server includes variables for the web publishing collection.
Among the variables defined there are:
NS-collection-alias = Web Publishing
NS-doc-root = C:/Netscape/SuiteSpot/docs
NS-url-base = /
NS-display-select = YES
The variables in your dblist.ini file may
differ according to the type of collections you are using, Table
11.9 contains some of the more commonly found collection-specific variables.
Commonly found variables in dblist.ini
Variable
|
Description
|
NS-collection-alias
|
The collection's label. Can be specified more
then once to search multiple collections.
|
NS-doc-root
|
The root directory for the documents in the collection.
|
NS-display-select
|
This indicates whether the collection is displayed
as part of the collection information listing, when NS-search-page=contents.
The default is YES.
|
NS-highlight-start
|
Begin highlighting at this point in the displayed
document. Typically this highlights the search query criteria. |
NS-highlight-end
|
End highlighting at this point in the displayed
document. |
NS-language
|
The language of the documents in the collection.
|
NS-record-pat
|
The pattern file to use when displaying a highlighted
document page. |
NS-tocend-pat
|
The footer pattern file associated with a collection
to be used when formatting the search results.
|
NS-tocrec-pat
|
The record pattern file associated with a collection
to be used when formatting the search results.
|
NS-tocstart-pat
|
The header pattern file associated with a collection
to be used when formatting the search results.
|
NS-url-base
|
The base URL used when constructing the link
used to locate the file.
|
Macros and generated pattern variables
There are some search macros that you can use in
your pattern files or decorated URLs, and the search function itself generates
some pattern variables that you can use in subsequent search requests to
define how the later output is to be displayed. These macros and variables
have a prefix of $$NS- to indicate their use.
For example, after doing an initial search query
that results in 24 documents on the results page, you can reuse the search-generated
$$NS-docs-matched and the $$NS-doc-number variables to
help define a document page displaying one of the documents in detail.
In this way, you can tell the user that this document is number 3 of 24
documents returned for the original search.
The search macros and the generated variables
that you can use in a subsequent pattern file or decorated URL are listed
in Table 11.10.
Macros and generated pattern variables
Variable
|
Description
|
$$NS-collection-list
|
An HTML multiple select list of all the collections
in dblist.ini where NS-display-select is set to YES. |
$$NS-collection-list-dropdown
|
An HTML drop-down list version of NS-collection-list.
|
$$NS-collections-searched
|
The number of collections searched for this request.
|
$$NS-display-query
|
The HTML-displayable version of the query that
is generated for a results page.
|
$$NS-doc-href
|
The HTML HREF tag for the document. This provides
a URL to the original source document. For email, this is in the form mailbox:/boxname?id-messageID
and for news, it is in the form news:messageID.
|
$$NS-doc-name
|
The document's name.
|
$$NS-doc-number
|
The sequence number of the document in the results
page list.
|
$$NS-doc-path
|
The absolute path to the document.
|
$$NS-doc-score
|
The ranked score of the document (ranges 0 to
100).
|
$$NS-doc-score-div10
|
The ranked score of the document (ranges 0 to
10).
|
$$NS-doc-score-div5
|
The ranked score of the document (ranges 0 to
5).
|
$$NS-doc-time
|
The creation time for a document in the results
list. To obtain this value, you must set NS-use-system-stat =
YES in the webpub.conf file. By default it is set to
NO, since system statistics are expensive. |
$$NS-doc-size
|
The size of the document rounded to the nearest
K. To obtain this value, you must set NS-use-system-stat = YES
in the webpub.conf file. By default it is set to NO, since system
statistics are expensive. |
$$NS-docs-found
|
The actual number of documents that the search
engine found for this request.
|
$$NS-docs-matched
|
The number of documents returned from the search
(up to NS-max-records) for this request.
|
$$NS-docs-searched
|
The number of documents searched through for
this request.
|
$$NS-get-highlighted-doc
|
This provides the URL for a highlighted document
in order to be able to display the document as HTML text with highlights.
|
$$NS-get-next
|
This variable gets the next set of search results
to be displayed. The set is equal to NS-max-records and is positioned
by using NS-search-offset.
|
$$NS-get-prev
|
This variable gets the previous set of search
results that has been displayed. The set is equal to NS-max-records
and is positioned by using NS-search-offset.
|
$$NS-host
|
The host name.
|
$$NS-insert-doc
|
A placeholder used in the NS-record-pat
pattern files for HTML to indicate where the source document is to be inserted. |
$$NS-rel-doc-name
|
The relative name of the document to display
creating a document page.
|
$$NS-search-offset
|
The offset into the set of records returned as
search results. Used to determine which set of records are displayed when
you use NS-get-next and NS-get-prev.
|
$$NS-server-url |
The URL for the server.
|
$$NS-sort-by
|
The sort sequence for the items on the results
page. You can select one or more of the available attributes for the collection.
The default is an ascending sort. |
Copyright 1997 Netscape Communications Corporation.
All rights reserved.