Written by Michael Peterson
Master of Information Science Program
School of Library and Information Science
Indiana University at Bloomington
Term Paper for
L542 Introduction to Human-Computer Interaction
Instructor: Dr. Andrew Dillon
Contents of Paper:
Interface A Interface B Interface C Interface D Figure 1A Figure 1B Figure 2 Figure 3A-D
Information retrieval (IR) has traditionally been the domain of librarians and information professionals (Harter and Hert, 1997). In the 1950s, as indexing system entries grew larger in size, subject heading indexes were returning too many documents for searchers to use. As a result, concept indexing (such as the use of controlled vocabularies and relational thesauri) was developed and searching by means of Boolean operators was implemented (Bates, 1998).
With the growth of the World Wide Web (WWW), millions of users now have access to searching systems that sift through millions of web documents (Zakon, 1998). Most search engines available on the web allow users to develop complex search queries through the use of Boolean modifiers, proximity modifiers (such as phrases or NEAR) and other sorts of delimiters, such as searching for URLs or for pages written in a specific language (Ginchereau et al., 1997; Tutorial: Guide to Effective Searching on the Internet).
However, search engine users typically have little formal training in how to construct search queries using Boolean operators and other sort of search query modifiers. Even if they had, the precise notation varies so greatly between some search engines and commercial search interfaces, such as LEXIS-NEXIS Academic Universe and Dialog, that being proficient with the syntax of one interface is often not beneficial for dealing with others and could even be detrimental if it causes searchers to mistakenly input incorrect syntax.
Information retrieval theorists describe search results in terms of precision, the percentage of retrieved documents that are relevant, and recall, the percentage of all possible relevant documents in the database that are actually recovered (Lancaster, 1991). As the number of documents in a database increases, both recall and precision can suffer. The two measures typically vary inversely with each other, with greater precision (i.e., retrieving fewer irrelevant items) coming at the expense of finding fewer of the relevant items that exist in the database as a whole (Lancaster, 1991).
With the immense size of the WWW, however, all thought of good levels of recall are thrown out; good precision is the key to successful searches of web-based documents. Boolean modifiers are therefore an important tool for WWW users, as Boolean and proximity modifiers increase precision and decrease recall (with the exception of the Boolean OR, which has the opposite effect).
At present most search engine interfaces are command based, a form that Shneiderman (1998) calls "unstructured text". As such, they require some notation or syntax to specify relationships between words when developing complex queries. Much work in the field of Human-Computer Interaction (HCI) has been to replace complex command-based syntax with graphically based direct manipulation (e.g., the development philosophy behind the Xerox Star, see Johnson et al. (1989)). On the other hand, command-line input provides a direct and efficient means for accomplishing tasks on the computer, once the syntax is known and remembered (Preece, 1994). What is a good compromise between these two alternatives?
The answer to that will depend on the tool being developed and may often involve allowing novice users to use a menu-based system or a drag-and-drop graphical interface. More advanced users should be allowed to use shortcuts or a command-line interface for entering their commands rapidly and directly, perhaps even bypassing much of the feedback and error checking that should be supplied to novice users (Shneiderman, 1998).
Most present-day search engines offer both a bare single text-box entry as their frontpage or 'simple' search engine interface and a more elaborate interface as the 'advanced' or 'professional' one. For example, the frontpage Lycos search engine (http://www.lycos.com/) offers a single text-entry box that allows only 20 characters to be visible. Based on the short length of the input box and the lack of any onscreen instructions (on that page), this interface is apparently intended to be used for one or two word queries and to discourage the use of any modifiers.
The advanced Lycos search interface (http://lycospro.lycos.com/), on the other hand, provides a text-entry box that allows 55 characters to be visible and has a pop-up menu that causes the words to be associated with some relationship (such as OR via "Any of the words"). In addition, other options are provided via radio buttons to limit one's search to type of content or fields (titles or URLs) or languages, etc.
Given the immensity of the document database that the world wide web and internet newsgroups are (Zakon, 1998), it is essential to reexamine the design of search engine interfaces from an usability standpoint to make them more effective for the typical user.
According to Shneiderman (1998), designers have five types of interaction styles from which to choose when designing a user interface:
Each of these has their respective advantages and disadvantages and all should be considered when designing usable IR interfaces. As Shneiderman (1998) points out, when users operate command line input they are actively initiating an operation, because they must recall proper notation and input their command directly. Menu-based interfaces require users to be more passive, because they choose from among a set of circumscribed options. Form fill-in, likewise, presents passive options to a user, but is perhaps even more constraining than menu-based interfaces, as each form input box has only one allowable type of data and gives the user no alternatives.
- command language
- menu selection
- form fill-in
- natural language
- direct manipulation (graphical/visual)
On the other hand, some search engines, such as Altavista (http://www.altavista.com/), have implemented natural language interfaces (Tomaiuolo and Packer, 1998) and studies of user interest in this format wait to be done. Lastly, much research is currently underway on ways to implement effective IR graphical displays (Shneiderman, 1998; Hendry and Harper, 1997; Eberts, 1994; A. Dillon, pers. comm.).
One of the great disadvantages of a command-based input is that a novice or casual user is unaware of the command options available to him, either because he has never seen them or has forgotten them from infrequent use (Shneiderman, 1998). This situation is not helped when search engines require users to hyperlink to another page to get search help and many of these pages are difficult to navigate and poorly organized (Sherman, 1998).
HCI research has demonstrated that computer users typically do not read instructions, except as a last resort and prefer to learn by experimentation with the interface (Farkus, 1998; Redish, 1998; Penrose and Seiford, 1988). If this is the case with the use of Boolean and proximity modifiers in search engines, then user testing should be performed to determine which types of "advanced" search interfaces aid the novice user to become aware of his options and aid the more experienced user to remember the means by which relationships between words can be formulated.
Nielsen (1993) draws up an additional category of user between novice and expert—the casual user who uses a system intermittently. For the casual user, system memorability is the most important design consideration. Intermittent users should be able to quickly discern the way to perform tasks by a combination of past experience and onscreen cues.
In general, an increase in the learnability of an interface will concomitantly increase memorability, though Nielsen (1993) points out that they are separate concepts. However, for the single-task based role of the search engine interface, easily learnable interfaces would also be highly memorable ones.
Therefore, it is my contention that the goal of search engine interface designers should be to create instantly learnable interfaces that require minimal use of help screens, minimize errors and aid users to formulate complex search queries that will increase the precision of their searches.
To this end, I have done questionnaire sampling and user-testing studies to test the following four questions:
Is the use of WWW search engines casual or infrequent relative to other computer programs?
Do search engine users know how to write Boolean-modified searches in their preferred search engine from memory?
Which types of search engine interfaces are more intuitive or instantly learnable for constructing Boolean-modified searches: command-line, menu-aided or form fill-in?
Based on their exposure to these various types of search engine interfaces and on their own past experience, which of these types of search interfaces do users prefer?
Materials and Methods
The experimental design involved an initial questionnaire given to 25 subjects prior to any testing, followed by user-testing of four different types of search engine interfaces with 12 of the initial 25 subjects and, lastly, a post-test questionnaire that asked the test subjects to rank the test interfaces.
The experiments were done with two user groups: graduate students in the School of Library and Information Science (SLIS) and graduate students  in the Biology Department at Indiana University at Bloomington.
As a prerequisite for entering the graduate program, SLIS students are required to take a course in computer-based skills (or must prove that they have equivalent preexisting skills) where they are trained how to use Boolean and proximity operators in computer-based searches. In addition, other course work in the SLIS curriculum amplifies upon this knowledge. Thus, I chose SLIS students as representative of experienced users of search engine interfaces in general and of Boolean-modified searches in particular.
All the Biology graduate students used in this study had general computer skills and have operated many kinds of common software programs, such as email programs, web browsers, word processors and spreadsheet programs. However, none of these students had received formal training in how to use Boolean operators. As a consequence I chose the Biology students as more representative of general computer users untrained in the skill of formulating Boolean-modified searches.
An initial questionnaire was given to 12 SLIS students and 13 Biology students to test my hypothesis of their representativeness of these two types of user groups and to evaluate the first two questions under investigation listed in the Introduction, namely:As the questionnaire was unique to this situation, no previously designed and tested questionnaire, such as the QUIS (Chin et al., 1988), was available (to my knowledge) for adaptation. In order to make the language of my self-designed questionnaire as unambiguous as possible and to have response options that would be useful for later scoring, the questionnaire was pilot tested by my HCI professor, Dr. Andrew Dillon, modified according to his suggestions and then pilot tested again on a fellow SLIS student and modified again based on his suggestions.
Is the use of WWW search engines casual or infrequent relative to other computer programs?
Do search engine users know how to write Boolean-modified searches in their preferred search engine from memory?
Most of the questionnaires were filled out in my presence, so there was little opportunity for subjects to "cheat" on the question that asked them to recite from memory the syntax they would use to formulate a Boolean modified search (Question 6 of Questionnaire 1).
Testing the Search Interface Interfaces
Four different WWW search engine interfaces were created for evaluation. The web-based forms were written in Perl 5.004 as a multi-part CGI script, using the Perl CGI.pm module (Stein, 1998).
All input from the forms was saved to a file for error checking of the user's input. User-testing was done in the IU Bloomington School of Library and Information Science Usability Lab. The subject's screen actions were recorded using direct-feed input from the computer into a time-stamped VCR recorder, so that the time required for input could be recorded and particular points of user difficulty could be reexamined.
Interface A uses four text input boxes that are connected by a pop-up menu with the options AND, OR, NOT or NEXT TO. Interface A was not based upon any existing search engine interface known to me. (However, see the Discussion for comparison to a similar engine.) It will be referred to as the "connector-menu interface".
Interface B has four fill-in text boxes, which are consecutively labeled: "All the words (AND):", "Any of the words (OR):", "Should NOT Have:" and "PHRASE:". This interface is similar to the "advanced" engine of Excite (http://www.excite.com/search/options.html?a-opt-t), though that engine uses different wording to describe each form fill-in box. It will be referred to as the "form fill-in interface".
Interface C has only a single text box and is the obvious mimic of the typical search engine interface. It will be referred to as the "command line interface".
Interface D has a single text box in conjunction with a single pop-up menu that lets the chooser select from one of four options: "All of the words", "Any of the words", "Exact phrase", and "Boolean phrase". This interface is based directly on the HotBot search engine interface (http://www.hotbot.com/) and uses the same wording for the menu choices (though HotBot offers more menu choices than my test interface). It will be referred to as the "single-menu interface".
Five SLIS students and seven Biology students (all of whom had completed the pre-test questionnaire) were each tested on all four search engines. Before beginning, the test subjects were asked whether they understood how to use a Windows-based PC and a two-button mouse. All subjects said that they did.
In addition, they were asked whether they were familiar with "Boolean-type modifiers such as AND, OR, NOT and PHRASE". Most subjects required only minimal reminding or instruction that "AND (for a two word search) implies that both words are required, OR implies that either or both can be included, NOT implies that a word should not be included and that PHRASE implies both AND and that the words need to be adjacent to each other".
Two Biology students needed a few minutes of additional instruction in the meaning of these modifiers. They were explained as defining a relationship between two words and, for one subject, Venn diagrams were used to demonstrate the difference between Boolean AND and Boolean OR.
Each subject went through four screens of each of the four interfaces, for a total of sixteen rounds of input. Subjects did not receive any output (i.e., retrieved search items); instead, each input screen was followed by another input screen until the last screen was reached.
All subjects were told that their input was being recorded by the computer to a separate file, that the computer screen was being recorded via videotape and that the whole process was being timed. Subjects were informed that they should not rush on account of being timed, but that they should approach each search engine like they would under normal circumstances ("you want to do it efficiently, but you also want to get it right").
Subjects were informed that the test was meant to ascertain which interface was the easiest or most intuitive to understand and use for Boolean-modified searches. Each subject was told that it was not s/he that was being tested, but the four interfaces relative to one another (as emphasized by Sweeney et al., 1993).
Each interface presented the subject with a direct instruction of which words to input and the sort of relationship that should be set up between them. For each interface, four queries were made requiring the subject to implement 1) a Boolean AND relationship, 2) an OR relationship, 3) an AND with NOT (three word) relationship and 4) a PHRASE. Each screen of each interface presented one of the following types of instruction:Input a search entry to find all documents that contain the word computer and the word mouse.
Input a search entry to find all documents that contain the word open or the word closed.
Input a search entry to find all documents that contain the word ruby and the word slippers but not the word Dorothy.
Input a search entry to find all documents that contain the phrase blue suede shoes.
All interfaces had the queries presented in the same order as above. However, each interface had a different set of words. To control for differences in timing to type the input queries, all word sets used in each group were of the same length and had the same number of lower case and upper case letters.
In addition, the tests were run over seven days and on each day, the sets were jumbled with respect to each other and with respect to which engine interface they appeared on in order to control for any possible bias of one of the word groups being more easy to input than the others. The word sets used are shown here:
AND OR computer mouse
AND // NOT
ruby slippers Dorothy
starship captain Kirk
baseball great Mantle
computer language Ada
string of pearls
poetry in motion
return to sender
blue suede shoes
The dependent variables of the test were 1) errors made in constructing the Boolean-modified search and 2) time taken to fill out and submit each search interface screen. Errors were classified into two types: significant and minor. Examples and description of both types will be discussed below (see Results). No spelling or typographical errors were made, but would have been ignored in any case.
At the bottom of each interface screen the notice "*SCROLL DOWN FOR INSTRUCTIONS*" appeared. Each interface was supplied with instructions for use on the same page as the input form, but they were not visible with the text entry boxes in view. Instructions were made as similar as possible in wording and length (see each interface demo for each of the instructions provided:
Interface A, Interface B, Interface C, Interface D).
Subjects were told that use of the instructions was optional; they were informed that they could look at them whenever they wanted if the method of query input was ambiguous or confusing, but that the instructions could be skipped if the query entry method of the interface appeared to be intuitive or obvious. The number of times each subject scrolled down to read the instructions (and on which queries of which interfaces) was recorded; in addition, the time required to read the instructions is also reflected in the total time required to formulate and submit the query.
Using the time-stamped videotaped recordings, the time from the appearance of each search engine on the screen to the time the subject pressed the Submit button was recorded. Frame number (30 frames/second) was used to measure times to half-second accuracy.
Immediately following the user testing, each subject filled out a four question follow-up questionnaire (see Questionnaire 2 for the full text) in which they did a relative ranking of the search engines from best to worst on the basis of four subjective criteria:The question content for the follow-up questionnaire was discussed with my HCI professor, Dr. Andrew Dillon, but no pilot study was done on the questionnaire. No subjects reported any confusion on how to interpret any of the questions.
Personal preference The rank order the subject would recommend the interface for other casual or intermittent users of search engines The rank order the subject would recommend the interface for other frequent users of search engines Ease of learning
The initial questionnaire was used to determine how frequently subjects use a computer for any task, how frequently they use the three software programs they most often use, how frequently they use WWW search engines, how often they use Boolean-type modifiers when using WWW search engines and, lastly, whether or not they know from memory how to formulate a simple Boolean-modified search on their preferred search engine that would find documents with the word "piano" but not the word "Steinway." The intent of this questionnaire was three-fold:Tables 1A and 1B show that the median value of search engine use by SLIS students is once a day, while for Biology students it is a few times per week. The overall distribution favors the view that the SLIS students use search engines more frequently than the Biology students do. At the extremes, two Biology students reported using search engines very infrequently (less than a few times per month), while two SLIS students reported using them many times every day.
determine whether search engines in general, and Boolean-modified searches in particular, can be considered to be of infrequent or "casual" use, in Nielsen's (1993) sense of the word;
determine whether Boolean-modifier syntax is memorable, given the variations in syntax required by popular search engines;
determine whether there is a significant difference between the SLIS subjects and Biology student subjects in use of search engines, use of Boolean modifiers and knowledge of Boolean syntax for their preferred search engine.
For the SLIS students, search engine use was on average slightly more frequent than use of the third most used (user-specified) software program, but not as frequent as use of the second most used program. In contrast, search engine use by Biology students was on average less frequent than use of any of the their three most used software programs.
In addition, the estimated average use of Boolean operators in search engine mediated searches for SLIS students was around 50% of the time, while for Biology students, it was somewhat less than that (Table 1C). One Biology student reported that s/he never uses Boolean modifiers, while one SLIS student reported using them in every search s/he performs. It is clear that there is a difference in the mean between the SLIS and the Biology students, but that the distribution around the mean is approximately the same for each group.
While the terms "casual" or "intermittent" are ambiguous and certainly relative to each situation, it appears that many people could be called intermittent users of search engines and of Boolean modifiers; a smaller number could be considered to be frequent users and a few are rare or non-users. Thus, designing search engine interfaces for quick learnability and/or memorability appears to be warranted.
A more definitive measure of the need for memorability lies in the question of whether subjects could recite from memory how to write a Boolean-modified query for "piano but not Steinway" in the proper notation of their preferred search engine. 10 out of the 13 Biology students said that they did not know or did not remember how to do it (Table 1C). Three Biology students attempted to write it out, but only one was able to do so correctly. The two that made errors used "not" or "and not" for search engines that require "+" and "-" notation (Altavista and Yahoo!).
In contrast, among the SLIS students, only one person claimed not to know or remember. Of the remaining 11 subjects, seven wrote a correct search query, but four wrote it incorrectly, again making the mistake of using "not" instead of "-" for Altavista or Yahoo!. 
Four conclusions can be drawn from these results:
- Nielsen's (1993) usability criterion of memorability for casual use applies to a large proportion of search engine users;
- Many users untrained in IR searching do not know how to construct simple Boolean-modified search queries on current WWW search engines;
- Users who have experience with writing Boolean-modified queries overwhelmingly adopt the use of "not" over "-" notation (and presumably of "and" over "+", but this was not tested directly);
- There were differences in pre-existing knowledge and experience with search engines between the Biology students and the SLIS students in this study.
SLIS Students Computer/Software/Search Interface Use
Biology Students Computer/Software/Search Interface Use
Estimated frequency of use of a computer (for any task) and of the three most often used (user-specified) programs are compared to the estimated frequency of use of WWW search engines of 12 SLIS students (Table 1A) and 13 Biology students (Table 1B). Tally marks ('1') represent each response.
Estimated Use of Boolean Modifiers and Test of Memorability of Boolean Syntax
Use of Boolean Modifiers SLIS Students Biology Students Every time 1 Most of the time (60%-90%) 11 11 ~Half the time 1111 1 Occasionally (20-40%) 1111 11111 Rarely (5-15%) 1 1111 Never 1 Totals 12 13 Wrote correct Boolean search query (BSQ) 7 1 Made error in BSQ 4 2 Didn't know/remember BSQ 1 10
Estimated frequency of use of Boolean-type modifiers when using search engines. Tally marks ('1') represent each response.
Number of respondents who wrote a search query for "piano not Steinway" for their own choice of search engine correctly or incorrectly and number of respondents who didn't know or remember how to write the query. (BSQ = Boolean Search Query)
Testing the Search Engine Interfaces
Subject performance on each interface was measured in two ways: time required to write and submit each input query and errors made in the submitted query. Figure 1A shows the average time required to complete all four screens of each interface for all subjects combined, SLIS students as a group and Biology students as a group. As expected, the SLIS students completed the query submissions faster than the Biology students did.
In terms of time taken to complete each query, the Biology students performed best on the connector-menu interface (Interface A) and worst on the command-line entry interface (Interface C). The SLIS students, on the other hand, performed best on Interface C and worst on the single-menu interface (Interface D). The order of performance (from best to worst) for Biology students was A, B, D, C, while for SLIS students it was C, B, A, D. For all averages combined, the order of performance was A, B, C, D.
Fig. 1B shows the same data ordered by search interface instead of user group. This view shows that the performance on Interface A was nearly identical in both user groups, but performance differed more widely on the other interfaces.
A. The average time of completion of all four screens for each interface is shown organized by subjects: by all subjects, by SLIS subjects and by Biology subjects. The table below the bar graph shows the actual average time in seconds.
B. The average time of completion of all four screens for each interface is shown organized by interface.
An important factor in this analysis is errors made. Though the distinction is somewhat arbitrary, errors were categorized into significant and minor classes. Minor errors are defined as those queries that would (almost certainly) be understood by any search engine query parser, but are less efficient than the optimal solution shown in the instructions. Queries that differed significantly from the format given in the on-screen instructions were called significant. In many cases, a well-written search engine query parser would be able to understand some of these, but for the purposes of this study were defined as significant.
Examples of significant errors:
Interface 'A': used AND to connect phrase words; used quotes for a phrase and put all words in the phrase in a single box
Interface 'B': none were made;
Interface 'C': AND and OR and NOT used instead of "+" and "-"; no quotation marks were used for the phrase;
Interface 'D': typed in Boolean operators ("and", etc.) with "All the words" menu option; used quotation marks for a literal phrase with the "Boolean phrase" menu option.
Examples of minor errors:
Interface 'A': none were made;
Interface 'B': used a Boolean operator (AND or OR or NOT) in the text boxes; used quotation marks in the PHRASE box;
Interface 'C': no spaces put between words separated by '+' or '-';
Interface 'D': used "Boolean phrase" menu option (with proper Boolean operators) instead of "All the words" or "Any of the words" for the AND query and OR query, respectively; used quotation marks for a phrase with the "exact phrase" menu option.
The number and distribution of significant and minor errors are shown in Tables 2A and 2B, respectively. Many significant errors were made in engines C and D, while performance on interfaces A and B was relatively error-free. The only type of query in which errors were made on interface A was with the phrase query. In these cases, subjects did not understand that the "NEXT TO" connector should be used. The only errors that occurred with engine B were additions of a Boolean operator (such as "and") to words in the form fill-in boxes.
All the significant errors committed on the engine C interface involved the use of Boolean word operators ("and", "or", "not") instead of "+" and "-". The majority of SLIS students and one Biology student made this error; in every case, the on-screen instructions were not consulted by the student making this error.
Every person who used the correct notation had consulted the instructions. This is an important finding. It implies using symbolic notation is not intuitive to users and that search engines designed with command-line interfaces should, therefore, either not use symbolic "+" and "-" notation or allow both Boolean word operators and symbolic notation.
Lastly, proper use of interface D was problematic for a few students, particularly two Biology students who used the "All the words" menu option for every query, but typed in Boolean word operators. Two other students used the "All the words" option for the query that required use of the "Boolean phrase" option (because a NOT modifier was necessary). The last significant error was use of the "Boolean phrase" option instead of "exact phrase" for the PHRASE query.
The fact that two out of seven Biology students had significant problems with Interface D indicates that a significant portion of novice or casual users may have difficulty with Interface D. The most common minor error occurred in Interface D (mostly among the SLIS students) when subjects used the "Boolean phrase" menu option instead of "All the words" or "Any of the words", but they formulated a correct Boolean query.
These errors are also instructive. The option "All the words" and "Any of the words" are not intuitive to some users and the wording of the option "Boolean phrase" may confuse some into thinking that it should be used instead of "exact phrase" for phrase queries. For some novice users, "Boolean phrase" was not an intuitive choice even though they wrote a query with Boolean modifiers.
The menu options in D were similar to the form fill-in options in B, but almost no errors were made with the latter interface. Three factors may account for this. First, the wording of engine B was more explicit. The first choice in B was "All of the words (AND):", whereas in D it was simply "All the words." The inclusion of the "(AND)" may have made its meaning clearer. Second, all of the options in B were visible, whereas in D they are hidden in a pop-up menu.
A follow-up study could be done with use of radio buttons in a D-type interface to test whether this was an important factor. Third, the addition of the "Boolean phrase" option in D may have made the other options less understandable due to their overlap in use. For example, some subjects may have thought that because a "Boolean" option was explicitly included, the other options were meant something different than they did in search engine B. A verbal protocol analysis would help to understand what users who have difficulty with this are thinking when they interact with it.
Another measure of user performance and the intuitiveness of the interfaces is use of the online instructions. Table 3 shows that Interface A's instructions were accessed the least, while the help screens on the other interfaces were viewed two to five times as often as Interface A's. SLIS students tended to view the instructions much less frequently, which accounts for the increased numbers of errors made by the SLIS students, especially on Interface C (see Table 2).
The data on help screen usage and errors clearly favor interfaces A and B as the most intuitive or easily understandable of the four examined in this study. It remained to find out which of the search engine interfaces subjects preferred after using them.
Number of "Significant" Errors Per Interface
Number of "Minor" Errors Per Interface
The number of the errors each subject made per engine interface is shown. The first five columns under SLIS Students and the first seven columns under Biology Students are the number of times each individual subject made an error on each engine interface. Totals by group are shown to the right of the individual tallies and the combined totals of both groups are shown in the right-most column. The mean error rate per student (M) is shown in parentheses to the left of the totals. See text for discussion of what constituted an "significant error" vs. a "minor error".
Number of Times the Help Screen was Accessed
The number of the times each subject accessed the instructions is shown. The first five columns under SLIS Students and the first seven columns under Biology Students are the number of times each individual subject viewed the help screen of each engine interface. Totals by group are shown to the right of the individual tallies and the combined totals of both groups are shown in the right-most column. The mean rate of access per student (M) is shown in parentheses to the left of the totals.
Subjects were asked to rank the interfaces from best to worst in accordance with four criteria: 1) personal preference, 2) suggested use for other people who are casual or intermittent users, 3) suggested use for other people who are frequent users and 4) ease of learning. The results of the four questions averaged over all SLIS students and Biology students separately and jointly are shown in Figure 2.
The subjects' rankings were inverted for presentation, such that in Figure 2 the higher the value, the more preferable the interface was according to the specific criterion. The highest possible ranking is 4 and the lowest is 1. On the question of personal preference, the 5 SLIS subjects gave mixed reviews, with Interface A and D getting approximately equally strong ratings of 2.8 and 3, respectively. Interface C was their least preferred interface; it had a mean score of 1.8, about ~60% of the value of Interfaces A and D.
The 7 Biology subjects gave very strong approval of Interface A (~3.3), with strong support for D (~2.9) as well, and strong disapproval of Interface C (~1.4). Thus, Biology students, who are in general less experienced and formally trained than the SLIS students in implementing Boolean-modified queries, preferred both Interfaces A and D by a two to one ratio over the command line interface. Whether there is a significant difference between the Biology students' preference of A vs. D awaits a formal statistical analysis and, perhaps, an increase in the number of test subjects.
Figure 2In Figures 3 A-D, a scatter plot is shown of the personal preference ranking (from 1 to 4) of each interface vs. the product of the relative ranking of overall search engine use and frequency of use of Boolean notation, as estimated by each subject on their pre-test questionnaire. This product was calculated by giving a relative value of estimated use from 1 to 6 (frequent use to infrequent use) for the levels given on the pre-test questionnaire and then multiplying these numbers (see full text of the Questionnaire 1).
The average rankings of the four subjective criteria in the post-test questionnaire are shown organized by subjects. The highest possible rating if 4 and the lowest is 1.
Thus, the range for the product runs from 1 (subject uses search engines many times every day and uses of Boolean notation every time) to 36 (subject rarely uses search engines and never uses Boolean notation). As before, with the personal preference ranking of the interfaces, 4 was highest approval and 1 was lowest approval.
These scatter plots are meant to test whether there is a correlation of frequency of use of search engines and Boolean notation with interface preference. In most cases, if any correlation exists it is weak and would require a statistical analysis to ferret out the strength of correlation. For Interface A, however, there does appear to be some correlation between previous infrequent use of search engines and Boolean notation with a stronger preference for the interface (Fig. 3A). Conversely, for Interface D, there may be a weak correlation between preference for it and frequent use of search engines and Boolean notation (Fig. 3D).
The other measures of the post-test questionnaire had results roughly similar to those seen for personal preference for both SLIS and Biology students (Fig. 2). SLIS subjects did not perceive much difference among the interfaces for those who might use search engines frequently vs. those who might use them infrequently, with the exception that Interface A seemed slightly more appropriate for frequent use rather than casual use and vice versa for Interface B (Fig. 2). All interfaces were ranked approximately equally learnable by SLIS students.
The Biology students ranked Interface A as the best interface on all four measures. The only strong differences between each of the measures was that Interface B was perceived to be more useful for casual users than for frequent users. Interfaces A and B were ranked as the most learnable.
Scatter plots of the rating of each interface (as shown in Fig. 2) vs. the product of the imputed values of search engine (SE) use and Boolean notation (BN) use. The numerical value of SE use and BN use was taken from the initial questionnaire (Table 1) and give the following values:
Frequency of SE use Imputed Numerical Value Many times every day 1 Once a day 2 A few times a week 3 Once a week 4 A couple times a month 5 Rarely 6
Frequency of BN use Imputed Numerical Value Every time 1 Most of the time (~60-90%) 2 About half the time 3 Occasionally (~20-40%) 4 Rarely (~5-15%) 5 Never 6
Thus, the range of the product of these two runs from 1 (use SEs many times every day and BN every time) to 36 (rarely use SEs and never use BN).
As before, the rating of the interfaces runs from 1 (= lower approval rating) to 4 (= highest approval rating).
Nielsen (1993) defines the usability of computer systems in terms of five components: learnability, efficiency, memorability, errors (error rate) and satisfaction. When analyzed in terms each of these attributes, Interface A emerges as the most usable of the four interfaces tested in this study.
The help menu of Interface A was consulted the fewest number of times of all the interfaces by both SLIS and Biology students (Table 3). Though all four interfaces were approximately ranked as equally easy (or difficult) to learn by the SLIS students, Interface A was ranked to be the most learnable by the Biology students (who perhaps had more learning to do than the SLIS students) (Fig. 2).
Among the Biology subjects (generally novice users of Boolean-modified queries), overall time of completion of the four queries was shortest for Interface A (Fig. 1). For the more experienced SLIS students, Interface A took longer than B and C by 5 and 7.5 seconds, respectively. Whether these differences are statistically significant awaits formal analysis.
Use of Interface D was both slow and inefficient as indicated by the high number (and nature) of minor errors (Table 2B). These errors typically involved choosing the less efficient "Boolean phrase" option that requires input of the Boolean modifiers over "All the words" or "Any of the words" which only requires a list of words.
Nielsen's (1993) meaning of efficiency goes beyond simple time taken to finish a task. His definition states that for users who have learned the system a high level of productivity should be achievable. There is a potential problem here for Interface A. In terms of the number of mouse clicks required to enter the query "ruby and slippers not Dorothy", for example, Interface A requires five clicks, Interface B requires two, Interface C requires one, and Interface D requires two. Multiple uses of the mouse when entering words are much less efficient than using a command line interface (Olson and Olson, 1990).
One of the Biology students who volunteered information as to why s/he had picked Interface A as his/her least favorite for personal use said that it required too many cursor jumps with the mouse or tab key and, therefore, did not flow well. My original hypothesis was that most users, especially experienced users, would not prefer Interface A for this very reason.
Many HCI researchers have cautioned about the difference between testing for learnability and for long-term usability, especially among skilled users (Nielsen, 1993; Stevens, 1983, A. Dillon, pers. comm.). An explicit interface, one that is "highly supportive" (Dillon, 1987) and has many on-screen cues, continuous error-checking, or (as with Interface A) a strongly predefined means of interaction, may become disruptive to users as they become more skillful and experienced with the interface.
Based upon initial exposure to the various types of interfaces tested here, users did not seem to draw a distinction between which are more appropriate for casual users and which would be better for frequent users. SLIS students ranked D as the best for both types and Biology students ranked A as the best for all measures. A longer term study may find that users have a different set of preferences after interacting with the interfaces for an extended period of time.
Memorability would be directly tested by having subjects return to take the test after again some period of time. This was not tested in the present study. However, this attribute/variable was indirectly tested for some of the interfaces, as all of the students have used typical search engines resembling C and many have used the HotBot engine, resembling D and the Excite advanced search format resembling B. The fact that A did nearly as well as B and C (in terms of overall time taken) with SLIS students and better than the other interfaces with Biology students suggests that it may be more "instantly learnable", such that memorability is automatically built in.
Search engine interfaces should be designed to be instantly learnable and instantly memorable. Shneiderman (1992) has remarked on the need for convenient and memorable interfaces:Instead of feeling inadequate or foolish because they cannot remember a complex sequence of commands, [users] should complain to the designer who did not provide a more convenient mechanism, or should seek another product that does. (p. 35)The interfaces that are easiest to learn may turn out to be the most memorable as well. A follow-up test with the same subjects using these interfaces would test this attribute of usability more explicitly.
In terms of "significant" errors made, Interface B was the best with zero errors, while Interface A had only 3 committed compared to 16 and 11 for C and D, respectively (Table 2A). However, the high number of errors committed on Interface C should be understood in the specific context of the rules set up for this test. Were the system designed to allow natural word Boolean modifiers instead of "+" and "-"syntax, then the number of significant errors committed would have dropped to one.
One report that reflects on the present findings is a study by Ledgard et al. (1980) that found that syntax or command structure for command-line interfaces has an impact on performance. These researchers tested two types of command structures; the first used symbolic notation, while the other used more natural keyword phrasing. For example, a command to replace all occurrences of "KO" with "OK" in the symbolic notation was defined as:
whereas the keyword notation required:
CHANGE ALL "KO" TO "OK"
The user group that used the symbolic notation made two to three times as many errors and completed a lesser percentage of tasks compared to the group that used the keyword notation.
While the difference between "+" and "and" in search engines is considerably simpler than the complex notation in the Ledgard et al. study, it bears noting that people tend to think in natural language terms; symbolic notation is something better left to an interface that will be frequently used and become automatic to its typical users. The results of this study found that even experienced users of search engines and those who have been trained in using Boolean operators tend to assume that natural language "and" and "not" keyword notation will be used.
Satisfaction was measured by asking subjects which interface they would prefer for personal use. Both the SLIS students and Biology students least preferred the command line interface (C). The Biology students most preferred Interface A (at a ratio of 2:1 over C). SLIS students ranked both A and D highly, with B a close third. At present it is unclear whether the differences in rankings by the SLIS students are statistically significant, but it is clear that they also preferred the more explicit interfaces over the command line interface.
The most usable interface
The connector-menu interface represented by A emerges as the most usable of the four interfaces tested. It was the most efficient to use both overall and for Biology students, and was relatively error-free. In addition, it was the most "instantly learnable" or intuitive, and probably, therefore, highly memorable as well. Lastly, it was the most preferred interface by Biology students and was highly regarded by SLIS students.
Interface B, while not a clear winner on any these issues, also emerged as a highly usable interface under Nielsen's definition of usability.
Caveats to adoption of Interface A
While this study supports Interface A as the best search engine interface of the four tested, there are a number of problems with the connector-menu interface that need to be considered before implementing such an interface:
However, these problems may be surmountable. The interface currently employed by the Livelink Pinstripe power search engine (a business news and finance oriented search engine available at http://pinstripe.opentext.com/search/power.html) uses a connector-menu interface with five longer text-entry boxes (up to 25 characters) that are each on a single line. Each has a connector-menu to the left, except for the first box. Its menu options are "and", "or", "but not", "near" and "followed by". This interface is very similar to Interface A.
- One problem is that its text boxes are rather small; when large words are entered or if the user decides to input a phrase using quotation marks into a single box, much of the input string will be hidden, thus inhibiting visual input error checking on the part of the searcher and detracting from its aesthetic appeal.
- A second potential problem is that as users can have web browsers set to various widths (depending on size of the monitor and size of fonts), in many cases the separated input boxes will not be on a single line. Decreasing the size of the browser window will cause right-hand side text boxes to be positioned under the left-most text box(es). Under controlled laboratory conditions of constant browser width, I was able to avoid this problem in my user-testing, but it may lead to confusion on the part of users and should be investigated further before implementing connector-menu interface.
- Only a set number of boxes are allowed (four, in the case of the Interface A). This may confuse novice users if they have more words to input than there are boxes. On the other hand, having only two words to enter, but four boxes available confused one subject in this study in the user-test. During the testing s/he stopped and asked whether s/he should fill in all four boxes by putting in the same two words in the latter boxes but in a different order (which s/he did on the first screen, but not on subsequent screens). Not surprisingly, this subject ranked Interface A as the lowest for all four criteria in the post-test questionnaire.
Having the text entry boxes stacked upon one another, instead of linked together on the same line, avoids the problem of variable browser width and may relieve the confusion that some users might feel about whether all the boxes need to be filled in. This is a question ripe for further usability testing.
Consistency and Natural Language Queries
Shneiderman et al. (1997) emphasize that IR interfaces need to be both more explicit in search options and standardized among themselves. One caveat to this that the authors note is that interface differences may be derived from a lack of compatibility in search modes; a probabilistic search mode cannot do Boolean type searches (in the strict sense) (Shneiderman et al. 1997). However, most unstructured text search engines allow the use of either true Boolean operators or symbolic "+" and "-" notation.
In situations where these sorts of modifiers are truly not allowed, some indication of that fact needs to be made on the front page of the IR interface. If other alternative forms of modifiers are allowed (such as proximity or truncation expansion), then I suggest that a more explicit interface would aid the user, as well as visible instructions. For those search engines that keep command line interfaces, both symbolic notation and natural (Boolean) word syntax should be allowed and properly parsed.
Another option that has received attention is to use natural language queries instead of Boolean modified queries. The Altavista search engine has recently adopted this format. The technical feasibility natural language searching has advanced considerably over the past two decades (Turtle, 1994; Lancaster, 1991).
Turtle (1994) found that natural language systems scale to large databases even better than Boolean modified queries. Natural language queries often utilize syndetic relationships among words, whereas Boolean operators are generally used with simple free text searching (Vorhees, 1998; Lancaster, 1991). As a result, natural language systems may scale to larger databases better than Boolean operators (Turtle, 1994). The usability and acceptability of natural language systems vs. Boolean aware systems need to be compared.
Improving Search Engine Usability
For revenue, search engines companies typically rely on advertising dollars that are tied to measures of engine use (Reid, 1997). Thus, as search engines become more of a commodity in the information economy, their purveyors should be keenly aware of usability issues.
In discussing the stages of adoption of a technological tool, Norman (1998) has observed that the transition point of early adopters to late adopters roughly coincides with the transition from the time when better technology drives the adoption of a device or tool to the time that greater usability is what drives sales and use. In the early marketing of search engines, better technology was emphasized as the selling point. For example, an advertisement for the Excite search engine that appears in the May 1996 issue of Wired Magazine states:"[I]f you want to turn loose the real power of the Net, use Excite.(TM) It's the first concept-based navigation tool. Just type in the general area you're interested in and Excite actually has the intelligence to search through reams of information and bring you the good stuff. . . . So next time you have a choice between Yahoo! and Excite, just ask yourself if this year could be better than last year. And try something new." (p. 176).Companies that develop and market search engines have generally focused on technical issues; for example, they have sought to improve the relevancy and recency of retrieved items and number of items indexed (Xie et al., 1998). Likewise, ratings of the search engines have tended to focus on these aspects as well. As the technical quality of search engines increases, differences between the various vendors will likewise decrease and the user experience will come to dominate user preferences (Norman, 1998).
Xie et al. (1998) have recently published a 14 point of list of customer expectations of search engines that they gathered from an analysis of various search engine reviews and an informal internet survey. Three of the 14 items have potential relevance to the present study: 1) that the layout upon first impression is easy to understand; 2) that different search methods are available; and 3) that good syntax consistency for the keywords in searching is implemented.
In recent years, most search engines have made efforts to improve the user experience, by adding rankings of returned results, categorization of results into groups (e.g., Northern Light, http://www.northernlight.com/), adding natural language query options (e.g., Altavista) and providing more explicit advanced search interfaces.
However, most search engines provide a command line interface in their default or "simple" search mode. The frontpages of Yahoo!, Altavista, Infoseek, Lycos and Excite all provide a single box for input with no selectable items, such as radio buttons, check boxes or pop-up menus. The findings in this report, however, suggest that search engine providers have gotten it backwards when it comes to the "simple search" command line interface and the more explicit "advanced search" interface.
A 1997 survey of search engine users found that 96% of respondents said that "ease of use" was a very important determinant of satisfaction with search engines ("Satisfied customers" at http://www.npd.com/corp/press/c_online4.htm). Shneiderman (1992) outlines the differences between users with different experience levels:First time users need an overview to understand the range of services . . . plus buttons to select actions. Intermittent users need an orderly structure, familiar landmarks, reversibility, and safety during exploration. Frequent users demand shortcuts or macros to speed repeated tasks and extensive services to satisfy their varied needs. (qtd in Shneiderman, 1997)In the present study, less experienced users performed best on a connector-menu interface and preferred that interface. More experienced subjects performed best on the command line interface—but only as long as that interface allows natural language Boolean modifiers. At the same time, however, they generally preferred the more explicit interfaces and recommended them over the command line interface for both casual use and frequent use.
On these results, search engines companies should consider using a more explicit interface for their front page "simple search". It is important to note that further testing should be done over a longer period of time to test whether preferences among experienced users might change.
In any case, connector-menu interfaces or form fill-in interfaces seem to be the most appropriate for casual, inexperienced users and were not rejected as disruptive by more experienced users in this study. While the number of test subjects in this study was small and the test results and conclusions are therefore of a tentative nature, the findings here are provocative enough to deserve attention and further study.
Bates, Marcia. J. (1998). Indexing and access for digital libraries and the internet: human, database, and domain factors. JASIS 49(13): 1185-1205.
Chin, J.P., Diehl, V.A. and Norman, K.L. (1988). Development of a tool measuring user satisfaction of the human-computer interface. Proceedings of SIGCHI '88, (pp. 213-218), New York: ACM/SIGCHI. Available at: http://www.lap.umd.edu/LAPFolder/papers/cdn.html
Dillon, A. (1987). Knowledge acquisition and conceptual models: a cognitive analysis of the interface. In D. Diaper and R. Winder (Eds.), People and Computers III. pp. 371-379. Cambridge: Cambridge University Press.
Eberts, R. (1994). User Interface Design. Englewood Cliffs, NJ: Prentice-Hall.
Farkus, D.K. (1998). Layering as a safety net for minimalist design. In (Ed. J.M. Carroll) Minimalism beyond the Nurnberg Funnel, pp. 247-274. Cambridge, MA: MIT Press.
Ginchereau, W., Howell, F. and Mitchell, K. (1997). Too much information. InfoWorld (May 19): 72-82.
Ginchereau, W., Howell, F. and Mitchell, K. (1997). Too much information. InfoWorld (May 19): 72-82.
Harter, S. and Hert, C. (1997). Evaluation of information retrieval systems: approaches, issues and methods. Ann. Rev. Info. Science and Tech. (ARIST) 32: 3-94.
Hendry, D.G. and Harper, D.J. (1997). An informal information-seeking environment. J. Am. Society for Info. Science (JASIS) 48(11): 1036-1048.
Johnson, J., Roberts, T.L., Verplank, W., Smith, D.C., Irby, C.H., Beard, M. and Mackey, K. (1989). The Xerox Star: a retrospective. IEEE Computer 22: 11-29.
Lancaster, F.W. (1991). Indexing and abstracting in theory and practice. Champaign, IL: Univ. of Illinois Press.
Ledgard, H., Whiteside, J.A., Singer, A. and Seymour, W. The natural language of interactive systems. Communications of the ACM 26, 7 (July 1983): 495-503.
Nielsen, J. (1993). Usability engineering. Cambridge, MA: Academic Press.
Norman, D. (1998). The invisible computer: why good products can fail, the personal computer is so complex, and information appliances are the solution. Cambridge, MA: The MIT Press.
Olson, J.R. and Olson, G.M. (1990). The growth of cognitive modeling in human-computer interaction since GOMS. Human-Computer Interaction 5: 221-265.
Penrose, J.M. and Seiford, L.M. (1988). Microcomputer users' preferences for software documentation: an analysis. J. of Technical Writing and Comm. 18: 355-366.
Preece, J. (1994). Human-Computer Interaction. Reading, MA: Addison-Wesley.
Redish, J. (1998). Minimalism in technical communications: some issues to consider. In (Ed. J.M. Carroll) Minimalism beyond the Nurnberg Funnel, pp. 219-245. Cambridge, MA: MIT Press.
Reid, R.H. (1997). Architects of the web: 1,000 days that built the future of business. New York: Wiley & Sons.
Satisfied customers could mean stiff competition for new search engines, says NPD report, NPD Online Research (NPD, New York, 1997). Available at: http://www.npd.com/corp/press/c_online4.htm. [Retrieved Dec. 1998]
Shackel, B. (1991). Usability—context, framework, definition, design and evaluation. In B. Shackel and S. Richardson (Eds.), Human Factors for Informatics Usability (pp. 21-38). Cambridge: Cambridge University Press.
Sherman, C. (1998). Search engine help: documentation and resources on the web. ONLINE (Nov/Dec): 51-56. Also available at: http://www.onlineinc.com/online.mag.
Shneiderman, B. (1998). Designing the user interface: strategies for effective human-computer interaction. 3rd Ed. Reading, MA: Addison-Wesley.
Shneiderman, B. (1992). Designing the user interface: strategies for effective human-computer interaction. 2nd Ed. Reading, MA: Addison-Wesley.
Shneiderman, B., Byrd, D. and Croft, W.B. (1997). Clarifying search: a user-interface framework for text searches. D-Lib Magazine, January 1997. Available online at: http://www.dlib.org/
Sweeney, M., Maguire, M. and Shackel, B. (1993). Evaluating user-computer interaction – a framework. Intl. J. of Man-Machine Studies 38: 689-711.
Stein, L. (1998). Official guide to programming with CGI.pm: the standard for building web scripts. New York: Wiley.
Stevens, G.C. (1983). User Friendly Computer Systems?: a critical examination of the concept. Behaviour and Information Technology 2(4): 3-16.
Tomaiuolo, N.G. and Packer, J. (1998). Maximizing relevant retrieval: keyword and natural language searching. ONLINE (Nov/Dec): 57-60.
Turtle, H. (1994). Natural language vs. Boolean query evaluation: a comparison of retrieval performance. SIGIR '94. Proceedings of the seventeenth annual international ACM-SIGIR conference on Research and development in information retrieval, pp. 212-220.
Tutorial: Guide to effective searching on the internet. Available at: http://thewebtools.com/searchgoodies/tutorial.htm. [Retrieved Nov. 1998.]
Vorhees, E.M. (1998). Using WordNet for text retrieval. In C. Fellbaum (Ed.), WordNet: an electronic lexical database (pp. 285-303). Cambridge, MA: MIT Press.
Xie, M., Wang, H. and Goh, T.H. (1998). Quality dimensions of internet search engines. Journal of Info. Science 24(5): 365-372.
Zakon, R.H. (1998). Hobbes' Internet Timeline v3.3. Available at: http://info.isoc.org/guest/zakon/Internet/History/HIT.html. [Retrieved: Nov. 1998]
Note 1: Some of the Biology subjects were post-doctoral fellowship researchers, but I use the term student for simplicity.
Note 2: One of the students that was counted as having done it correctly, used "not" and said that they would use it on Dogpile and Altavista; as this is correct for Dogpile (though not Altavista), this was counted as "weakly correct".