Back to my homepage


Reverse autocomplete



This is a proposal for a small improvement on the autocomplete feature, commonly found in many applications. To my knowledge, it hasn't been described or implemented elsewhere. If you know otherwise, or have some comments or suggestions, drop me a line to the address below. [note]

1. Introduction
2. Problem
3. Solution
4. Implementation



1. Introduction

Autocomplete is a well-known user interface feature implemented in many different kinds of software:

  1. Web browsers complete the address typed in the address field with entries from the browsing history
  2. Word processors (Word, Openoffice.org) have an autocomplete option which suggests previously entered words to speed up typing
  3. Programming editors (Vim, Emacs, Eclipse) can autocomplete function names, member variables or other text depending on the current context
  4. Most input boxes on web pages (except those for passwords) have autocomplete
  5. In command line terminals or interpreted programming environments (Matlab, R, etc.) it is possible to autocomplete the typed command or the directory/file name (usually with TAB or ESC key)
  6. In Google Suggest, as soon as I start typing, a drop-down list appears with popular queries that match with what I typed so far
  7. When I enter the first few letters of an email address or name of the person, most email clients (both Desktop and web-based) offer to complete the rest of the address
  8. many other examples...


There are many variations in the implementation details of this feature:


2. Problem

While there are many different approaches in designing an autocomplete system, I believe most existing systems use either a simple set/list when there are only a small number of elements, or a variant of the prefix tree (trie), or Radix tree data structures for many elements. They dynamically fill up this structure with elements, which are then matched against the text that appears to the left of the cursor.

One drawback, in my opinion, of existing implementations (including all of those mentioned above) is that they don't look at the location of the cursor and the text to the right of the cursor. Of course, when the cursor is at the rightmost position, which is the case most of the time, there is nothing to look at. In some cases however, the user is going back to make some correction. In this case, autocomplete behaves in various ways in the implementations I've seen, but none of them feels really natural.

Let's see some examples of this behaviour:


3. Solution

The solution is to simply take whatever method there was to search for the text before the cursor, and use it to match the text after the cursor as well. If a prefix tree is used, it is convenient to store the text to the right in reversed form for easier search. Most of the time the text after the cursor will be empty, so there won't be much overhead. When the cursor moves back, simply match the text both before and after the cursor, and only do autocomplete if something matches with both sides.

Example:

"United | America"


4. Implementation


Traditional autocomplete:



Bidirectional autocomplete:



Bidirectional autocomplete, also appending at both ends [note]:

 
 
Enter names of American presidents (autocomplete is done on the whole name)...
Try for example "Theodore Roosevelt", then change "Theodore" to "Franklin".


This example implementation in javascript is based on AutoSuggest v1.0. I modified the code to perform the reverse search as well. Both the original and the modified version are under LGPL. For the cursor location code I modified this script. As this is just a user interface proof-of-concept, I don't recommend using the javascript code as such, as it is not more than a quick hack.

Text left and right of the cursor is shown for more clarity. Note that I didn't use any of the complex data structures mentioned, all we have is a simple array, as can be seen from the source.

Notes

1.

It was pointed out that the idea is not new, but still it is surprising that as of now, it does not exist in mainstream applications. Zsh performs the kind of validation of paths that was mentioned earlier. Emacs does allow for backward/forward matching. And of course, I did not claim that matching with wildcards was a new idea, obviously this is only a subset of regular expression matching. The idea is only to use the cursor location and the claim that if there is already a mechanism for autocompleting in the traditional way, it can be extended to look forward from the cursor, without any overhead. Matching in a more general context is discussed for example here.

2.

Many people have pointed out that in order to be useful, an autocomplete system has to be able to append both to the beginning and the end of the text, regardless of the cursor position. In this way "States|of" should also match "United States of America", the same holds for "t|f" for example. This feature is mostly independent of the original idea to take cursor position into account. One possible issue is that this way sometimes too many suggestions are offered, especially for short strings. An other issue is that if we allow to expand the string at both the beginning, the end and at cursor location, a simple structure such as prefix tree can not be used for matching, which could be a problem for large databases. Arbitrary matching based on regular expressions can be done of course, but it is computationally more expensive. Additionally, fuzzy matching can also be allowed, to correct misspellings, as Google Suggest does.

3.

I appreciate all the comments and suggestions on reddit and news.yc. In particular, "zem" and "yxhuvud" on reddit pointed me to the "gaddag" data structure aimed at solving a similar problem when filling in crossword puzzles or playing scrabble.

4.

EDIT: 2010

As of now, Google has a new autocomplete behavior on their main search page which is more advanced than before and includes pretty much all that is suggested in this write-up, and more. For example the "Italia| football team" example works as expected (it seems that not the cursor location itself is used, but perhaps the difference between subsequent queries). Since July 2008 the write-up has been viewed close to half a million times and it has been covered on prominent usability blogs. I have sent the link in a suggestion to the Google Suggest mailing list, but did not receive a response. It is perhaps not too far-fetched to imagine that someone at Google might have picked up the idea from this post. If that was indeed the case, I have nothing against it obviously, that's why I wrote it in the first place, although I would have been glad to receive a message about it for some geek credit :)


(c) 2008. László Kozma (LKozma@gmail.com)