7 Web App Ideas

Here are some ideas for new web apps or features of existing services. As I don't have time to pursue them right now, I'm making them public, in the hope that someone will implement them or that one of them will be the missing piece in someone else's idea. I have to admit that I didn't do much "due diligence" to find out whether they exist already in one form or another, so if you know better, let me know in the comments below. Discussion also on news.yc.

1. Content-aware URL shortener
2. Site search that suggests frequent phrases
3. Web search spell checker
4. Share widget that tracks influence
5. Website traffic visualizer
6. Automated website optimizer
7. LaTeX resume builder

[more recent posts]

1. Content-aware URL shortener

If a webpage is taken offline for one reason or another, the links pointing to it become broken. We can often use Google Cache, the Internet Archive and other similar services for looking up an older version of the page that used to be there. However, often the content from the webpage was just moved elsewhere, it could still be available on a different address and in that case it would be useful to update the link to the new address automatically. URL shorteners could provide the service of finding a possible copy in the following way: when I want to shorten a URL, it will also index the linked page, storing the following information:

the most representative keywords of the linked page. Extract the terms that appear on the page but are rare otherwise. Pick terms which if searched together on Google, will likely return the page as the first result.
various hashes of the linked page: all html content, only text content, etc.

When someone clicks a shortened URL, one of the following happens:

redirect to the original URL
issue a notification that the page has changed since the shortened link was created: give as much information as allowed by the number and type of hashes created
issue a notification that the page is not available, and possibly suggest another address which comes up as a search result for the top keywords stored for the old page. Additionally check the hashes, and say if the page is really the same, or just likely similar.

2. Site search that suggests frequent phrases

A site search engine that can be installed on the server (or offered as a hosted service) for indexing all the public documents of a website and giving the usual text box for searching the site. Provides autocomplete feature on search queries, such as Google does. However, in Google's autocomplete (suggest) feature, it is frequent queries that are suggested (queries previously entered by you or by other users). For smaller sites the number of previous queries might be too small to make this useful. Sharing query lists among different sites is useless as well, if the sites are on different topics.

So instead of suggesting frequent queries, the idea is to suggest phrases that frequently appear in the documents themselves. This not only assists searching but gives the user an idea of what is on the site and lets him/her explore and browse the site in a novel (and possibly quite strange) way. This list of phrases is built during the indexing of the documents. It needs some thought to get this right. The phrases should be representative of each document, so they would likely be such combinations of words that are otherwise rare but appear possibly more than one time in the given document. This is something like the SIP (statistically improbable phrases) marker on Amazon.

On the other hand the phrases should be such that are intuitively close to what users would search. For extra usability, as the user starts typing the query, a drop-down would display possible continuations, and a number of documents in which given text appears, sorted by number of matching documents or alphabetically. As an added bonus, this would train the users not to ask natural language questions as queries, but think of the search query as a rough approximation of the document to be found, which it really is in fact.

3. Web search spell checker

Whenever I type in a language other than my native one (like I do now in English), I worry that something I write is incorrect. I don't just mean the misspelling of individual words, be it accidental like "teh" for "the" or something I learned wrong, like "aproximation" instead of "approximation". These kind of errors are caught quite reliably by current spell-checkers, although in ambiguous cases, often an understanding of the context would be needed to pick the correct choice.

What I worry more about are grammatical mistakes, and formulations which are grammatically correct but just sound like something that a native speaker would never say. A special case of the latter is when I try to use common phrases or famous sayings/proverbs or I refer to some meme or use some (pop-)cultural reference, a group of words that is always used in the same way, like "How I learned to stop worrying and ...". I wouldn't really write that, but you get the idea.

For clear-cut grammar mistakes, such as "Mike does has a car" often rule-based grammar checkers from more complex editors do the job, but often they fail as well. In any case, they can not claim to contain all the correct forms of all phrases, memes, cultural references, etc. Even grammar "errors" are often just subtle ambiguities which could be correct in some context.

As an example, suppose I want to write "as they say, a picture is worth a thousand words", but I'm not sure if it the saying really goes "a picture..." or "an image..." and whether it is "thousand" or "million".

What I invariably end up doing is constantly switch between the editor and a browser with Google and check all my alternatives or dubious formulations, to see which is used more often by people who know the language better. In the first example, if I really weren't sure, I would type "does has a car" and "does have a car" (including quotes) and see how many hits there are according to Google. Actually the method is a bit more subtle: I would almost subconsciously judge the source and the context as well. If the phrase appears in the New York Times, I would immediately accept it, if it appears in some random forum, surrounded by misspellings, I would still be doubtful.

In the second example I would perhaps search "is worth a" and see the correct version come up. As an added bonus, I'd get some background information from the Wikipedia article. From the other results I would get a confirmation from the surrounding text that the sentence is indeed used as I expected it to be.

Switching to Google to search for various groups of words has become a reflex for me and it was nice to discover that others are often using the exact same method. It actually makes sense, if indeed the meaning of a word is its use, as Wittgenstein claimed, what better way of establishing the correct meaning of some words than checking their use in a large corpus of text, like the whole internet.

A possible application of the idea would be to have a spell-checker built in with the editor, that consults Google in the background to give some suggestions about possible errors or weird phrases and to speed up the whole process. One would not have to manually switch to search mode and back, thereby breaking the train of thought and ending up reading random articles on Wikipedia. Also some of the thinking behind what to search and how to interpret the results would be taken over by this smart spell checker. Some approaches:

check combinations of neighboring words, whether they appear in other documents in the same order or not. If they never appear (or very few times, or in very unreliable sources) underline them as possibly incorrect. More thought needs to go into this however: it might be that the expression does not appear because some of the individual words are rare. "Mike does have a quadricycle" will not appear in this exact form anywhere, but "NOUN does have a NOUN" will be frequent and both Mike and Quadricycle are plausible.
an easier version would be to have the user mark parts of the text, for example by selecting them with the mouse cursor and the editor would provide the background check for the phrase and display it as an unobtrusive pop-up automatically. Say "appears 1,000,000 times, reputable sources, 99% correct". Or for "image is worth a thousand" it would display "appears 900,000 times, however "picture is worth a thousand" appears 6,000,000 times, probably that's what you wanted to say. Most probable continuation ".. words"."

In summary, the idea is just to improve existing spell checkers and create smarter ones by making use of the vast amount of text that is available in many languages on the internet and our ability for quickly searching it.

4. Share widget that tracks influence

This is an embeddable widget performing a similar task as DuckDuckGo's community page (link). Users who want to share the link get it with a unique parameter apended to it, so people who click the link automatically "vote" for the person who shared it. The leadorboard provides an incentive to be active in sharing the page. On DuckDuckGo this is mostly for users of Twitter, but this could become a feature of all trackback widgets. At a minimum, the list of pages or blogs linking to a page could be odered not only temporally but also by the amount of traffic they actually send to the page. This would result in a leaderboard of backlinks, which would give an incentive to share a link and write a description around it that encourages people to actually click it.

Although this idea needs further refining, it is quite obvious that the current "share" and "bookmark" buttons are broken. Everyone seems to add them just because everyone seems to add them, but there is not much incentive for visitors to actually use them and the analytics are poor. With this leaderboard-type widget, some of the analytics would be given back to the users. This could be useful for companies, for example, to see who are their most active customers, who drive most traffic to their site, but at the same time there would be an interesting competition among customers for this title if the widget actually worked as intended. It might not eventually be exactly like this, but it is still interesting to watch what will replace the existing fad of social buttons, which are, at least in my experience, mostly ignored by visitors.

An unrelated idea would be to show the social bookmark widget only for the service from which the user came to the site. It is annoying for me to see all the "share on Facebook" widgets if I am not using Facebook, and unpaid advertising for Facebook is not necessarily what the site owners had in mind. So the button could be for one service only, or for the group of services that the visitor has used previously. This can be discovered with a browser cache trick as it was explained in many blog posts. However, this targeting might be annoying to some users, so it's maybe difficult to get this exactly right.

5. Website traffic visualizer

Real-time web analytics services have appeared recently with lots of features, and complex dashboard-type controls, but I am thinking more of an entertaining visualization (for being used as a screensaver or on a large display in lobbies) than of serious analytics. One approach would be to show the website as a directed graph with nodes and edges nicely laid out and labeled. For static sites, the pages would be represented as the nodes and the links as the edges. For dynamic web pages there could be user-defined events and the sequences that lead from one to the other that are visualized as nodes and edges.

In both cases, the visualization would be just a nice graph on which bright impulses are traveling along the arrows from one node to the other, as visitors view one page after the other or perform specified tasks on the page. This would give a visual overview of what the users are doing on the site right now, where they spend more time, where they are stuck, where is most of the activity, which parts of the site are not explored, etc.

This idea is quite obvious, but still I don't know of an easy to use service or open source package that would achieve it.

6. Automated website optimizer

My knowledge about the so called A/B-testing is from the blogosphere only, I haven't seriously experimented with it, but my understanding is this: say you want to know which line works better on your website: "BUY XYZ, IT'S ONLY $10" or "BUY XYZ TODAY FOR ONLY $10" so you display one or the other randomly to your visitors and measure which version leads to higher percent of users actually buying, and you then go with that version.

However, there is a lot of information about users that is ignored here, such as the geographical region they are from, the kind of browser and system configuration they use, the way they found the site, etc. All these could be correlated with their various preferences, so they should be factored in when deciding which version to display to them.

Obviously, customization of websites has been done for ages, many sites display in a different language according to your location or browser settings, some adapt to your browser version or screen resolution, etc. But these variations are designed by humans. It would be fairly easy to design an engine that learns subtle relationships from the user parameters to the preferences. It could be for example, that users from the US prefer "BUY XYZ, IT'S ONLY $10", whereas users from the UK prefer the other version. Or it could be that users with IE6 prefer one version over the other. This could be explained by the fact that users using an old browser are probably less familiar with technology. The idea is however, that it's difficult to come up with all these explanations so a system would just take all the variants of the page that the designer has made and then automatically generate hypotheses and test them out. Looking at the results, the site owner could learn interesting things about the users and have better performance on the website.

In effect, this would be a nice machine learning task: we have the inputs x(i), the user parameters and the outputs y(i), encoding the preference for one page variant or the other. This would be computed from various other metrics: clickthrough rate, time spent on page, etc, according to the goal of the website owner. The goal is then to learn a functional relationship from x to y. Additionally, this could be cast as an active learning problem. The system decides what to show to each visitor, not only to maximize the goals based on our best current knowledge, but also in order to learn fast (try out new hypotheses). There is a classic trade-off here between exploitation (showing the variant thought to be the best) and exploration (trying to learn more). This active learning twist is important: it could reduce the necessary sample size. Without that we might need too many visitors to learn and testing could not be used on smaller traffic websites.

Such techniques have most probably been applied by companies such as Google, Yahoo, Amazon, etc. however, I see no reason why they can not be available in a custom web analytics and website optimization package, if not for being a bit harder to explain to users. Or are they available already?

7. LaTeX resume builder

One can easily create quite professionally looking resumes using LaTeX, however the difficulty of setting it up, finding a suitable template and editing the source is still too challenging for non-technical users. They will probably just stick with Word, but they will often still find the LaTeX-produced resumes better looking, and they would prefer that if the entry barrier would be lower. It would be useful then to have a web app where you can choose from a couple of templates, have sensible defaults and easy controls to change settings, a few text boxes to fill with the actual information, and a nicely formatted ready-to-print resume coming out in pdf at the click of a button, LaTeX being run on the server only. This could be a smaller feature of a professional social network site, or some service centered around job searching.