Distributed karma [more ideas]
an idea for fixing recommendation/reputation systems
This write-up refers to systems where users votes on material created/submitted by other members (comments, links), such as reddit or digg. Therefore it doesn't necessarily apply to movie/book recommendation systems, etc. What I refer to mostly is the online reputation of the users, rather than the recommendation methods themselves.
EDIT: as since the first writing I have learned more about recommendation systems and data mining in general, I am slightly embarrased by some of the claims I made in this text. I'll leave it up anyway for historical reasons and to remind me not to be so verbose in the future. You have been warned, continue at own risk :)
Vote-based commenting systems, forums, news aggregators have become widely popular, and are considered prominent examples of the web 2.0 phenomenon. The main assumption is that by collecting the opinions of a large number of people, one can somehow distill information that is meaningful for the individual. ("crowd wisdom")
The system works surprisingly well for a small community of people, who share similar interests. It is efficient in removing spam and obnoxious comments/submissions, and promoting valuable material.
When one tries to scale such a recommendation system, several problems arise:
- As the community grows, the quality of the average opinion declines. This doesn't necessarily imply that most people are stupid. As users see their opinions having smaller and smaller effect, they spend less effort in making educated decisions and taking part in quality discussion.
- As there are more and more users, the average user cannot remember a significant portion of the community, and the chance of finding material created by someone familiar becomes very small. There's a much smaller chance for influential people to emerge. Newcomers don't respect the established hierarchies, there aren't any expert voices. ("Eternal September")
- As the community becomes more diverse, the standard deviation from the average opinion becomes larger, and one can hardly identify with it anymore.
- It is a small minority of the whole community who votes, and this minority is not necessarily the most knowledgeable, etc. Even if everyone votes, expert opinions aren't given any weight, opposing opinions cancel each other out. We end up having the average review of anything on the internet '3 stars out of 5'.[1]
- Users can easily game the system, by creating multiple identities (sockpuppets), voting and commenting their own submissions, etc.
Towards a solution:
1. Karma.
A first idea would be to have a score of how reputable a user is (karma), then let the karma influence the weight each vote of the person carries. If the votes themselves generate karma for others, this would result in something similar to the Pagerank algorithm of Google. In that case, a web page can 'vote' another page by having a link to it, and the value of the vote depends roughly on how many votes(links) the page itself has. On a reddit-like system, the more votes a person has gathered with his own comments and submissions, the more influence his own votes could carry.The problems with this approach are the same as the problems of Pagerank. Users beat the system by accumulating large number of karma-points, by creating sockpuppets and voting their own submissions, by spending insane amounts of time on the forum, or by commenting in a way that agrees with the majority opinion, in order to attract votes. The bigger the community, and the lower the quality of discussion, the easier it is to beg for karma in such a fashion. In the case of reddit, although votes don't influence karma, agreeing with the common view on controversial political/religious/social issues, almost certainly results in huge number of votes...
Another problem with this approach is that people are different, and have different interests/opinions. Someone can be widely popular, it doesn't guarantee that I want to give weight to his opinion on a certain question, or have his submitted story on top of my reading list. We need a more personalized recommendation system that captures people's networks of trust and influence in a similar way to real-life relationships. Then each user would have a different view of the forum/news aggregator, with items ordered according to his (assumed) preference.
2. Recommendations.
Recommendations can be based on many things. A news aggregator could recommend stories that contain keywords similar to my interests. It could show stories that were liked by people who have voted similarly with me in the past. However, the safest bet is to recommend me a story/comment that was created/submitted by someone I trust. A news aggregator knows I trust someone if I voted on his stuff in the past. The more often I did, the more I trust him. This is how real-world recommendations work too.In my recommendation list, order submissions according to how many times I've voted up that user previously. The problem with this approach is that it requires a lot of voting to collect sufficient data to train a filter. If I don't spend all my time reading submissions, I will miss a new user who has great stories and since I never voted him, I don't even see his submissions.[2] The solution is to let trust propagate. The assumption here is that if I have up-voted someone's submissions/opinions, there's a good chance that I will also like stuff that he himself voted up.
3. Distributed karma.
We need a directed graph of all users, with links showing the amount of trust a user has towards content created by another user. To estimate this trust, we need to know how many times a user has voted for content created by the other. This will carry the largest influence. Furthermore, we need to know how much the other user is trusted by people I have voted for in the past. When calculating this trust, we need to consider the votes along paths from me to the other user, with smaller weights given to votes that are further form me.
Taking the level of trust into account when ordering recommended stories would solve most of the problems outlined before. Even if one user goes crazy up/down-voting certain articles, it doesn't affect my view, since I probably never trusted him. The difficult part is to find efficient data structures to implement a system like this.
Even though the graph is relatively sparse, it is probably computationally too expensive to calculate trust levels and update them in real-time for every vote. If we did so, for every vote, we would need to 'boost' the trust level the user has towards the other, and also make sure that this boost is propagated backwards to users who trust the one who just voted.
In a different method, we could use a pre-calculated table with all the trust levels between every possible pair of users, and update it periodically (at least daily) by a separate process. For 10,000 users, such a table would have 100 M entries, which would probably be impractical to store. Since most users either didn't vote much (they are new or inactive), or nobody voted them (spammers, trolls, new or inactive users, etc.) and since there is a small number of users who are trusted by a lot of people, it would be sufficient to cache the trust table entries for just a portion of the total user-base. For efficiency reasons, users who log in frequently should also have their trust-tables cached.
So what happens when I log in to view recommended stories. If my trust table is cached, every candidate story is weighted with the level of trust I have towards the user who submitted the story. If my table is not cached, or the user who submitted the story is not present in my table, the following search is performed:
1. Did I ever vote for that user?
2. Did anyone for whom I voted have the trust value cached for that user?
...
This way my trust towards that particular user is evaluated, and his story weighted accordingly. The nice thing is that it can be decided how much of the trust-table is cached or how much calculation can we tolerate. It's a direct trade-off between storage space and computation time. With smart caching, most of the fields needed should be pre-computed. We can also decide to what length it is practical to follow voting-chains. This also depends on the decay-function we use to evaluate trust. If there's absolutely no time to finish the computation, the user's absolute karma[3] can be used in some way as a short-cut. All cached values should have an expiry time of one day, after which a separate process needs to recalculate them, because subsequent voting might have influenced the levels of trust. With this method we still need to store all the votes each user has given.
Studying the structure of the graph would make it easy to identify influential users, groups of spammers, cliques with a certain interest, etc. It would be particularly interesting for a user to have access to his section of the trust-graph ("which users I trust the most", "who trusts me the most", etc.).
Notes:
[1] In a story told by Feynman, nobody is permitted to see the Emperor of China. To find out the length of the nose (of the Emperor) someone goes all over the country, asking people what they think the length is, in the end averaging the results.
[2] in fact, this fear of missing important material is probably the biggest obstacle for the adoption of recommendation systems. People prefer browsing through largely dull material, if they know it's the same for everyone else. This could account for the continuing popularity of TV and newspapers.
[3] an overall karma (total nr. of votes, or some other function) still needs to be calculated for overall ranking or other vanity-reasons