There are plenty of nifty search engines that don’t begin with “Goo” and end with “gle,” as Wired points out. But one site they forgot to include is MrTaggy, which was created by PARC’s Augmented Social Cognition Area.
Unlike other engines, this one doesn’t index the content of web pages. Instead, it uses PARC’s TagSearch algorithm, which aggregates and sorts the user-generated tags added to social bookmarking sites like Delicious. From there, users can give thumbs up or down for each and every result. The goal: be part-search, part-recommendation engine by tapping the wisdom of the crowd.
BBG asked the ASCA researchers to connect the dots between PARC’s earlier forays into search and MrTaggy. Here’s what Ed Chi, Manager of ASCA, shared with us:
First, one of the most efficient ways of browsing and navigating toward a desired information space was illustrated by the pioneering research on Scatter/Gather, a collaborative project on large-scale document space navigation between amazing researchers such as Doug Cutting (of Lucene, Hadoop fame) and Jan Pedersen (chief scientist at AltaVista, Yahoo, Microsoft for search).
The research done in early to mid 90s, showed how a textual clustering algorithm can be used to quickly divide up an information space (scatter step), ask the user to specify which subspaces they’re interested in (gather step). By iterating over this process, one can very quickly narrow down to just the subset of information items they’re interested in. Think of it as playing 20 questions with the computer.
Second, also around the mid-90s, an important information access theory was being developed at PARC in our research group called Information Foraging, which showed that you can mathematically model the way people seek information using the same ecological equations used to model how animals forage for food. We noticed that we can use information foraging ideas to model how people used Scatter/Gather to browse for information. It turns out that it was possible to predict how people use the information cues (which we called ‘information scent‘) in each cluster to determine whether they were interested in the contents inside the cluster. It turns out that Scatter/Gather can be shown to be a very efficient way to communicate to the user the topic structure of a very large document collection. In other words, people learned the structure of the information space much more efficiently using Scatter/Gather interfaces.
I hope it is quite clear that the relevance feedback mechanisms are very much inspired by Scatter/Gather. The related tags communicate the topic structure of what’s available in the collection. Through this process, we designed MrTaggy, hoping that it would be just as efficient as Scatter/Gather in communicating the topic structure of the space.
Third, our group had developed Information Scent algorithms and concepts to build real search and recommendation systems. These algorithms build upon earlier work on a human memory model called Spreading Activation.
TagSearch algorithm uses similar concepts here. It constructs a kind of Bayesian modeling of the topic space using the tag co-occurrence patterns.
TagSearch’s algorithm owes its heart and soul in concepts in Spreading Activation, which helps us find documents that are related to certain tags, and vice versa.
So what it’s like to actually use MrTaggy?
I started a search with the suggested tags “funny” and “video.” Less than 30 seconds later, I discovered this Bruno-related gem from FunnyorDie that had, until now, somehow escaped my attention.
Good find, MrTaggy!