— Scouring the web for someone with a common name could become easier with software that automatically distinguishes between individuals by analysing the details of search results.
The software tool, developed by researchers at the University of Tokyo in Japan, picks apart the results of a search engine query, identifying unique identities within these results. For example, it can tell the difference between Michael Jackson the pop singer and a travelling beer expert of the same name, who also appears on the first page of results produced by Google.
The program analyses the first 100 results returned by Google in response to a name search. It then examines common words in each summary to see if any results may relate to different people with the same name.
The program clusters together results thought to relate to different people. For example, it creates one cluster of results for pop signer Michael Jackson, defined by key words such as "music" and "trial", and another for beer expert Michael Jackson, defined by the words "beer" and "travel". In testing, the software was between 70% and 95% accurate at telling apart people with the same name.
The software also analyses full-page results to identify words or phrases that may be relevant to each unique person. It considers the frequency with which words and phrases appear and their grammatical relationship with the name in question.
"The keywords extracted by the algorithm can be used to suggest better queries to the user [to help them refine their search]," says Danushka Bollegala, who developed the system with colleagues Yutaka Matsuo and Mitsuru Ishizuka.
For example, if a person searches for "Jim Clark" using the system, it will tell them how many different Jim Clarks there are and which keywords to use if they want to search a particular one.
But Bollegala believes the system could ultimately prove useful in other areas of computer science, such as language processing. This is because it can provide a better understanding of a statement by distinguishing between ambiguous statements. "We are working to extend the method to disambiguate other types of named entities, such as products, organisations and geographical locations," says Bollegala.
Victoria Uren, at the UK's Open University, agrees that such techniques could have widespread benefits. "Solutions to this problem are badly needed for the semantic web to work," she says. The "semantic web" involves enabling computers to process the meaning of online documents.
Andrew MacFarlane, who researches information retrieval at City University, UK, agrees that the system could be especially useful for web searching, but doubts whether any commercial search engine would be interested in implementing such a tool. "Since they get their revenue from advertising they will focus on improvements that return more clicks on adverts," he says.
The work was presented at the 17th European Conference on Artificial Intelligence this summer in Riva del Garda, Italy.