KMi’s Jianhan Zhu has built the winning Enterprise Search engine, beating 22 rivals at the prestigious Fifteenth Annual Text REtrieval Conference (TREC 2006).
TREC, held in Gaithersburg, Maryland, USA, from 15th-17th November 2006, is the premier scientific forum for the comparative evaluation of search engine technology, and this year featured 107 groups participating from 17 different countries.
According to TREC program manager Ellen M. Voorhees: "TREC 2006 is the latest in a series of workshops designed to foster research on technologies for information retrieval. The workshop series has four goals: (1) to encourage retrieval research based on large test collections; (2) to increase communication among industry, academia, and government by creating an open forum for the exchange of research ideas; (3) to speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements in retrieval methodologies on real-world problems; and (4) to increase the availability of appropriate evaluation techniques for use by industry and academia, including development of new evaluation techniques more applicable to current systems.
TREC 2006 contained seven areas of focus called ‘tracks’. Five of the tracks ran in previous TRECs and explored tasks in question answering, detecting spam in an email stream, enterprise search, search on (almost) terabyte-scale document sets, and information access within the genomics domain. The two new tracks explored blog search and providing support for legal discovery of electronic documents."
KMi, represented by research fellow Dr. Jianhan Zhu, participated in the Enterprise Track Expert Search Task. Unlike generic web search, which retrieves links to websites or documents, this search task is designed to return names of individuals deemed by the competing search engines to be ‘experts’ in particular tasks such as ‘java programming’ or ‘ontology engineering’ (or whatever sample queries are suggested to the teams).
The competing serachengines have to make a plausible judgement based on the results of searching millions of sample documents and pages provided to all competing teams. A crawl of the World Wide Web Consortium (W3C) website was used as the text collection in the Enterprise Track, comprising over 5 Gigabytes of data in over 330,000 documents. There are 55 topics designed by the participating groups for this task. The goal is to run a search query to find experts on each of these topics from among a set of 1082 people relevant to the W3C. 23 internationally renowned research groups submitted their results to the task. KMi was ranked top among all 23 groups across all major Information Retrieval measures.
Although this is the first time that KMi participated in TREC, the result is very encouraging. Jianhan gave a talk describing KMi’s excellent results in the conference Enterprise Track session, accompanied by a poster presentation. Our work has attracted significant attention from top researchers in IR, and we will continue to build closer cooperation with leading research institutions in this arena.
Please note that it is important for these results to be interpreted in light of the specific TREC context, which is described in the publication links below.
- TREC: Text Retrieval Conference
- TREC Enterprise Track Wiki site
- W3C Corpus used for Enterprise Search task
- TREC official publications (2006 proceedings will appear soon)