RW’s Blog
DATA MINING, MACHINE LEARNING AND MORE

Going to WSDM 2010

January 8th, 2010 . by rw

Our paper I Tag, You Tag: Translating Tags for Advanced User Models has been accepted for the WSDM conference. In the paper we analyze the tagging behavior of users from two real-world folksonomies, namely Delicious and Bibsonomy, and find that (from the abstract)

… individual users develop highly personalized vocabularies of tags. While these meet individual needs and preferences, the considerable differences between personal tag vocabularies (personomies) impede services such as social search or customized tag recommendation. In this paper, we introduce a novel user-centric tag model that allows us to derive mappings between personal tag vocabularies and the corresponding folksonomies. Using these mappings, we can infer the meaning of user-assigned tags and can predict choices of tags a user may want to assign to new items. Furthermore, our translational approach helps in reducing common problems related to tag ambiguity, synonymous tags, or multilingualism.

If you want to read the paper before it appears on the official ACM repository click here.

The conference will take place from 3-6 February in New York. The list of accepted papers looks very promising, and I am really looking forward to see all the presentations.

Share/Save/Bookmark


Del.icio.us dataset for download

June 25th, 2009 . by rw

del.icio.usI am happy to announce that my research group has finally decided to make our delicious dataset publicly available. The dataset contains all public bookmarks of about 950,000 users retrieved from http://delicious.com between December 2007 and April 2008. The retrieval process resulted in about 132 million bookmarks or 420 million tag assignments. This is the same dataset we used in some of our recent publications. The full corpus is about 7GB of compressed data. It’s one of the largest folksonomy datasets used in research so far. See this post and this publication for details.

Note that this is the original corpus which still includes millions of spam posts. You should try to remove this spam before using the dataset. We have presented a spam detection method in the above publication.

You can get the dataset here.

See here for other data mining corpora I found worth bookmarking.

Share/Save/Bookmark


Best Paper Award at ESAIR 2009

February 13th, 2009 . by rw

Our paper A hybrid approach to item recommendation in folksonomies has received the Best-Paper-Award at the Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR 2009), co-located with this years WSDM conference in Barcelona.

The paper presents an extended version of the PLSA algorithm that combines collaborative filtering and tagging patterns found in folksonomies into a unified recommendation model. Experiments were done on our delicious corpus. We believe that folksonomies, and especially Social Bookmarking Services such as Delicious, provide a perfect soure for web resource recommendation.

If you want to get an idea what item recommendation based on folksonomies can look like see here.

Share/Save/Bookmark