January 8th, 2010 . by rw
Our paper I Tag, You Tag: Translating Tags for Advanced User Models has been accepted for the WSDM conference. In the paper we analyze the tagging behavior of users from two real-world folksonomies, namely Delicious and Bibsonomy, and find that (from the abstract)
… individual users develop highly personalized vocabularies of tags. While these meet individual needs and preferences, the considerable differences between personal tag vocabularies (personomies) impede services such as social search or customized tag recommendation. In this paper, we introduce a novel user-centric tag model that allows us to derive mappings between personal tag vocabularies and the corresponding folksonomies. Using these mappings, we can infer the meaning of user-assigned tags and can predict choices of tags a user may want to assign to new items. Furthermore, our translational approach helps in reducing common problems related to tag ambiguity, synonymous tags, or multilingualism.
If you want to read the paper before it appears on the official ACM repository click here.
The conference will take place from 3-6 February in New York. The list of accepted papers looks very promising, and I am really looking forward to see all the presentations.
Posted in delicious, folksonomy, publication |
No Comments »
June 25th, 2009 . by rw
I am happy to announce that my research group has finally decided to make our delicious dataset publicly available. The dataset contains all public bookmarks of about 950,000 users retrieved from http://delicious.com between December 2007 and April 2008. The retrieval process resulted in about 132 million bookmarks or 420 million tag assignments. This is the same dataset we used in some of our recent publications. The full corpus is about 7GB of compressed data. It’s one of the largest folksonomy datasets used in research so far. See this post and this publication for details.
Note that this is the original corpus which still includes millions of spam posts. You should try to remove this spam before using the dataset. We have presented a spam detection method in the above publication.
You can get the dataset here.
See here for other data mining corpora I found worth bookmarking.
Posted in dataset, delicious, folksonomy, publication |
2 Comments »
February 13th, 2009 . by rw
Our paper A hybrid approach to item recommendation in folksonomies has received the Best-Paper-Award at the Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR 2009), co-located with this years WSDM conference in Barcelona.
The paper presents an extended version of the PLSA algorithm that combines collaborative filtering and tagging patterns found in folksonomies into a unified recommendation model. Experiments were done on our delicious corpus. We believe that folksonomies, and especially Social Bookmarking Services such as Delicious, provide a perfect soure for web resource recommendation.
If you want to get an idea what item recommendation based on folksonomies can look like see here.
Posted in delicious, folksonomy, machine learning, publication |
No Comments »
July 10th, 2008 . by rw
We recently analized the structure and dynamics of social bookmarking systems. For this purpose, we spent six months on crawling about 142 million del.icio.us bookmarks coming from around 1 million users. I will give an overview of our findings here. Details can be found in this paper which I am going to present at the ECAI 2008.
The growth of del.icio.us
According to this blog, the del.icio.us bookmarking site went online in Sep. 2003 and, as our data indicates, has seen an exponential growth since then. According to our dataset, there where over 7,305,559 newly added bookmarks and 47,429 newly appearing del.icio.us users in December 2007.
There is an interesting period in the first half of 2006 where del.icio.us didn’t encounter much growth at all, but plateaud at around 3.5 million new bookmarks a month. This pattern was also reported by other authors, but the reason remains unclear.

The monthly growth of del.icio.us between 2004 and 2008 by posted bookmarks, new users, new URLs and new tags.
Read the rest of this entry »
Posted in folksonomy, machine learning |
4 Comments »