Del.icio.us dataset for download
June 25th, 2009 . by rw
I am happy to announce that my research group has finally decided to make our delicious dataset publicly available. The dataset contains all public bookmarks of about 950,000 users retrieved from http://delicious.com between December 2007 and April 2008. The retrieval process resulted in about 132 million bookmarks or 420 million tag assignments. This is the same dataset we used in some of our recent publications. The full corpus is about 7GB of compressed data. It’s one of the largest folksonomy datasets used in research so far. See this post and this publication for details.
Note that this is the original corpus which still includes millions of spam posts. You should try to remove this spam before using the dataset. We have presented a spam detection method in the above publication.
You can get the dataset here.
See here for other data mining corpora I found worth bookmarking.