Article extraction through clustering
View the Project on GitHub ziyan/spider
Install this bookmarklet by dragging it on to your bookmark bar
By Ziyan Zhou and Lei Sun for CS221
Spider is a web content extraction robot who master its technique by learning through examples. Feed it with multiple article pages from the same site and it will (hopefully) accurately extract the content of an article.
Some known limitations:
You can help us make Spider smarter and improve our algorithm by trying out the bookmarklet.
Thanks very much! :)