It is difficult to predict what data will have long term value so it is often safest to archive everything. With data storage costs approaching zero I think that we can expect high value data to last forever, baring a nuclear war or the crash of society.

Curated data has a higher value than saving "everything." I think that the search engine Blekko is interesting and useful because of what it does not have: human powered curation yields fewer results but very little SPAM. The Guardian's curated structured data stores have much higher value than the original raw data (from government sources, etc.). I can imagine The Guardian curated data becoming a permanent part of our history as for example are ancient stone tablets we see in museums.

I have long planned on providing curated news and technology data that has semantic markup either on my ancient knowledgebooks.com domain or a new placeholder kbsportal.com but I seldom have free time slots because of my consulting business. Hint: I would like having a few partners who are into statistical natural language processing and general data geeks to help me with this. I don't know if it would end up being a viable business or just a public service portal.