Strata Week: A .data TLD?

A proposal for a .data TLD, flavors of Hadoop, and a vote for pseudonymous commenters.

Here are some of the data stories that caught my attention this week.

Should there be a .data TLD?

radar.dataICANN is ready to open top-level domains (TLD) to the highest bidder, and as such, Wolfram Alpha’s Stephen Wolfram posits it’s time for a .data TLD. In a blog post on the Wolfram site, he argues that the new top-level domains provide an opportunity for the creation of a .data domain that could create a “parallel construct to the ordinary web, but oriented toward structured data intended for computational use. The notion is that alongside a website like wolfram.com, there’d be wolfram.data.”

Wolfram continues:

If a human went to wolfram.data, there’d be a structured summary of what data the organization behind it wanted to expose. And if a computational system went there, it’d find just what it needs to ingest the data, and begin computing with it.

So how would a .data TLD change the way humans and computers interact with data? Or would it change anything? If you’ve got ideas of how .data could be put to use, please share them in the comments.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Cloudera addresses what Apache Hadoop 1.0 means to its customers

Last week, the Apache Software Foundation (ASF) announced that Hadoop had reached version 1.0. This week, Cloudera took to its blog to explain what that milestone means to its customers.

The post, in part, explains how Hadoop has branched from its trunk, noting that all of this has caused some confusion for Cloudera customers:

More than a year after Apache Hadoop 0.20 branched, significant feature development continued on just that branch and not on trunk. Two major features were added to branches off 0.20.2. One feature was authentication, enabling strong security for core Hadoop. The other major feature was append, enabling users to run Apache HBase without risk of data loss. The security branch was later released as 0.20.203. These branches and their subsequent release have been the largest source of confusion for users because since that time, releases off of the 0.20 branches had features that releases off of trunk did not have and vice versa.

Cloudera explains to its customers that it’s offered the equivalent for “approximately a year now” and compares the Apache Hadoop efforts to its own offerings. The post is an interesting insight into not just how the ASF operates, but how companies that offer services around those projects have to iterate and adapt.

Disqus says that pseudonymous commenters are best

Debates over blog comments have resurfaced recently, with a back and forth about whether or not they’re good, bad, evil, or irrelevant. Adding some fuel to the fire (or data to the discussion, at least) comes Disqus with its own research based on its commenting service.

According to the Disqus research, commenters using pseudonyms actually are “the most valuable contributors to communities,” as their comments are both the highest quantity and quality. Those findings run counter to the idea that those who comment online without using their real names actually lessen rather than enhance quality conversations.

Disqus’ data indicates that pseudonymity might engender a more engaged and more engaging community. That notion stands in contrast to arguments that anonymity leads to more trollish and unruly behavior.

Got data news?

Feel free to email me.

Related:

tags: , , , ,