ENTRIES TAGGED "Big Data"
The need to root out old data goes well beyond creating disk space
A couple weeks ago Brian Krebs announced that Adobe had a serious breach, of customer data as well as source code for a number of its software products. Nicole Perlroth of The New York Times updated that to say that the breach appears to be much bigger than thought and, indeed, Krebs agrees. Adobe themselves announced it first, earlier than Krebs’s first report in CSO Brad Arkin’s terse blog post, Illegal Access to Adobe Source Code.
By now, breaches are hardly news at all. All of us pros flat out say that it isn’t a matter of *if* you get hacked, but *when*. Adobe’s is of note solely because of the way that the news has dribbled out. First, the “illegal access” to source code, then the news of lost customer data to the tune of 2.9 million, then upping that to 38 million, but really actually (maybe?) 150 million. The larger number is expired accounts—or something.
Working with big data and open source software
I recently sat down with Mark Grover (@mark_grover), a Software Engineer at Cloudera, to talk about the Hadoop ecosystem. He is a committer on Apache Bigtop and a contributor to Apache Hadoop, Hive, Sqoop, and Flume. He also contributed to O’Reilly Media’s Programming Hive title.
Key highlights include:
Weekly Highlights and Insights: May 13-17
Google I/O: O’Reilly Editor Rachel Roumeliotis reports from the conference floor.
Big Data, Cool Kids: Fumbling toward the adolescence of big data tools.
Real-time World-wide Wikipedia Edits: Stephen LaPorte and Mahmoud Hashemi’s addictive visualization.
Future of Open Source: The quality, security, and community driving open source adoption.
With a modern search engine and smart planning, web sites can provide visitors with a better search experience than Google. Why turn-out for the new "big data" track was lower than I expected, and other news from this week's conference about using Lucene big and small.