I wonder does he discuss data that can no longer be obtained because it was created by old program's that are no longer used
The author is a physicist? Really? I am a statistician and data analyst, and the author's utter lack of comprehension of basic statistical concepts (correlation, prediction, etc.) is stunning. And distressing. It's junk like this that just compounds the public's dismal scientific literacy.
Let's hope Big Data doesn't team up with the Big Lie. Or become it.
"Observational data can yield enormously predictive tools. I note just one iconic example: gravity. We know not only that Newton’s apple will fall but exactly how fast."
Yes, but gravity - for all earthly intents and purposes - is a universal constant. Anyone can use Newton's formula, and just about everyone does, for a plethora of purposes.
Now imagine that Newton lied - that he intentionally published an inaccurate formula, and he had the ability (if not the desire) to keep the truth secret from everyone but James II.
Asimov's Foundation series fascinated me as a young man. And Big Data may, indeed, be able to predict the future. But what good are such predictions when 1) the data is essentially a black box to the public; and 2) the owners of the data lie about it.
Maybe the best we can hope for is that hackers are more interested in the truth ... and that is certainly not a given.
"The authors are off-base, though, in their claim that it is somehow weird to know something will happen without having an explanatory theory. In much of science, it has long been the case that theory follows 'robust correlations.'"
I'm not sure Mr. Mills appreciates the difference between the sorts of predictions that big data can provide--with no possibility of theory--and the sorts of empirical relationships that eventually lead to robust theory. We come up with theories for things that have simple, linear relationships. The trends that are recognizable in big data sets have non-linear, wildly complex relationships that will require mathematical disciplines that haven't been invented yet--and may never be invented.
The true revolution that will come from big data is that our software is magically just going to know stuff, the same way that human beings can walk into a cocktail party and know that somebody had a loud altercation a few moments before. Big data software is essentially using the same strategies that brains use to recognize patterns and predict the near future. Nobody will know why the prediction is correct; it just is.
The consequences of this will change how science is done so profoundly that words like "observation" and "theory" are going to seem quaint. Humans seek scientific theories as a way of predicting stuff; when a machine can predict stuff faster, better, and cheaper, the value of theory decreases. Eventually, the new predictive tools replace the old ones. It's yet another disruptive technology, except in this case, the incumbent technology is the scientific method itself.
There's been a lot of talk in the last few years about the singularity, the moment where our machines evolve so quickly that the historical consequences are unknowable to ordinary humans. The rise of big data is about as good a place to peg a singularity as any.
We are witnessing the baby steps of the most powerful Police State in the world. After decades of fighting Totalitarian dictatorships, the United States has become what it once opposed.
The powerful corporations that collect Big Data are using it to protect the offices of Incumbents and the Incumbents will protect the wealth of the corporate giants who helped them stay in office.
How small, of all that human hearts endure,
That part which data heaps can cause or cure!
Yes - Big Government is the problem here, definitely. Let's replace Big Government (and ridiculous democracy) with Big Corporates whose sole purpose is to siphon profits to a small but stunningly powerful elite. Yes yes yes Big Data - do it to me now
I think that Mr. Mills overlooks a crucial point with regard to big data in this review, namely, that the most likely beneficiaries of these approaches are very likely to be the companies doing the analytic work, and not the public at large. Industry players like Google and Facebook have already shown that they are capable of keeping the public satisfied with tokens like "free" accounts, which are already really just methods of sorting people's data in order to extract patterns for marketing purposes. So the overall pattern is already clear: keep the general public occupied with "free" tokens ("free" is in quotes because there are huge hidden costs for the account which you open, in the form of the raw data which you are providing for analysis), and then make (potentially huge amounts of) money from analyzing the resultant data from the public's use of those tokens. In short, there is no balanced exchange here; these companies want to create the illusion of getting something for nothing in order to get at what's truly valuable: the raw data.
My second point is more speculative, but I think it's still worth entertaining. Isn't it possible that this near-obsession with analysis of existing data will soon begin to stifle innovation and associated risk-taking? As companies focus their efforts on data sets and increasingly sophisticated methods of analyzing and interpreting these data sets, isn't there a risk that industries will become increasingly "introverted" instead of forward-looking, and that the main thrust of innovation will occur in exotic mathematical tools for data analysis? And doesn't this scenario sound vaguely like another situation involving exotic mathematical tools in 2008, the one that ultimately led to the subprime mortgage crisis?
I believe that the true way forward lies in techno-anarchism like that advocated by Jaron Lanier, where people begin to barter goods and services online. This is the way that true value can be determined, in a global marketplace without the market distortions imposed by companies that are basically in the business of exploiting the general public's data. It's time to stop feeding companies like Google and Facebook with data; peer to peer technologies can replace their current dominance.
As a friend pointed out... "Big Data" will be undone by "Big Hacking"... "Big Data is going to be the most befuddling instance yet of "garbage in, garbage out."
Regulating private use of this "Big Data" is all well and good. But clearly we have government bureaucracies running out of control.
Yesterday saw an act of war against Bolivia. Effectively, a blockade -- technically an act of war -- was laid against passage of the presidential plane of Bolivia over the air space of Spain and France.
We're threatening people because our illegal acts have been outed. It's crazy stuff. Really, regulating private spy operations seems a bit of humor.
but can this big data predict a brick falling on one's head?
Since the unbelievable spurt in the world economy of the last 20years coincided precisely with the advent of the digital era, the current slump is atributable to a large extent to the fact that the benefits of this technology has now been absorbed at baseline level. So, with Ferguson et al, I had thought that we are in for a lenghty dry spell, economically. Megadata might be the next wave, but as is the case wth most trend-seeking mechanisms, I cannot help feeling that in the long term it is a case of a dog chasing its tail...