Big Data and Emergence

Big data refers to the exponential growth of data volumes created since the dawn of the digital age in the early 2000s. Even though the early adopters are already moving their attention to new things, such as fast data, big data is still a concept that is highly regarded.

What actually is big data? A naïve approach would be to define a limit such that, eg. everything over 1 TB is big data. Well, it is certainly "big" in the sense that it will not easily fit into the RAM of a personal computer. This approach, is however quite limited and puts all attention to the amount of data.

Wikipedia takes another view on the aspect: "Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them." This approach is a slight modification of the naïve definition, yet it only manages to make the data size limit adapt to the current state of data processing capabilities.

Data as an Emergent Phenomenon

Now, I take a the view that big data should be interpreted as an emergent behavior that regular sensor data exhibits when it is gathered in large quantities. I am quite sure that I am not the first person to think of this, but I could not find any relevant articles discussing this approach.

Sand Dunes

Wikipedia: "Emergence is a phenomenon whereby larger entities arise through interactions among smaller or simpler entities such that the larger entities exhibit properties the smaller/simpler entities do not exhibit."

Physical World Example of Emergence

Sand dunes are an example of an emergent physical behavior. Consired a grain of sand. Now measure all its properties: mass, volume, porosity, shape, etc. Even with the most accurate and thorough measurements, it will prove difficult to predict the generation of sand dunes from the properties of sand alone. Sand dunes are an emergent behavior that arises from the delicate interaction of sand and wind. The patterns that are created in sand dunes resemble remarkedly those that arise in another self assembling systems.

Emergence in Big Data

Now consired a simple medical analysis instrument that measures the inflammation levels from human blood. A single instrument is able to determine the health status of a single patient. If tens or hundreds of these devices are spread in multiple practices they are still only able to determine the health of patients and, in the large scale, of societies.

Syring and blood

Emergent big data arises when these devices are spread all over the globe and they are connected to a database that can be use to analyse the aggregate data from millions of measurements. This database together with location data could be used to extract pathogen spreading vectors around the world. This is something that a single measurement or even multiple traditional measurements cannot achieve.

Map with pins

This is what big data emergence is.