Sunday, 2 February 2014

This is a bit of a high level concept (and perhaps one that I may flesh out later), but I think it's worth writing down, even if it's only a prelude to an idea.

The problem I'm considering is that whilst there's a lot of data being generated in the world, it's not in a standardised format, and it's not centralised.

I am wondering whether it would be possible to design an online data hosting service where users could submit any form of data, adhering to a specific format (possibly one of a number of formats). For example, if a data point related to a day (e.g. the closing stock price of a particular stock on that day), the day from that dataset would be structured in that dataset in exactly the same way as another dataset recording the average temperature in Los Angeles by day.

A key consideration of this type of service is that there would be strict requirements on metadata, enabling users of the data to understand the tools and methodologies that were used in collecting it.

The advantage of a centralised repository would be a single location from which to search for data sets. It may even be possible to search on something like "find datasets that correlate with my dataset" rather than just a keyword search of the metadata.

No comments: