Dat is an open source tool, funded by the Sloan Foundation in the US as part of their Science Tools research funding, that seeks to enable collaboration workflows on top of datasets of any size. The high level goal of the Dat project is to make it easier to work with large scientific datasets in an automated way, which both saves time and also makes reproducibility easier. The core dat tool is a streaming dataset versioning + replication system developed with a heavy Unix philospohy designed to encourage extreme modularity and enable many third party applications to be built on top.
In addition to the core tool we are also developing tools for building and distributing streaming, cross platform data pipelines based on Node.js and Docker.
This talk will introduce Dat, talk about how we used Node to build it, and show examples of how to use Node and LevelDB to work with very large datasets.
Max Ogden is a open source software developer who works full time on the Dat project at the United States Open Data Insitute. He previously worked at Code for America, a US based not-for-profit dedicated to improving technology in cities. In his spare time Max organizes the NodeSchool community, CSVConf, TacoConf and likes to travel to countries with Cat Cafes.