Some data is better generated in batch, for example: user session data, ad click through rates, sales conversions, and recommendations.
Unfortunately, to generate these useful pieces of information you have to process GB, or maybe even TB of log and database data files.
Enter Hadoop Map-Reduce, a linearly scalable way to process vast amounts of data with relatively trivial code.
Getting started with Hadoop can be a total time sync, to prevent this, I'll try my best to lead you in the right direction and go over the following points:
1) MapReduce and Hadoop 101
2) Getting Data into Hadoop
3) Running a MapReduce job in Ruby (!)
4) An overview of Ruby MR helper frameworks
5) Using SQL for ad-hoc data queries via Hive
6) Integrating these jobs into your ruby app framework (scheduling, starting jobs, getting the data out again)
Plus the bonus:
7) How to use this setup to automate A/B testing and totally blow your mind.