How we solved our batch SQL processing problems with Hadoop
Does your SQL batch jobs take too long to complete? Does all that data stored in staging tables keep you up at night? Does dirty data cause the ETL process to barf with primary key violations? When product asks for another metric does it take 5 different alter scripts, 10 changes to SQL stored procs, and 8 changes to existing SSIS packages? If this is your organization, Hadoop can help with all these. You don’t need internet scale data to use it. After a brief intro to map/reduce, we will walk through a solution to a real world problem of finding incoming links to web pages. First, we will solve it using the Java API and then proceed to solve the same problem in Hive. You should walk away with a good understanding of Hadoop and how to start solving problems with it.
Brock started using the .NET framework since its first release. In his spare time he has been building an open source SQL like language for scraping web pages called Pickaxe (https://github.com/bitsummation/pickaxe). He currently works for the Advisory Board and blogs at brockreeve.com
- Austin .NET UG 52 Recordings
- Hadoop 4 Recordings