Does your SQL batch jobs take too long to complete? Does all that data stored in staging tables keep you up at night? Does dirty data cause the ETL process to barf with primary key violations? When product asks for another metric does it take 5 different alter scripts, 10 changes to SQL stored procs, and 8 changes to existing SSIS packages? If this is your organization, Hadoop can help with all these. You don’t need internet scale data to use it. After a brief intro to map/reduce, we will walk through a solution to a real world problem of finding incoming links to web pages. First, we will solve it using the Java API and then proceed to solve the same problem in Hive. You should walk away with a good understanding of Hadoop and how to start solving problems with it.
Brock started using the .NET framework since its first release. He spent his first 8 years at National Instruments coding .NET components in Windows Forms and ASP.NET. Currently, Brock is a technical manager at Max Systems , which makes car dealership web software. Splitting his time 50/50 between programming and management duties, Brock spends most days estimating product features, helping programmers, guiding technical implementation, fixing bugs, and coding features all while figuring out how to better crunch data for lots of vehicles. Blogs at http://brockreeve.com/. Likes to ride his bike, fly model airplanes, golf, and code.