As part of our engineer-to-engineer series of technical talks, Cloudera co-founder Jeff “The Hammer” Hammerbacher visited Redfin’s San Francisco office last week to explain why Facebook began using Hadoop — which stores data as a lot of big files on many computers rather than squeezing it into the rows and columns of a database — for analyzing how people use Facebook. In case you forgot, Jeff led the team that built Facebook’s data storage systems. Now, Redfin engineer Gordon Brown just posted a summary of Jeff’s talk. My favorite bits:
Software Developers as Business Analysts: Jeff emphasized the importance of letting developers access usage data, so they can figure out on their own what optimizations need to be made to the site. While the query language of traditional relational databases doesn’t do much beyond pull the data out, Hadoop allows developers to build their own analytical tools, or extend those of others, using a more powerful all-around programming language like Java.
The Rise of the Machines: Jeff observed that the amount of data being stored by computers is exploding, just because most of it is now automatically captured by machines. You can change one setting on a web server and increase the amount of data you capture about what people do on your website by a factor of 100 or even 1,000. When Jeff worked at Facebook, the company was generating a terabyte of data — a trillion bits of information — every day. And this was in 2007.
Microsoft Has Started to Do a Lot Right: While Jeff was hardly impressed by the packaged software bought by businesses less enterprising than Facebook — Gordon was too polite to include Jeff’s unexpurgated opinions — Jeff reserved special praise for all the analytical tools Microsoft offers with its database, SQL Server. As Jeff said, “It’s kind of scary that Microsoft has started to do a lot right within the last 5 years.” If you do end up working with a relational database rather than Hadoop, Microsoft’s a good choice.
To read the entire summary of Jeff’s talk, visit our devblog. Or check out other engineer-to-engineer talks, such as Twitter talking about its use of Scala. The next big talk is our own Sasha Aickin, dishing it out next Thursday on the merits of HTML5 vs. proprietary mobile applications. Everyone’s invited! And many, many thanks to Jeff for a fantastic talk.