Dr Amr Awadallah is the Chief Technology Officer of Cloudera, a data management and analytics platform based on Apache Hadoop. Before co-founding Cloudera in 2008, Awadallah served as Vice President of Product Intelligence Engineering at Yahoo!, running one of the very first organizations to use Hadoop for data analysis and business intelligence. Awadallah joined Yahoo! after the company acquired his first startup, VivaSmart, in July 2000.
With the fourth industrial revolution upon uswhere the lines between the physical, digital and biological spheres are blurred by the world of big data and the fusion of technologiesCloudera finds itself among the band of companies that are leading this change. In this interview with Enterprise Innovation, the Cloudera co-founder shares his insights on the opportunities and challenges in the digital revolution and its implications for businesses today; how organizations can derive maximum value from their data while ensuring their protection against risks; potential pitfalls and mistakes companies make when using big data for business advantage; and what lies beyond big data analytics.
Take us through the beginning of Cloudera, your time with VivaSmart, and what it was like to set up these companies.
They were very different processes. When VivaSmart was acquired by Yahoo! in mid-2000 for $9 million, it was mainly an acqui-hire because there were only five of us in the company and we were one of the few experts in terms of compression, which Yahoo! really needed for its shopping service. In retrospect, it was the right thing to do because back in 2000 when the Internet bubble burst, almost all our competition shut down and we were lucky to join Yahoo! when we did.
The lightbulb really went on for me in Yahoo!. I spent a total of eight years therefour were spent working on the compression shopping engine VivaSmart built, and four more on business intelligence and data analytics where I had a number of challenges in terms of scaling from a processingtime perspective and a cost of storage perspective; we were deleting data we wanted to keep, and it was not advancedit could only do SQL and we wanted to do predictive modeling, pattern matching, clustering, and other techniques that were very hard to do in SQL. I was lucky while I was at Yahoo! that Doug Cutting, who now also works at Cloudera, was working with the Yahoo Search team to build the Hadoop technology for Search. I was complaining about all the problems I had and he said to try Hadoop and see if it works for me. And it did! Within six months, all of my backend was switched to Hadoop, the processing time went down from nine hours to five minutes, the cost went down by almost 100x in some cases, and we gained the flexibility of being able to go beyond SQL and do more advanced stuff.
You were one of the first guys working on Hadoop
We were the only Hadoop big data platform for two years.
How did that business model evolve?
That comes from Mike Olson, my co-founder and one of the very first open source CEOs. He had a company called Sleepy Cat, which was an in-memory database that was open source. He was very fundamental in charting the course of Cloudera in terms of how to create the business model around open source.
We knew from day one that the benefits of open source are extremely rapid innovation and lots of word of mouth, but the downside is obviously that its very easy for someone to copy your products, and in many cases customers themselves take the software and dont want to be customers. Mike experienced that firsthand with his first startup, so when we were building out Cloudera, we always had it in our strategy to do a hybrid open source business model. Well keep the core platform and capabilities open, but build value around it that would make it easier, make it enterprise-ready, and make it more about performancethats how we created the differentiation against competition.
Cloudera is now a $4 billion company with 1,500 employees. How is your workforce spread out?
Of the 1,500, a thousand are in the U.S. and the rest are worldwide. The 500 are mostly in sales and marketing in different countriesSingapore and ASEAN, Japan, China, Australia, and Europe In Budapest, we have the only R&D and engineering office outside the United States. That came out of the fact that theres a significant shortage of skills in the U.S. because the success of Silicon Valley companies like Google and Uber has led to competition becoming very cutthroat in terms of finding talent and retaining them. We made a strategic decision about two years ago that we would open an R&D office outside the U.S. and Budapest, Hungary was our choice.
Eastern Europe is obviously very attractive for many reasonsa very educated skilled workforce, and the cost of that talent is probably half of some of other European nations.
One of the unique things about Budapest is that compared to German, the U.K., France or Netherlands, its a third of the U.S. But the reason we moved was actually not to save money, but to find talent in the first place. H1B [visas] are very tough to get these daysand for a startup, which we still are, we have to be very agile.
Why specifically Hungary over countries like Moldova, Romania, Macedonia, etc.?
It came down to a number of things. First, the country needs to be politically stable, otherwise Ukraine was really on top of our list. Second, the talent we needed should be available. We look for a special type of talent, not just computer science developers, but talent that understands oursystemsand this is the main determining factor why we picked Budapest. We did a survey of the market and found there are already a number of companies over there that were doing that, and we found that the local university was very advanced in terms of teaching that. Finally, there wasnt already a big established presence from Google or Microsoft and other behemoths whom we didnt want to start competing with right away.
How do you see Asia fitting into the whole R&D system for Cloudera?
Even though our size is relatively big, were still a startup. Right now, its not in our best interest to spread R&D out in too many locations because it slows down development. But as we grow as a company and start having more product lines, it would make sense to have more R&D offices in other locations and Asia will definitely be on top of the list.
After having traveled around in Asia, how do you see the maturity of adoption compared to the West?
I would say its very similar to Europeits spotty, and at the same stage. By that I mean there are some companies that are just way cutting edge, way ahead of the curve, and there are some that are still playing catch up and learning what to do. In Europe, telecom and banking tend to be ahead of the curve, and what were seeing in Asia is that telecom is ahead of the curve. The banking industry here has not been as fast.
Would you say banking is generally more conservative here?
I wouldnt say conservativeslow-moving. Thats different because conservative means you take a very long time before to decide. Here, they are actually making the decision; they just take a very long time to get things done.
Where do you see the role of the state and the role of regulation in promoting innovation within a jurisdiction?
I think one of the most fruitful areas to always invest in is talent. Theres no question about that. Weve seen some of the governments around here, Singapore and Malaysia included, that are very active in helping train people. There are governments giving subsidies to companies, like if the company wants to go and train somebody to learn in the data science skills, the government would pay maybe 50% (for example) of the training cost. In Malaysia a couple of months ago, there was an event where they give awards to these top universities, the top students that graduated as data scientists, and I look at that as an area that is very useful, fruitful.
You guys are at a point where youre not a startup, but not yet a massive enterprise either. Do you see your administrative, innovative processes continuing in this direction or do you guys have very different ways of growing the company from here on? What are your plans for growth?
Every year we look at how were scaling as a business and we change the way were doing things to adapt to that growth.
Sometimes the change could be a simple process change, or a change in people. For example, I was the VP for Engineering for the first four years. At some point, it became very clear that I couldnt continue my CTO role in terms of meeting with customers and public speaking while continuing to scale the engineering team at the very fast rate it needs to be scaling at. We had to go out and hire a VP of Engineering who is now running that team.
Same thing happened with our CEO Mike Olsonmy co-founder. He was the CEO for the first five years and in the fifth year was hitting his boundaries in terms of scaling. Hes never scaled the company to this much revenue and people beforehe can go learn it, but if youre growing fast, you dont have the luxury of learning. So Mike kind of fired himself from being the CEOhes still there as the chairman and chief strategy officer, and then we hired Tom Reilly which was one of the best moves that Mike ever did for the company. These are the kind of things that we watch out for as we continue to scale.
What excites you about the industry in the coming future? How do you see the future evolving?
We think there is a data revolution going on right nowand it is going to be as big, if not bigger, than the industrial revolution. In the industrial revolution, we learned how to use machines to build stuff, and companies and countries that figured out how to do that became the leaders of the worldChina, for example.
The exact same thing is going to happen with data. Countries and companies that figure out how to leverage data to automate the decision-making process wherever possible, across multiple disciplines, will be the ones that will win. We have customers in farming collecting data from the fields, drones taking pictures and seeing how the colors of the crops are changing, and theyre using that to optimize the yield. There are hospitals now in the U.S. working on precision medicine initiativesanalyzing the DNA, and making a tailored drug for exactly your condition and not the one-size-fits all approach pharmaceuticals take today. There will be more and more of this personalization and more precision around many things. These will really change the world in the future so significantly that certain jobsthose that are not creative or do not involve dealing with peoplewill be replaced.
Original post:
Building a $4 billion company around open source software: The Cloudera story - Enterprise Innovation