RW: So let’s go into the evolution of the product. What were the technical decisions you had to make at the beginning to create an engine that could track hundreds of thousands of apps? You guys tackle a lot of big data problems.
RA: One of the big decisions was, knowing we were going to be collecting a lot of data, what do we host this on? Do we buy our own hardware? Do we do it on the cloud? We ultimately decided, like many startups in that timeframe, to use Amazon Web Services because of the scalability and flexibility primarily.
As we have grown, our needs have become more sophisticated. So has Amazon Web Services. We still are primarily hosted on Amazon. That was an important decision made early on that has boded well for us.
Another decision related to that was, what is our core database going to be? When we had hardly any data, we started off with just a MySQL database and that worked great until one day after about a few hundred apps on board it started to seize up and we realized we needed to do something else because the way to scale MySQL is just to put on more and more boxes. That started to get very expensive. So we started to look at other database technologies. We decided to make our primary data store a column-based database, which was a newer concept, but they were much more optimized towards analytics purposes.
The column-based database aggregates data in a way that helps you access analytics data really quickly. We were able to do everything in real time and we still do. We were able to collect and maintain granular data and some of the things we are able to do we weren’t able to do a few years earlier.
Because of the technology available and the price of computing and the brains that [Localytics] has, we were able to collect and maintain granular user event level detail so that our customers could do any kind of ad hoc query and get answers to any kind of question they had. And they didn’t have to pre-define their questions well in advance.
RW: These questions are looking at how people are using these apps via the events [when somebody pushes a button or makes a swipe] within an app.
RA: These are unique questions that couldn’t otherwise be answered. Sometimes you don’t know what the questions are in advance. You’re suddenly able to ask any question of your users and get insightful data back.
So that was the data store piece that had technical decisions. As we grew in scale—now we’re collecting data on a billion unique devices—other things would break or bottleneck. The rate of data collection is enormous and we start to see thousands of data points a second. How do we scale that? We end up using different technologies on that side. For example, NoSQL databases for their ability to quickly process data and present to our primary data store. So those are some of the key technical decisions.