A/B testing for product managers

Neil McCarthy is Director of Product Management at Yammer where he has worked for the past three and a half years. Coming from an education in electrical engineering he has worked for the past 10 years in enterprise software in roles bordering between the business and the technical side.

At Yammer they decided early on to become a data informed company and invested heavily in an infrastructure to support this along with a team of data scientists. Today, no new feature is released without an A/B test.

Why A/B test your product?
I asked Neil what A/B testing can do that other methods for getting customer feedback, such as focus groups and surveys, can’t do.

“A/B testing helps product teams move faster by helping them build the right things and validate their assumptions along the way. A/B testing is a great way to test an idea you already have, but it’s not a great a way to come up with new ideas. Gathering user feedback and thinking strategically about the future of the product and industry is a better way to come up with good ideas.”

At Yammer they also do qualitative and quantitative research post project to figure out what people are actually doing. This plays a big part in figuring out what happened when a test fails.

One example of such a test that turned out to be worse than baseline was when they decided to try to alter the sign up flow. Conventional wisdom has it that the more friction you take out of the sign up flow the better the retention of the customer. So, Yammer hypothesized that by taking out a few steps of the sign up flow and putting them into the product, they could increase long term retention. But to their surprise it turned out that taking out these steps had the opposite effect. The sign up flow was helping users understand what Yammer is. Therefore they did not keep the change and instead left the sign up flow as is. Another example of something that was a success was when they tested whether including a module in the feed that suggested the user to follow other users that their friends followed. It turned out that a lot of users started to follow others and this resulted in a lift in the core metric of days engaged.

How to test
Yammer is not Twitter or Facebook who can do significant tests with only 1% of their users. Instead, Yammer usually tests on 50% of their users. Still it take minimum 2 weeks to do a test. The problem is that since you are testing hypotheses, some of which are proven incorrect, it feels like the advancement of the product is slower. In actuality, you’re moving faster because you eliminate a lot of waste and complexity by not implementing features that are unsuccessful.

“The core of A/B testing is to have a hypothesis. At Yammer hypotheses are rigorously formulated into if/then statements. For example “if we increase the priority of groups, then more users will get work done in Yammer”. This will be broken down into smaller hypothesis that can more easily tested, like: “If we increase the prominence of the group join button then more users will join groups and engage more frequently with Yammer”.

How to avoid local maximum
A well known problem with A/B testing and any other incremental test method is the problem of the local maximum. This happens when a product reaches the point where small changes no longer significantly improve it. At Yammer they have avoided local maximum problems by periodically taking big bets, where they work on really big features. Even for bigger features, they’ll break down the project into small pieces so they can execute incrementally.

Getting started with A/B tests
I also asked Neil what he thought the current best practice for A/B testing was. Here is a list of four key ingredients in successful A/B testing for product managers.
1) Having the right hypotheses is necessary. If you don’t have well informed hypotheses, A/B testing will not help you no matter what degree of technical perfection you have.
2) Log everything users do. This is not to help the A/B test in itself, but in order to understand post hoc, what happened. Why did the test go wrong? Why did the users not react as expected?
3) Have a solid A/B testing framework in place. Without the technical framework to do it you won’t succeed.
4) Put statistical rigor into guidelines for conducting the A/B tests. You need to make sure you are considering statistical significance when looking at the results so you only conclude on true positives.