Experimentation in product management

 Traditionally new products were developed according to the founder’s idea that was written down, which the engineers built. The last few years this pattern has changed.  Across the internet there has been a shift in mindset to bring the customer into what we are building. There is a growing awareness that we are wrong about what the customer wants most of the time. Therefore it is necessary to experiment to find out what customers want.

We talked to Teresa Torres about the role of experimentation in product management. The greater part of her career has spent in pre product market fit internet start ups, so if someone should know how to experiment to find a product that is successful it’s Teresa. Today she helps companies make better product decisions as a consultant and coach.

According to Torres it is better to start thinking about product development in terms of experiments rather than requirements. In Marty Cagan’s dual-track scrum article, he recommends using a discovery track and a delivery track. First we should experiment in the discovery track to identify what the right thing to do is. In the discovery track there should be a lot of experimentation in order to to inform what to build. Today there is a tendency to build any and every idea.

But real experiments require quite a bit of rigor and experience in designing the experiment.

“This is my primary focus as a coach. Many teams start to experiment but don’t have the experience to do it well. Most of us don’t have strong science or statistics backgrounds. What happens in practice is instead of generating informed hypotheses and designing experiments to test those hypotheses, we are testing anything and every thing  The problem with this approach is that we risk false positives.  We are running tens and sometimes hundreds of experiments, many with way too many variations.  This guarantees that we will see many false positives – changes that look good but really have no impact.  As a result, we are making decisions on bad data. If we want to build better products,  we need to understand how to run good experiments. The business side needs to be more scientific and the data science side needs to be more business oriented”

According to Torres the ready availability of experimentation tools like Optimizely and Visual Website Optimizer opens up the possibility for experimenting, but you need resources and expertise otherwise decisions will be made on faulty data. Part of the problem is the wide spread “Fear of Math”. Most people shy away from concepts like statistical significance and power. But it is necessary for product managers to begin understanding these concepts. Today there are many online resources that will teach you basic understanding of statistical concepts. Another problem is that we need to be better at hypothesis design. If you have not properly designed your hypothesis before you start you are not setting yourself up to get good data. We also need experimenters that can design experiments that can also test the hypotheses they are designed to test.

I asked Torres if there are any simple rules of thumb or best practices for product managers who want to get started.

“Don’t trust results that are not statistically significant. Surprisingly many teams are not even testing for significance. Define your hypotheses and decide upfront what decisions you will make if it passes, fails, or is flat . Otherwise you will just find yourself rationalizing after the fact why your change is still good regardless of what the data tells you.  Run the same experiment multiple times. It helps reduce false positives. There is no absolute truth. The world is always changing, something that didn’t work in the past may work in the future. Always keep a mindset that everything evolves.”

For more tips, see her article on The 14 Most Common Hypothesis Testing Mistakes (And How to Avoid Them)

It is up to you if you take Teresa Torres’ suggestion to start experimenting. In the mean time visit her excellent blog Product Talk and sign up for her newsletter. It is always packed with interesting content about product management.

A/B testing for product managers

Neil McCarthy is Director of Product Management at Yammer where he has worked for the past three and a half years. Coming from an education in electrical engineering he has worked for the past 10 years in enterprise software in roles bordering between the business and the technical side.

At Yammer they decided early on to become a data informed company and invested heavily in an infrastructure to support this along with a team of data scientists. Today, no new feature is released without an A/B test.

Why A/B test your product?
I asked Neil what A/B testing can do that other methods for getting customer feedback, such as focus groups and surveys, can’t do.

“A/B testing helps product teams move faster by helping them build the right things and validate their assumptions along the way. A/B testing is a great way to test an idea you already have, but it’s not a great a way to come up with new ideas. Gathering user feedback and thinking strategically about the future of the product and industry is a better way to come up with good ideas.”

At Yammer they also do qualitative and quantitative research post project to figure out what people are actually doing. This plays a big part in figuring out what happened when a test fails.

One example of such a test that turned out to be worse than baseline was when they decided to try to alter the sign up flow. Conventional wisdom has it that the more friction you take out of the sign up flow the better the retention of the customer. So, Yammer hypothesized that by taking out a few steps of the sign up flow and putting them into the product, they could increase long term retention. But to their surprise it turned out that taking out these steps had the opposite effect. The sign up flow was helping users understand what Yammer is. Therefore they did not keep the change and instead left the sign up flow as is. Another example of something that was a success was when they tested whether including a module in the feed that suggested the user to follow other users that their friends followed. It turned out that a lot of users started to follow others and this resulted in a lift in the core metric of days engaged.

How to test
Yammer is not Twitter or Facebook who can do significant tests with only 1% of their users. Instead, Yammer usually tests on 50% of their users. Still it take minimum 2 weeks to do a test. The problem is that since you are testing hypotheses, some of which are proven incorrect, it feels like the advancement of the product is slower. In actuality, you’re moving faster because you eliminate a lot of waste and complexity by not implementing features that are unsuccessful.

“The core of A/B testing is to have a hypothesis. At Yammer hypotheses are rigorously formulated into if/then statements. For example “if we increase the priority of groups, then more users will get work done in Yammer”. This will be broken down into smaller hypothesis that can more easily tested, like: “If we increase the prominence of the group join button then more users will join groups and engage more frequently with Yammer”.

How to avoid local maximum
A well known problem with A/B testing and any other incremental test method is the problem of the local maximum. This happens when a product reaches the point where small changes no longer significantly improve it. At Yammer they have avoided local maximum problems by periodically taking big bets, where they work on really big features. Even for bigger features, they’ll break down the project into small pieces so they can execute incrementally.

Getting started with A/B tests
I also asked Neil what he thought the current best practice for A/B testing was. Here is a list of four key ingredients in successful A/B testing for product managers.
1) Having the right hypotheses is necessary. If you don’t have well informed hypotheses, A/B testing will not help you no matter what degree of technical perfection you have.
2) Log everything users do. This is not to help the A/B test in itself, but in order to understand post hoc, what happened. Why did the test go wrong? Why did the users not react as expected?
3) Have a solid A/B testing framework in place. Without the technical framework to do it you won’t succeed.
4) Put statistical rigor into guidelines for conducting the A/B tests. You need to make sure you are considering statistical significance when looking at the results so you only conclude on true positives.