DATA PROCESSING

First we need to clean the data - fix manual input errors, remove duplicates and make sure that users do not appear multiple times.

We want to separate the data to sessions logged before the test start and after the test start date.

Since we want to measure users' conversion and don't want to randomly select same users multiple times, we need to make the "user_id" column unique. To do that we have 2 methods:

  1. Calculate LTV for each user
  2. Focus on the first exposure of each user

We also need to create a new column "converted" with binary variables where "1" means conversion if total amount spent is greater than 0 and "0" otherwise.

First method: calculate LTV for each user

A/B TEST

First, we need to decide on a sample size, which depends on a few factors:

Power of the test (1 — β) — This represents the probability of finding a statistical difference between the groups in our test when a difference is actually present. This is usually set at 0.8 by convention.

Alpha value (α) — The critical value of 0.05

Minimum Detectable Effect — The difference we expect to see between the conversion rates. We assume it's 2%.

In the table above, it does look like our two designs performed very similarly. The old design performed slightly better, 6.4% vs. 6.3% conversion rate.

Since we have a large sample, we can use the normal approximation for calculating the p-value with a z-test. The p-value=0.725 is much higher than α=0.05 threshold, meaning that the new design did not perform significantly different than the old one.

As we saw, there's no significant difference in conversion rates between the 2 groups. However, each user in the treatment group spends on average ~38% more than users in the control group, while playing a similar number of matches.

Second method: First exposure of each user

A/B TEST

First, we need to decide on a sample size, which depends on a few factors:

Power of the test (1 — β) — This represents the probability of finding a statistical difference between the groups in our test when a difference is actually present. This is usually set at 0.8 by convention.

Alpha value (α) — The critical value of 0.05

Minimum Detectable Effect — The difference we expect to see between the conversion rates. We assume it's 2%.

In the table above, it does look like both of the designs performed very similarly. The new design performed slightly better, 2.5% vs. 2.2% conversion rate.

Since we have a large sample, we can use the normal approximation for calculating the p-value with a z-test. The p-value=0.615 is much higher than α=0.05 threshold, meaning that the new design did not perform significantly different than the old one, which corresponds to the results we got using the first method.

As we saw, there's no significant difference in conversion rates between the 2 groups. Furthermore, there's no much difference in matches played and average amount spent per user as opposed to the previous method. However, the treatment group earns and spends more hard currency than the control group, possibly because of the cherries offered as a reward on the first day in the new design.