Data to Action: AB test - Pricing Test¶

Load Data¶

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("ticks")
plt.style.use('fivethirtyeight')

#sns.set_style("white")
#plt.style.use('ggplot')
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
%config InlineBackend.figure_format = 'retina'


plt.rcParams["figure.figsize"] = (12,8)
plt.rcParams['axes.labelsize'] = 16
plt.rcParams['axes.titlesize'] = 20

pd.set_option('display.max_columns', 50)
pd.set_option('display.max_rows', 50)

df_test = pd.read_csv("data/Pricing_Test/test_results.csv")
print(df_test.shape)
df_test.head()

(316800, 8)

df_user = pd.read_csv("data/Pricing_Test/user_table.csv")
print(df_user.shape)
df_user.head()

(275616, 5)

Data Cleaning¶

# missing value
df_test.isnull().sum()

user_id             0
timestamp           0
source              0
device              0
operative_system    0
test                0
price               0
converted           0
dtype: int64

df_user.isnull().sum()

user_id    0
city       0
country    0
lat        0
long       0
dtype: int64

# merge two dataframes on user_id
df = pd.merge(df_test, df_user, on='user_id', how= 'left')
df.head()

# check missing value after merge
df.isnull().sum()

user_id                 0
timestamp               0
source                  0
device                  0
operative_system        0
test                    0
price                   0
converted               0
city                41184
country             41184
lat                 41184
long                41184
dtype: int64

# those missing values are exactly the number of rows

# check if people in test always get the 59$ and people in control 39$
df.groupby(['test','price'])[['price']].count()

Issue:

A small number of users appear to be getting the wrong price based on their test/control assignment.

Fix:

Remove the users with wrong price and move one with our analysis.

df=df[((df['test']==0) & (df['price']==39)) | ((df['test']==1) & (df['price']==59))]
df.head()

# check count again
df.groupby(['test','price'])[['price']].count()

# create a new feature: revenue
df['revenue'] = df['converted'] * df['price']
df.head()

df['revenue'].describe()

count    316435.000000
mean          0.827083
std           6.179009
min           0.000000
25%           0.000000
50%           0.000000
75%           0.000000
max          59.000000
Name: revenue, dtype: float64

Q: Should the company sell its software for 39 or 59?¶

# compare conversion rate and revenue
df.groupby('test')['converted'].mean()

test
0    0.019900
1    0.015555
Name: converted, dtype: float64

df.groupby('test')['revenue'].mean()

test
0    0.776083
1    0.917748
Name: revenue, dtype: float64

Although conversion rate seems to go down by around 25%, revenue per use is going up.

# Perform a t-test

from scipy import stats
t_test = stats.ttest_ind(df[df['test'] == 1]['revenue'],
                         df[df['test'] == 0]['revenue'],
                         equal_var=False)

print('t statistic is: ',t_test.statistic)
print('p-value is: ',t_test.pvalue)

t statistic is:  5.715224666463108
p-value is:  1.0972577312420781e-08

Revenue per user is up and highly statistically significant.

Let’s now check how different segments are reacting to the test. This can give us extremely useful information about user price sensitivity by segment, which is really just a fancy way to say: how much different users are valuing our product?

# check revenue by source
sns.barplot(x='source', y='revenue', hue='test', data=df);
plt.xticks(rotation=45);
plt.title('Revenue by Source');

Insights:

Most segments agree with the overall finding of an increase in revenue as we increase price.
Unfortunately, the three segments with the higher avg revenue with the new price happen to be the most expensive ones, i.e. friend referral, and google/FB ads.

# check revenue by device
sns.barplot(x='device', y='revenue', hue='test', data=df);
plt.xticks(rotation=45) ;
plt.title('Revenue by Device');

Insight:

It appears that mobile users are less price sensitive.

# check revenue by operating system
sns.barplot(x='operative_system', y='revenue', hue='test', data=df);
plt.xticks(rotation=45);

Insights:

Mac/iOS react better to the price increase. Most likely, they are acting as a proxy for different user characteristics.
- -> It would probably be a good idea to figure out how to get more of those users and target them more via, for instance, ads or ad-hoc marketing campaigns.
Linux has 0% conversion rate for test! That’s clearly impossible and most likely a bug.
- -> Try to remove that segment from the dataset and redo the statistical test

# check the ratio of Linux users
df[df['operative_system']=='linux'].shape[0] / df.shape[0]

0.01305165357814401

There are only 1% users choosing Linux platform. Let's remove them from our dataset.

df = df.query('operative_system!=\'linux\'')
df.groupby('test')[['revenue']].mean()

# perform t-test

t_test = stats.ttest_ind(df[df['test'] == 1]['revenue'],
                         df[df['test'] == 0]['revenue'],
                         equal_var=False)

print('t statistic is: ',t_test.statistic)
print('p-value is: ',t_test.pvalue)

t statistic is:  6.184240028400434
p-value is:  6.253458285742887e-10

The result improves slightly.

# check top 10 cities with most users
top_cities=df['city'].value_counts().reset_index().sort_values('city', ascending=False).head(10)
top_cities

#plot avg revenue for top 10 cities
sns.barplot(x='city', y='revenue', hue='test', data=df[df['city'].isin(top_cities['index'])]);
plt.xticks(rotation=45);

Results appear pretty different by city, but it is hard to draw any conclusions.
The city variable has so many levels that, by chance only, we are expected to find a lot of different results.
However, combining this with domain/product knowledge could lead to create some hypotheses that could be then A/B tested in a more statistically sound way.

Conclusions：

If the goal of the test was trying to increase revenue, then the test has won and we should go ahead with the change.
The negative side is that we are losing roughly 25% of customers.
The positive side is that larger revenue per user typically means that the company can bid higher for ads or, more generally, invest more money in most marketing channels. So, in the long run, this should give the opportunity of increasing the number of customers and, therefore, total revenue.

Other insights:

Friend-referral is by far the best channel. Any UI/UX change that emphasizes it even more is likely to lead to revenue gains
After referrals, FB/Google ads have higher revenue per user than any other channel. This is kind of expected. However, SEO is not really that much behind. And SEO is free. Focusing on SEO and increasing its relative share of traffic would lead to much larger profits
Mobile has larger revenue per user than web. This is an excellent news. Mobile share of traffic has been growing constantly over the last few years. Just by taking advantage of this trend, the company will make more money. If they could increase mobile traffic even more, they would make even more money. Bidding higher for mobile ads is probably a good idea
iOS is doing much better than Android. Most likely, this is just because iOS users are less price sensitive than Android users. However, it would still be interesting investigating whether the actual product (i.e. UI/UX) is different between the two platforms
Revenue per user varies greatly across segments. This could lead to potentially understanding why and coming up with different products, some cheaper and some more expensive

[Bonus] After how many days you would have stopped the test?¶

Let’s say significance level and power are the standard 0.05 and 0.8. Let’s also assume we are interested in detecting a difference in conversion rate of about ~33%, i.e. ~1-39/59. That is, if the new conversion rate is about 66% of the old one we are good.

import statsmodels.stats.api as sms

conversion_rate_control = df_test.query('test==0').converted.mean()
p1_and_p2 = sms.proportion_effectsize(conversion_rate_control, conversion_rate_control*39/59)
sample_size = round(sms.NormalIndPower().solve_power(p1_and_p2, power=0.8, alpha=0.05))
print("The required sample size per group is ~", sample_size)

The required sample size per group is ~ 5548

# check how many users we have per week to figure out how long it will take to get enough users
#create a column that's week of the year
df_test['week_year'] = pd.to_datetime(df_test.timestamp.str.split(' ').str[0]).dt.strftime('%W')
#take avg count of users per week. We remove first and last week, just in case they don't include all 7 days
users_week = df_test['week_year'].value_counts().sort_index()
users_week = users_week.drop(users_week.index[[0,(users_week.shape[0]-1)]]).mean()
  
print("Avg number of users per week is:", round(users_week))

Avg number of users per week is: 24356

If we ran the test on 50% of the users, we would be done in less than a week.

However, it is a good practice to run the test for at least two weeks regardless of that, so that we can get weekly patterns.

\In terms of percentage split, we can look at what’s the minimum percentage of users that would need to be in test to get the required sample size in two weeks.

print("The percentage of traffic that needs to be in test is", round(sample_size/(users_week*2)*100),'%')

The percentage of traffic that needs to be in test is 11 %

	user_id	city	country	lat	long
0	510335	Peabody	USA	42.53	-70.97
1	89568	Reno	USA	39.54	-119.82
2	434134	Rialto	USA	34.11	-117.39
3	289769	Carson City	USA	39.15	-119.74
4	939586	Chicago	USA	41.84	-87.68

	user_id	timestamp	source	device	operative_system	test	price	city	country	lat	long
0	604839	2015-05-08 03:38:34	ads_facebook	mobile	iOS	0	39	Buffalo	USA	42.89	-78.86
1	624057	2015-05-10 21:08:46	seo-google	mobile	android	0	39	Lakeville	USA	44.68	-93.24
2	317970	2015-04-04 15:01:23	ads-bing	mobile	android	0	39	Parma	USA	41.38	-81.73
3	685636	2015-05-07 07:26:01	direct_traffic	mobile	iOS	1	59	Fayetteville	USA	35.07	-78.90
4	820854	2015-05-24 11:04:40	ads_facebook	web	mac	0	39	Fishers	USA	39.95	-86.02

	user_id	timestamp	source	device	operative_system	test	price	city	country	lat	long
0	604839	2015-05-08 03:38:34	ads_facebook	mobile	iOS	0	39	Buffalo	USA	42.89	-78.86
1	624057	2015-05-10 21:08:46	seo-google	mobile	android	0	39	Lakeville	USA	44.68	-93.24
2	317970	2015-04-04 15:01:23	ads-bing	mobile	android	0	39	Parma	USA	41.38	-81.73
3	685636	2015-05-07 07:26:01	direct_traffic	mobile	iOS	1	59	Fayetteville	USA	35.07	-78.90
4	820854	2015-05-24 11:04:40	ads_facebook	web	mac	0	39	Fishers	USA	39.95	-86.02

	user_id	timestamp	source	device	operative_system	test	price	city	country	lat	long
0	604839	2015-05-08 03:38:34	ads_facebook	mobile	iOS	0	39	Buffalo	USA	42.89	-78.86
1	624057	2015-05-10 21:08:46	seo-google	mobile	android	0	39	Lakeville	USA	44.68	-93.24
2	317970	2015-04-04 15:01:23	ads-bing	mobile	android	0	39	Parma	USA	41.38	-81.73
3	685636	2015-05-07 07:26:01	direct_traffic	mobile	iOS	1	59	Fayetteville	USA	35.07	-78.90
4	820854	2015-05-24 11:04:40	ads_facebook	web	mac	0	39	Fishers	USA	39.95	-86.02

	index	city
0	New York	25401
1	Chicago	7071
2	Houston	6608
3	San Antonio	4554
4	Los Angeles	4089
5	San Jose	3312
6	Indianapolis	3177
7	Jacksonville	2865
8	Philadelphia	2488
9	Las Vegas	2375

		price
test	price
0	39	202517
0	59	210
1	39	155
1	59	113918

	revenue
test
0	0.778002
1	0.933531