Data Science Challenge: Ads Analysis

image.png

In [141]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#sns.set_style("ticks")
sns.set_style("white")
#plt.style.use('fivethirtyeight')
plt.style.use('ggplot')
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
%config InlineBackend.figure_format = 'retina'


plt.rcParams["figure.figsize"] = (10,6)
plt.rcParams['axes.labelsize'] = 18
plt.rcParams['axes.titlesize'] = 22

pd.set_option('display.max_columns', 50)
pd.set_option('display.max_rows', 50)

Load Data

In [142]:
df = pd.read_csv('data/ad_table.csv')
df.head()
Out[142]:
date shown clicked converted avg_cost_per_click total_revenue ad
0 2015-10-01 65877 2339 43 0.90 641.62 ad_group_1
1 2015-10-02 65100 2498 38 0.94 756.37 ad_group_1
2 2015-10-03 70658 2313 49 0.86 970.90 ad_group_1
3 2015-10-04 69809 2833 51 1.01 907.39 ad_group_1
4 2015-10-05 68186 2696 41 1.00 879.45 ad_group_1
In [143]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2115 entries, 0 to 2114
Data columns (total 7 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   date                2115 non-null   object 
 1   shown               2115 non-null   int64  
 2   clicked             2115 non-null   int64  
 3   converted           2115 non-null   int64  
 4   avg_cost_per_click  2115 non-null   float64
 5   total_revenue       2115 non-null   float64
 6   ad                  2115 non-null   object 
dtypes: float64(2), int64(3), object(2)
memory usage: 115.8+ KB
In [144]:
df.describe()
Out[144]:
shown clicked converted avg_cost_per_click total_revenue
count 2115.000000 2115.000000 2115.000000 2115.000000 2115.000000
mean 68299.844444 3056.077069 126.453901 1.373749 1966.517589
std 48884.821409 3783.579969 233.420826 0.754331 3942.018757
min 0.000000 0.000000 0.000000 0.000000 -200.150000
25% 28030.500000 744.000000 18.000000 0.760000 235.470000
50% 54029.000000 1392.000000 41.000000 1.400000 553.300000
75% 97314.500000 3366.500000 103.000000 1.920000 1611.495000
max 192507.000000 20848.000000 1578.000000 4.190000 39623.710000

Observations:

  • Some problematic values, such as a negative revneue.

Data Cleaning

In [145]:
# convert date to datetime type
df['date'] = pd.to_datetime(df['date'])
In [146]:
df[df['total_revenue']<0]
Out[146]:
date shown clicked converted avg_cost_per_click total_revenue ad
1186 2015-10-25 50459 1904 97 0.89 -200.15 ad_group_23
1404 2015-11-01 70145 584 15 0.36 -25.09 ad_group_27
1524 2015-11-15 22070 1400 20 1.67 -3.56 ad_group_29
2089 2015-10-28 80270 1151 41 0.96 -92.06 ad_group_40
In [147]:
# There are 4 events with negative revenue
# We decide to remove those values
df = df[df['total_revenue']>=0]
In [148]:
# check if the funnel makes sense
len(df.query('shown<clicked | clicked<converted'))
Out[148]:
0
In [149]:
# check the shown trend across time by different group of ads
import matplotlib.dates as mdates
sns.set_style("white")
g = sns.lineplot(x='date',y='shown',hue='ad',data = df,legend='')
g.xaxis.set_major_locator(mdates.MonthLocator())
g.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
plt.title('Ad impressions');

Observation:

  • There are some sudden zeros definitely look weird.
In [150]:
# let's remove the zero values from shown data
df = df[df['shown']>0]
g = sns.lineplot(x='date',y='shown',hue='ad',data = df,legend='')
g.xaxis.set_major_locator(mdates.MonthLocator())
g.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
plt.title('Ad impressions');
In [151]:
# check the click trend
g = sns.lineplot(x='date',y='clicked',hue='ad',data = df,legend='')
g.xaxis.set_major_locator(mdates.MonthLocator())
g.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
plt.title('Ad clicked');

Observation:

  • There are still a few weird drops to zero, despite having removed when impressions was zero.
  • We decide to remove them, too.
In [152]:
df = df[df['clicked']>0]
g = sns.lineplot(x='date',y='clicked',hue='ad',data = df,legend='')
g.xaxis.set_major_locator(mdates.MonthLocator())
g.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
plt.title('Ad clicked');
In [153]:
# check the conversion trend
g = sns.lineplot(x='date',y='converted',hue='ad',data = df,legend='')
g.xaxis.set_major_locator(mdates.MonthLocator())
g.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
plt.title('Ad clicked');
In [154]:
# check out the final data
df.describe()
Out[154]:
shown clicked converted avg_cost_per_click total_revenue
count 2091.000000 2091.000000 2091.000000 2091.000000 2091.000000
mean 68853.852702 3088.744142 127.822573 1.387661 1989.242257
std 48799.535713 3792.468317 234.394500 0.746035 3958.838731
min 7494.000000 187.000000 0.000000 0.080000 0.000000
25% 28446.000000 756.500000 18.000000 0.780000 244.765000
50% 54569.000000 1413.000000 42.000000 1.410000 561.070000
75% 100395.500000 3413.500000 106.000000 1.920000 1634.895000
max 192507.000000 20848.000000 1578.000000 4.190000 39623.710000

Q: If you had to identify the 5 best ad groups, which ones would be? Which metric did you choose to identify the best ones? Why? Explain the pros of your metric as well as the possible cons. From a business perspective, choosing that metric implies that you are focusing on what?

There are obviously tons of ad related metric. And they are all plausible and useful in different situations, depending on the main business focus of a given company. A few examples are:

  1. Click-through-rate (CTR), # of clicks/ # of impressions, i.e. of all people seeing the ad, how many click on it?

    • Pros: it is great to identify demand from users. Also, since it only takes into account what the user does after seeing the ad, it can be easily estimated and allows to test and improve ad characteristics to maximize the number of users coming to the site
    • Cons: It has no information about what the user does once they come to the site as well as costs. A very high CTR is not that useful is no one of those users converts. It can also easily be tricked via click-bait ads.
    • This metric is purely growth-based when the goal is to bring users to the site
  2. Conversion rate (CR), # of conversions/ # of impressions, i.e. of all people seeing the ad, how many people end up becoming customers?

    • Pros: Fixes most of CTR issues. After all, the goal of an ad is to get customers, and this is what this metric represents.
    • Cons: It has no information about costs. Also, conversion on the site highly depends on the product. An ad could be great in capturing demand, but if the product is bad, no one converts. Finally, how to link conversion to an ad click is pretty messy, conversion might happen after a long time, might happen after multiple visits through different channels, etc.
    • This metric combines both growth and product.
  3. Profits: revenue from conversions - ad costs

    • Pros: Very concrete and business driven. Am I making money via ads?
    • Cons: Same as CR, hard to define revenue from an ad, not only there are the issues described above, but here you would need to predict user lifetime value to see if an ad is profitable. Also, it only considers users after they click on the ad. It has no information about the top of the funnel, i.e. how many users see the ad and choose to not click.
    • This metric focuses on the business/product side, but it is hardly useful from a growth standpoint. It is typically used by very large companies whose main goal is being profitable.

My choice of metrics in this project:

  • In this project, let’s look into profits and CTR. We will pick the top 5 ads based on CTR as long as profits are above a certain threshold. In plain English, that means we want the ads with the highest growth potential after removing the ads that would make us go broken.
In [155]:
# check CTR & profits by ad groups
data_rank = df.groupby('ad').apply(
             lambda x: pd.Series({
                       'CTR': (x['clicked']/x['shown']).mean(),
                       'profits': (x['total_revenue']-x['clicked']*x['avg_cost_per_click']).mean(),
                       'count': x['shown'].mean()
  })
).reset_index()
data_rank.head()
Out[155]:
ad CTR profits count
0 ad_group_1 0.039003 -1901.937547 69345.566038
1 ad_group_10 0.014175 -1650.159615 117701.365385
2 ad_group_11 0.063575 -1346.640408 19448.979592
3 ad_group_12 0.061022 -1194.148431 29145.549020
4 ad_group_13 0.085635 -3573.196275 161519.176471
In [156]:
data_rank['profits'].describe()
Out[156]:
count       40.000000
mean     -3727.781833
std       7316.360934
min     -42659.834528
25%      -3723.092961
50%      -1270.394420
75%       -237.768783
max       1027.480000
Name: profits, dtype: float64
In [157]:
# check the distribution of profits
sns.distplot(data_rank['profits'], kde = True, bins = 20);
In [158]:
# barplot: count of ads in each group
sns.barplot(x = 'ad', y='count', data = data_rank)
plt.xticks(rotation=45, ha='right');
In [159]:
# plot CTR vs.profits
plt.figure(figsize=(10,8))
plt.scatter(data_rank['profits'], data_rank['CTR'])
plt.axvline(x=-5000)
plt.xlabel('profits')
plt.ylabel('CTR')
plt.show()
  • We can define as threshold for profitability -5K.
  • We will remove all ads below that, and then find the ads with the highest CTR among the remaining ones.
  • Since we are considering short term revenue from the ad, we are expecting that number to be negative. An ad should be positive when taking into account a user life time value (as usual, lifetime -> 1 yr).
In [160]:
# Get the 5 best ad groups with highest CTR and above profitability of -5k
data_rank[data_rank['profits']>-5000].sort_values(by='CTR',ascending = False).head()
Out[160]:
ad CTR profits count
10 ad_group_19 0.094188 -3460.419038 19506.884615
20 ad_group_28 0.089191 -2644.280943 19886.339623
4 ad_group_13 0.085635 -3573.196275 161519.176471
27 ad_group_34 0.084257 -1189.270000 35371.622642
35 ad_group_5 0.063862 -3233.796538 52406.461538

Q: For each group, predict how many ads will be shown on Dec, 15

  • Use the time series package prophet to do the prediction.
In [161]:
# check the shown for ad group1
data_ad1 = df[df['ad']=='ad_group_1'][["date", "shown"]]
#plot the time series trend
plt.figure(figsize=(10,6))
g=sns.lineplot(x="date", y="shown", data=data_ad1)
g.xaxis.set_major_locator(mdates.MonthLocator())
g.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
g.set_title("Ad group 1") 
plt.show()
In [162]:
#change column names and index
data_ad1['ds'] = data_ad1['date']
data_ad1.rename({'shown': 'y'}, axis=1, inplace=True)
data_ad1.set_index('date', inplace=True)
data_ad1.head()
Out[162]:
y ds
date
2015-10-01 65877 2015-10-01
2015-10-02 65100 2015-10-02
2015-10-03 70658 2015-10-03
2015-10-04 69809 2015-10-04
2015-10-05 68186 2015-10-05
In [163]:
# current max day
data_ad1['ds'].max()
Out[163]:
Timestamp('2015-11-22 00:00:00')
In [164]:
#Let's build time series using prophet. 
import fbprophet

#build the model
ts = fbprophet.Prophet()
ts.fit(data_ad1)
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Out[164]:
<fbprophet.forecaster.Prophet at 0x7fcba20258d0>
In [165]:
#make predictions until Dec 15. This is the number of days for which we need a prediction to get to Dec 15
days_predictions = (pd.to_datetime("2015-12-15")-data_ad1['ds'].max()).days
future_data = ts.make_future_dataframe(periods = days_predictions)
predictions = ts.predict(future_data)
  
#let's plot the prediction
ts.plot(predictions)
plt.show()
In [166]:
#plot the time series component
ts.plot_components(predictions)
plt.show()

Observations:

  • There is some weekly pattern.
  • Most importantly, the time serie overall is pretty flat in October and starts going up in Nov.
  • Predictions for Dec 15 tend to be up given that the model assumes the time series keeps following its most recent trend. Obviously, this would be much more useful with yearly data, so we would be able to capture yearly seasonality.
In [167]:
# fit the model to all groups and make predictions

#data set to store results
predictions_all_groups = pd.DataFrame({'ad_group':df['ad'].unique(),
                                       'date': pd.to_datetime("2015-12-15")})
prediction_dec_15 = []    
predictions_all_groups.head()
Out[167]:
ad_group date
0 ad_group_1 2015-12-15
1 ad_group_2 2015-12-15
2 ad_group_3 2015-12-15
3 ad_group_4 2015-12-15
4 ad_group_5 2015-12-15
In [168]:
# loop through all groups 
for level in predictions_all_groups.ad_group:
     
     #subset data, only keep 1 ad group and columns needed
     tmp = df.loc[df['ad']==level][["date", "shown"]]
     
     #change column names and index
     tmp['ds'] = tmp['date']
     tmp.rename({'shown': 'y'}, axis=1, inplace=True)
     tmp.set_index('date', inplace=True)
     
     #build the model
     ts_tmp = fbprophet.Prophet()
     ts_tmp.fit(tmp)
     
     #make predictions until Dec 15. This is the number of days for which we need a prediction to get to Dec 15
     days_predictions_tmp = (pd.to_datetime("2015-12-15")-tmp['ds'].max()).days
     future_data_tmp = ts_tmp.make_future_dataframe(periods = days_predictions_tmp)
     predictions_tmp = ts_tmp.predict(future_data_tmp)
     
     #store results for the last day, i.e. Dec 15 prediction
     prediction_dec_15.append(round(predictions_tmp['yhat'].iat[-1]))
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
In [169]:
# prediction results
predictions_all_groups['predictions'] = prediction_dec_15
predictions_all_groups
Out[169]:
ad_group date predictions
0 ad_group_1 2015-12-15 77558.0
1 ad_group_2 2015-12-15 50305.0
2 ad_group_3 2015-12-15 152888.0
3 ad_group_4 2015-12-15 94679.0
4 ad_group_5 2015-12-15 54750.0
5 ad_group_6 2015-12-15 40587.0
6 ad_group_7 2015-12-15 56988.0
7 ad_group_8 2015-12-15 47171.0
8 ad_group_9 2015-12-15 124642.0
9 ad_group_10 2015-12-15 127043.0
10 ad_group_11 2015-12-15 19964.0
11 ad_group_12 2015-12-15 28000.0
12 ad_group_13 2015-12-15 162073.0
13 ad_group_14 2015-12-15 8715.0
14 ad_group_15 2015-12-15 15773.0
15 ad_group_16 2015-12-15 30494.0
16 ad_group_17 2015-12-15 134101.0
17 ad_group_18 2015-12-15 89749.0
18 ad_group_19 2015-12-15 19889.0
19 ad_group_20 2015-12-15 120417.0
20 ad_group_21 2015-12-15 28320.0
21 ad_group_22 2015-12-15 26866.0
22 ad_group_23 2015-12-15 48616.0
23 ad_group_24 2015-12-15 37394.0
24 ad_group_25 2015-12-15 174387.0
25 ad_group_26 2015-12-15 72261.0
26 ad_group_27 2015-12-15 65385.0
27 ad_group_28 2015-12-15 20984.0
28 ad_group_29 2015-12-15 21327.0
29 ad_group_30 2015-12-15 110710.0
30 ad_group_31 2015-12-15 123616.0
31 ad_group_32 2015-12-15 39847.0
32 ad_group_33 2015-12-15 16240.0
33 ad_group_34 2015-12-15 38683.0
34 ad_group_35 2015-12-15 60008.0
35 ad_group_36 2015-12-15 71430.0
36 ad_group_37 2015-12-15 71485.0
37 ad_group_38 2015-12-15 177108.0
38 ad_group_39 2015-12-15 26122.0
39 ad_group_40 2015-12-15 80638.0

Q: Cluster ads into 3 groups: the ones whose avg_cost_per_click is going up, the ones whose avg_cost_per_click is flat and the ones whose avg_cost_per_click is going down

In [170]:
# plot the overall avg_cost_per_click
g=sns.lineplot(x="date", y="avg_cost_per_click",hue="ad", data=df, legend="")
g.xaxis.set_major_locator(mdates.MonthLocator())
g.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
plt.title("Avg Cost per Click") ;

It is not easy to detect trend from the plot above, though the overall trends seem to be flat.

Strategy:

  • build a linear regression for each ad group.
  • Cost_per_click will be the y value and the x axis is days as a number (1,2,3,etc).
  • We will then check the coefficient of the regression【use Bonferroni rule for significance】:
    • Positive and significant coefficient -> going up
    • Negative and significant coefficient -> going down
    • Non-significant coefficient -> flat
In [171]:
import statsmodels.api as sm

# store results
coefficient = []
p_value = []

# Loop through all ad groups
for level in df['ad'].unique():  
  #subset dataset and keep variables we care about
  data_reg = df.loc[df['ad']==level][["date", "avg_cost_per_click"]]
  #replace dates with numbers. 0 is the minimum date, 1 is minimum date + 1 day and so on
  data_reg['date'] = (data_reg['date'] - data_reg['date'].min()).dt.days
  #add intercept
  data_reg['intercept'] = 1
  
  #build the regression
  lr = sm.OLS(data_reg['avg_cost_per_click'],data_reg.drop('avg_cost_per_click', axis=1)).fit()
  #save coefficient and p-value
  coefficient.append(lr.params[0])
  p_value.append(lr.pvalues[0])
In [172]:
# put all together in a dataframe
lr_output = pd.DataFrame({
                  'ad_group': df['ad'].unique(),
                  'coefficients': coefficient,
                  'p_value': p_value})

# classify based on the rule described above. 
# Note:we use Bonferroni rule for significance
lr_output['cluster'] = np.where((lr_output['coefficients']>0) & (lr_output['p_value']<0.05/40), 'going_up',
                                np.where((lr_output['coefficients']<0) & (lr_output['p_value']<0.05/40), 'going_down',
                                         'flat'))
  
lr_output                                       
Out[172]:
ad_group coefficients p_value cluster
0 ad_group_1 0.000926 0.145429 flat
1 ad_group_2 -0.000006 0.991095 flat
2 ad_group_3 0.002445 0.255564 flat
3 ad_group_4 -0.000577 0.804445 flat
4 ad_group_5 -0.001018 0.440261 flat
5 ad_group_6 -0.001734 0.077129 flat
6 ad_group_7 -0.002569 0.120068 flat
7 ad_group_8 -0.002689 0.026231 flat
8 ad_group_9 0.000044 0.434626 flat
9 ad_group_10 -0.000161 0.861238 flat
10 ad_group_11 0.000490 0.655742 flat
11 ad_group_12 0.000861 0.494787 flat
12 ad_group_13 0.000198 0.874992 flat
13 ad_group_14 -0.000776 0.168030 flat
14 ad_group_15 0.000406 0.533287 flat
15 ad_group_16 -0.000595 0.280798 flat
16 ad_group_17 -0.000027 0.894071 flat
17 ad_group_18 0.000625 0.691426 flat
18 ad_group_19 -0.002980 0.139845 flat
19 ad_group_20 0.000685 0.467957 flat
20 ad_group_21 -0.001027 0.234991 flat
21 ad_group_22 -0.000244 0.695846 flat
22 ad_group_23 -0.000539 0.641059 flat
23 ad_group_24 -0.006371 0.030700 flat
24 ad_group_25 -0.000100 0.870318 flat
25 ad_group_26 0.000724 0.699794 flat
26 ad_group_27 -0.000295 0.156161 flat
27 ad_group_28 -0.001832 0.188441 flat
28 ad_group_29 -0.001375 0.314217 flat
29 ad_group_30 -0.000114 0.878117 flat
30 ad_group_31 0.000080 0.449891 flat
31 ad_group_32 0.004258 0.033788 flat
32 ad_group_33 -0.000967 0.319079 flat
33 ad_group_34 0.000128 0.932790 flat
34 ad_group_35 -0.000412 0.163060 flat
35 ad_group_36 -0.001072 0.056771 flat
36 ad_group_37 0.002610 0.076241 flat
37 ad_group_38 -0.000809 0.432336 flat
38 ad_group_39 0.001141 0.327894 flat
39 ad_group_40 0.001925 0.006205 flat

Insights:

  • Looks like everything is flat.
  • W/o using the Bonferroni correction (i.e. had we chosen a less aggressive p-value threshold), we would have likely found some significant ads.
  • Also, via a linear regression, we are looking at the overall trend. That is, old data points weigh just as much as the most recent ones. Had we focused only on the most recent data, we obviously might have found different results.
In [ ]: