Analyzing Drop in User Engagement: A SQL Case Study

Using product analytics and hypothesis testing to analyze a decrease in the engagement of users on Yammer

Shweta Yadav
Towards Data Science

--

Source

It’s Monday morning after Thanksgiving… You are still drooling over that perfectly roasted turkey, creamy mashed potatoes, and heavenly pumpkin pie. You sluggishly turn on your laptop and see an angry email from the head of the Product team. In the email, he sounds infuriated about the sudden drop in user engagement and asks you to get to the bottom of this situation right away.

Immediately your game face is on, you fire up the engagement dashboards, and this figure pops up:

Image by Author

Side Note: The data I am using is not recent. It belongs to the time period between May 2014 to September 2014 and hypothetically I am doing this case study in September 2014. I’ve written the above paragraph to make this case study more engaging to read.

After taking a good look at this chart it is apparent that the decline started on July 28. The next course of action is to come up with a list of possible causes for the dip in engagement and test each one of them. But first, let’s acknowledge what is Yammer and what does it mean by user engagement?

Different types of engagement in Yammer data (Image by Author)

Yammer is a social network for communicating with coworkers. Individuals share documents, updates, and ideas by posting them in groups. The engagement in yammer is defined as having made some type of server call by interacting with the product.

There is a total of 17 types of engagements in the dataset provided (shown in the adjacent table).

The goal is to determine the cause of the dip in these events and recommending solutions for the problem.

Datasets

Table 1: Users

This table includes one row per user, with descriptive information about that user’s account.

Table 2: Events

This table includes one row per event, where an event is an action that a user has taken on Yammer.

Table 3: Email Events

This table contains events specific to the sending of emails. It is similar in structure to the events table above.

Table 4: Rollup Periods

The final table is a lookup table that is used to create rolling time periods.

Disclaimer: The data was generated for the purpose of this case study. It is similar in structure to Yammer’s actual data, but for privacy and security reasons it is not real. Here is the link to the datasets and case study on Mode.

Source

Digging In with Analysis

Hypothesis 1: A drop is a one-time event due to server / technical outage or data refresh failure.

SELECT DATE_TRUNC('day',created_at) AS day,
COUNT(*) AS all_users,
COUNT(CASE WHEN activated_at IS NOT NULL THEN user_id ELSE NULL END) AS activated_users
FROM tutorial.yammer_users
GROUP BY 1
ORDER BY 1
Number of users signing up daily (Image by Author)
  • The above figure demonstrates a normal growth rate — it continues to be high during the week, low on weekends
  • Also, the drop in engagement is not localized to a particular day. It is gradual over the span of a few months. Therefore the cause is not likely due to a one-time glitch.

Hypothesis 2: The decline is happening in an isolated region either due to holiday season or a new competitive product in that region.

In order to test this, I categorized the countries into the respective continents.

WITH geographic_regions AS(
SELECT user_id,
CASE
WHEN location IN ('Indonesia', 'Korea', 'Singapore', 'Israel', 'Malaysia', 'Hong Kong', 'Saudi Arabia', 'Philippines', 'Turkey', 'United Arab Emirates', 'Taiwan', 'Thailand', 'India', 'Iran', 'Japan', 'Iraq', 'Russia', 'Pakistan')
THEN 'Asia'
WHEN location IN ('Venezuela', 'Colombia', 'Argentina', 'Chile', 'Brazil')
THEN 'South_America'
WHEN location IN ('Sweden', 'Ireland', 'Portugal', 'Finland', 'France', 'Netherlands', 'Spain', 'Belgium', 'Italy', 'United Kingdom', 'Germany', 'Greece', 'Denmark', 'Switzerland', 'Norway', 'Austria', 'Poland')
THEN 'Europe'
WHEN location IN ('United States', 'Canada', 'Mexico')
THEN 'North_America'
WHEN location IN ('Nigeria', 'Egypt', 'South Africa')
THEN 'Africa'
WHEN location IN ('Australia')
THEN 'Oceania'
ELSE null end as geographic_region,
COUNT(*)
FROM tutorial.yammer_events
WHERE event_type = 'engagement'
GROUP BY 1, 2
)
SELECT cast(date_trunc('week', occurred_at) as date) AS start_week_date, geographic_region,
COUNT(DISTINCT events.user_id) AS num
FROM tutorial.yammer_events events
JOIN geographic_regions
ON events.user_id = geographic_regions.user_id
GROUP BY 1, 2
Image by Author
  • North America particularly the United States with the highest number of active users has the most significant 14% drop.
  • But the decline is not associated with any one particular region. North America, Europe, and Asia all are showing a big dip around July end.
  • My suggestion would be to check for local holidays and a new release of similar products in these three regions.

Hypothesis 3: There is a decline in engagement happening in competitive products as well.

  • The sudden spike in market competitor Slack users at the beginning of August contradicts this hypothesis.
  • It can be the reason for the drop in engagement of Yammer users but it is difficult to confirm with given data.
  • In this case, a detailed competitive analysis needs to be done after getting relevant data about Slack if possible.

Hypothesis 4: A broken feature in some devices is preventing people from using Yammer.

WITH device_type AS(
SELECT device,
CASE WHEN device in ('iphone 5','iphone 4s','iphone 5s','nexus 5','samsung galaxy s4','htc one','nokia lumia 635','samsung galaxy note','amazon fire phone') THEN 'Phone'
WHEN device in ('windows surface','kindle fire','ipad mini','samsumg galaxy tablet','ipad air','nexus 7','nexus 10') THEN 'Tablet'
WHEN device in ('dell inspiron notebook','macbook air','macbook pro','lenovo thinkpad','acer aspire notebook','asus chromebook','acer aspire desktop','mac mini','hp pavilion desktop','dell inspiron desktop') THEN 'Desktop'
END AS device_type
FROM tutorial.yammer_events),

device_count AS(
SELECT device,
COUNT(CASE WHEN date_trunc('month', occurred_at) BETWEEN '2014-07-01' AND '2014-07-31' THEN user_id ELSE NULL END) AS July_Sales,
COUNT(CASE WHEN date_trunc('month', occurred_at) BETWEEN '2014-08-01' AND '2014-08-31' THEN user_id ELSE NULL END) AS August_Sales
FROM tutorial.yammer_events
WHERE event_type='engagement'
AND event_name = 'login'
AND occurred_at BETWEEN '2014-07-01' AND '2014-08-31'
GROUP BY 1
)
SELECT device_count.device, device_type.device_type, August_Sales, July_Sales, (August_Sales - July_Sales) as diff
FROM device_count
JOIN device_type
ON device_count.device = device_type.device
GROUP BY 1, 2, 3, 4, 5
ORDER BY 5
Image by Author
  • It is possible that something in the operating system of a device is broken. For example: If a mobile app was unstable, crashed, or recently updated can affect engagement for only that device type.
  • Phone users are having a much more pronounced drop in engagement especially iPhones but it is difficult to pinpoint one particular brand.

Hypothesis 5: The tracking code is broken causing an interruption in sign-up flow or the product not providing the right value to users.

SELECT cast(date_trunc('week', occurred_at) as date) AS week,
COUNT(CASE WHEN event_name IN ('login') THEN user_id ELSE NULL END) AS login,
COUNT(CASE WHEN event_name IN ('home_page') THEN user_id ELSE NULL END) AS homepage,
COUNT(CASE WHEN event_name IN ('view_inbox','like_message','send_message') THEN user_id ELSE NULL END) AS message,
COUNT(CASE WHEN event_name IN ('search_autocomplete', 'serach_run', 'search_click_result_1','search_click_result_2','search_click_result_3','search_click_result_4','search_click_result_5','search_click_result_6','search_click_result_7','search_click_result_8','search_click_result_9','search_click_result_10') THEN user_id ELSE NULL END) AS search_click
FROM tutorial.yammer_events
GROUP BY 1
Image by Author
  • There is a significant decline in users loading the homepage and the number of interactions with message service. Login rate though is not showing a large dip.
  • It seems like there is a problem with keeping users engaged on Yammer and their growth over time.

Hypothesis 6: The decrease in engagement is due to a lack of retention from older users.

SELECT DATE_TRUNC('week',z.occurred_at) AS "week",
AVG(z.age_at_event) AS "Average age during week",
COUNT(DISTINCT CASE WHEN z.user_age > 70 THEN z.user_id ELSE NULL END) AS "10+ weeks",
COUNT(DISTINCT CASE WHEN z.user_age < 70 AND z.user_age >= 63 THEN z.user_id ELSE NULL END) AS "9 weeks",
COUNT(DISTINCT CASE WHEN z.user_age < 63 AND z.user_age >= 56 THEN z.user_id ELSE NULL END) AS "8 weeks",
COUNT(DISTINCT CASE WHEN z.user_age < 56 AND z.user_age >= 49 THEN z.user_id ELSE NULL END) AS "7 weeks",
COUNT(DISTINCT CASE WHEN z.user_age < 49 AND z.user_age >= 42 THEN z.user_id ELSE NULL END) AS "6 weeks",
COUNT(DISTINCT CASE WHEN z.user_age < 42 AND z.user_age >= 35 THEN z.user_id ELSE NULL END) AS "5 weeks",
COUNT(DISTINCT CASE WHEN z.user_age < 35 AND z.user_age >= 28 THEN z.user_id ELSE NULL END) AS "4 weeks",
COUNT(DISTINCT CASE WHEN z.user_age < 28 AND z.user_age >= 21 THEN z.user_id ELSE NULL END) AS "3 weeks",
COUNT(DISTINCT CASE WHEN z.user_age < 21 AND z.user_age >= 14 THEN z.user_id ELSE NULL END) AS "2 weeks",
COUNT(DISTINCT CASE WHEN z.user_age < 14 AND z.user_age >= 7 THEN z.user_id ELSE NULL END) AS "1 week",
COUNT(DISTINCT CASE WHEN z.user_age < 7 THEN z.user_id ELSE NULL END) AS "Less than a week"
FROM (
SELECT e.occurred_at,
u.user_id,
DATE_TRUNC('week',u.activated_at) AS activation_week,
EXTRACT('day' FROM e.occurred_at - u.activated_at) AS age_at_event,
EXTRACT('day' FROM '2014-09-01'::TIMESTAMP - u.activated_at) AS user_age
FROM tutorial.yammer_users u
JOIN tutorial.yammer_events e
ON e.user_id = u.user_id
AND e.event_type = 'engagement'
AND e.event_name = 'login'
AND e.occurred_at >= '2014-05-01'
AND e.occurred_at < '2014-09-01'
WHERE u.activated_at IS NOT NULL
) z

GROUP BY 1
ORDER BY 1
Image by Author
  • Cohort analysis is a tool to measure user engagement over time. It helps to know whether user engagement is actually getting better over time or lack of activity of the old users is being hidden by the impressive growth numbers of new users.
  • Dividing the users by when they signed up first on Yammer shows that those who signed up more than 10 weeks prior to May 1 have decreasing levels of engagement over time.
  • This confirms that a decrease in engagement is due to a lack of retention from older users.

Conclusion

In summary, my approach to finding the cause of lower user engagement on Yammer is as follows: I started by understanding the context so that I could discard issues related to single server/technical outages, regions, and specific platforms.

Then, I dug deeper to identify what could affect user behavior to stop them from engaging. Of the six possible causes, I think the most plausible ones are: lack of retention of older users resulting in less activity on the homepage and messaging over the platform and an external new competitive feature on Slack.

Recommendations

My suggestions for the improvement in customer engagement after the analysis are as follows:

  • We should establish a process for obtaining customer feedback and share that information with the rest of the organization for better planning of further steps.
  • A customer loyalty program can reward customers for their continued loyalty and ensure their retention.
  • Measuring customer’s response to calls, outreach emails to check for patterns in their behavior.
  • Detailed investigation of similar features on Slack that are experiencing a proportional increase in engagement which might be causing users to switch in order to fulfill their use-cases.

Thanks for Stopping By!

Connect with me on LinkedIn.

--

--

“If you really look closely, most overnight successes took a long time.” -Steve Jobs