Predicting Game Play Time on Steam

2 min readJan 27, 2021

PART 1 — DATA

My final project at General Assembly Data Science Immersive Program was predicting game average playtime on Steam and I wanted to share a little bit about my experience and progress of this project.

First of all, Steam is a platform that is currently the largest PC game online distributor and it accumulates user and game data constantly. One of the data that is collected is user’s play time per game and overall. I chose this metric for my project as it is ultimately customer engagement measurement. Using various regression models, I wanted to see if I can predict the average time played based on game features.

Data was acquired using Steam Web and SteamSpy API. Steam Web has public access to player and game data and SteamSpy is a third party site that aggregates statistics on Steam Games. While acquiring the data, I stumbled on some potential issues. The distribution of playtimes was very skewed and looking more into it, the data gaps were most likely affected by private account settings by many users. If a user chose the privacy settings, their data is restricted from external access.

After acquiring the data, I ended up with 30,000 rows and 50 columns all together. After dropping rows with mostly null values and columns that are duplicated or irrelevant (such as media and images), I ended up with 25,000 rows and 32 columns. I needed to filter out games with little to no owners and kept only paid games, to balance out the data. I found the For my numerical features (required age, ratings, number of owners, price, discount. I filled any remaining null values with the median and dropped any extreme outliers. The categorical features (Category, Developer, Genre) where one hot encoded. Now that the data cleaning and initial EDA is complete, next step is to apply linear regression to the features as a baseline model.

I’ll go further into detail on EDA and the models in part 2 of my capstone project blog. Stay tuned!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Steam

Linear Regression

Written by Dominika Jones

0 Followers

7 Following

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

More from Dominika Jones

Dominika Jones

First Hackathon: My Experience

A month ago I participated in my first hackathon and it was amazing!

Apr 9, 2021

Dominika Jones

Seduced by the Dark Arts of Coding

On one beautiful Chicago evening (a long time ago before the quarantine era) I recall the first time my friends who were either…

Jan 25, 2021

Dominika Jones

Data Science Bootcamp Survival Guide

A few months ago I joined a 10-week data science immersive program at General Assembly. Looking back with just one week to go, I’m…

Jan 22, 2021

See all from Dominika Jones

Recommended from Medium

How Does Our Sense of Humor Change With Age? A Statistical Analysis

Fanfare

Daniel Parris

How Does Our Sense of Humor Change With Age? A Statistical Analysis

How do our comedic sensibilities form and transform over time?

Jun 22, 2024

343

2024-’25 NBA Predictions To Feed The Ego

Rory Masterson

2024-’25 NBA Predictions To Feed The Ego

This is literally just a list of predictions for awards/playoff seeding/etc. ahead of the start of the NBA season tonight, just so I have a…

Oct 22, 2024

Lists

Practical Guides to Machine Learning

10 stories2225 saves

Predictive Modeling w/ Python

20 stories1857 saves

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

DataDrivenInvestor

Austin Starks

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

It literally took one try. I was shocked.

Sep 15, 2024

9.1K

242

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jessica Stillman

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Oct 30, 2024

25K

732

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Level Up Coding

Jacob Bennett

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Jan 7

10.6K

260

Data Science All Algorithm Cheatsheet 2025

Artificial Intelligence in Plain English

Ritesh Gupta

Data Science All Algorithm Cheatsheet 2025

Stories, strategies, and secrets to choosing the perfect algorithm.

Jan 5

1.4K

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams