New Year New Me! But instead of having a New Year’s resolution, I have a pre-New Year’s resolution: To get my weight back to healthy range before 2020. I remembered my 3-month internship back in 2018 at an extremely fast-paced startup with high demands of progression every single day that I did not realize how much stress-eating I brought myself upon. Everyday ended with me being burnt out without any energy left to get my butt off for a quick workout.
At the end of my internship and I gained a hefty 6kg which instantly brought my BMI status to overweight. 6kg in 3 months! I was absolutely shocked and the changes in my appearance became apparent.
Fast forward to today, I am thankfully employed as a data analyst in a large corporation where the work-life balance has allowed me to spare some energy to hit the gym right after work.
My weight situation
I have always wanted to shed some fats off my body and get in shape because my BMI is at borderline overweight during the start of this project. Online sources pointed out that keeping your exercise intensity to 50–70% is optimal for percentage of calories burnt from fats.
I tried this out by hitting the gym 5 times a week for 2 weeks and sure enough I have already seen subtle changes in the shape of my body. My total weight displayed on the weighing scale did not change much but I was happy to see the growth in strength.
As I continued to hit the gym frequently, I became inspired to utilize the idea of applying machine learning to predict my future weight loss based on the type of gym workout I do and the diet I take. I would like initiate a classic data science approach to predict my weight change based on the following methods:
Data acquisition: Logging my nutrition and gym activity through mobile apps and keying in the data into a csv file to be imported into a Python script.
Analytics: Visualizing the individual relationship of my diet habit and gym routine and decide which features are suitable for regression models.
Statistics: Understanding how my diet habit and gym routine affect my weight change and remove highly correlated features.
Modelling and evaluating: Choosing the best linear regression model based on R-squared statistics and lowest MAE.
Disclaimer: I am neither a nutritionist nor a sports science expert. I am writing this article out of interest in combining both data science and exercising. The knowledge I have in sports nutrition is limited to what I read on the internet and a little bit on my diploma in Biomedical Sciences. So if any of the following information about nutrition may seem incorrect to the readers, I do apologize. I am merely interested in the data aspect.
This is my personal weight loss journey with my own set of data which does not necessarily work for you. Every individual’s body works differently and you should supply your own set of data if you are interested in embarking on this journey.
To put into perspective at where I stand at the start of this project, here are some measurements:
Current Weight: 74.9kg
Current BMI: 24.96 (Yes I was 0.04 away from being considered overweight)
Target Weight: 68kg
Weight and other body metrics are measured every morning right after waking up. Calories and macro-nutrient breakdown are logged in MyFitnessPal app free version and workout data is captured with the Jefit app elite version.
All weekday workouts occur right after work in the evening and weekend workouts typically occur in the afternoon.
Before deciding on the variables to be used for predictive analysis, let’s look at each individual variables and explore their relationship towards weight change.
Weight change: I considered using fat % change as the variable of interest. Because after all, burning fats and building muscles are the desired results of this project. However, many sources pointed out the inaccuracy of bathroom body composition scale although it is nice to see certain metrics related to your body. I tried using XiaoMi’s Body Composition Scale and even the product review itself admits its inaccuracy despite it having packed with an array of useful features.
I decided to use weight change as the dependent variable of interest instead because I would like to explore the relationship between my activities and the corresponding weight changes.
I decided to use weight change as the dependent variable of interest instead because I would like to explore the relationship between my activities and the corresponding weight changes.
The weight_change variable is simply derived from subtracting weight_tomorrow with weight_today.
For consistency purposes, all measurements are taken in the next morning right after waking up.
Excess calories: I define excess calories as the calories consumed minus the basal metabollic rate(BMR) of myself. The basal metabollic rate of a person is calculated by the following formula:
- Women: BMR = 655 + (9.6 * weight in kg) + (1.8 * height in cm) - (4.7 * age in years)
- Men: BMR = 66 + (13.7 * weight in kg) + (5 * height in cm) - (6.8 * age in years)
- Excess Calories = Calories consumed - BMR
Protein-carb ratio: Look up the internet on articles about low carb diets and the more you read the more you are convinced that restricting carbohydrates burns stored fats as energy.
If you hit the gym frequently, you will understand how protein is essential in muscle repair and growth so you can recover and press forward in your subsequent workout. Other sources point out that replacing your carbs with protein is key to weight loss.
This gave me the interest to investigate its relationship with weight loss since I intend to replace my carbohydrates with higher consumption of proteins like meat, tofu and protein shakes.
Fats: Contrary to popular belief that eating fats will make you fat, substituting your carbs with good fat helps with burning overall fats. Unfortunately, the mobile app I am using does not provide accurate separation of fats into saturated fats, polyunsaturated fats and trans fats. But naturally, I would think that continuous consumption of fat from regular diet may lead to increase in weight change.. right?
Exercise type: I followed Jefit’s “5-Day Muscle Mass Split” program and tailored it to my requirements and available gym equipment. This is a categorical variable limited to the following types of workouts:
- Chest and back
- Legs and abs
- Shoulders and back
- Chest and legs
Since the app does not separate how much time is spent on each muscle component (except arms day), I decided to group them into a single exercise type to maintain consistency instead of separating each muscle component with one-hot encoding.
Gym Weight Lifting Rate: This is the first app I have tried that shows how much weight have been lifted per session. Certain workouts like arms day will expect a lower weight lifted since the arms do not belong to major muscle groups whereas we will expect to see higher weight lifted in workouts that include chest, back and legs. Knowing how much weight is lifted everyday may appear to be an adequate measure of performance on a particular workout at first. However, I believe that the amount of time taken to rest applies to the intensity of the workout. Imagine resting 2 minutes versus 1 minute between sets , the difference is huge and it is key to maintaining that 50–70% exercise intensity.
Exploratory Data Analysis
It’s time to dive into the data aspect of this project. Here is a TL:DR dashboard summary of my weight loss:
First of all, pardon the horrible lighting with the transformation pictures. When I first took it, I wasn’t very serious about embarking on this journey and I wasn’t sure about the right angle for the picture to be taken. It was when I start to see gradual results that I realized I should have picked a place with better lighting. To maintain consistency, I took the ‘After’ picture with similar settings.
To kick things off, here is a quick simple linear regression on my daily weight over time:
Before I proceed, do note that the regression plot is done throughout the entire dataset whereas the summary statistics takes away the data from the last 14 days. I will explain this later.
Clearly, doing a simple linear regression using number of days as the dependent variable does not truly explain the reason about my weight loss. However, it is interesting to know that I am losing an average of 0.068kg per day which brings it to an average of 0.476kg per week. With an R-squared statistic of 98%, it explains the consistency of my habits to bring my weight down.
So how does nutrition affect weight change?
From skipping meals to gorging in all-you-can-eat buffets, you are what you eat. Everyone has a specific BMR that burns enough calories to maintain their bodily functions daily. It is natural to assume that your weight is directly linked to how much you consume in excess to your personal BMR.
There appears to be a fairly positive relationship between excess calories and weight change. To be fair I did expect a stronger correlation between the two variables. There are a couple of reasons for this observation:
Counting calories is just an estimate of what you eat. MyFitnessPal has a database of food that does not necessarily provide accurate breakdown of macronutrients since it is shared globally by the users. Certain food especially Asian dishes do not exist in the database and I had to provide a rough estimate of the macronutrients by comparing to similar dishes.
The physiology of my body plays a part as well. When I was sick, I was constantly dehydrated even after drinking lots of water. During my sick period, I constantly lost weight. But the moment I became well again, my weight spiked up. Having to work with limited data, these anomalies may affect my analysis.
We know that bodybuilders swear on high protein consumption while cutting back a little bit on carbs to achieve that chiseled look. Portioning your macro-nutrient ratio of carbs:protein:fat to 40:30:30 is a great way to start for beginners and that is where I started. As I progressed through my journey, the ratio slowly shifted to 40:40:20. So ideally I would do my best to keep my protein-carb ratio to 1 and above for weight loss.
This partially explains why my weight loss is not at a rapid rate as compared to those going on keto-diet. Most of my daily consumption still has high carb count! Even when I omitted from eating any staples for dinner at the second half of my journey. A negative correlation tells me that reducing carb intake and substituting it with protein does help with weight loss. But what about fat? Surely it has a positive correlation right?
Surprising, isn’t it? There’s barely any correlation between weight change and fat consumption. It really boils down to the amount of carbs you take and the excess calories that are highly responsible for your weight change! Don’t just take my word for it, you can check out this experiment from Ariel Faigon.
In the top foods that are responsible of weight loss, it includes olive oil, tempura, eggs and bacon! Yes, delicious bacon! It is also no surprise that many staples are responsible for weight gain.
As far as you have noticed, I have used Pearson’s coefficient as a primary metric to assess the relationship between weight change and nutrition variables. Because of this I do have to stress that correlation does not equal to causation. Reducing your carbs and replacing them with protein does not always guarantee weight loss. There will always be an independent third party factor that affects weight change, which is why we are going to investigate the effects of my workout towards weight change.
Does the type of workout affect weight change?
If you have looked at the dashboard profile of my weight loss, you will notice that most of my weight loss came from legs/abs day and cumulative rest days accounted for positive weight change. However we cannot simply conclude that doing legs/abs will definitely result in the greatest weight loss. Let’s look at a boxplot for comparison:
At first glance, we can see that certain exercise types including rest days have highly skewed data where the median falls close to the upper/lower quartiles. It is fair to assume that not all exercise types do not conform to the normal distribution. This is largely due to the fact that the number of data captured is very limited. Recall that I only have 12 rest days out of a total of 87 days in this project. As the number of data grows larger, each variable will most likely follow a normal distribution. A non-parametric statistical test for independent variables is required to validate the differences in distribution, namely the ‘Mann-Whitney U Test’. However, I recently discovered that Python’s statsmodels library has conveniently included a pairwise t-test. And even though it is not the right way to conduct this test, I went ahead with it because it still gave me some degree of insights in the differences of the distribution. Here’s how it looks:
Although the results show that certain workouts as compared to rest days are inconclusive. I am fairly confident as the data collection grows larger, the p value will converge to a lower value.
There are still a couple of things to explore, but I’ll put a stop here and proceed with the main objective of this project. I will share my data in my Github and you can have fun exploring around.
Predicting my Weight Change
At the end of my project, I have shifted my focus from weight loss to mass gain, so there is no point to predict my future weight with the models I am about to introduce. Recall early on that I will be taking away the last 14 days of my project as test set. That is about a 16% train-test split that I will use for my modelling. Let’s start with a very simple multiple linear regression to find out the magnitude of the independent features towards weight change:
Looking at the R-squared statistics it does not appear to be a great model for prediction. Of all the features in this list, it appears that protein-carb ratio plays the most important role in weight change, and one with a negative coefficient.
Speaking of feature importance, I will apply random forest regression and XGBoost with hyperparameters tuned using GridSearchCV to rank which features provide high magnitude towards weight loss:
Hold on a minute… Why is the excess calories variable second in importance and categorical excess calories variable (yes/no) way below? On second thought, it appears that all one-hot encoded features are placed at the bottom half of the chart. Looking around the web, this seems like a viable explanation, taken from a well-explained article about random forest feature importances.
“Cons: biased approach, as it has a tendency to inflate the importance of continuous features or high-cardinality categorical variables”
Let’s see how XGBoost fares against random forest regression:
Now this makes so much more sense! We see a great mixture of categorical and continuous variables this time. As we compare this with the continuous scatterplots generated for nutrition, we can observe that protein-carb ratio plays a more important role than excess calories and fats.
However, I am still pretty skeptical about it’s predictive capabilities. In the previous algorithms, I deliberately fitted in all the features I have selected for prediction. Looking back, there appears to be certain variables that may jeopardize performance, such as ‘chest and back’ in both cases.
So what are the redundant features that can be removed to improve prediction? We answer this using yet another random forest regression’s upgraded algorithm, its recursive feature elimination.
Well well well… It looks like all the categorical/binary features have been eliminated! As much as it pains me to have to remove those features, robust prediction comes with a little bit of sacrifice.
Prediction and Evaluation
To be honest, this is going to be a rather disappointing section, if you are hoping to see XGBoost performing the best among the 4, get ready because here it comes:
XGB came out with the highest MAE score with the worst prediction among the three. In fact, although I did not include the linear regression using days as the independent variable, you can pretty much guess that it will make the best prediction as compared to the 4 of them.
The RFE algorithm worked best because certain features that have very little to do with weight change are eliminated. XGB and random forest regression did not perform as well as I expected it to be because it has a plethora of hyperparameters to tune not even my laptop can handle.
The lack of collected data might be another factor for the algorithms to not work as well as it should. After all, it’s not even 90 days! Looking at the rather erratic pattern on the line chart of the RFE algorithm pretty much sums it up.
I can go on with an arsenal of regression algorithms and maybe even making use of deep learning to bring it closer to the actual value. But I’ll put a stop here.
Final Thoughts & Takeaways
A new me
Today my weight has reached a healthy BMI range. I am absolutely happy to have achieved where I want to be in 3 months. I feel a lot less lethargic in my day-to-day routine due to my diet choices and workout habits. 2020 has definitely started on a positive note!
Although many websites share similar advice on nutrition and workout for weight loss, employing data science for this personal project has provided valuable insights about the magnitude of each feature in conjunction to weight loss.
How did I do it and what can you do?
Everyone has a different source of motivation when it comes to sculpting their body the way they want it to be. In most cases, we look at our reflection and ask ourselves what can we do to achieve it.
And then we start forming excuses.
And then we stop there.
The great hurdle of maintaining consistency is where many of us fail to overcome in order to achieve the body we ever dreamed of.
For me, as weird as it sounds, data science was what drove me to the turning point. Collecting data and monitoring my activities were the main drivers of my successful weight loss.
Everyday I watch my food intake, making sure I do my best to eat below the BMR(including cheat days) and make sure I visit the gym for an average of 1 hour. These are possible because convenient apps have allowed me to track my progress.
I always tell people if they are serious about gaining/losing weight they have to develop a habit of consistency in their diet and exercise. When I tell them I workout about one hour almost everyday, “I don’t really have the time” is the number 1 excuse under their belt.
Let’s do a little bit of math.
1 hour out of 24 hours is 4.17% of your day dedicated to exercising. 4.17%. And assuming best case scenario if you are working out 6 days a week, that’s less than 4% of your life dedicated to working out. Now unless you are truly busy with work where you go home and continue to ravage through your scripts and spreadsheets, or if there are certain health conditions that prevent you from strenuous activities, there is little excuse to not dedicate a small part of your life for the transformation you desire.
Get out there and start making a change today!
I hope that this little project can inspire others to employ data science into their own personal data. And if not, I hope the readers will at least have understood the impact of their diet choices and exercise habits on their weight change as they embark on their own personal journey to their own body transformation.
Once again, you can access my work here.