How I Used a / B Testing to Hack My Children
“I wonder how we can get them to sleep more.” This simple thought, expressed by my wife, not even a question, became a challenge for me. My engineering mind saw this as a problem to be solved, and when a software developer sees a problem, he designs tests. Luckily, I knew the perfect system for testing some ideas in a controlled and measurable environment. And with twins, testing would be even easier. Welcome to parenting, A / B testing style.
This post originally appeared on the Daddy on the Run page .
A / B testing is used all over the Internet. You will likely run into this dozens, if not hundreds of times a day without even realizing it. All big tech companies do this by using it as a tool to test the effectiveness of ideas and measure them. Google is known for testing 41 shades of blue for search results. The designers allegedly couldn’t decide which of the two shades to use, so they tested 41 in total to see which one resulted in more users clicking on the results. Facebook is constantly testing different feed experiences. Amazon even changes its buy buttons and cart layouts quite often. You may notice this if you ever log in from a new computer or see a friend using a site that is slightly different from yours.
A / B testing is used to test one or more “treatments” or experiments on “control” or existing experiences. The metric is measured, usually based on a user action such as a click-through or “conversion”, with a baseline in relation to the control. In the Google example, they can check the likelihood that users will navigate to at least one result with a different shade. After a statistically significant period of time, often a week or two, any experience with a higher score will be selected as the winner and become the new control. This really gets complicated when multiple experiments are run at the same time or when the percentage of users is not evenly distributed. Complex knowledge of statistics is required here. Or use any of the many powerful testing tools available. At Audible and Amazon, we are constantly testing these experiences. This is the best way to see how users actually behave, as often what users say they will do and what they do can be slightly different.
I decided to use this method while testing with boys to see if we can increase the most important metric in the home of anyone with 10-week-old babies, especially twins: bedtime . Using one of the boys as a control and the other as a treatment – not to mention the fact that no one can now describe any part of our lives in terms of control or treatment – I tested several theories about sleep versus control.
Accurate measurements and data tracking are critical in any experiment. Often a success rate is chosen because of the availability of data or the ability to measure. You don’t want to try to measure something that takes longer than changing a test or test input. Fortunately, measuring sleep is almost as easy. When they wake up at night, we just write it down. This is exactly what we have been doing since the day they were born, since the nurses in the hospital instilled this in us. We’ve already looked at a few notebooks, but it’s so easy to track them down. To do this, we even started importing data into a spreadsheet to see the impact more clearly.
We first tested the increase in feed intake just before bed. Instead of the usual four ounces, we tried five, then six. To prevent bias on the part of one child, we alternated who was the test and who was the control, as they seem to be in cycles of good and bad. While one child ate more in the evening, the other stayed four ounces. Result: to no avail. It seems that both babies started to increase their sleep during this period anyway. They both slept for almost the same amount of time. There was one night where heavy feeding was correlated with a record 5.5 hours of sleep, but one data point is irrelevant in this dataset. It was also difficult to continue testing, as anything over five ounces has a high likelihood of spitting out a few minutes after eating.
The next was a secret that was whispered about in the dark corners of parenting blogs on the net, and that was passed from parent to parent, at least in my office. Okay, maybe it’s not a secret, but it took us a while to try. Supposedly, this herb and spice blend, unlike the KFC blend, soothes the stomach from reflux and gas, especially during the night, leading to longer sleep. After a week of testing, we found it did help with reflux, especially regurgitation, and while we didn’t track individual burps or farting, it seemed to reduce them as well. However, this did not affect the duration of sleep. On average, we saw a slight increase – from 20 to 30 minutes, but again, this could be a natural increase due to age.
After water became the new control, we tested additional feedings at bedtime. The boys started doing it naturally on their own anyway, and we tried to prevent it. However, it seemed like a good testing opportunity, so we gave it a try. Many babies will cluster feeds before bedtime, with feeds occurring a short time after the previous one, right before bedtime. We did this feed about 1.5-2 hours after the previous one, compared to 3 hours as usual. In this feeding, we tried 4 ounces versus the 4-5 they usually take during the day. Sometimes they refused to take more than three. Of all the experiments, this seemed to work the best. As a result, we observed an increase in sleep duration by an additional hour, although often not earlier than a few days after the start of the experiment, obviously, this takes time to affect sleep patterns. A good lesson for A / B tests is that sometimes there are several days to correct while people come up with a new treatment and adapt to it. However, it is important to record the results of both the adjustment period and the post adjustment. Apple has been known to neglect the onboarding period when launching several products, especially cards.
Finally, we tested how much longer they stay awake during the day. Our hypothesis was that they would get more tired at night and sleep longer as a result. This may have been a little true, we saw a slight increase in sleep duration, but we did not account for the stress and exhaustion they caused by keeping them awake and unhappy. It also took significantly longer to get them to calm down and sleep at night as they were overworked and fidgety. Testing Lesson: Don’t sacrifice other metrics for a little boost of one.
Many of these tests were unsuccessful. This is largely due to the same size. With a sample like Facebook, tests can be run in small segments and reach statistical significance very quickly. With twins, it is difficult to understand what the real result is, and what is the personality or natural development. For more accurate testing, we may need to increase the sample size. For this, triplets come in handy. Maybe just someone else’s triplets, we are definitely not ready for this!
It also shows the importance of the test-measure-iterate process. While some methods have not shown significant improvements, taken together, they can. When processing is used as a control, when it outperforms the control, small improvements add up. As you keep trying new things quickly and moving on, it’s easy to come up with new ideas to try. You don’t need to move the mountain, just move small handfuls of earth for a long time. With this approach to parenting, boys can also grow steadily. And with luck, our sanity, well-being, and parental life will be the same.
How I Used A / B Testing To Hack My Children | Daddy on the run