Tuesday, June 7, 2022

Spurious Correlation? Japanese Potato Production and Calbee Baseball Cards

 



One thing that you sometimes see get mentioned in discussion about the relative scarcity of certain Calbee baseball card sets in Japan is that there was a poor potato harvest in a given year which in turn led Calbee to reduce sales of its baseball cards in that year.  This seems logical since the two are inextricably linked - all Calbee cards have for years only been sold with potato chips, so if there are fewer potatoes to make those chips its not entirely inconceivable that this would result in fewer baseball cards as well.  

This has always intrigued me since it suggests the existence of a potential correlation between two variables - potato and baseball card production - that would be unique to Japan and to the hobby.  And it would add a neat piece of hobby trivia to all the other bits floating around out there.

So I decided to try to statistically prove or disprove the existence of this correlation.

To do so I gathered data on both variables: potatoes and cards.  For data on annual potato production in Japan I consulted the Potato Pro website's Japan page which allows you to search official figures on potato production.  I hand collated the annual data from 1997 to 2013.

For data on Calbee baseball cards I could not locate official figures on annual production from Calbee, so I had to use a proxy.  Yahoo Auctions listings in its baseball card category for Calbee cards are broken down on a year by year basis and may be a useful substitute for official figures in this regard, subject to certain limitations.  Yahoo Auctions is the biggest auction site in Japan, effectively its equivalent to Ebay,  and is one where a large volume of baseball cards are bought and sold. At the time of writing there were 69,997 Calbee cards listed for sale so it represents a relatively large pool of data.  The number of Calbee cards available for sale on Yahoo Auctions in a given year category is not a perfect proxy for the number of cards produced in that year but the numbers are large enough to suggest that differences in the number of cards originally produced would show up as differences in the number available on Yahoo Auctions today.  I thus hand collated the data on the number of listings for cards available each year from 1997 to 2013.

I chose to set the year range from 1997 to 2013 for two reasons.  1997 was chosen as a cut off point since it marked the beginning of the "modern" style of Calbee card, and the availability of cards on Yahoo Auctions is likely to be a better reflection of the number of cards originally produced for cards after that year since the surviving population is less likely to have been affected by events like moms throwing them away, as cards from the 1970s to early 90s were (in Japan the collecting hobby developed a couple of decades later than its American counterpart).  2013 was selected as the upper limit simply because from 2014 onwards Yahoo Auctions stopped breaking listings down by year (for some reason).  One other oddity worth noting is that for the years 2000 and 2001 for reasons that are unclear Yahoo Auctions lumped both years into a single category, so I assigned half to each year in my data. Another limitation to note is that while most of the listings are for single cards, some are for lots, something I haven't taken the time to weed out. 

What do the data tell us?  I tried to present it in the line chart at the top of this post.  The orange line tells you the number of Calbee cards from a given year available on Yahoo Auctions, while the blue line tells you the domestic potato production that year (expressed in 10s of thousands of tons).  

On the card side you can see there is a huge spike in 1999, then a massive drop in 2002 which recovered in 2003 and 2004, since when there has been a general declining trend though marked by fluctuations from year to year.

On the potato side  there has been a decline in potato production between 1997 (3,390,000 tons) and 2013 (2,400,000 tons), though there is a fair bit of fluctuation year on year in there too (less pronounced however).  

When I run a simple linear regression analysis with the baseball cards set as the dependent variable and potato production as the independent, the results suggest there is no significant correlation between the two (R squared = 0.050602).  In other words, potato production has no effect on baseball card availability.

After running that though I realized that there was one factor which I had to adjust for.  In the years 2007 and 2009-2013 Calbee distributed their cards two per bag.  In the other years, they only distributed one per bag.  This would suggest that the "two cards per bag" years were being over-represented in the Yahoo Auctions data since every two cards would represent one bag of potato chips in those years.  To correct for this, I divided the number of Yahoo Auctions listings for those years in half.

Running the same regression using that data, the R-squared jumps to 0.328352.  That is still low - especially given the small sample size involved (just 16 years of data) - and suggests that a lot of other factors which this simple two-variable model isn't capturing are more important than potato production in determining how many Calbee cards get made. But its at least big enough that the relevance of potato production might be a bit more than background noise and could have some effect on Calbee card availability.

From a statistical point of view I haven't exactly put this question to a very rigorous analysis here, but I think it was kind of an interesting exercise.  In terms of its limitations, if you look at the data from individual years its quite easy  to see that potato production isn't always a major determinant of Calbee baseball card production in a given year. 2002 illustrates this point well - as you can see from the chart there was a huge drop in baseball card production that year, but at the same time potato production reached the second highest level in the data set (tied with 1998).  This was an outlier though - 2002 was the year Japan co-hosted the World Cup and the greater interest in soccer that year reduced demand for baseball cards.  For other years its worth noting that a lot of the correlation comes from the period between 2004 and 2013 in which both baseball card availability and potato production slowly declined in tandem.  Its entirely possible that this is pure coincidence and the two have nothing to do with each other. Or not, who knows?  We need more data on other variables to try to untangle this mess.   

6 comments:

  1. Don't forget that Calbee has plenty of potato products which don't come with cards. In Singapore my mother-in-law likes their Jagabees, which sadly have no cards. (Only once have I found Calbee's baseball product there.)

    ReplyDelete
    Replies
    1. Thanks that is a good point to mention - they have a ton of potato products in Japan too (I've seen them in Singapore when I've visited there too). In a bad harvest year they might just cut production of non-baseball products and shift the potatoes over to the baseball card ones.

      Delete
  2. Interesting idea but I don't know if there's enough data from Calbee to establish a relationship. I'm not convinced that YJA listings for Calbee cards is an accurate representation of the number of Calbee cards produced in those years. Calbee only issued only two series in 2001 and 2002 (after four series in 1999 and three in 2000) which might account for the drop in listings but all that means is they issued fewer unique cards - maybe they issued more copies of each card so the total production of cards was the same as before. Another factor would be non-baseball sets - did Calbee use some of their "potato capital" on soccer cards instead of baseball cards?

    The other thing I'd mention is that I think any correlation between potato and card production is probably a year apart - just based on what happened in 2016 and 2017. The Hokkaido potato crop in 2016 was disrupted by a series of typhoons which resulted in Calbee cutting back on a number of products in 2017 - including only issuing two Series of baseball cards instead of the usual three.

    ReplyDelete
    Replies
    1. Yes 16 years of data isn't really enough to draw reliable conclusions from (I had to heavily hedge pretty much everything I said in the post for that and other reasons!).

      Its a good point that YJA listings aren't a very good indicator of card production. The classic response to that criticism is that they do have the advantage of being the only data that exists (on that point), so I had to make do.

      The switch from baseball to non-baseball I did indirectly mention (with respect to 2002 when Japan co-hosted the World Cup, leading to way less interest in baseball that year).

      That is a good point about it being more accurate to pair the year after rather than the same year. There are other factors too (like whether they can substitute imported potatoes in years when the domestic harvest is weak, etc). In all honesty I just don't know enough about potatoes to speak intelligently about that end of it.

      As I said at the end, I just don't have enough data to draw any solid conclusions about this question.

      Delete
  3. Not gonna lie, you completely lost me, but it does look like a lot of work went into this, so kudos to you for that, as a lot of folks don't bring this kind of research to their posts.

    ReplyDelete
    Replies
    1. Thanks. To be honest this only took about 10 minutes working with an Excel sheet, and anyone familiar with data analysis will tear what I wrote apart because I really didn't put the work in to try to find a (statistically) serious answer to the question!

      Delete