Dear Reader (a singular as my readership most likely around one person), I have decided to make a leap into sports betting. In particular, I am going to bet one every single game of the group stage in the European Championship (Euro 2016), which is starting on Friday.
But, my loyal reader, if you know me – which I do suspect you do, you are probably either my mum or more likely my partner – you will know one thing. You will know that I know nothing – about football. Sure I know that the game is just over 90 minutes, there is one ball, a guy with a whistle and a set of coloured cards. There are two goals and each team starts with 11 players on the field.*
Now, since I want to bet on a game I know nothing about, and since I like my money more in my pocket than in the bookmakers’, I have a problem. I have no idea what position in the games I should take. Should I always bet on a team which comes earlier in alphabet? Or should I bet on the team that I think has the nicer shirt? I don’t know what you think of these betting strategies, but to me they sound like a personal financial catastrophe.
It did not take me long to realise that there were some superior methods to the colour-of-the-shirt or the alphabetic-order positions. And given that I did not want to learn too much about the game that I only like when Iceland or FC Köln is playing, I decided I could one of two things:
- I could read expert opinions or tune in for radio and TV discussions about the teams and how they compare. Using such qualitative information I could then make up my mind about every single game and bet on the outcome I think is most likely; or
- I could collect some data, design and populate a mathematical model in order to generate a set of probabilities that could be used in combination with a concept from economics to try maximising the return from my investment.**
So, I designed a mathematical model and equipped it with data
To be perfectly honest, the egg actually outdates the chicken. I never really considered investing too much time into understanding the game. I knew that there was this index of relative qualities for football teams out there, which could be a key ingredient for a model, and so I was always going to try and build a model. The index is better known by its catchy title “Fifa/Coca-Cola (Man’s) World Ranking [index]”.
So, using only this index (over three years) in combination with the results of the games in the past two international tournaments I am going to try and beat the bookie – or at least not lose too much money to her.
Figure 1: Fifa/Coca-Cola index (Euro 2016 teams only)***
Note now my dear reader, you might be thinking to yourself “but the Index methodology is a bit of a joke”. I must admit, I do sort of agree with you. The index is not a particularly sophisticated thing in terms of methodology, nor is it particularly clever. But what do you expect from the combo of a sugary soft drink manufacturer and a group of old men mostly interested in enriching themselves. For me, unfortunately, this is the best quantitative information about the teams’ qualities that I could find (considering my time constraints).
In any case, since this index is actually based on games of the past, it will contain 10^6 times more information about the relative quality of the teams than my football memory. So, in conclusion, I am better off with the index than with nothing.
Also, the index is not on its own. I spice up my model with additional information about the “expected exhaustion level” of each team in each game.
What does the Eikonomics Euro 2016 model do and how does it work?
Well, this is where it all gets a bit complicated. And dear reader, if you cannot be bothered with reading loads of geeky stuff about my modelling and the trickiness of predicting football games, then I recommend you just go straight down to the section “getting to the point”. (Or, if you are eager to take your brain for a little jog first, jump to “My financial affairs post Euro 2016?”)
Now, if you are still reading, I would like to ensure that you will finish reading my post and, therefore, I will keep the methodology description of my overly-complicated-probably-not-worth-the-effort approach short. In few bullet points, this is what I did:
- Firstly, I converted the scores of the index (in three different years) so that all teams received a score relative to the highest ranked team on the global index. The highest ranked team, Argentina, received a score of 100%. A team with half the score of Argentina received a score of 50% and a team with a score of 0 received 0%.
- Secondly, I collected data on the game outcomes from two international tournaments (World Cup 2014 and Copa América 2015) and calculated the goal difference of every single game in the tournament;
- Thirdly, I wrote a program that gave me an estimate of the most likely goal difference of a match between any two teams based only on the Fifa/Coca-Cola index.^
After all this effort, I came to the conclusion that the best (or perhaps the least worst, or the least boring) model I could come up with was the following predictive model:
- If the difference in the scores of the two teams is less than 25%, the model will predict a tie.
- If the difference in the scores of the two teams is 25% or more, the model will predict that the better team will win by one more goals.
But, there is more. My model also takes into account one other factor and it is fairly complicated to explain how this factor enters the model numerically. The numeric interpretation is also not very interesting, so instead I will explain the idea and how it impacts the final prediction:
- My idea was that for harder games – games played against a team with a high Fifa/Coca-cola score – I would expect the worse team to play harder and put more effort in than the better team. This should lead to an increased risk of injury for the worse and harder working team, exhaust more of their energy and make them less fit in the next game. For the better team, on the other hand, the opposite should be true (easy game, use substitutes and rest you good players).
- Therefore, my model also takes into account the sequence of games. As an example of how this works, think of a game between two equally good teams. If one of those teams had played a few really hard games before whereas the other team would have had an easy ride, then the model would slightly favour the team with the easy ride (if this is unclear then I recommend the footnote: ****).
So, in summary, my model predicts the outcome of each game at the group stage based on the two teams’ relative rankings and how difficult the tournament has been for them historically (should have perhaps just said that. Sorry!).
Getting to the point
Hello reader, welcome back (assuming you are sensible and skipped all the modelling stuff above)! This is the section that you might be interested in. My actual predictions and how much money my model tells me I am going to win from this venture.
My model final predictions of the group stages
My model predictions on individual game outcomes can probably be best described by the 1997 Simpson parody:“…fast-kickin’, low scorin’. And ties? „. Yes. It is full of ties and wins with one goal margin. While that might sound unrealistic to some, I am not giving up on my model. After all, it is designed to tell me the probability of a win, a loss and a tie (and make me money), not to tell me accurately the final position.
Fine, you may say, but who goes through to the knock-out stage in your predictions? For your entertainment, I include my predictions of the final group rankings below. As you will see, the model is at its wit’s end with this task thanks to all the ties and similar goal differences, so a dice has to help out in some cases to predict the runner up (see *).
*Table footnote: Note that numbers marked with star model is undecided where to place which team so the final order of star marked is arranged by random chance.
Finally, just to make it absolutely clear, the predictions only represent my model’s view and not mine. After all, I am certain that my fellow Icelanders will prove it wrong and push all other teams in their group down one place and place themselves where my model sees Portugal.
My financial affairs post Euro 2016?
Now, in order to get here I had to rely on the concept of expected value. I will not go into much detail, but if you are interested you can read footnote: ^^. In any case, this concept dictates what bets I think are “cheap” bets and worth buying. To give you an example of how this works, below is an explanation of how I came to place both a “Romania wins” and an “it’s a draw” bet on the first game between France and Romania. First, below are the probabilities from my model:
- 43% chance of a draw
- 27% chance that France will win
- 30% chance that Romania will win
But, the bets (on Betfair as of the date when I purchased them) rewarded as follows (for a one pound bet):
- Betting on France winning will pay me a total of £1.29 (profit of £0.29)
- Betting on a draw will pay a total of £5.00 (profit of £4.00)
- Betting on Romania will pay a total of £13.00 (profit of £12.00)
Since the expected value is the product of the probability and the payout less the cost of the bet, I find that my expected value from a draw ([43% x £5.00] -£1.00) is greater than zero (£1.15) and so is my expected value from Romania winning (£2.90), but my expected value from the bet on France winning is negative (-£0.65). Therefore, I would buy both the Romania to win and the draw as they have a positive expected value, i.e. I expect to make money.^^ Using this method, I have calculated that I expect to be around £20.00 richer at the end of the tournament by placing all my bets according to my model’s predictions.
Figure 2: At the end of the tournament, I expect to be £20.00 richer
Now I need to warn you a tiny bit. Since the concept of expected value is indeed theoretical and not actually the sum of the net returns of the bets that I actually make, there is a great degree of uncertainty in how, and if indeed, I will end up with £20.00 profits. Also, even discounting the fact that I am most likely wrong in my probability estimates and they are indeed all correct, then a simple simulation can give a range of different outcomes above and below £20.00 (see chart below).
Figure 3: A simple simulation of what my final profit could be
Just to clarify what the chart above shows. If I lived in 20 identical parallel universes all playing the same tournament this summer and I was to place my bets 20 times, how much I could expect to take home in profit at the end varies from universe to universe (see bars in chart above).
And as you can tell, in one scenario I am walking to the bank with £45, but in another I am only taking home around £5.00. But, on average across all 20 universes I am going home with an amount equal to my expected value, £20.00.
Ok, I (expect) to be rich. But what is next for the reader?
Yes loyal reader you are in luck. Over the tournament, I am going to share with you, how my model is doing, how my funds are doing (see annex for the bets I have made) and what the model thinks is going to happen in the next games.
I hope you have enjoyed this (somewhat), and see you on Friday.
Below is the full data on the probabilities produced by my model, the pay-out from a one pound bet at Betfair and the bet I made on the Betfair website. In Total I made 49 bets on 39 games at the cost of £49 (£1.00 per bet).
* The number of players on the field at kick-off I actually had to Google just to make absolutely sure that they were not 12.
** It would probably be fairer to claim cost minimisation in my case, but profit maximisation does indeed require cost minimisation at fixed outputs.
*** note that I extracted the data few days ago and the index has slightly changed since, but not materially such that it should impact my predictions.
**** if Italy and Switzerland (almost equally ranked, 63% and 64%) are playing, and Italy had just competed against Albania (lowest ranked team, 41%) dew days before and Switzerland had just finished playing Belgium (highest ranked, 89%), then to predicted goal difference would be: GD (64% – 64%) * 2.6 + [ (64% – 89%) – (63% – 41%) ] * 0.3 = -0.12. in other words, the model would predict that Italy where to win with 0.12 points (or indeed draw).
^ The regression equation estimated could roughly be described as the following:
● The goal difference between team i and j in today’s game (t) is a function of the ranking difference of the two teams and the difference in the two teams cumulative ranking difference in previous games. Yes that is a mouthful and roughly this could be noted in the following equation (Warning, high risk of some notational screw ups as I am constrained by life and time).
● (Gi,t-Gj,t) = α0+ α1(Fi,t-Fj,t) +α2t=1 t-1(Fia,t-Fjk,t)+ εij
where i ≠k, and j ≠ a
NOTE: If you are now going all “Cointegration, Endogeneity, Multicollinearity”, then mind you, I ran lots of regressions (including an IV one), and this turned out to be the most fun and exciting specification for the purpose.
^^ As an example: If I were to throw a coin (50/50 chance of either side coming up) 100 times and would offer you 60c every time heads came up. But in return, every time tails came up you would have to pay me 40c then your expected value from the deal would be (Ev = 100 * [50% * 60c – 50% * 40c] = €30 – €20 = €10. For me on the other hand the expected value of the deal would be a negative €10.