<?xml version='1.0' encoding='UTF-8'?><rss xmlns:atom='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' version='2.0'><channel><atom:id>tag:blogger.com,1999:blog-6580893925904612963</atom:id><lastBuildDate>Tue, 13 Oct 2009 20:01:41 +0000</lastBuildDate><title>Brisink</title><description></description><link>http://brisink.blogspot.com/</link><managingEditor>noreply@blogger.com (Jerome)</managingEditor><generator>Blogger</generator><openSearch:totalResults>2</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6580893925904612963.post-5951174401046239975</guid><pubDate>Tue, 26 May 2009 18:14:00 +0000</pubDate><atom:updated>2009-06-11T02:26:16.356-07:00</atom:updated><title>Soccer Analytics, very special games</title><description>In an &lt;a href="http://brisink.blogspot.com/2009/05/blog-post.html"&gt;earlier blogpost&lt;/a&gt;, I spent some time on constructing a model to find out if &lt;a href="http://brisink.blogspot.com/2009/05/blog-post.html"&gt;a soccer game has been decided yet&lt;/a&gt; based on some input variables like the elapsed time and goal difference. An interesting comment I got was about the Model validity area. Some games are indeed not very common and the model might not be valid for those games. For example, one of the teams could score 2 goals in the first 15 minutes. We might consider those games as outliers.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This blogpost will shine some light on those outliers.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div align="left"&gt;First, we used the following very conservative rule to define if there is enough data to build a valid model: ((N Rows &gt; 50) or (N Rows &gt; 10 &amp;amp; probability == 1)). This rule is applied for each combination of "Goal Difference" and "Time". &lt;/div&gt;&lt;div style="TEXT-ALIGN: center" align="center"&gt;&lt;img id="BLOGGER_PHOTO_ID_5340590833936551906" style="DISPLAY: block; MARGIN: 0px auto 10px; WIDTH: 400px; HEIGHT: 272px; TEXT-ALIGN: center" alt="" src="http://3.bp.blogspot.com/_xl9DnCB9Cco/Sh2XEVt-z-I/AAAAAAAAAE4/LO5b4yrsvOY/s400/Outlier2+copie.png" border="0" /&gt;&lt;span style="font-size:85%;"&gt;Graph1a: Is there enough data? for different combinations of "Goal Difference" and "Time". &lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;a href="http://4.bp.blogspot.com/_xl9DnCB9Cco/Sh2TG8utFSI/AAAAAAAAAEw/qrQIuMkGOz0/s1600-h/Outlier+copie.png"&gt;&lt;img id="BLOGGER_PHOTO_ID_5340586480721794338" style="DISPLAY: block; MARGIN: 0px auto 10px; WIDTH: 400px; HEIGHT: 282px; TEXT-ALIGN: center" alt="" src="http://4.bp.blogspot.com/_xl9DnCB9Cco/Sh2TG8utFSI/AAAAAAAAAEw/qrQIuMkGOz0/s400/Outlier+copie.png" border="0" /&gt; &lt;/a&gt;&lt;p align="center"&gt;&lt;span style="font-size:85%;"&gt;Graph1b: Is there enough data? for different combinations of "Goal Difference" and "Time". &lt;/span&gt;&lt;/p&gt;&lt;p align="left"&gt;Graph1a and Graph1b show if we have enough data for each existing combinations of "Goal Difference" and "Time". Without any surprise, we don't have much data about games with high "Goal Difference", especially when the game has just started.&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;&lt;div align="left"&gt;From this rule, we can define for which range of input variables the model will be valid. &lt;/div&gt;&lt;div align="center"&gt;&lt;img id="BLOGGER_PHOTO_ID_5340595525690437234" style="DISPLAY: block; MARGIN: 0px auto 10px; WIDTH: 400px; HEIGHT: 273px; TEXT-ALIGN: center" alt="" src="http://2.bp.blogspot.com/_xl9DnCB9Cco/Sh2bVb3RrnI/AAAAAAAAAFI/cEaAXTjrXPQ/s400/Frontier+copie.png" border="0" /&gt;&lt;span style="font-size:85%;"&gt;Graph 2: Model Validity area&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;Graph 2 shows how we can create the Model Validity area. The blue points are the minimum times for each value of Goal Difference where we have enough data. The red lines are used to delimit the model validity area. The Area under the red lines represents the range of input variables for which the model is valid. The Model Validity area is delimited by linear constraints so that these constraints can be reused in the profiler in SAS JMP.&lt;br /&gt;&lt;br /&gt;This discussion is very theoritical as we won't see many of those games in practice, meaning that the model will be valid most of the time.&lt;br /&gt;Also, one could argue that during the games outside the model validity frontier, one of the teams is really dominated. This means that the probability that the game has already been decided is extremely high. For example, if one team scores 3 goals in the first 10 minutes, we can be pretty sure that this team is really dominating the game and that the game has already been decided.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6580893925904612963-5951174401046239975?l=brisink.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://brisink.blogspot.com/2009/05/soccer-analytics-very-special-games.html</link><author>noreply@blogger.com (Jerome)</author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_xl9DnCB9Cco/Sh2XEVt-z-I/AAAAAAAAAE4/LO5b4yrsvOY/s72-c/Outlier2+copie.png' height='72' width='72'/><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-6580893925904612963.post-5053619379418687550</guid><pubDate>Mon, 11 May 2009 15:58:00 +0000</pubDate><atom:updated>2009-05-26T05:58:23.712-07:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Analytics</category><category domain='http://www.blogger.com/atom/ns#'>Football</category><category domain='http://www.blogger.com/atom/ns#'>Soccer</category><category domain='http://www.blogger.com/atom/ns#'>English</category><title>Soccer Analytics</title><description>&lt;div style="TEXT-ALIGN: justify" align="justify"&gt;&lt;span style="font-size:130%;"&gt;The Question:"Has the game been decided yet? HTGBD"&lt;/span&gt;&lt;/div&gt;&lt;span style="font-size:130%;"&gt;&lt;div style="TEXT-ALIGN: justify" align="left"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="TEXT-ALIGN: justify" align="left"&gt;This is the question that most people constantly ask themselves when they are watching a football game.&lt;br /&gt;This question can take different forms depending on the circumstances.&lt;br /&gt;If you're lucky to support the winning team, you might ask yourself:&lt;br /&gt;"How secure is the lead?"&lt;br /&gt;And for the less fortunate of us:&lt;br /&gt;"Is there still a chance for my team to win?" &lt;/div&gt;&lt;div style="TEXT-ALIGN: justify" align="justify"&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;The Answer: Analytics&lt;/span&gt;&lt;/div&gt;&lt;div style="TEXT-ALIGN: justify" align="justify"&gt;&lt;/div&gt;&lt;iframe style="BORDER-RIGHT: medium none; PADDING-RIGHT: 0px; BORDER-TOP: medium none; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; MARGIN: 12px auto; BORDER-LEFT: medium none; PADDING-TOP: 0px; BORDER-BOTTOM: medium none" border="0" src="http://blogs.sas.com/jmp/uploads/swf/soccer.htm" width="620" height="370"&gt;&lt;br /&gt;&lt;br /&gt;  &lt;p&gt;Your browser does not support iframes.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;Graph1 : Probability of the game having been decided in function of the elapsed time and the number of goals difference.&lt;/span&gt;&lt;br /&gt;&lt;div style="TEXT-ALIGN: left"&gt;&lt;/div&gt;&lt;div style="TEXT-ALIGN: left"&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="TEXT-ALIGN: left"&gt;Graph1 shows the probablility of the game having been decided in function of the elapsed time and the number of goals difference. It is possible to change the elapsed time and the number of goal difference on the graph by clicking on a different value.&lt;br /&gt;&lt;/div&gt;&lt;div style="TEXT-ALIGN: left"&gt;Some interpretation examples:&lt;/div&gt;&lt;ol&gt;&lt;li&gt;&lt;div style="TEXT-ALIGN: left"&gt;&lt;u&gt;If Time=45 and Goal Difference=0&lt;/u&gt; The game has been going on for 45 minutes and the number of goal difference is 0. There is a 23% probability that the outcome of the game won't change. Here, as the teams are even (0 goal difference), this would mean that there is a 23% probability the game will end in a tie.&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div style="TEXT-ALIGN: left"&gt;&lt;u&gt;If Time=45 and Goal Difference=1&lt;/u&gt; The game has been going on for 45 minutes and one of the teams is leading by 1 goal difference, then we have a 60% probability that the outcome of the game won't change. Here, this would mean that the leading team has a 60% probability to win. &lt;/div&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p style="TEXT-ALIGN: left"&gt;&lt;span style="font-size:130%;"&gt;More Details about the Answer&lt;/span&gt;&lt;/span&gt; &lt;/p&gt;&lt;p&gt;The model used above has been built using data from the UK Premier League from 2002 to 2006. The type of model used is a regression model.&lt;/p&gt;The following representations are usefull to understand the underlying data.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p align="center"&gt;&lt;a href="http://3.bp.blogspot.com/_xl9DnCB9Cco/SglG1ar6yBI/AAAAAAAAAEA/dP8c34zDZS4/s1600-h/Soccer2.png"&gt;&lt;img id="BLOGGER_PHOTO_ID_5334873117107603474" style="WIDTH: 400px; CURSOR: hand; HEIGHT: 321px" alt="" src="http://3.bp.blogspot.com/_xl9DnCB9Cco/SglG1ar6yBI/AAAAAAAAAEA/dP8c34zDZS4/s400/Soccer2.png" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;p align="center"&gt;&lt;span style="font-size:85%;"&gt;Graph2: Has the game been decided Vs. Time&lt;/span&gt;&lt;/p&gt;&lt;p align="left"&gt;Graph2 shows the percentage of the games that have been decided in function of the Elapsed Time. i must say that I wasn't surprised by this graph which basically states that the Elapsed time and the HTGBD (Has The Game Been Decided) are directly proportional.&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p align="center"&gt;&lt;a href="http://4.bp.blogspot.com/_xl9DnCB9Cco/Sgl2rvwO8mI/AAAAAAAAAEQ/FaH6B1Z9R-0/s1600-h/Soccer1.png"&gt;&lt;img id="BLOGGER_PHOTO_ID_5334925727522288226" style="WIDTH: 400px; CURSOR: hand; HEIGHT: 291px" alt="" src="http://4.bp.blogspot.com/_xl9DnCB9Cco/Sgl2rvwO8mI/AAAAAAAAAEQ/FaH6B1Z9R-0/s400/Soccer1.png" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;p align="center"&gt;&lt;span style="font-size:85%;"&gt;Graph3: Has the game been decided Vs. Time By Goal Difference&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;/p&gt;&lt;p align="left"&gt;Graph3 shows the percentage of the games that have been decided in function of the Elapsed Time By the number of goal difference. According to this graph, the number of goal difference is an excellent predictor for the HTGBD.&lt;/p&gt;&lt;p align="left"&gt;Additional readings: &lt;/p&gt;&lt;p align="left"&gt;Similar models are available for basketball. Check out &lt;a href="http://www.slate.com/id/2185975/"&gt;Bill James&lt;/a&gt; &amp;amp; &lt;a href="http://blogs.sas.com/jmp/index.php?/authors/16-Jeff-Perkinson"&gt;Jeff Perkinson&lt;/a&gt; if you want to learn more.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6580893925904612963-5053619379418687550?l=brisink.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://brisink.blogspot.com/2009/05/blog-post.html</link><author>noreply@blogger.com (Jerome)</author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_xl9DnCB9Cco/SglG1ar6yBI/AAAAAAAAAEA/dP8c34zDZS4/s72-c/Soccer2.png' height='72' width='72'/><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item></channel></rss>