I recently discovered the show Numb3rs, which somehow ran 5 seasons on primetime TV without me noticing, and got caught up in the spirit of applying only-vaguely-related mathematical processes to an esoteric problem. My task was deciding which lottery numbers to play this week, and I decided to try a strategy I had learned for playing roulette, which has a similar “guess which random number will appear” challenge. The strategy utilized Standard Deviation which, as the PDF explained, involved keeping track of which numbers appear over a certain amount of time, and betting on the numbers that have appeared too few times.
The real challenge here was getting a usable list of which numbers were played when. The Florida Lotto provides a handy-dandy HTML file with drawing results all the way back into the 80’s. However, the data is arranged awkwardly in a number of HTML tables, with three tables on each “page”, and 49 drawings in each table:

My target data structure was a 2D array, with the first object being the most recent drawing and the last being the oldest drawing. Each object in the primary array should be an array with 7 indexes, 0 being the drawing date, and 1-6 being the 6 numbers drawn on that date. However, the tables are arranged in a way that the next object is 49 drawings away from the current one. To make my array, I’d need…. AN ALGORITHM!
First off, I need to be able to extract only the information I need for each drawing, and ignore the table and formatting data from the HTML file. I first considered using bulky Search string functions, but since my data is already formalized, I decided on using some Regular Expressions, one for the date, and one for the six numbers:
var datePattern:RegExp = new RegExp(“[0-9][0-9]?\/[0-9][0-9]?\/[0-9][0-9]?”, “g”);
var numPattern:RegExp = new RegExp(“\”>[0-9][0-9]?<\/”, “g”);
My datePattern expression seeks out data with the format “nn/nn/nn“, where n is any number, and my num pattern looks for one or two numbers between less-than/greater-than signs (these are opening/closing tags in the HTML file, and were useful for singling out drawing numbers as opposed to, say, font sizes) “>nn<”. The “g” flag on the RegExp objects allows the objects to keep track of where in the HTML file it found the last matching string, so that the next time the exec() function is called on those expressions, they start from where they left off instead of the beginning of the file. So far, my function looks like so:
var Result:Object;
Result = datePattern.exec(e.target.data); // (e.target.data is the HTML file loaded in via URLLoader)
date = Result[0];
Result = numPattern.exec(e.target.data);
num1 = Number(Result[0].slice(2, -2));
Result = numPattern.exec(e.target.data);
num2 = Number(Result[0].slice(2, -2));
Result = numPattern.exec(e.target.data);
num3 = Number(Result[0].slice(2, -2));
Result = numPattern.exec(e.target.data);
num4 = Number(Result[0].slice(2, -2));
Result = numPattern.exec(e.target.data);
num5 = Number(Result[0].slice(2, -2));
Result = numPattern.exec(e.target.data);
num6 = Number(Result[0].slice(2, -2));
The first time I call the numPattern expression, it finds the very first lotto number in the HTML file, which happens to be the first number of the most recent drawing. Because the “g” flag is set, each subsequent call starts from the last number found and finds the next number. If I call it five more times, it returns the second, third, fourth, fifth, and sixth number of the first drawing. When I call it again, it then finds the first number of the second drawing, and so on. The problem now is putting the drawings in the right order, since the second drawing in the HTML file is actually the 50th drawing chronologically. After some experimentation and compensation, I arrived at the following algorithm:
position = (Math.floor(i/3)+((i%3)*49))+(Math.floor(i/147)*98);
dataArray[position] = new Array(date, num1, num2, num3, num4, num5, num6);
The first half of the position calculation handles intra-table positioning; that is, organizing each 147 numbers in a given table correctly. I do this by using modulo 3 to determine which of the three columns on a page the number comes from. The first column contains positions 0-48, the second column has 49-97, and the third 98-146. The second half of the table came later after I realized tables on other pages were overriding each other. To compensate, I divide the current index position by 147 to figure out which “page” the current index is on. Finally, with the correct chronological position calculated, I can add an array containing the extracted data to the primary array of drawings.
Running this function in a loop in Flash takes about 5 minutes (I had to extend the Script Timeout Limit past 15 seconds to facilitate this) to extract all 1,764 drawings (at the time I wrote the program). Running the array through a tracing function allows me to output the data to an XML file, which I then used in a second program to calculate how long it’s been since each number has been drawn. That function looks like this:
var lottoNumbers:Array = new Array(54);
for(var i=xmlData.drawing.length()-1; i>-1; i += -1) {
for(var j=53;j>0;j+=-1) {
lottoNumbers[j]++;
}
for(var k=5;k>-1;k+=-1) {
lottoNumbers[xmlData.drawing[i].num[k]] = 0;
}
}
for(var index in lottoNumbers) {
trace(index+” => “+lottoNumbers[index]);
}
While solving the problem of extracting, ordering, and running calculations on the lotto data was fun, as it turns out, the six numbers that had gone the longest without being drawn (the six I picked) didn’t turn up at all in the drawing that week. Back to the drawing board I suppose!