Microsoft Azure ML -- Big Data Modeling in Azure
- 08 July, 2014 11:02
Microsoft has jumped in with both feet with the release to Preview of a new Microsoft Azure-based tool that helps organizations do Machine Learning and predictive analysis all from a Web console.
I had an opportunity to work with Azure ML (codename "Passau") in an early adopter program, and we collaboratively built machine learning models WAY faster and easier than we've been able to do with traditional tools like SAS, SPSS, and R.
The tool allows for the import of data (or real time HTTP access to live data), and then traditional statistical analysis modeling, calculations, comparative analysis, forecasting, and the like. What made Azure ML a BETTER collaborative tool for us in our analysis work is the ability for us to share data, share models, and provide access to Web-views of our data models (through permissions) WITHOUT ever having to move the project data between users.
Normally we would create a model and then either cut/paste the algorithms and send them to one another, or save the data models and send the ENTIRE data model to others. In many instances, we don't necessarily want to share or give up the data model information to others, and in other instances we would run in to version control issues when data models were being edited and exchange across a workgroup.
With Azure ML, the data, the models, the analytics, EVERYTHING remains up in Microsoft Azure with shared access to experiments and models. AND we were able to control the models where we were able to setup Web views of the results of our models without EVER giving up access to the actual model analytics on the backend.
We were able to modify, revise, review, and change the models several times a day (and in some cases several times an hour), something that would have taken us days to do in the past.
So with this new tool, we set out to leverage the tool in a manner that would take what we were already doing the old fashion way, and put it into a production mode for a very high visibility real world scenario.
Two months later, here's the real live scenario we're using Azure ML for:
Scenario: "Using Microsoft Azure ML to Predict the Outcome of the 2014 U.S. Congressional Elections"
Background: More campaign money will be spent on the United States Congressional Elections in 2014 than has EVER been spent in history, and the ability to assess and predict the outcomes from the elections is of SIGNIFICANT importance. Every single seat in the U.S. House of Representatives is up for elections this Fall, and the Republican party wants to maintain its majority going into the 2016 Presidential elections. The Democratic party would very much want to retake the House and have a majority in both houses of Congress as it goes into the 2016 elections. If the Democrats control both houses of Congress, they have a good chance of seating a Democratic President in 2016. If the Republicans continue to control the House of Representatives and potentially gains seats in the House, they have a better chance of placing a Republican in the White House in 2016 and can undo a lot of what President Obama has put in place over the past 8-years.
Model: This model has gathered data from nationwide Congressional elections from 2008, 2010, and 2012 along with voter registration data from 2008-2014, trends of each districts electorate in previous elections, and as each 2014 Primary Election is held, rolls that real time data into the assessment of candidates for Congress for the Fall 2014 elections. This model uses a Logic Regression Model and leverages predictive analysis of multiple data streams in determining the elections outcome of each district.
Initial Report - California: We generated an initial report on California data that showed Districts 25 and 33 are Statistically Unpredictable, where in the example of District 33, the Republican candidate received more votes in the June/2014 Primary, however given the Incumbent is a Democrat, the bias is toward an incumbent win. California District 33 can go either way, and thus special funding would be prudent by Campaign groups to sway the election their direction.
California District 25 has 2 Republicans vying for the Fall elections (California is one of only a handful of States where 2 members of the same party can run for final elections in the Fall, whereas most states, the Primary election selects 1 Republican and 1 Democrat to run against each other in the Fall). California District 25 has a unique challenge where the incumbent party has 2 candidates running against each other, but with VERY different political views and preferences (Tea Party Conservative vs Moderate Republican) that the House Leadership in Washington will need to determine whether a "win" by either of these individuals is preferred or not in the final outcome.
The model has so far successfully predicted the Primary elections for California, and as of the writing of this article, we have uploaded the data for nationwide districts that represent about 2/3 of the Congressional Seats that Primary elections have already been held.
And then ultimately the focus will be on the Fall General Elections and leveraging this information to do the best predictive job possible...
It has been a GREAT tool to work with in the early adopter program, and something that now that Azure ML is available to the general public as a Preview technology, this is something we will leverage for MANY other scenarios we've been using this for from healthcare, to banking, or insurance, to life sciences, to retail, to social media and more.
Microsoft's website for Azure ML is up on http://azure.microsoft.com/en-us/campaigns/machine-learning/