Power BI incorporated a nice preview visual in its February 2019 release, namely the Key Influencers Visual containing Artificial Intelligence. This visual can predict buying behavior of customers, amongst others. You will only need Power BI Desktop, which is free, and a data set. Can we now say a definite goodbye to the Data Mining functionality in Microsoft SQL Server Analysis Services? As this became “deprecated” in SQL Server 2017 and will not be available in follow up versions. Next to that, will Azure Machine Learning Studio also become obsolete due to Power BI? Let’s compare the lot.
Key Influencers Visual in Power BI
The data set used in the below examples is related to a Targeted Mailing Campaign, and is available in the Basic Data Mining Tutorial of Microsoft: https://docs.microsoft.com/en-us/sql/tutorials/basic-data-mining-tutorial?view=sql-server-2014
To use the Key Influencers Visual, you must select this visual in the Preview Features section of the Power BI Desktop Options. Below icon will then be displayed in the Visuals pane. This can be done from February 2019 update of Power BI Desktop onwards, until it is incorporated as a common visual.
The data set of the Targeted Mailing Campaign contains customer attributes and an indicator if a bike has been bought in the past (BikeBuyer). BikeBuyer needs to be put in the “Analyze” segment of the Key Influencers Visual and the customer attributes to your liking in “Explain by”.
By doing so, Power BI estimate the influence of the selected customer attributes on buying behavior. The result is shown directly in the visual. In this case, a “Bachelors” degree seems to be most influential on purchasing a bike (“What influences BikeBuyer to be 1”).
To make a proper selection of targeted customers, it is important to evaluate on more than one customer attribute. In the Key Influencers Visual you can view co called segments, which are combinations of attribute values scattered across multiple attributes).
The percentage and population size are shown, and you can zoom into more details. So not only the result is shown, but also the way how Power BI determined these results. It is also possible to evaluate if it is sensible to split a segment by adding other attributes.
SQL Server Analysis Services Data Mining
The Key Influencers Visual in Power BI does explain how it got to the results, however, you can only tweak it in a very limited way. How different that is in Data Mining of SQL Server Analysis Services. In that tool, you can apply various Mining Models like Decision Tree, Naive Bayes and Clustering. Many settings are available per model and you can create sub selections of data to branch models and results. Analysis Services Data Mining can also determine which model is most appropriate for the given case. For the Targeted Mailing Campaign data set, the so-called Lift Chart shows it is in favor of the Decision Tree model.
You can wander through the models, and narrow or expand your view as shown below for the specific Decision Tree. Color intensity shows the varying influence and the levels and histograms and helps navigating through the Decision Tree.
The results of each Data Mining run can be stored, for instance in a database, for further usage or reference. You can have models tested and trained to further enhance the accuracy. Speaking of accuracy, let’s compare the results of the Key Influencers Visual with those of the Data Mining run.
I have stored the results of in Data Mining Decision Tree run in a database and made a second query in Power BI to retrieve them. The non-buyer records are missing here, which is logical for use in a Targeted Marketing Campaign. Afterwards, the results are filtered on the prominent attributes found in the segment of the Key Influencers Visual. Once for the best Buyer Segment and once for the best Non-Buyer segment. No Data Mining records are shown any more when the “Non-Buyer filter” is applied, which makes sense. The other way around, with the “Buyer filter” applied a reasonable amount of records is still shown, leading to an average probability of 71%. This leads us to believe that the results in the Key Influencers Visual is not extremely different from those found in the Data Mining results.
The Key Influencers Visual is not the only alternative which Microsoft has to offer. In SQL Server 2017 we find Machine Learning Services to be added (Python & R) and this will also be incorporate step by step into Azure SQL Database. The standalone Microsoft R Server has been rebranded into Machine Learning Server due to the addition of Python capabilities. Next to the Key Influencers Visual, Power BI can also make use of Azure Cognitive Services and custom models made in Azure Machine Learning Studio. Azure Machine Learning on its own already took things to a new level because it introduced possibilities like image recognition and sentiment text analysis, next to common statistical operations. You can see below that Azure Machine Learning Studio offers a decent interface, in which you can make use of comparable components as found in Data Mining of Analysis Services.
The Power BI Key Influencers Visual results seem to be reasonably okay, so it performs nicely. But you can only tweak it in a limited way and therefore is unsuitable for your thorough predictive accomplishments. An end user may feel comfortable to gain a certain impression by using the Key Influencers Visual. However, “a dash of Artificial Intelligence” in and end user tool does not replace applying statistical models. Even though it is ok to use the visual for gaining first impression and helping you to determine which attributes are sensible to use from a business standpoint, my advice it to always discuss with statisticians or Data Scientists to create proper predictions. Many tools are available to support that follow up process.
Azure Cognitive Services contains a few models which can be used in Power BI amongst others. Azure Machine Learning can do a lot more for you, but then Azure needs to fit into your cloud strategy, if you even already have one. The results of Azure Machine Learning cannot be easily stored in a database and for now a connection cannot be made to an Analysis Services cube. On the up side, neat integration into business processes appears to be accomplishable.
When opting for on premise only, the alternative for Analysis Services Data Mining would be SQL Server Machine Learning Services or Machine Learning Server. These have many advantages for Python and R developers and brings ML to the data rather than Data to ML. However, it lacks a nice interface and is quite tech-heavy. My overall conclusion is that Data Mining as a concept is alive and kicking more than ever, also at Microsoft. Cloud wise, the gap that Analysis Services Data Mining will leave is nearly filled and for some functions even more than that. In an on premise only environment, however, some functionality will be dearly missed from SQL Server 2019 onwards. The question is how much this will hurt when everybody goes for cloud, in one way or another...