Right now, you are sitting on a virtual goldmine of transactional data that could transform your company into a leaner, quicker, and more profitable organisation. Extracting useful information from this data could help you adjust your marketing strategies or streamline your operations, making your company exponentially more competitive.
But if you're like many IT leaders, you are torn between the need to capture useful operational data patterns and the frustration of dealing with statistical mathematical intricacies required to yield any worthwhile results. This is especially true when analysing huge quantities of historical data, such as sales transactions or behaviour records of online customers. If this sounds familiar, I recommend PolyAnalyst 4.1, the latest data mining release from Megaputer Intelligence.
Although there are many comprehensive data mining solutions on the market, such as IBM, Visual Warehouse and Oracle, PolyAnalyst focuses more effectively on data discovery than its competition and provides efficient algorithms. This self-contained data mining product was easy to use, adaptable to multiple business contexts, and affordable, thereby earning a score of Very Good.
Probably one the most impressive characteristics of PolyAnalyst is the sheer number of data mining tasks it can tackle. It provides 11 data exploration capabilities that should cover even the most demanding tasks - from the discovery of simple relations between two fields (how do discounts affect quantities ordered?), to the identification of more complex relationships (which of my products are most often purchased together?).
PolyAnalyst works with Microsoft Windows 95/98/2000 and NT but unfortunately does not offer either Unix or Linux support. You can choose between a single station and a client/server version of the product. Using PolyAnalyst is as simple as importing the data set that you want to analyse (typically a table from a relational database), and launching the data mining function you want to start with.
I installed the single-station version on a dual processor machine with 256MB of RAM running Window NT 4.0 without any problem and began exploring the product.
PolyAnalyst can access relational databases via ODBC, desktop database formats such as Microsoft Access and Excel spreadsheets, and text data. For IBM Visual Warehouse and Oracle Express users, the product offers direct connectivity to those OLAP (online analytical processing) data repositories, allowing convenient access to predefined cubes of data.
The product also offers a comprehensive set of data analysis engines that work from an easy-to-use point-and-click GUI. In the client-server version, the data scans are off-loaded to a server. The product creates HTML reports, extracts formulas from data patterns, and imports and joins data from multiple data structures such as spreadsheets, relational databases and text.
Extracting meaning from your data is easiest if you use a gradual approach, such as first taking an overall picture of your data set and then drilling down to specific details.
For example, if the objective of your data analysis is to optimise your inventory turnaround time, you can begin with a general picture of how goods flow in and out of your warehouses and identify items that are more expensive, overstocked, or have a poor rotation index. Moving from that view, you can then focus on those details that have significant financial relevance and conduct specific assessments on those only.
PolyAnalyst's Summary Statistics function is very helpful for an initial understanding of your data. It groups information according to variance of values for each field and calculates statistics, such as standard deviation, frequency, and mean. Conceptually, this is similar to drafting the layout of your home before initiating a remodeling project: it's a blueprint that will keep further investigation on target.
To test this functionality, I opened an Excel spreadsheet containing a computer inventory with a row for each machine and several columns containing data such as brand, model, operating system, and database software. For your importing ease, a wizard guides the data import step by step and allows you to select the whole data set or a random sample. To reduce clutter you can remove unwanted columns or fields from your data.
Almost immediately I saw my computer inventory listed as a group of icons on the left pane of PolyAnalyst. The product had automatically detected the name of each column from the first row of the spreadsheet and set the correct data format. In addition, when importing spreadsheets, PolyAnalyst gave me the option to simultaneously fire up Excel to verify the file content. This can be a real timesaver when working with numerous files.
To start my overall data analysis, I chose Explore/Summary Statistics from the menu and received a report in HTML format showing a summary view of my inventory. As a bonus, the Summary Statistics report allowed me to create charts according to the value that I selected. For example, I could instantly create several pie charts grouping computers by manufacturer, operating system, or any other field.
The Summary Statistics report is very useful when analysing a new set of data, because it allows you to quickly explore the boundaries of your information. For example, you can summarise historic sales data by the combination of products ordered by each customer, the total amount of sales, or the dates of the purchase activities.
Although it offers useful overviews of data, the Summary Statistics function does not provide information that you can immediately apply to business. This is where the Find Rule engine steps in.
One of the objectives of data mining is to find a mathematical expression that can accurately describe facts and help you analyse data for use in current forecasts and initiatives. This, in the data mining language, is called "finding a rule".
PolyAnalyst shines in this arena, thanks to its unique Find Rule engine, which can express data mining results as a mathematical expression that you can use in your forecasts.
Unlike other data mining tools, PolyAnalyst does not require you to come up with possible formulas. Instead, it automatically formulates hypotheses based on data content and returns the proper equations and estimates of accuracy. Obviously, this will save data analysts time and reduce the risk of selecting an unsatisfactory expression.
Besides its unfortunate Windows-centricity, the only other gripe I have with PolyAnalyst 4.1 is its inability to combine data from different databases into a single set of information for viewing. This can be inconvenient when analysing several groups of information at the same time.
Overall, however, PolyAnalyst's flexibility, ease of use, dynamite data discovery engines, and affordable price make it an appealing solution and more than earn it a score of Very Good. If you want to uncover hidden information that is just sitting in your archives waiting to make you money, I suggest you implement PolyAnalyst posthaste.
THE BOTTOM LINE
Business Case: The capability of this affordable and general-purpose data mining product to independently discover data relations can save significant time and cost. The product addresses multiple business considerations, such as linear regression, what-ifs, and shopping-basket analysis. Companies with limited data mining requirements can tailor the product to their specific needs without paying for unwanted functionalityTechnology Case: The product comes with a friendly and multifaceted GUI and embeds OLE DB for data mining and COM connectivity. The developer version offers the possibility to integrate the product functionality in custom applications. Unfortunately, PolyAnalyst only supports Windows platforms and has difficulty combining information from disparate sourcesPros:--l Adaptable to multiple data analysis scenariosl Competitively pricedl Modularity allows for selection of specific data enginesl Dynamite Find Rule engine expresses data as a business-critical mathematical expressionCons-:l Runs on Windows platform onlyl Limited data structure discoveryPlatf-orms: Windows 95/98/2000 and NT.
Pric-e: $US2300 to $14,900, depending on algorithms chosen; developer kit is $16,000 plus components. More information available from Web site.www.megaputer.com