Watch this short Richard Feynman’s video about Discovering Rules of Chess. He used a chess analogy to explain what we are doing in trying to understand nature: “Imagine that the gods are playing some great game like chess. You don’t know the rules of the game, but you are allowed to look at the board at least from time to time from a little corner, perhaps. And from these observations, you try to figure out what the rules of the game are, what are the rules of pieces moving.”
After watching this video, I decided it could be interesting today to refresh several past projects devoted to the Automatic Rules Discovery based on positive and negative examples. I will describe a few projects in which I and/or some of my colleagues were involved.
My first experience with applying Machine Learning (ML) to Business Rules (BR) discovery started in 2006-2007 when I read the famous Weka book and briefly cooperated with Prof. Ryszard Michalski. I was impressed with the power of openly available ML algorithms: they were clearly described in the literature and their implementations frequently took only 1 or 2 pages of code. I thought about the potential integration of ML algorithms with rule engines.
As a result, in 2007 we added the first version of “Rule Learner” to “OpenRules”, which already was a popular business rules management system. The same year I cooperated with Ian Graham on rules discovery using ML technology which we applied to compression of large decision tables – we presented our results at the Business Rules Forum 2007.
In 2008, OpenRules won the US government contract called “Automating Business Rules Creation Using Machine Learning Models” issued by the National Headquarters Office of Research, Internal Revenue Service (IRS). Within the next few years, OpenRules successfully applied “Rule Learner” to several similar contracts for different IRS divisions. Until today this development remains among very successful applications of the integrated ML+BR approach. Even the official IRS report stated: “We wanted to automatically discover rules that were meaningful to our business analysts and at the same time could be run on Enterprise BRMS. We accomplished all of these objectives. The IRS domain experts reviewed the automatically learned rules and suggested them to be useful, comprehensible, and intuitive.” So, I will briefly describe what was done.
Discovering red-flag rules for catching fraudulent tax returns, IRS, 2008-2012
IRS gave us a large volume of historical data comprised of ~50K generalized US tax returns for multi-year periods with already known positive and negative audit results with all their 300+ attributes. Naturally, we did not know what rules the IRS actually applied to get these results (it was a huge secret). We needed to figure out similar or better business rules by ourselves. The actual objective of this project was to discover business rules capable of identifying suspicious tax returns by analyzing the historical data provided. The automatically discovered rules were supposed to be understood by IRS subject matter experts and executed by a selected rule engine.
We applied machine-learning algorithms for the rule discovery. We tried more than 20 different ML algorithms and selected two of them, RIPPER and C4.5, that produced the most compact, human-understandable, and modifiable business rules. Using the initial version of our “Rule Learner”, we incorporated the machine learning results into the OpenRules business rules management system by automatically transforming the discovered rules into executable OpenRules decision tables in Excel. IRS subject matter experts were able to understand, interpret, and even modify these business rules. As a side effect, it helped them identify gaps between the manually created rules and rules automatically developed by the machine learning system to detect potential previously unnoticed non-compliance.
The new tax return classification results were compared with results previously achieved by manual tax return audit selection. The comparison produced a very positive estimate: if the rules generated by our Rule Learner were applied to the same data instead of the manually elicited rules, the overall saving could be up to 3%, a potentially huge saving. The applied approach demonstrated a generic, yet fully automated integration of the learned rules into a Business Rules Management System.
Of course, being statistical, ML methods discover rules that can produce both accurate and inaccurate results. In our IRS projects, we did a lot to minimize inaccuracy. We tuned the ML algorithms to avoid over-fitting and under-fitting effects. We added a special rules-based Trainer which allowed IRS subject matter experts to split training sets into different categories and filter outliers. Still, the rules could make serious errors in identifying or missing suspicious tax returns. However, in practice, the humanly created rules usually made even more mistakes reflected in historical data. The automatically generated rules were especially good at identifying tax returns with small or potentially negative audit results. Thus, it was a very beneficial business case for the ML+BR approach. More importantly, the implemented system proved that automatically discovered rules could improve the agency’s capabilities to keep up with a constantly changing environment.
Self-Learning Decision Models, Sep-2020
I made this presentation at the DecisionCAMP in September 2020. It described the latest version of OpenRules “Rule Learner” including its cloud-based Graphical Interface for discovering business rules from custom historical data sets. Without forcing business analysts to learn statistics or programming, it allows them to upload simple Excel tables with their training instances and a business glossary, select an ML method, and receive downloadable automatically generated business rules. The presentation focused on practical aspects of business rules generation by developing ever-learning decision-making applications. The proper architecture is described here. Video, Slides
Below I will describe the related development experience of some of my colleagues.
Learning Executable Constraint Models from Positive and Negative Examples by Helmut Simonis, Dec-2021
Helmus’s presentation (watch the video) is devoted to how examples of positive and negative solutions to combinatorial problems can be used to learn constraints built into this problem. The constraints learned are either structural, coming from the perceived problem structure of the model, or based on input data that provide parameters and index sets. The learning process uses information about global constraints from the Global Constraint Catalog and defines a set of constraint patterns to discover conjunctions of similar constraints over common data structures.
Helmut starts with the discovery of Sudoku rules (constraints) based on multiple good and bad solutions to various Sudoku puzzles. Then he introduces a Constraint Acquisition Tool (CAT) capable of generating constraint models in MiniZinc, which can be run by a variety of back-end solvers. The CAT also produces a human-readable description of the constraint model which allows a user to understand the generated model. You may find more details in this video.
Learning how to play the Nim Game by Bob Moore, Jan-2020
Dr. Bob Moore accepted our DMCommunity.org Challenge about “Nim Game” and provided an interesting machine-learning solution. He didn’t want to use any off-the-shelf machine-learning tool and applied a self-learning approach by playing games in which both players use the latest learned strategies that are being constantly improved. In a way, Bob’s approach is similar to AlphaGo when a program becomes a teacher for itself. His solution is written in Python without any 3rd party ML tools and leaves space for experiments and possible improvements. Link Video
Current Development
This month we published a new release 10.5.0 of our Rule Learner that made it even easier for subject matter experts (without programming or knowledge of statistics) to discover classification rules in their historical data. Being incorporated into the modern OpenRules Decision Intelligence Platform, this tool utilizes well-known ML algorithms to solve real-world problems.
In today’s environment, we prefer not to use the overloaded term “AI”. Still, we are working on expanding the Rule Learner toward the automatic discovery and maintenance of the operational decision models with a human in the loop. Stay tuned!
