When it comes to handling large collections of data in business rules, OpenRules customers frequently use Excel-based DecisionData tables and/or decision tables of the type “BigTable“. It works fine when Excel tables contain thousands or even tens of thousands rows. However, when an Excel file contains hundreds of thousands rows Excel itself becomes much slower to search and requires much more time and memory to be downloaded in OpenRules. In this case we recommend our customers to switch from the Excel to the CSV format.
It is really easy for customers to save their Excel tables as CSV (Comma delimited) format with the extension *.csv. For example, one of our customers had an Excel table with 250,000+ rows presented in Excel format as follows:
The size of the Excel file was more than 6 Mb and even the standard Excel search or sort took an essential time. When they included this table in their OpenRules project, they hit the memory limit. Even if you increase the available memory, the processing time for downloading and building the proper decision model becomes too expensive (not mentioning how inefficient any iteration through this huge collection will become). So, we offered this customer to save their large file such as “MisusedCodes.xlsx” in the CSV file “MisusedCodes.csv”. Here is the CSV fragment of this csv-file:
Then instead of the large Excel table (shown above) they need to keep in the file “MisusedCodes.xlsx” the following table with only two rows:
As you can see we added [MisuusedCode.csv] at the end of the first row – it tells OpenRules that the actual data for this DecisionData table is located in the file “MisusedCodes.csv”. Here the CSV file is assumed to be located in the same folder where the file “MisusedCodes.xlsx”. However, a user can use any valid URL path, e.g. [./csvfiles/MisusedCode.csv] would tell OpenRules that this CSV files is located in the sub-folder “csvfiles” of the folder with the file “MisusedCodes.xlsx”.
After these changes everything continues to work as it used with data in Excel with one huge (!) difference:
OpenRules started to handle large tables with data coming from CSV files almost in no time without any special memory requirements!
Similarly, you can use CSV files with OpenRules decision tables of the type “BigTable”. For example, the BigTable described here using Excel now may be presented using the corresponding CSV file:
Please note that these new CSV capabilities have been implemented as a part of the upcoming 8.4.2 release – you may request a preview version by writing to firstname.lastname@example.org. These capabilities have been implemented using the Apache Commons CSV and support all complex CSV formats described in the RFC 4180 standard.
Note. One may ask: “Why not to keep huge files in a database?” We usually ask the same question and even offer our customers to use a very flexible DB-interface to access a relational database directly from rules using RuleDB. However, some customers have serious reasons to justify their choice. The main reason is that after trying OpenRules, our customers like the simplicity of data organization in OpenRules Excel tables and don’t want to deal with databases at all.