Data-Driven Engine
Jedox In-Memory DB in versions 7.1 and 2018.1+ use Data-Driven Engine (DDE) architecture more extensively than previous versions. It has two built-in versions of DDE: one for CPU, and one for GPU, which supports slightly fewer scenarios. DDEs don't understand model metadata like dimensions or rules, and therefore they use a calculation planner to translate metadata into a plan.
Calculation planner
The calculation planner is a component that analyzes queries (such as Excel PALO.DATA / DATAC / DATAV formulas, DFILTER, Jedox View) and OLAP metadata (such as dimensions, cubes, and rules) and builds a calculation plan.
The calculation plan is a hierarchical graph of objects (nodes) that are later instantiated by the DDE to process source data stored in data cubes in a uniform way. Each node has a simple specific function but can transform a large amount of data effectively. For optimal performance, it is important that the amount of data flowing between these nodes is not extremely large (effect of data sparsity).
While building the plan, the calculation planner does some optimization to reduce DDE workload. For example, it replaces a node doing the addition of two constants and replaces it with another constant or replaces aggregation of one billion constant rule-calculated cells with an aggregation of a single constant using multiplication. It also analyzes conditions of IF expressions and, in the that case a constant condition is used, the calculation planner uses only true or false expressions as an IF result.
The Jedox In-Memory DB then decides whether the plan should be processed by the CPU or GPU engine based on estimated execution time for each of them. Processing of a single calculation plan can also be distributed between the CPU and GPU engine.
Below is an example of a calculation plan for two rules:
['EUR'] = ['LC'] * PALO.DATA("","XRATE","LC","EUR",!'Date')
['Revenue'] = ['Price'] * ['Units']
Performance factors
Processing of each plan depends mainly on two factors:
- Plan complexity (the number of plan nodes): based on the query (single cell or whole cube, cell row, or multidimensional slice), number of rules and consolidations involved in query evaluation, and level of sub-query fragmentation (caused by recursion, complex IF expressions, and PALO.DATA formulas using mapping).
- Amount of data flowing between plan nodes: based on values loaded into cubes (primary data) and the types of plan nodes being generated (secondary data)
Amount of processed data – primary and secondary
Primary data are those loaded into data cubes using Integrator cube load or entered manually, e.g. during the planning process. Secondary data are those produced by plan nodes. The amount of secondary data can be several orders of magnitude higher and usually depends on structure sizes, i.e., the number of dimensions and their sizes.
To prevent the creation of large plans, the calculation planner won't create plans for recursive/chained rules when the current plan's size reaches 64 nodes. These unexpanded nodes are processed ad hoc. This 64-node ceiling limits GPU usage and will be changed in future versions.
Value types
DDE implementation for CPU operates with these data types:
- Null
- Numeric
- String
- Errors
For GPU, only null and numeric nodes for primary and secondary data are supported.
Null
Usually cheap to process. Primarily, null represents an empty cell. Secondarily, it can be the product of
- a consolidated cell with no source data
- the result of multiplication/division when one of the operands is null
- rule keyword null
- a result of rule comparison operators that is not true; e.g. 1 > 2 (== null)
Numeric
Primary numeric data can come from base cells or from rule literals like 3.14159, -1 or 0.0. Secondary numeric data can be produced by math operations, consolidations, or functions like ['A']+['B'], LEN("Hello")
.
String
Primary data or string literals (e.g. Not a Value
, ""
).
Secondary string data comes as a function of results or rule parameter values: CONCATENATE("Hello","World"), !'Year'
.
String values are typically used in attribute cubes and rights cubes. String values produced by rules are not cached in the Jedox In-Memory DB. This can have a negative impact on performance when reports that use such values are repeatedly refreshed, or when Integrator jobs repeatedly extract such values.
Errors
Error values usually come from rule functions when they are called with invalid parameters. Errors typically occur when using invalid element names in recursive rules of PALO.DATA functions.
Updated June 5, 2023