Database reference guide |
HOME |
CBATThe underlying technology for Engine is Column Based Analytical Technology (CBAT). This section looks at what is meant by CBAT and what the implications are of using this approach in data analysis. Database TypesThere are three broad categories of database technology in use today;
OLAPThe basic unit of an OLAP system can be considered to be a cube. OLAP systems are good at reporting aggregate values, provide simple navigation of data and are frequently used for management reporting. They are not intended for comparing or updating rows and have read-only capability . RDBMSThe basic unit of an RDBMS system can be considered to be a row. RDBMS systems are good at storing a wide variety of data and are optimized for inserting, retrieving and updating transactions. They are not designed for the complex processing of data. CBATCBAT is an approach to database technology that focuses on analysis and selection rather than insert, update, delete or management reporting. The main use of CBAT is for comparing, ranking, scoring and grouping subsets of very large populations - for example, when performing segmentation, propensity modeling or profiling. CBAT systems are not transactional systems. Although Engine supports updates and deletes of data, this is done as part of a dedicated load process - typically executed overnight. Because data is stored as columns, it is a simple matter to extend the data set by adding new columns and metrics. This removes the need to predefine the questions that are to be asked of the data and gives greater flexibility than is to be found within an OLAP system. StorageThe basic unit of a CBAT system is a column. Each column is stored in its own data files, allowing data to be efficiently loaded and analyzed on a column by column basis. Column based storage offers instant advantages when used in a static analysis environment. Whereas an RDBMS system must access entire rows of data in order to answer a query - even attributes that are not part of the query - a column based system requires only the columns needed to answer the active query to be retrieved and stored in memory. The subsequent reduction in disk activity and memory requirements provides a performance increase that can be measured in orders of magnitude. It becomes feasible to analyze whole data sets rather than samples. The Column Based storage mechanism does not alter the relational nature of the data. Data can still be queried and viewed as if it were stored in an RDMBS system. Engine optimises the storage based on data types and cardinality (number of discrete values), and it is important to note that this storage is internally defined and are not configured by the administrator, who simply need only select the correct data type. IndexingA powerful advantage of CBAT technology is that it allows columns to be individually indexed. Engine provides many levels of indexing for each column. The index used depends on the process being run and is always that which is most suited to the task at hand. This automatic creation and use of additional indexes is referred to as "over-indexing" and is a key contributor to Engine's excellent performance when executing analytical tasks. FeaturesThe following are the key features of a CBAT system: Fast Comparison on Large Volumes of DataCBAT systems allow for rapid query and response times. This meets several needs of the data warehouse, most noticeably the need for "exploration" in any large scale decision support solution. Nested (Recursive) Queries with Intermediate StorageThe ability to break queries down into small pieces and store them intermediately allows the user to reuse elements of queries in other analyses. New columns can be easily added to the system allowing for preparation of data modeling and mining as well as scoring of the database. Intersecting Set OperationsSupport for set operations allows the user to work with data at the most granular level. This removes the need to summarize the data before hand and makes CBAT systems ideal for use in segmentation, profiling and campaign management. Time Based Data DerivationIndividual Key Performance Indicators (KPIs) based on individual events - e.g. date of purchase, visit to web site, cancellation of service… - are possible. This allows metrics such as purchase cycle and depth of repeat to be calculated. Fast Data Load, Indexing and OutputThe ability to rapidly load and index data makes CBAT tools extremely useful in the data hygiene and rapid prototyping stages. The ability to efficiently output data, e.g. list of customers, makes closed loop marketing and system integration easier to accomplish. VisualizationBy providing another way to work with the data and by allowing queries and other forms of analysis to be executed rapidly, visualization techniques can be employed in order to improve analysis and system design. |
Online & Instructor-Led Courses | Training Videos | Webinar Recordings | ![]() |
|
![]() |
© Alterian. All Rights Reserved. | Privacy Policy | Legal Notice | ![]() ![]() ![]() |