One of the challenges in the implementation of a payment solution is the long-term storage of the data. The primary information to be stored by your payment system is the information about transactions processed. Systems that experience large processing volumes accumulate a lot of transaction data over time.
The transactional data being stored is needed for different purposes. On one hand, people need to run various types of historical reports (historical analysis of longer periods of data). On the other hand, sometimes people need to go back to refund some previously processed transactions, or to review a transaction that was processed, and, for some reason, resulted in a chargeback or an ACH return (in such cases 90 days’ worth of data is required).
While today’s database systems are fairly sophisticated, there are some challenges with transaction data archiving, which have to be faced by database administrators.
What are the challenges of transaction data archiving?
Not all database systems face the same issues, but, collectively, as the data grows, there are several groups of problems that become apparent.
- Size. At the most basic level there is the elementary issue of the database size. As more data is accumulated, the physical size of this data on the hardware drive grows, so you can simply run out of space. Beside that, it takes much longer to backup large data volumes.
- Indexing. Databases maintain additional indices to perform quick searches on various criteria. As the table grows, the indices also grow, and the searches become very memory-hungry, because the entire index has to be loaded into memory.
- Performance. The general performance for insert operation tends to drop as the table grows (because of index updates). Consequently, on some database systems you can experience degradation in the performance of the insert statements in the database as it grows.
- Changes. In some systems, the time it takes to alter a table structure depends on the size of the table. Consequently, the time, that it takes to update the system to a newer version, when a specific large table needs to be updated, also increases.
Effective transaction data archiving techniques
There are various general techniques that exist in database management systems, but we are going to address the ones, which are specific for payment industry due to the nature of the business.
The classical approach, used for solving the abovementioned problems, which is already present in the database systems, is called data partitioning. Some criterion, usually, based on the time interval, is chosen, and the entire table is broken down into several partitions, where each partition only stores a certain portion of data, which represents a specific segment of time, such as a week or a month. When the search is executed, it only scans those partitions where the required data is stored, and not the whole table.
However, in high-volume systems with a very high transactional throughput, even data partitioning cannot guarantee sufficient level of performance. That is why in some payment systems another mechanism is implemented in addition to data partitioning.
Operational versus transaction data archiving tables
In the modern transaction processing, after transactions are settled, no changes are, generally, made to the majority of the transactions. Because of this, it is possible to segment transactional data into two tables. The tables have exactly the same structure. The first table (operational table or authorization table) only stores transactions that have not been settled yet, while the second one (transaction archiving table) stores all the data, i.e. both the transactions that have been settled, as well as pending ones, only waiting to be settled.
When a transaction comes in for authorization, it is immediately inserted into the authorization table. Once it is authorized and the request is consumed by the caller (transaction submitter’s POS system), it is replicated into the archiving table. At the end of the day, when settlement happens, the data is, once again, replicated, as needed, into the archiving table, and, subsequently, removed from the operational table.
Consequently, operational table contains only unsettled transactions at any point in time. It is structured to contain very few indexes, as most of the searches are done on the archiving table. All of the reporting, as well as the entire user interface, are built using the archiving table, while refund\void functionality is implemented by making the initial search in the operational table, and, subsequent search in the archiving table.
The additional advantage of the approach is that, if some maintenance work (restructuring etc) has to be done on the archiving table, the system can still be functioning, so that the authorizations will still come through. However, user interface or reports might remain unavailable for some time period, while maintenance works are done.
Depending on the transactional throughput you are dealing with, you have to make an informed choice of the most suitable effective transaction data archiving technique.