ID | IEP-62 |
Author | |
Sponsor | |
Created |
|
Status | COMPLETED |
Currently, for page replacement (page rotation between page-memory and disk) we use Random-LRU algorithm. It has a low maintenance cost and relatively simple implementation, but it has many disadvantages and affects performance very much when replacement is started. We even have warnings in the log when page replacement started and a special event for this. On some deployments, administrators even force to restart cluster nodes periodically to avoid page replacement.
Proposals to improve page replacement in Ignite:
Main idea: in some cases start background task to evict cold pages from page-memory (for example, pages, last touched more than 12 hours ago).
The task can be started:
Batch page replacement will be helpful in some workloads (when some data much colder than another). For example, when OLAP request over historical data raised pages to page-memory, and after such request, this data is not needed for a long time. Or when OLTP transactions mostly add new data and process recent data but rarely touch historical data. In these cases with the current approach, we will enter "page replacement mode" after some period of time and never leave it. Batch page replacement here can prevent the starting of Random-LRU page replacement, or if Random-LRU already started it can provide conditions to stop it.
Good page replacement algorithm should satisfy the requirements:
Our Random-LRU has low maintenance cost and sequential scan resistant, but to find the next page for replacement in the best case we scan 5 pages, in the worst case we can scan all data region segment. Also, due to random nature, it's not very effective in predicting the right page for replacement to minimize the page-fault rate. And it's much time required to totally evict old cold data.
Usually, database management systems and operating systems use modifications of LRU algorithms. These algorithms have higher maintenance costs (pages list should be modified on each page access), but often they are effective from a "page-fault rate" point of view and have O(1) complexity for a searching page to replace. Simple LRU is not sequential scan resistant, but modifications that utilize page access frequency are resistant to sequential scan. Some databases uses CLOCK as page replacement algorithm.
We can try one of the modifications of LRU as well (for example, "segmented LRU" seems suitable for Ignite) or CLOCK algorithm (or one of its modifications).
Random-LRU algorithm. Every time page is accessed, its timestamp gets updated. When a page fault occurs and it's required to replace some pages, the algorithm randomly chooses 5 pages from the page memory and evicts a page with the latest timestamp.
Segmented-LRU algorithm is a scan-resistant variation of the Least Recently Used (LRU) algorithm. Segmented-LRU pages list is divided into two segments, a probationary segment, and a protected segment. Pages in each segment are ordered from the least to the most recently accessed. New pages are added to the most recently accessed end (tail) of the probationary segment. Existing pages are removed from wherever they currently reside and added to the most recently accessed end of the protected segment. Pages in the protected segment have thus been accessed at least twice. The protected segment is finite, so migration of a page from the probationary segment to the protected segment may force the migration of the LRU page in the protected segment to the most recently used end of the probationary segment, giving this page another chance to be accessed before being replaced. Page to replace is polled from the least recently accessed end (head) of the probationary segment.
The clock algorithm keeps a circular list of pages in memory, with the "hand" pointing to the last examined page frame in the list. When a page fault occurs and no empty frames exist, then the hit flag of the page is inspected at the hand's location. If the hit flag is 0, the new page is put in place of the page the "hand" points to, and the hand is advanced one position. Otherwise, the hit flag is cleared, then the clock hand is incremented and the process is repeated until a page is replaced.
Ignite is a memory-centric product, so "low maintenance cost" is very critical. And there is a risk that page replacement algorithm can affect workloads, where page replacement is not used (enough RAM to store all data). Any page replacement solution should be carefully benchmarked with different workloads.
https://en.wikipedia.org/wiki/Page_replacement_algorithm
https://en.wikipedia.org/wiki/Cache_replacement_policies