...
StatusStatus
...
Page properties |
---|
...
|
Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-18-Code-Generation-for-improving-sorting-performance-tc16486.html
...
|
Pull-request : https://github.com/apache/flink/pull/3511
Released: TBD
|
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
These optimizations are studied in FLINK-4867 and the results are promising.
...
Sorter \ No. records | 10,000 | 100,000 | 1,000,000 |
Original NormalizedKeySorter | 4.70 ms | 61.53 ms | 609.00 ms |
SwapViaPutGetLong | 3.34 ms | 49.88 ms | 471.30 ms |
UseLittleEndian | 4.20 ms | 56.15 ms | 578.14 ms |
CompareUnrollLoop | 4.29 ms | 53.59 ms | 605.32 ms |
Table 1 : Comparison between optimization ideas from FLINK-4867 and original NormalizedKeySorter
Another potential improvement could be eliminating expensive mathematical operators such as division and modulo, for example, when computing memory segment index. One proposed solution that trying to improvement this problem is Greg Hogan ’s FLINK-3722. Another possible ones could be using bitwise operators or replacing divisors with a constant that we know at compile-time [1]. From our experiment, we have found that Greg's solution is the best one, so we will base our implementation on his pull-request.
Sorter \ No. records | 10,000 | 100,000 | 1,000,000 |
Original NormalizedKeySorter | 4.34 ms | 61.79 ms | 582.88 ms |
Flink-3722 NormalizedKeySorter | 3.00 ms | 35.90 ms | 386.95 ms |
DividedByConstant | 3.49 ms | 45.14 ms | 436.22 ms |
UsingBitwiseOperators | 3.33 ms | 42.22 ms | 419.98 ms |
Because these optimization techniques depend on knowledge of each execution context, for example, data-type of sorting key and endianness of machine, we would need to apply applicable optimizations on the fly. In other words, we will dynamically generate the most efficient sorter for each context on each worker.
External Libraries
In this proposal, we plan to use 2 external libraries that are the same as FLINK-3599 : Code Generation in Serializers is using, namely
- Janino: A minimal Java compiler that we will use to compile the generated code.
- FreeMarker : a Java template engine.
...