For now, Ignite has not build-in profiling tool for user's operations and internal processes. Such a tool will be able to collect performance statistics and create a human-readable report. It will help to analyze workload and to tune configuration and applications.
Example of similar tools in other products: AWR [1] [2] [3] (Oracle) ; pgbadger [4], pgmetrics [5], powa [6] (PostgresSQL).
We should provide a way to execute cluster profiling. Consider the following scenario:
The performance report will be in a human-readable format (html page) and should contain:
Additional investigation required to gather following statistics:
The Ignite will log some additional internal performance statistics to profiling files. The format is like WAL logging.
One disk-writer thread and off-heap memory buffer will be used to minimize affect on performance. Maximum file size and buffer size can be configured on start.
The new extension performance-statistics-ext
module will be introduced. It will contain the tool to build the report: build-report.sh(bat).
The JSON format is used to store aggregated statistics and next draw in the report.
The report is based on the bootstrap library and can be viewed in a browser offline.
1) JMX:
PerformanceStatisticsMBean
void start()
// Start collecting performance statistics in the cluster.void stop()
// Stop collecting performance statistics in the cluster.boolean enabled()
// True if collecting performance statistics enabled.2) Control.sh utility. Functionality is like JMX.
3) System properties:
Enabled profiling mode will cause performance degradation.
- IGNITE-12666Getting issue details... STATUS
3 Comments
Alexey Goncharuk
Nikita Amelchev I took a look at the sample reports and noticed that only the number of operations is put on the timeline. Do you think it may be possible to collect histograms for each n-seconds interval so that we can build a heatmap for operations durations?
Alexey Goncharuk
Nikita Amelchev Also, can you also add the following details to the IEP?
perf
tool?)Nikita Amelchev
Alexey Goncharuk Sorry for delaying,
> Do you think it may be possible to collect histograms for each n-seconds interval so that we can build a heatmap for operations durations?
Yes, it's possible. All required statistics are collecting and can be properly aggregated and drawn at the report. I think I can add this view during review or at nearest time after.
> How the profiling data is collected? Is the data kept in memory (on heap, off heap?) or is it being dumped to disk, if yes, what is the format?
I have updated IEP. Yes, it's collected to profiling files on the disk. Offheap buffer is used bewfore flushing to the disk. The mechanics is like WAL logging.
> Did you consider using JFR extensions for dumping the profiling events?
I think this method may not take into account the effect of small queries.
> What is expected performance impact when this feature is enabled? Will it have configurable profiles like JFR does?
Locally I get less than 5% impact. I'll benchmark on real cluster soon. Offheap buffer size can be configured. Performance impact can't be configured.
> Why do we need a separate command-line tool? Does the tool need to have a cluster online to work? Should we add this to control.sh?
> If the data is written in binary format, can I build a report later (similar to
perf
tool?)The tool does not need a cluster. Profiling files can be parsed locally out of a cluster. It not assume to run a grid client node. This is why I use the separate script.
> What is the CLI for the tool (flags, input parameters, security permissions)
I have updated PR. See Profiling management section.