Introduction

The Trafodion instance repository is a data warehouse for manageability data collected from a Trafodion instance. Each instance in a Trafodion cluster contains a repository schema ("_REPOS_") that hosts the repository tables. To retrieve data from the repository, DBAs or end users can query the repository tables directly using the Trafodion Command Interface (TrafCI) or any third-party tool like DbVisualizer or SQuirreL SQL Client. Performance metrics such as query information, statistics, and session information are stored in repository tables.

See the overall Process Architecture diagram, which shows data being published to repository tables and events being written to log files via log4ccp or log4j.

Repository Structure and Features

Components on the Trafodion platform generate performance data for database objects, events data such as error messages or state changes, and a variety of statistical data. A component transmits this data using publications, which are data structures that define fields to hold the transmitted, or published, data. The data from these publications are written to repository tables with corresponding columns. See the diagram below.

Trafodion Manageability Flow

Repository Schema and Table Definitions

All tables in the repository reside in the "_REPOS_" schema. The repository currently includes these tables:

Table NameTable Description
METRIC_QUERY_AGGR_TABLEThis table contains the statistics of short-running queries, which have been aggregated into one record and written to the repository at the end of each statistics aggregation interval. For descriptions of the fields in the table, see METRIC_QUERY_AGGR_TABLE (Release 1.1). If you are using Trafodion Release 1.0, see METRIC_QUERY_AGGR_TABLE (Release 1.0).
METRIC_QUERY_TABLEThis table stores query statistics information gathered during the compilation and execution of queries that run longer than the configured interval. For descriptions of the fields in the table, see METRIC_QUERY_TABLE (Release 1.1). If you are using Trafodion Release 1.0, see METRIC_QUERY_TABLE (Release 1.0).
METRIC_SESSION_TABLEThis table provides access to ODBC and JDBC session statistics, which are gathered when clients connect to a Trafodion instance. These statistics include the data that is collected at the start and end of each client session. For descriptions of the fields in the table, see METRIC_SESSION_TABLE (Release 1.1). If you are using Trafodion Release 1.0, see METRIC_SESSION_TABLE (Release 1.0).
METRIC_TEXT_TABLEThis table is reserved for future use. For descriptions of the fields in the table, see METRIC_TEXT_TABLE (Release 1.1). This table does not exist in Trafodion Release 1.0.

For additional repository field descriptions, see:

Getting Started With the Repository

Installation

The repository is automatically installed on the platform by the Trafodion installer.

Character Set Support

All repository character data is encoded using UTF8.

Enabling or Disabling Publication to the Repository

Publication of statistics is enabled by default. You can disable publication of statistics by setting the DCS server property, dcs.server.user.program.statistics.enabled, to 'false' in the dcs-site.xml file and then restarting the DCS servers. For more information, see the Trafodion Database Connectivity Services Reference Guide.

Configuring the Repository

Statistics published to the repository can be:

  • Session statistics, which are published when session ends.
  • Aggregation statistics, which are aggregated and published at a specified interval during the session. The default interval is 60 seconds.
  • Query statistics, which are published when the query starts and are updated when query ends. By default, query statistics are published only for queries that run longer than 60 seconds.

You can configure the type of statistics that are written to the repository by setting the property, dcs.server.user.program.statistics.type, in the dcs-site.xml file to one of the following statistics types and then restarting the DCS servers:

Statistics TypeDescription
'aggregated'(default)This setting causes session statistics and aggregation statistics to be published in the repository for all queries. Query statistics are published only when a query executes longer than the specified statistics limit, which is 60 seconds by default, but can be configured in the dcs-site.xml file.
'session'This setting causes only session statistics to be published in the repository. Aggregation and query statistics are not published.

For more information about setting this and other statistics properties, see the Trafodion Database Connectivity Services Reference Guide.

NOTE: Trafodion Release 1.0 supports the 'query' setting, which causes session statistics and query statistics to be published in the repository for all queries and which prevents aggregation statistics from being published. This setting is no longer supported in Trafodion Release 1.1 and subsequent releases.

Upgrading the Repository

Repository tables are re-created on every reinstallation of Trafodion or if you execute the 'initialize trafodion, upgrade' command.

Data Aging for the Repository

As part of system management, you should include maintenance of repository tables to "age out" old data. Because these kinds of operations are resource-intensive, integrating regular repository maintenance schedules into your enterprise-level operations will optimize the process.

NOTE: Aging out repository data is currently not supported but will be part of a future release.

Examples and Guidelines for Creating Repository Queries

See Examples and Guidelines for Creating Repository Queries for some simple queries that you can run against the available repository tables. That page also provides suggestions for writing repository queries.

Logging Events

Trafodion components use log4cpp and log4j to log error or event messages. By default, all information is logged to files. Log files contain:

  • Messages from SQL subcomponents and the master executor.
  • Information collected from the Transaction Managers (TMs).
  • General information about potential system situations; for example, SQL errors, configuration file errors from a component.
  • Information specifically addressing SQL errors; for example, the ID of the query producing the error, a description of the error, or the name of the catalog, schema, and table that produced the error.

Location of Configuration Files

Trafodion configuration files are located in the $MY_SQROOT/conf folder. Starting in Trafodion Release 1.1, there is a configuration file for each subcomponent of Trafodion that currently logs events.

Config File NameUsed by Component
log4cpp.trafodion.masterexe.configSQL master executor/compiler/ESPs
log4cpp.trafodion.ssmp.configSQL ssmp
log4cpp.trafodion.sscp.configSQL sscp
log4cpp.trafodion.lob.configSQL lobserver
log4cpp.trafodion.udr.configSQL udr
log4cpp.monitor.mon.configMonitor
log4cpp.monitor.mon.snmp.configMonitor Critical
log4cpp.monitor.pstartd.configProcess Starup Daemon
log4cpp.monitor.wdg.configWatchdog process messages
log4cpp.monitor.wdg.snmp.configWatchdog process critical
log4cpp.tm.configTransaction Manager Client
log4j.dtm.configTransactional Server
log4j.hdfs.configHadoop/HDFS/Hbase layer

NOTE: In Trafodion Release 1.0, all configuration files use the default log level of 'ERROR.'

Location of Log Files

Trafodion log files are located in the $MY_SQROOT/logs folder. A log file is generated for every master executor process. The format of the log file and the message is as follows:

master_exec_<node number>_<Process Id>.log:<Timestamp>, <Log Level>, <Component>, <Node Number>, <CPU>, <Process Id>, <Process Name>, <SQL Error code>, <Query Id>, <Message Text>

Searching the Log Files

The log file for each master executor process is placed in the node in which the process is executing. Thus, there will be a different set of master_exec_*.log files in every node of the cluster. To troubleshoot a problem that could have occurred anywhere on the cluster, you will need to search the log files on every node. To facilitate this, Trafodion has implemented a table-valued function that provides an SQL interface to the contents of these log files. You can invoke this function in an SQL query. It reads all the log files across the cluster and produces an output that can be treated as an SQL table. Each event is a separate row. The columns of this table depend on the options provided.

The SQL syntax to invoke this function is:

select * from udf(event_log_reader( [options] ));

The optional [options] argument is a character constant. It accepts the value:

f: add file name output columns (see below)

Returned columns:

 log_ts        timestamp(6),
 severity      char(10 bytes) character set utf8,
 component     char(24 bytes) character set utf8,
 node_number   integer,
 cpu           integer,
 pin           integer,
 process_name  char(12 bytes) character set utf8,
 sql_code      integer,
 query_id      varchar(200 bytes) character set utf8,
 message       varchar(4000 bytes) character set utf8

 If option "f" was specified, we have four more columns:

 log_file_node integer not null,
 log_file_name varchar(200 bytes) character set utf8 not null,
 log_file_line integer not null,
 parse_status  char(2 bytes) character set utf8 not null

 (log_file_node, log_file_name, log_file_line) form a unique key
 in the result table. parse_status indicates whether there were
 any errors reading the information:

 '  ' (two blanks): no errors
 'E'  (as first or second character): parse error
 'T'  (as first or second character): truncation or over/underflow
                                      occurred
 'C'  (as first or second character): character conversion error.

Sample Queries for Capturing Event Information

To see events reporting unique constraint violations that occurred during a specific time interval, in time sequence, use this query:

select log_ts, process_name, cast(message as char(100))
from udf(event_log_reader()) 
where sql_code = 8102 and log_ts between timestamp '2015-01-04 12:00:00' 
  and timestamp '2015-01-04 13:00:00'
order by log_ts; 

This query reports the number of execution errors (with sql_code between 8000 and 9000) that occurred during a specific day:

select sql_code, count(*) 
from udf(event_log_reader()) 
where log_ts between timestamp '2015-01-04 00:00:00' 
  and timestamp '2015-01-04 23:59:59' 
  and sql_code between 8000 and 9000
group by sql_code 
order by sql_code; 

This query reports the number of compiler processes started by each master executor:

select process_name, count(*) 
from udf(event_log_reader()) 
where message like '%A compiler process is launched%' and severity = 'INFO'
group by process_name ;

This query reports all the missing statistics warnings generated during a specified interval. This output can be used to issue UPDATE STATISTICS commands to improve plan quality for future runs of the same workload.

select message 
from udf(event_log_reader()) 
where log_ts between timestamp '2015-01-27 00:00:00' 
  and timestamp '2015-01-27 23:59:59' 
  and sql_code in (6007, 6008) ;
  • No labels