简介

什么是Trafodion项目?

Trafodion是一个在HP实验室孵化的开源项目。它致力于开发一款企业级的SQL-on-HBase解决方案,与处理分析型工作负载不同,它主要用于对大型事务型或操作型工作负载的处理。

Trafodion什么时候可用?

Trafodion目前可以在downloads.trafodion.org下载。您也可以在那里找到安装器和可执行代码。关于安装指南,可参考安装

Trafodion的主要功能有哪些?

  • 全功能的ANSI SQL语言
  • 适用JDBC/ODBC与Linux/Windows客户端连接
  • 跨语句、表格、行的分布式ACID的事务保护
  • 事务恢复机制以保证数据库一致性
  • 针对低延迟读取和写入事务进行了优化
  • 通过并行感知查询优化器来支持海量数据集
  • 通过编译时和运行时优化以提高OLTP的性能
  • 分布式并行处理结构设计实现的可扩展性

Trafodion有哪些主要优点?

Trafodion提供了全功能的、优化了的事务型SQL-on-HBase关系数据库解决方案,它有着健全的事务型数据保护。这些功能在事务型工作负载的层面上弥补了Hadoop的弱点。

用Trafodion,用户可获以下利益:

  • 相较于复杂的MapReduce编程,用户可以利用内部现有的SQL专家知识能力
  • 对现有事务型应用程序的的无缝支持
  • 具备开发下一代高度可扩展、 实时事务处理应用程序的能力
  • 减少???分析工作负载的数据延迟
  • 在不同的应用程序领域实现数据重用

并且他们还继承了Hadoop生态系统的一些优点。

  • 减少基础设施建设的成本
  • 强大的可扩展性和颗粒???弹性
  • 改进的数据可用性和灾难恢复保护

为何取名叫作Trafodion?

Trafodion在威尔士语中意为“事务”。

Trafodion有哪些主要的用户案例?

在 Trafodion 上的主要使用案例包含了在现有的HBase工作负载上增强SQL的事务型数据保护。另一类的用例包括了在新的或重置操作型应用程序上避免可伸缩性问题、复杂的应用程序编程、或禁用许可的费用。

Trafodion的传承

Trafodion是基于惠普在OLTP方面二十多年的数据库产品开发经验而来的。基于这一遗产,Trafodion提供跨多个语句、表格、行的分布式的事务管理保护。

事务型或操作型SQL语句意味着什么?

事务型或操作型SQL描述工作负载以前被称为OLTP(联机事务处理)工作负载,但现在该定义已扩展到广泛的企业级事务型应用程序(如ERP、CRM等),包含了从社交媒体和手机数据交互,及结构化和半结构化混合数据的新型事务记录。更多有关操作型关系数据库的详情,请参见Gartner网站上的文章“操作型数据库管理系统的魔术象限”。

有了商业的OLTP数据库,为什么还要使用Trafodion?

Trafodion是一款开源产品,就像Hadoop生态系统中其他产品,它有两个传统专用OLTP数据库系统没有的主要优点:成本结构???和数据重用。

 

技术

Trafodion主要是对HBase的扩展吗?

不完全是。Trafodion是运行在HBase上的,利用HBase的API执行典型的SQL功能包括插入、更新、删除及查询。然而,我们有计划包括使用HBase所提供的协处理器机制的数据库功能。

除了通过SQL数据库连接访问数据,Trafodion在现有的分布式数据库技术中提供还什么其他好处?

使用Trafodion,客户可以获得并使用标准的HBase API,以及使用Trafodion的SQL访问HBase表。 这包括使用强大的SQL功能对HBase和Trafodion进行表联接,并对其执行查询,同时提供事务管理能力。

Trafodion可以修改表吗?

可以。Trafodion提供全功能的ANSI DML操作,包括INSERT、UPDATE、DELETE、MERGE及UPSERT的语句块支持。

Trafodion的事务管理是一个最终一致性的模型吗?

不是。Trafodion支持所有的ACID属性。分布式事务管理在多行更新、跨多表更新和事务跨多个更新语句上提供了事务的一致性。

Trafodion提供的事务管理可以用于原生态的HBase表吗?

可以。Trafodion支持在原生态的HBase表上用事务管理。

运行Trafodion需要什么样的硬件?它在硬件供应商问题上中立吗?

Trafodion旨在安装在供应商中立的商品化的硬件上。由于Trafodion是运行在HBase和HDFS上的,因此我们建议使用HP建议架构的Hadoop版本或其他类似的服务器配置。

Trafodion是工作在Hadoop上的YARN应用吗?

Trafodion还没有在YARN上做过测试,但这并不会妨碍它成为运行在YARN容器里的一个常规应用程序。

Trafodion支持读一致性吗?

是的。所有的读,在Trafodion里都是完全一致的。

Trafodion是否具备回滚事务的能力?

是。Trafodion对于跨多个SQL语句、多表或多行提供全方位的ACID保护。事务可以用ROLLBACK WORK语句被中止。

Trafodion有事务日志并可以被发送到DR站点吗?

有。Trafodion有一个日志用于审计所有事务活动。目前并不支持把日志发送到一个远程的DR站点并回放。

Trafodion支持跨多个数据中心的分布式事务管理吗?

Trafodion的实例并不支持在不同的数据中心间进行事务管理。

Trafodion是如何支持数据库安全管理的?

目前,Trafodion提供基于HBase功能的安全管理。有些作为线路图一部分的项目为用户和角色,加上支持基于目录的身份验证,提供了表和schema级的充分的授予/撤销能力。

Trafodion需要安装在所有的Hadoop集群节点上吗?

Trafodion需要在HBase RegionServer所有节点上安装。Trafodion有一个安装脚本会让安装过程变得容易。更多信息,请参见Trafodion安装指南

Trafodion的软件要求是什么?

Trafodion需要一个x86-64位的硬件平台,运行RedHat企业版Linux操作系统或CentOS 6."x"内核(64位)。它支持Cloudera和Hortonworks的Hadoop版本;更多详情,可参见支持的Hadoop发行版。该软件支持各版本的Linux和Hadoop,但其他发行版目前还没有经过测试。

有一些推荐配置吗?比如要多少个节点,多少内存等等?

我们有计划针对Trafodion性能优化定义一个参考体系结构。目前,我们建议使用Hadoop或类似配置的HP参考体系结构。

事务型工作负载的基准性能是什么?

基准结果将被发布在wiki上。

根据CAP定律,Trafodion是一个怎样的定位?它是CP(如HBase般的一致性和分区宽容度)或CA(一致性和高可用性)吗?

Trafodion是基于HBase和HDFS上的。HBase一般在CAP定理的中定位为CA(一致性和可用)。不同于原生态的HBase,Trafodion扩展了一致性的定义,以提供跨多个SQL语句、 表和行的交易记录的ACID保护。

Trafodion在数据库大小方面是如何伸缩的?

由于Trafodion是基于HBase/HDFS的,所以从理论上讲,Trafodion的数据库是可以像HBase/HDFS那样伸缩的。(由于硬件限制)这一点,目前Trafodion只测试了最多10个节点和50TB字节的配置。

What about high availability? Does Trafodion have a single point of failure?

HBase and HDFS are considered highly available with many built in features for HA including name node redundancy, HDFS k-safety data replication, HBase replication, HBase snapshots, Zookeeper’s highly reliable distributed coordination of Hadoop hosted services, and so on. Additionally, the planned release of HBase 1.0 is advertised as having many additional HA features. Furthermore, many Hadoop distributions have added their own HA features at the HBase or HDFS layers.
Trafodion leverages HBase and HDFS capabilities for providing extended HA to the connectivity and SQL layers. For example, Trafodion registers its connection services using Zookeeper to ensure persistent connectivity services. Trafodion is designed as an MPP service with replicated service points for HA and no single points of failure.

Is Trafodion best suited for reads or for writes?

Trafodion provides a bulk load capability that interfaces directly with HDFS for high-performance data loading. HBase leverages large-scale cached memory to host data in memory until the HFile buffer becomes filled and is then written to HDFS. Additionally, Trafodion provides many low-latency read optimizations on top of HBase to significantly improve the both the random and parallel read performance of the database engine. These include:
  • Statistics-based plan generation
  • Degree of parallelism optimization
  • In-memory data-flow, scheduler-driven executor
  • Query plan caching
  • Key-based access with SQL “pushdown”
  • Multi-dimensional access method (MDAM)
  • Composite key support
  • Secondary index support
  • Table structure optimizations
  • Salting of row-keys for workload balancing

 

Can Trafodion data be accessed through Hive? Can we publish a Trafodion table to HCatalog and query it using Hive?

While theoretically possible since Trafodion data is stored in HBase/HDFS, the data would not be easily interpreted due to the encoding mechanisms Trafodion employs for transaction performance and efficiency. Trafodion maintains its own catalog for metadata that provides the relational abstraction layer for accessing the physical storage layer via SQL commands. Trafodion objects are complex as they can have tables, indexes, views, constraints, which are represented in Trafodion’s own catalog. HCatalog can represent only simple objects like tables or column families currently but does not contain support for representing other more complex relational objects. Alternatively, Trafodion data could be loaded into Hive/HDFS and then accessed using Hive specific operations or native MapReduce jobs.

Can Trafodion access Hive stored data?

Trafodion does provide the capability for SQL statements submitted to Trafodion to access, join, and aggregate a combination of Trafodion database objects, native HBase objects, and Hive tables (via the HCatalog).

Can we access Trafodion data with MapReduce?

Trafodion provides automatic access parallelism without the need for writing MapReduce. If the optimized plan calls for parallel execution, the Trafodion Master process divides the work among Executive Server Processes (ESPs) to perform the work in parallel on behalf of the Master process. The results are passed back to the Master for consolidation. In some situations where there a highly complex plan specified (for example, large n-way joins or aggregations), multiple layers of ESPs may be requested. See the diagram below.
Multiple ESPs
Additionally, Trafodion can serve as a source (read) or target (write/load) for user-written MapReduce jobs using Trafodion’s standard ODBC/JDBC interface.

Does Trafodion have role-based security authorization?

Yes, Trafodion supports ANSI GRANT/REVOKE on both an individual user and role basis. See Enabling Security Features in Trafodion. For details on the GRANT/REVOKE syntax, see the Trafodion SQL Reference Manual (pdf, 3.98 MB).

What is Trafodion’s security authorization granularity?

Trafodion supports authorization (that is, privileges) at the system and database-object level. System privileges are valid across SQL, such as the ability to create schemas and objects. Object privileges include tables, views, and other databse objects. Views can be used to restrict access to the subset of table rows referenced within the view.

Does Trafodion have full text search capabilities?

Trafodion provides SQL search capabilities using search functions such as POSITION (equivalent to INDEX) and SUBSTRING. Once large object (LOB) support is made available, a user-written or third-party UDF could be plugged in to add search capabilities into documents and images.

What’s the process like when we add a new node to the cluster? Does the system grow elastically?

The underlying HBase/HDFS infrastructure supports elastic scalability allowing for incremental node expansion, that is datanode expansion. Trafodion would be able to access the HBase tables (or ‘regions’ of the HBase tables) that now span the new node(s). Trafodion services (that is, connection, compiler, master, ESP) can be made to leverage these additional nodes by restarting Trafodion with a new coniguration.

Does Trafodion need dedicated hardware?

Trafodion can run on dedicated hardware, cloud or VM configurations. For best and consistent performance, dedicated hardware is recommended where feasible.

Given a big cluster and a transaction involving 10s of RegionServers, how does Trafodion achieve good performance and quasi-linear scalability (given the inherent overhead of a two-phase commit)?

Trafodion employs a Distributed Transaction Manager (DTM) where a DTM runs on every datanode and each DTM works with its peers in other datanodes to coordinate the two-phase commit protocol. For efficiency purposes, the DTM is invoked only when necessary:
  • If the transaction involves only a single region, then two-phase commit coordination is not necessary.
  • The DTM is not involved with read-only transactions.
  • The DTM is not involved in loading data into empty tables or during index creation.
  • If the transaction impacts only a single table row, then standard HBase ACID protection is used instead of the DTM.
  • No labels