Download Flume 1.5.2 User Guide - Apache Flume
Transcript
type batchSize – 100 The component type name, needs to be null. Example for agent named a1: a1.channels = c1 a1.sinks = k1 a1.sinks.k1.type = null a1.sinks.k1.channel = c1 HBaseSinks HBaseSink This sink writes data to HBase. The Hbase configuration is picked up from the first hbase-site.xml encountered in the classpath. A class implementing HbaseEventSerializer which is specified by the configuration is used to convert the events into HBase puts and/or increments. These puts and increments are then written to HBase. This sink provides the same consistency guarantees as HBase, which is currently row-wise atomicity. In the event of Hbase failing to write certain events, the sink will replay all events in that transaction. The HBaseSink supports writing data to secure HBase. To write to secure HBase, the user the agent is running as must have write permissions to the table the sink is configured to write to. The principal and keytab to use to authenticate against the KDC can be specified in the configuration. The hbase-site.xml in the Flume agent’s classpath must have authentication set to kerberos (For details on how to do this, please refer to HBase documentation). For convenience, two serializers are provided with Flume. The SimpleHbaseEventSerializer (org.apache.flume.sink.hbase.SimpleHbaseEventSerializer) writes the event body as-is to HBase, and optionally increments a column in Hbase. This is primarily an example implementation. The RegexHbaseEventSerializer (org.apache.flume.sink.hbase.RegexHbaseEventSerializer) breaks the event body based on the given regex and writes each part into different columns. The type is the FQCN: org.apache.flume.sink.hbase.HBaseSink. Required properties are in bold. Property Name channel type table columnFamily zookeeperQuorum Default – – – – – Description The component type name, needs to be hbase The name of the table in Hbase to write to. The column family in Hbase to write to. The quorum spec. This is the value for the property hbase.zookeeper.quorum in hbase-site.xml znodeParent /hbase The base path for the znode for the -ROOT- region. Value of zookeeper.znode.parent in hbase-site.xml batchSize 100 Number of events to be written per txn. coalesceIncrements false Should the sink coalesce multiple increments to a cell per batch. This might give better performance if there are multiple increments to a limited number of cells. serializer org.apache.flume.sink.hbase.SimpleHbaseEventSerializer Default increment column = “iCol”, payload column = “pCol”. serializer.* – Properties to be passed to the serializer. kerberosPrincipal – Kerberos user principal for accessing secure HBase kerberosKeytab – Kerberos keytab for accessing secure HBase Example for agent named a1: a1.channels = c1 a1.sinks = k1 a1.sinks.k1.type = hbase a1.sinks.k1.table = foo_table a1.sinks.k1.columnFamily = bar_cf a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer a1.sinks.k1.channel = c1 AsyncHBaseSink This sink writes data to HBase using an asynchronous model. A class implementing AsyncHbaseEventSerializer which is specified by the configuration is used to convert the events into HBase puts and/or increments. These puts and increments are then written to HBase. This sink uses the Asynchbase API to write to HBase. This sink provides the same consistency guarantees as HBase, which is currently row-wise atomicity. In the event of Hbase failing to write certain events, the sink will replay all events in that transaction. The type is the FQCN: org.apache.flume.sink.hbase.AsyncHBaseSink. Required properties are in bold. Property Name channel type table zookeeperQuorum Default – – – – Description znodeParent /hbase The base path for the znode for the -ROOT- region. Value of The component type name, needs to be asynchbase The name of the table in Hbase to write to. The quorum spec. This is the value for the property hbase.zookeeper.quorum in hbase-site.xml