Download Flume 1.5.2 User Guide - Apache Flume

Transcript
type
batchSize
–
100
The component type name, needs to be
null.
Example for agent named a1:
a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = null
a1.sinks.k1.channel = c1
HBaseSinks
HBaseSink
This sink writes data to HBase. The Hbase configuration is picked up from the first hbase-site.xml encountered in the classpath. A class implementing
HbaseEventSerializer which is specified by the configuration is used to convert the events into HBase puts and/or increments. These puts and
increments are then written to HBase. This sink provides the same consistency guarantees as HBase, which is currently row-wise atomicity. In the
event of Hbase failing to write certain events, the sink will replay all events in that transaction.
The HBaseSink supports writing data to secure HBase. To write to secure HBase, the user the agent is running as must have write permissions to the
table the sink is configured to write to. The principal and keytab to use to authenticate against the KDC can be specified in the configuration. The
hbase-site.xml in the Flume agent’s classpath must have authentication set to kerberos (For details on how to do this, please refer to HBase
documentation).
For
convenience,
two
serializers
are
provided
with
Flume.
The
SimpleHbaseEventSerializer
(org.apache.flume.sink.hbase.SimpleHbaseEventSerializer) writes the event body as-is to HBase, and optionally increments a column in Hbase. This is
primarily an example implementation. The RegexHbaseEventSerializer (org.apache.flume.sink.hbase.RegexHbaseEventSerializer) breaks the event
body based on the given regex and writes each part into different columns.
The type is the FQCN: org.apache.flume.sink.hbase.HBaseSink.
Required properties are in bold.
Property Name
channel
type
table
columnFamily
zookeeperQuorum
Default
–
–
–
–
–
Description
The component type name, needs to be hbase
The name of the table in Hbase to write to.
The column family in Hbase to write to.
The quorum spec. This is the value for the property
hbase.zookeeper.quorum in hbase-site.xml
znodeParent
/hbase
The base path for the znode for the -ROOT- region. Value of
zookeeper.znode.parent in hbase-site.xml
batchSize
100
Number of events to be written per txn.
coalesceIncrements false
Should the sink coalesce multiple increments to a cell per batch.
This might give better performance if there are multiple increments
to a limited number of cells.
serializer
org.apache.flume.sink.hbase.SimpleHbaseEventSerializer Default increment column = “iCol”, payload column = “pCol”.
serializer.*
–
Properties to be passed to the serializer.
kerberosPrincipal
–
Kerberos user principal for accessing secure HBase
kerberosKeytab
–
Kerberos keytab for accessing secure HBase
Example for agent named a1:
a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = hbase
a1.sinks.k1.table = foo_table
a1.sinks.k1.columnFamily = bar_cf
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
a1.sinks.k1.channel = c1
AsyncHBaseSink
This sink writes data to HBase using an asynchronous model. A class implementing AsyncHbaseEventSerializer which is specified by the configuration
is used to convert the events into HBase puts and/or increments. These puts and increments are then written to HBase. This sink uses the Asynchbase
API to write to HBase. This sink provides the same consistency guarantees as HBase, which is currently row-wise atomicity. In the event of Hbase
failing to write certain events, the sink will replay all events in that transaction. The type is the FQCN: org.apache.flume.sink.hbase.AsyncHBaseSink.
Required properties are in bold.
Property Name
channel
type
table
zookeeperQuorum
Default
–
–
–
–
Description
znodeParent
/hbase
The base path for the znode for the -ROOT- region. Value of
The component type name, needs to be asynchbase
The name of the table in Hbase to write to.
The quorum spec. This is the value for the property
hbase.zookeeper.quorum in hbase-site.xml