flume 日志收集的souce type ?

理由

举报取消

现在公司的日志生成方式，0点时生成一个00.log文件，往里面写一个小时的日志，到1点的时候再生成一个01.log文件再往里写一个小时的日志…………一天要生成24个文件，通过flume来实时收集日志，不知道如何定义source type，感觉exec和spooldir都不能满足需求,不知道哪个source type比较好，希望大神能帮忙解答下，万分感谢！

2017年6月14日 4 条回复 1094 次浏览

数据

回复 ( 4 )

urey

举报回复

理由

举报取消

我根据flume1.7.0的new feature：

[FLUME-2498] – Implement Taildir Source

实现了一个可以递归监听配置目录下面所有子目录中的文件的source，详细请戳：

GitHub – qwurey/flume-source-taildir-recursive: Flume1.7.0 TaildirSource support monitor sub-directories recursivly

当然，如果不需要递归监听的feature，直接使用1.7.0原生的Taildir Source即可满足需求。

关于Taildir Source的官方描述：

This is the proposal of implementing a new tailing source.
This source watches the specified files, and tails them in nearly real-time once appends are detected to these files.
This source is reliable and will not miss data even when the tailing files rotate.
It periodically writes the last read position of each file in a position file using the JSON format.
If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file.
It can add event headers to each tailing file group.
A attached patch includes a config documentation of this.
This source requires Unix-style file system and Java 1.7 or later.

蔡国庆初入职场
0
举报回复
理由

举报取消

Flume支持多种类型的Source，包括Avro、Thrift、Exec、JMS、Spooling Directory、Taildir、Kafka、NetCat、Sequence Generator、Syslog Sources、HTTP、Stress、Custom、Scribe。

让Flume读取现有的日志文件，可以使用如下Source：

Taildir Source：观察指定的文件，并在检测到添加到每个文件的新行后几乎实时地尾随它们。

Spooling Directory Source：监测配置的目录下新增的文件，并将文件中的数据读取出来。需要注意两点：拷贝到 spool 目录下的文件不可以再打开编辑；spool 目录下不可包含相应的子目录。

Exec Source：以运行Linux命令的方式，持续的输出最新的数据，如tail -F文件名指令。

参考：Apache Flume日志收集系统简介
杨威初入职场
0
举报回复
理由

举报取消

同样有这样的场景，而且还是windows环境，敢问大侠解决了吗？
HADOOP 初入职场
0
举报回复
理由

举报取消

遗憾的告诉你，我们公司基本没用flumn，我也只停留在知道有这个东西，没有深入了解

找回密码

flume 日志收集的souce type ?

发起人：bi qu 初入职场

回复 ( 4 )

我来回答

帐户注册

登录

找回密码

flume 日志收集的souce type ?

发起人：bi qu 初入职场

回复 ( 4 )

我来回答