This is the proposal of implementing a new tailing source.
This source watches the specified files, and tails them in nearly real-time once appends are detected to these files.
This source is reliable and will not miss data even when the tailing files rotate.
It periodically writes the last read position of each file in a position file using the JSON format.
If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file.
It can add event headers to each tailing file group.
A attached patch includes a config documentation of this.
This source requires Unix-style file system and Java 1.7 or later.
回复 ( 4 )
我根据flume1.7.0的new feature:
实现了一个可以递归监听配置目录下面所有子目录中的文件的source,详细请戳:
GitHub – qwurey/flume-source-taildir-recursive: Flume1.7.0 TaildirSource support monitor sub-directories recursivly
当然,如果不需要递归监听的feature,直接使用1.7.0原生的Taildir Source即可满足需求。
关于Taildir Source的官方描述:
Flume支持多种类型的Source,包括Avro、Thrift、Exec、JMS、Spooling Directory、Taildir、Kafka、NetCat、Sequence Generator、Syslog Sources、HTTP、Stress、Custom、Scribe。
让Flume读取现有的日志文件,可以使用如下Source:
Taildir Source:观察指定的文件,并在检测到添加到每个文件的新行后几乎实时地尾随它们。
Spooling Directory Source:监测配置的目录下新增的文件,并将文件中的数据读取出来。需要注意两点:拷贝到 spool 目录下的文件不可以再打开编辑;spool 目录下不可包含相应的子目录。
Exec Source:以运行Linux命令的方式,持续的输出最新的数据,如tail -F文件名指令。
参考:Apache Flume日志收集系统简介
同样有这样的场景,而且还是windows环境,敢问大侠解决了吗?
遗憾的告诉你,我们公司基本没用flumn,我也只停留在知道有这个东西,没有深入了解