We work with a lot of Java and Scala stacktraces and the other options I've tried for supporting them in Heka don't work as well as I'd like. This is an implementation of a regex-based
MultilineSplitter which works great for our stacktraces. The implementation is that you define a regex to use as the delimiter and a regex used to match lines that should be joined together. It first splits the buffer using the delimiter and then checks each section against the multiline regex to see if it's a match. All lines that are contiguous and match the multiline regex are joined. Because of the multiline nature, it always keeps the delimiter on the EOL.
This is going to be notably slower than the
RegexSplitter because it has to find many matches on the first pass rather than the first one. (That's limited to 99 by default and is not currently configurable without a recompile.) Secondly, it will run a second regex on all those matches. Given's Go's performance-oriented Regex implementation and reasonable logging levels it appears to be tolerable. It can, in the worst case, re-split the first lines in a very large buffer repeatedly.
Here's an example configuration for the splitter:
type = "MultilineSplitter"
multiline = '(\] FATAL )|(\A\s*.+Exception: .)|(at \S+\(\S+\))|(\A\s+... \d+ more)|(\A\s*Caused by:.)|(\A\s*Grave:)'
delimiter = '\n'
Given a broken Kafka installation, this generates something like the following when encoded with the
"@message": "java.net.ConnectException: Connection refused\n\tat sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)\n\tat sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)\n\tat org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)\n\tat org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)\n",
Note that this output is from a splitter-enabled Docker input plugin that I will prepare a separate PR for.