All logs are not created equal Part 3

Author: 
Pramod Sridharamurthy
Thursday, October 19, 2017

Packaging

In the first two posts in this series, we looked at the log variety in both time series and non-time series log formats. In this concluding blog, I will focus on how these logs are typically packaged and delivered to ingestion platform.

There are multiple ways in which the logs are packaged, here are some examples:

Every event (stream)

Logs generated from a device can be continuously streamed to a target location when the event occurs. Typically this approach is used for simple log formats (see IoT, Syslog examples in my previous blog here) and also when real time analytics is the primary use case.

Note: non time series data is not easy to package and transmit this way.

All-Logs-Are-Not-Created-Equal-Part-2

Log data sent as batch in a single file

Log data can be batched in a single file and sent across periodically. This single file can contain both time series and non-time series data. This method is used, when the data need not be sent in real time and also when the amount of data batched per file is not too large. This also assumes a simple device that is logging into a single file. In a complex device on the contrary, different components of the device log into different files. The size of these single file logs vary from KBs to MBs.

Log data sent as a collection of files

Log data can be batched into multiple files (different components of a complex device logging into its own files) and all these files can then be compressed into an archive and sent across. Complex devices like networking devices, storage devices and medical devices create such log packages. Log files in the package can range from tens of files to tens of thousands of files. The size of such batched packages ranges from MBs to GBs

Delivery mechanism

Logs that are packaged as above can be transmitted in multiple ways

Streaming (Real time)

Streaming using TCP or UDP streams is used as a delivery mechanism when every event needs to be delivered as and when it occurs. For real time analytics and when log formats are simple, this method is used.

FTP/SFTP/SCP (Batched mode)

When sending batched data, either single log file or collection of files, FTP/SFTP/SCP is used. In this mechanism, the device remotely transfers the log data to the target using one of the above protocols. Typically secure options like SFTP/SCP are used over FTP. This approach is typically used, when log size (either single log or collection of logs) run into tens of MBs to GBs

E-mail (Batched mode)

This is a very common approach to send log data from devices in a batched mode. The device is configured to send a single log or a collection of logs as an e-mail to the target location. If the data being sent is a single log, it can be sent as an attachment or as the body of the e-mail and a collection of logs is always sent as an attachment.

HTTPS POST

Batched log bundles can also Posted using HTTPS from the device. The advantage of a method like this is that there is a guaranteed delivery as compared to e-mail as a transport mechanism where the delivery is not guaranteed.

Protocols like MQTT, AMQP etc

When you are dealing with simple devices or sensors, which are IoT enabled, then the data protocols like MQTT, AMQP is used to package and delivery over Wi-Fi, Bluetooth etc.

Combination of the above

In some cases, the device might stream time-series data for real time analytics and send configuration and status information in batch mode.

So, now that you understand the various formats of the logs, packaging mechanism and delivery mechanism, here is a real life scenario for a manufacturer, say Hiwi Networks.

Real life scenario

Let’s take a vendor who manufacturers networking devices. A typical network device manufacturer will have 4 – 5 product lines. Each product line might have 5 – 6 different product models and over the period of many years of existence, there might be 30 – 50 different software versions being used by their customers in the field. While the collection of logs being sent by all the different product lines and models across versions might be similar, more often than not, every product line, model and software version combination will have some variation to the log formats as compared to others. Now, consider the case where the device streams critical events and sends other information in a batched mode.

So, the challenge for the log analytics software is to handle and support:

  • Both time series and non-time series data
  • Streaming and batched mode data
  • Hundreds of complex formats in non-time series data type. Typically time series data is easy to handle
  • All the variations for the 5 product lines x 5 product models x 50 software versions
  • Both real time and historical analytics use cases

At Glassbeam, this is the exciting challenge that we solve for our customers. Each of our customer use case is as complex as listed above. We really mean it when we say we ingest and parse logs from ‘Any device, Any format’!