In computer log management and intelligence, log analysis (or system and network log analysis) is an art and science seeking to make sense of computer-generated records (also called log or audit trail records). The process of creating such records is called data logging.
Typical reasons why people perform log analysis are:
- Compliance with security policies
- Compliance with audit or regulation
- System troubleshooting
- Forensics (during investigations or in response to a subpoena)
- Security incident response
- Understanding online user behavior
Logs are emitted by network devices, operating systems, applications and all manner of intelligent or programmable devices. A stream of messages in time sequence often comprises a log. Logs may be directed to files and stored on disk or directed as a network stream to a log collector.
Log messages must usually be interpreted concerning the internal state of its source (e.g., application) and announce security-relevant or operations-relevant events (e.g., a user login, or a systems error).
Logs are often created by software developers to aid in the debugging of the operation of an application or understanding how users are interacting with a system, such as a search engine. The syntax and semantics of data within log messages are usually application or vendor-specific. The terminology may also vary; for example, the authentication of a user to an application may be described as a log in, a logon, a user connection or an authentication event. Hence, log analysis must interpret messages within the context of an application, vendor, system or configuration to make useful comparisons to messages from different log sources.
Log message format or content may not always be fully documented. A task of the log analyst is to induce the system to emit the full range of messages to understand the complete domain from which the messages must be interpreted.
A log analyst may map varying terminology from different log sources into a uniform, normalized terminology so that reports and statistics can be derived from a heterogeneous environment. For example, log messages from Windows, Unix, network firewalls, and databases may be aggregated into a "normalized" report for the auditor. Different systems may signal different message priorities with a different vocabulary, such as "error" and "warning" vs. "err", "warn", and "critical".
Hence, log analysis practices exist on the continuum from text retrieval to reverse engineering of software.
Functions and technologies
Pattern recognition is a function of selecting incoming messages and compare with a pattern book to filter or handle different ways.
Normalization is the function of converting message parts to the same format (e.g. common date format or normalized IP address).
Classification and tagging is ordering messages into different classes or tagging them with different keywords for later usage (e.g. filtering or display).
Correlation analysis is a technology of collecting messages from different systems and finding all the messages belonging to one single event (e.g., messages generated by malicious activity on different systems: network devices, firewalls, servers, etc.). It is usually connected with alerting systems.
Artificial Ignorance is a type of machine learning that is a process of discarding log entries that are known to be uninteresting. Artificial ignorance is a method to detect anomalies in a working system. In log analysis, this means recognizing and ignoring the regular, common log messages that result from the normal operation of the system, and therefore are not too interesting. However, new messages that have not appeared in the logs before can signal important events, and should be therefore investigated.[1][2] In addition to anomalies, the algorithm will identify common events that did not occur. For example, a system update that runs every week, has failed to run.
Log Analysis is often compared to other analytics tools such as application performance management (APM) and error monitoring. While much of their functionality is clear overlap, the difference is rooted in process. APM has an emphasis on performance and is utilized most in production. Error monitoring is driven by developers versus operations, and integrates into code in exception handling blocks.
See also
References
- ↑ "artificial ignorance: how-to guide". www.ranum.com.
- ↑ "Log message classification with syslog-ng [LWN.net]". lwn.net.