The evolution of technologies used to detect malicious code
Written by renxue February 25, 2008 09:54
Although the title seems to reference the full spectrum of technologies used to detect malicious code, the article focuses on nonsignature technologies.
At the beginning of the article the author points out that any technology used to detect malicious code has two components - a technical component and an analytical component. The technical component is the sum of all functions and algorithms which provide the analytical component with data for analysis. The analytical component is a decision making system which delivers a verdict on the data analysed.
The technical component. The technical component of a malware detection system collects data that will be used to analyze the situation.
As any malicious program is both a file with specific content and the sum of the effects the malicious program has on the operating system, there are a range of methods used to collect data in order to identify malicious code. These methods are listed in order of abstraction. The term abstraction is used to denote the point of view from which the program being run is viewed: as an original digital object (a collection of bytes), as a behaviour (more abstract than the collection of bytes) or as the sum of effects on the operating system (more abstract than the behaviour). Antivirus technology has, more or less, evolved along these lines: working with files, working with events via a file, working with a file via events, and working with the environment itself. Consequently, the list given in the article illustrates a natural chronology.
- The very first antivirus programs analyzed file code which was treated as byte sequences.
Using this method means that only the source byte code of a program is analyzed; program behaviour is not taken into account. Today, this method continues to be used in antivirus software - not as the sole detection method, but as a complement to other technologies.
- Emulating program code.
Emulation involves imitating the work of one system using another system without losing functionality and without distorting results. In relation to antivirus software, the emulator breaks down a program's byte code into commands, and then launches each commend in a virtual copy of the computer environment. In other words, while an emulator works with a file, it does analyze events. Emulation makes it possible to observe a program's behaviour without putting the operating system and user data at risk.
- Virtualization: launching a program in a sandbox.
A sandbox is an environment which uses partial or total restrictions or emulation of the resources of the operating system to ensure that a program can be safely launched in the space. In this case, virtualization makes it possible to work works with a program that is running in a real environment but the environment is strictly controlled. Using the metaphor of a child in the playground, the operating system represents the world, the malicious program is the child, and the constraints within which the child plays are the confines of the sandbox: a set of rules for interaction between the program and the operating system. Any point of contact between the program and its environment (such as the file system and system registry) can be virtualized. Whereas emulation provides an environment in which programs can be run, virtualization uses the operating system itself as the environment, with the sandbox controlling the interaction between the environment and the program.
- Monitoring system events.
Whereas an emulator or sandbox observes each program separately; monitoring technology observes all programs simultaneously by registering all operating system events created by running programs. This technology is currently the most rapidly evolving. However, it is not the most fail-safe technology, as the risk created when launching a program in a real environment significantly lowers the level of protection. Additionally, the monitoring technology can be deceived by the malicious program.
- Searching for system anomalies.
This method makes use of the following features:
- an operating system, together with the programs running within that system, is an integrated system;
- the operating system has an intrinsic "system status";
- if malicious code is run in the environment, then the system will have an "unhealthy" status; this differs from a system with a "healthy" status, in which there is no malicious code.
In order to detect malicious code effectively using this method, a relatively complex analytical system (such as an expert system or neural network) is required. Due to this complexity, the technology is still currently underdeveloped. At the moment, implementations in this area generally compare the condition of the system with a known standard, but this is not effective.
The analytical component. As for the analytical component, the sophistication of decision-making algorithms varies. Roughly speaking, they can be divided into three categories:
- Simple comparison.
In this category, a verdict is issued based on the comparison of a single object with an available sample.
- Complex comparison.
In this case a verdict is issued based on the comparison of one or several objects with corresponding samples. The templates used for comparison may be flexible, and the comparison gives a probability based result.
- Expert systems.
In this category, a verdict is issued after a sophisticated analysis of data. An expert system may include elements of artificial intelligence.
The article then goes on to examines exactly which algorithms are used in which malware detection technologies. The technical component of a technology is responsible for features such as how resource-hungry a program is (and as a result, how quickly it works), security and protection. In general, the less abstract the form of protection, the more secure it will be, but the easier it will be to circumvent.
The analytical aspect of a technology is responsible for features such as proactivity (and the consequent impact on the necessity for frequent antivirus database updates), the false positive rate and the level of user involvement. This last denotes the extent to which a user needs to participate in defining protection policies: creating rules, exceptions and black and white lists. It also reflects the extent to which the user participates in the process of issuing verdicts by confirming or rejecting the suspicions of the analytical system. The more complex the analytical system, the more powerful the protection is. However, increased complexity means an increased number of false positives, which can be compensated for by greater user input.
The author concludes by offering recommendations on how to choose non-signature protection. She stresses that there is no universal or ‘best' protection; each technology has its pluses and minuses. In choosing a product, the user should be guided by the results of independent tests, and reviews by users of established antivirus solutions.
