Decompiler

Program translating executable to source code


title: "Decompiler" type: doc version: 1 created: 2026-02-28 author: "Wikipedia contributors" status: active scope: public tags: ["compilers", "decompilers", "utility-software-types", "reverse-engineering"] description: "Program translating executable to source code" topic_path: "technology/computing" source: "https://en.wikipedia.org/wiki/Decompiler" license: "CC BY-SA 4.0" wikipedia_page_id: 0 wikipedia_revision_id: 0

::summary Program translating executable to source code ::

Type analysis

A good machine code decompiler will perform type analysis. Here, the way registers or memory locations are used result in constraints on the possible type of the location. For example, an and instruction implies that the operand is an integer; programs do not use such an operation on floating point values (except in special library code) or on pointers. An add instruction results in three constraints, since the operands may be both integer, or one integer and one pointer (with integer and pointer results respectively; the third constraint comes from the ordering of the two operands when the types are different).

Various high level expressions can be recognized which trigger recognition of structures or arrays. However, it is difficult to distinguish many of the possibilities, because of the freedom that machine code or even some high level languages such as C allow with casts and pointer arithmetic.

The example from the previous section could result in the following high level code: ::code[lang=c] struct T1 { int v0004; int v0008; int v000C; }; struct T1 *ebx; ebx->v000C -= ebx->v0004 + ebx->v0008; ::

Structuring

The penultimate decompilation phase involves structuring of the IR into higher level constructs such as while loops and if/then/else conditional statements. For example, the machine code ::code[lang=asm] xor eax, eax l0002: or ebx, ebx jge l0003 add eax,[ebx] mov ebx,[ebx+0x4] jmp l0002 l0003: mov [0x10040000],eax ::

could be translated into: ::code[lang=c] eax = 0; while (ebx < 0) { eax += ebx->v0000; ebx = ebx->v0004; } v10040000 = eax; ::

Unstructured code is more difficult to translate into structured code than already structured code. Solutions include replicating some code, or adding Boolean variables.

Code generation

The final phase is the generation of the high level code in the back end of the decompiler. Just as a compiler may have several back ends for generating machine code for different architectures, a decompiler may have several back ends for generating high level code in different high level languages.

Just before code generation, it may be desirable to allow an interactive editing of the IR, perhaps using some form of graphical user interface. This would allow the user to enter comments, and non-generic variable and function names. However, these are almost as easily entered in a post decompilation edit. The user may want to change structural aspects, such as converting a while loop to a for loop. These are less readily modified with a simple text editor, although source code refactoring tools may assist with this process. The user may need to enter information that failed to be identified during the type analysis phase, e.g. modifying a memory expression to an array or structure expression. Finally, incorrect IR may need to be corrected, or changes made to cause the output code to be more readable.

Other techniques

Decompilers using neural networks have been developed. Such a decompiler may be trained by machine learning to improve its accuracy over time.

Legality

The majority of computer programs are covered by copyright laws. Although the precise scope of what is covered by copyright differs from region to region, copyright law generally provides the author (the programmer(s) or employer) with a collection of exclusive rights to the program. These rights include the right to make copies, including copies made into the computer’s RAM (unless creating such a copy is essential for using the program). Since the decompilation process involves making multiple such copies, it is generally prohibited without the authorization of the copyright holder. However, because decompilation is often a necessary step in achieving software interoperability, copyright laws in both the United States and Europe permit decompilation to a limited extent.

In the United States, the copyright fair use defence has been successfully invoked in decompilation cases. For example, in Sega v. Accolade, the court held that Accolade could lawfully engage in decompilation in order to circumvent the software locking mechanism used by Sega's game consoles. Additionally, the Digital Millennium Copyright Act (PUBLIC LAW 105–304) has proper exemptions for both Security Testing and Evaluation in §1201(i), and Reverse Engineering in §1201(f).

In Europe, the 1991 Software Directive explicitly provides for a right to decompile in order to achieve interoperability. The result of a heated debate between, on the one side, software protectionists, and, on the other, academics as well as independent software developers, Article 6 permits decompilation only if a number of conditions are met:

  • First, a person or entity must have a license to use the program to be decompiled.
  • Second, decompilation must be necessary to achieve interoperability with the target program or other programs. Interoperability information should therefore not be readily available, such as through manuals or API documentation. This is an important limitation. The necessity must be proven by the decompiler. The purpose of this important limitation is primarily to provide an incentive for developers to document and disclose their products' interoperability information.
  • Third, the decompilation process must, if possible, be confined to the parts of the target program relevant to interoperability. Since one of the purposes of decompilation is to gain an understanding of the program structure, this third limitation may be difficult to meet. Again, the burden of proof is on the decompiler.

In addition, Article 6 prescribes that the information obtained through decompilation may not be used for other purposes and that it may not be given to others.

Overall, the decompilation right provided by Article 6 codifies what is claimed to be common practice in the software industry. Few European lawsuits are known to have emerged from the decompilation right. This could be interpreted as meaning one of three things:

  1. ) the decompilation right is not used frequently and the decompilation right may therefore have been unnecessary,
  2. ) the decompilation right functions well and provides sufficient legal certainty not to give rise to legal disputes or
  3. ) illegal decompilation goes largely undetected.

In a report of 2000 regarding implementation of the Software Directive by the European member states, the European Commission seemed to support the second interpretation.

References

References

  1. (1994). "Reverse Compilation Techniques". [[Queensland University of Technology]].
  2. (July 1995). "Decompilation of Binary Programs". Software: Practice and Experience.
  3. (1999). "Programming languages and systems: 8th European Symposium on Programming Languages and Systems". [[Springer-Verlag]].
  4. (2005). "Information technology law". Cavendish.
  5. "U.S. Copyright Office - Copyright Law: Chapter 1".
  6. (2004-12-03). "The Legality of Decompilation". Program-transformation.org.
  7. (1998-10-28). "Digital Millennium Copyright Act". [[US Congress]].
  8. (1991). "Legal protection of computer programs in Europe: a guide to the EC directive". [[Butterworths Tolley]].
  9. "Report from the Commission to the Council, the European Parliament and the Economic and Social Committee on the implementation and effects of Directive 91/250/EEC on the legal protection of computer programs".
  10. (2021-01-27). "Introducing N-Bref: a neural-based decompiler framework".
  11. (26 October 2018). "Federal Register :: Request Access".

::callout[type=info title="Wikipedia Source"] This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page. ::

compilersdecompilersutility-software-typesreverse-engineering