Comm

Shell command for comparing files


title: "Comm" type: doc version: 1 created: 2026-02-28 author: "Wikipedia contributors" status: active scope: public tags: ["free-file-comparison-tools", "standard-unix-programs", "unix-sus2008-utilities", "plan-9-commands", "inferno-(operating-system)-commands"] description: "Shell command for comparing files" topic_path: "technology/operating-systems" source: "https://en.wikipedia.org/wiki/Comm" license: "CC BY-SA 4.0" wikipedia_page_id: 0 wikipedia_revision_id: 0

::summary Shell command for comparing files ::

::data[format=table title="Infobox software"]

FieldValue
namecomm
screenshotComm-example.png
captionExample usage of comm command
authorLee E. McMahon
developerAT&T Bell Laboratories, Richard Stallman, David MacKenzie
released
programming languageC
operating systemUnix, Unix-like, Plan 9, Inferno
platformCross-platform
genreCommand
licensecoreutils: GPLv3+
Plan 9: MIT License
::

| name = comm | logo = | screenshot = Comm-example.png | screenshot size = | caption = Example usage of comm command | author = Lee E. McMahon | developer = AT&T Bell Laboratories, Richard Stallman, David MacKenzie | released = | latest release version = | latest release date = | programming language = C | operating system = Unix, Unix-like, Plan 9, Inferno | platform = Cross-platform | genre = Command | license = coreutils: GPLv3+ Plan 9: MIT License | website = **comm** is a shell command for comparing two files for common and distinct lines. It reads the files as lines of text and outputs text as three columns. The first two columns contain lines unique to the first and second file, respectively. The last column contains lines common to both. Columns are typically separated with the tab character. If the input text contains lines beginning with the separator character, the output columns can become ambiguous.

For efficiency, standard implementations of expect both input files to be sequenced in the same line collation order, sorted lexically. The [sort](sort-unix) command can be used for this purpose. The algorithm makes use of the collating sequence of the current locale. If the lines in the files are not both collated in accordance with the current locale, the result is undefined.

The command is specified in the POSIX standard. It has been widely available on Unix-like operating systems since the mid to late 1980s. Originally implemented by Lee E. McMahon, the command first appeared in Version 4 Unix.{{cite tech report | first1 = M. D. | last1 = McIlroy | authorlink1 = Doug McIlroy | year = 1987 | url = https://www.cs.dartmouth.edu/~doug/reader.pdf | title = A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 |series=CSTR | number = 139 | institution = Bell Labs}} The version in GNU coreutils was written by Richard Stallman and David MacKenzie.

Example

::code[lang=console] $ cat foo apple banana eggplant $ cat bar apple banana banana zucchini $ comm foo bar apple banana banana eggplant zucchini ::

This shows that both files have one banana, but only bar has a second banana.

In more detail, the output file has the appearance that follows. Note that the column is interpreted by the number of leading tab characters. \t represents a tab character and \n represents a newline (Escape character#Programming and data formats).

::data[format=table]

012345678901234
\t\tapple\n
\t\tbanana\n
\tbanana\n
eggplant\n
\tzucchini\n
::

Limits

Up to a full line must be buffered from each input file during line comparison, before the next output line is written.

Some implementations read lines with the function which does not impose any line length limits if system memory suffices.

Other implementations read lines with the function [fgets](fgets)(). This function requires a fixed buffer. For these implementations, the buffer is often sized according to the POSIX macro .

Comparison to diff

Although also a file comparison command, [diff](diff) reports significantly different information than . In general, is more powerful than . The simpler is best suited for use in scripts.

The primary distinction between and is that discards information about the order of the lines prior to sorting.

A minor difference between and is that will not try to indicate that a line has changed between the two files; lines are either shown in the "from file #1", "from file #2", or "in both" columns. This can be useful if one wishes two lines to be considered different even if they only have subtle differences.

Unlike for , the exit code of does not indicate whether the files match. As is typical, 0 indicates success, and other positive values indicate an error.

References

References

  1. "Comm(1): Compare two sorted files line by line - Linux man page".

::callout[type=info title="Wikipedia Source"] This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page. ::

free-file-comparison-toolsstandard-unix-programsunix-sus2008-utilitiesplan-9-commandsinferno-(operating-system)-commands