Nearest centroid classifier

Classification model in machine learning


title: "Nearest centroid classifier" type: doc version: 1 created: 2026-02-28 author: "Wikipedia contributors" status: active scope: public tags: ["classification-algorithms"] description: "Classification model in machine learning" topic_path: "technology/algorithms" source: "https://en.wikipedia.org/wiki/Nearest_centroid_classifier" license: "CC BY-SA 4.0" wikipedia_page_id: 0 wikipedia_revision_id: 0

::summary Classification model in machine learning ::

::figure[src="https://upload.wikimedia.org/wikipedia/commons/e/ef/Rocchioclassgraph.jpg" caption="Rocchio Classification"] ::

In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation. When applied to text classification using word vectors containing tf*idf weights to represent documents, the nearest centroid classifier is known as the Rocchio classifier because of its similarity to the Rocchio algorithm for relevance feedback.{{cite book | last1 = Manning | first1 = Christopher | last2 = Raghavan | first2 = Prabhakar | first3 = Hinrich | last3 = Schütze | title = Introduction to Information Retrieval | chapter = Vector space classification | publisher = Cambridge University Press | year = 2008 | url = http://nlp.stanford.edu/IR-book/html/htmledition/rocchio-classification-1.html

An extended version of the nearest centroid classifier has found applications in the medical domain, specifically classification of tumors.{{cite journal | last1 = Tibshirani | first1 = Robert | authorlink1 = Robert Tibshirani | last2 = Hastie | first2 = Trevor | authorlink2 = Trevor Hastie | last3 = Narasimhan | first3 = Balasubramanian | last4 = Chu | first4 = Gilbert | title = Diagnosis of multiple cancer types by shrunken centroids of gene expression | journal = Proceedings of the National Academy of Sciences | volume = 99 | number = 10 | year = 2002 | doi = 10.1073/pnas.082099299 | pages=6567–6572 | pmid=12011421 | pmc=124443 | doi-access = free | bibcode = 2002PNAS...99.6567T

Algorithm

Training

Given labeled training samples \textstyle{(\vec{x}1, y_1), \dots, (\vec{x}n, y_n)} with class labels y_i \in \mathbf{Y}, compute the per-class centroids \textstyle\vec{\mu}\ell = \frac{1}{|C\ell|}\underset{i \in C_\ell}{\sum} \vec{x}i where C\ell is the set of indices of samples belonging to class \ell \in \mathbf{Y}.

Prediction

The class assigned to an observation \vec{x} is \hat{y} = {\arg\min}{\ell \in \mathbf{Y}} |\vec{\mu}\ell - \vec{x}|.

References

::callout[type=info title="Wikipedia Source"] This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page. ::

classification-algorithms