Softplus

Smoothed ramp function
title: "Softplus" type: doc version: 1 created: 2026-02-28 author: "Wikipedia contributors" status: active scope: public tags: ["artificial-neural-networks", "computational-neuroscience", "entropy-and-information", "exponentials", "functions-and-mappings", "logistic-regression", "loss-functions"] description: "Smoothed ramp function" topic_path: "science/biology" source: "https://en.wikipedia.org/wiki/Softplus" license: "CC BY-SA 4.0" wikipedia_page_id: 0 wikipedia_revision_id: 0
::summary Smoothed ramp function ::
::figure[src="https://upload.wikimedia.org/wikipedia/commons/5/5d/Softplus.svg" caption="Plot of the '''softplus''' function and the [[ramp function"] ::
In mathematics and machine learning, the softplus function is
: f(x) = \ln(1 + e^x).
It is a smooth approximation (in fact, an analytic function) to the ramp function, which is known as the rectifier or ReLU (rectified linear unit) in machine learning. For large negative x it is \ln(1 + e^x) = \ln (1 + \epsilon) \gtrapprox \ln 1 = 0, so just above 0, while for large positive x it is \ln(1 + e^x) \gtrapprox \ln(e^x) = x, so just above x.
The names softplus{{Cite journal |last1=Dugas |first1=Charles |last2=Bengio |first2=Yoshua |last3=Bélisle |first3=François |last4=Nadeau |first4=Claude |last5=Garcia |first5=René |year=2000 |title=Incorporating second-order functional knowledge for better option pricing |url=http://papers.nips.cc/paper/1920-incorporating-second-order-functional-knowledge-for-better-option-pricing.pdf |journal=Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS'00) |publisher=MIT Press |pages=451–457 |quote=Since the sigmoid h has a positive first derivative, its primitive, which we call softplus, is convex.
Alternative forms
This function can be approximated as: : \ln\left( 1 + e^x \right) \approx \begin{cases} \ln2, & x=0,\[6pt] \frac x {1-e^{-x/\ln2}}, & x\neq 0 \end{cases}
By making the change of variables x = y\ln(2), this is equivalent to : \log_2(1 + 2^y) \approx \begin{cases} 1,& y=0,\[6pt] \frac{y}{1-e^{-y}}, & y\neq 0. \end{cases}
A sharpness parameter k may be included: : f(x) = \frac{\ln(1 + e^{kx})} k, \qquad\qquad f'(x) = \frac{e^{kx}}{1 + e^{kx}} = \frac{1}{1 + e^{-kx}}.
Related functions
The derivative of softplus is the standard logistic function: :f'(x) = \frac{e^{x}}{1 + e^{x}} = \frac{1}{1 + e^{-x}}
The logistic function or the sigmoid function is a smooth approximation of the rectifier, the Heaviside step function.
LogSumExp
Main article: LogSumExp
The multivariable generalization of single-variable softplus is the LogSumExp with the first argument set to zero:
: \operatorname{LSE_0}^+(x_1, \dots, x_n) := \operatorname{LSE}(0, x_1, \dots, x_n) = \ln(1 + e^{x_1} + \cdots + e^{x_n}).
The LogSumExp function is
: \operatorname{LSE}(x_1, \dots, x_n) = \ln(e^{x_1} + \cdots + e^{x_n}),
and its gradient is the softmax; the softmax with the first argument set to zero is the multivariable generalization of the logistic function. Both LogSumExp and softmax are used in machine learning.
Convex conjugate
The convex conjugate (specifically, the Legendre transformation) of the softplus function is the negative binary entropy function (with base e). This is because (following the definition of the Legendre transformation: the derivatives are inverse functions) the derivative of softplus is the logistic function, whose inverse function is the logit, which is the derivative of negative binary entropy.
Softplus can be interpreted as logistic loss (as a positive number), so, by duality, minimizing logistic loss corresponds to maximizing entropy. This justifies the principle of maximum entropy as loss minimization.
References
References
- Glorot, Xavier. (2011-06-14). "Deep Sparse Rectifier Neural Networks". JMLR Workshop and Conference Proceedings.
- (2017). "Smooth Rectifier Linear Unit (SmoothReLU) Forward Layer".
::callout[type=info title="Wikipedia Source"] This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page. ::