What precisely does it mean to borrow information?
I often people them talk about information borrowing or information sharing in Bayesian hierarchical models. I can't seem to get a straight answer about what this actually means and if it is unique to Bayesian hierarchical models. I sort of get the idea: some levels in your hierarchy share a common parameter. I have no idea how this translates to "information borrowing" though.
Is "information borrowing"/ "information sharing" a buzz word people like to throw out?
Is there an example with closed form posteriors that illustrates this sharing phenomenon?
Is this unique to a Bayesian analysis? Generally, when I see examples of "information borrowing" they are just mixed models. Maybe I learned this models in an old fashioned way, but I don't see any sharing.
I am not interested in starting a philosophical debate about methods. I am just curious about the use of this term.
machine-learning bayesian multilevel-analysis terminology hierarchical-bayesian
add a comment |
I often people them talk about information borrowing or information sharing in Bayesian hierarchical models. I can't seem to get a straight answer about what this actually means and if it is unique to Bayesian hierarchical models. I sort of get the idea: some levels in your hierarchy share a common parameter. I have no idea how this translates to "information borrowing" though.
Is "information borrowing"/ "information sharing" a buzz word people like to throw out?
Is there an example with closed form posteriors that illustrates this sharing phenomenon?
Is this unique to a Bayesian analysis? Generally, when I see examples of "information borrowing" they are just mixed models. Maybe I learned this models in an old fashioned way, but I don't see any sharing.
I am not interested in starting a philosophical debate about methods. I am just curious about the use of this term.
machine-learning bayesian multilevel-analysis terminology hierarchical-bayesian
1
For your question 2., you may find this link illuminating: tjmahr.com/plotting-partial-pooling-in-mixed-effects-models.
– Isabella Ghement
Dec 13 '18 at 4:07
I would love to see some mention of information theory in the answers here.
– shadowtalker
Dec 13 '18 at 19:07
add a comment |
I often people them talk about information borrowing or information sharing in Bayesian hierarchical models. I can't seem to get a straight answer about what this actually means and if it is unique to Bayesian hierarchical models. I sort of get the idea: some levels in your hierarchy share a common parameter. I have no idea how this translates to "information borrowing" though.
Is "information borrowing"/ "information sharing" a buzz word people like to throw out?
Is there an example with closed form posteriors that illustrates this sharing phenomenon?
Is this unique to a Bayesian analysis? Generally, when I see examples of "information borrowing" they are just mixed models. Maybe I learned this models in an old fashioned way, but I don't see any sharing.
I am not interested in starting a philosophical debate about methods. I am just curious about the use of this term.
machine-learning bayesian multilevel-analysis terminology hierarchical-bayesian
I often people them talk about information borrowing or information sharing in Bayesian hierarchical models. I can't seem to get a straight answer about what this actually means and if it is unique to Bayesian hierarchical models. I sort of get the idea: some levels in your hierarchy share a common parameter. I have no idea how this translates to "information borrowing" though.
Is "information borrowing"/ "information sharing" a buzz word people like to throw out?
Is there an example with closed form posteriors that illustrates this sharing phenomenon?
Is this unique to a Bayesian analysis? Generally, when I see examples of "information borrowing" they are just mixed models. Maybe I learned this models in an old fashioned way, but I don't see any sharing.
I am not interested in starting a philosophical debate about methods. I am just curious about the use of this term.
machine-learning bayesian multilevel-analysis terminology hierarchical-bayesian
machine-learning bayesian multilevel-analysis terminology hierarchical-bayesian
asked Dec 13 '18 at 1:33
EliKEliK
355114
355114
1
For your question 2., you may find this link illuminating: tjmahr.com/plotting-partial-pooling-in-mixed-effects-models.
– Isabella Ghement
Dec 13 '18 at 4:07
I would love to see some mention of information theory in the answers here.
– shadowtalker
Dec 13 '18 at 19:07
add a comment |
1
For your question 2., you may find this link illuminating: tjmahr.com/plotting-partial-pooling-in-mixed-effects-models.
– Isabella Ghement
Dec 13 '18 at 4:07
I would love to see some mention of information theory in the answers here.
– shadowtalker
Dec 13 '18 at 19:07
1
1
For your question 2., you may find this link illuminating: tjmahr.com/plotting-partial-pooling-in-mixed-effects-models.
– Isabella Ghement
Dec 13 '18 at 4:07
For your question 2., you may find this link illuminating: tjmahr.com/plotting-partial-pooling-in-mixed-effects-models.
– Isabella Ghement
Dec 13 '18 at 4:07
I would love to see some mention of information theory in the answers here.
– shadowtalker
Dec 13 '18 at 19:07
I would love to see some mention of information theory in the answers here.
– shadowtalker
Dec 13 '18 at 19:07
add a comment |
5 Answers
5
active
oldest
votes
This is a term that is specifically from empirical Bayes (EB), in fact the concept that it refers to does not exist in true Bayesian inference. The original term was "borrowing strength", which was coined by John Tukey back in the 1960s and popularized further by Bradley Efron and Carl Morris in a series of statistical articles on Stein's paradox and parametric EB in the 1970s and 1980s. Many people now use "information borrowing" or "information sharing" as synonyms for the same concept. The reason why you may hear it in the context of mixed models is that the most common analyses for mixed models have an EB interpretation.
EB has many applications and applies to many statistical models, but the context always is that you have a large number of (possibly independent) cases and you are trying to estimate a particular parameter (such as the mean or variance) in each case. In Bayesian inference, you make posterior inferences about the parameter based on both the observed data for each case and the prior distribution for that parameter. In EB inference the prior distribution for the parameter is estimated from the whole collection of data cases, after which inference proceeds as for Bayesian inference. Hence, when you estimate the parameter for particular case, you are use both the data for that case and also the estimated prior distribution, and the latter represents the "information" or "strength" that you borrow from the whole ensemble of cases when making inference about one particular case.
Now you can see why EB has "borrowing" but true Bayes does not. In true Bayes, the prior distribution already exists and so doesn't need to be begged or borrowed. In EB, the prior distribution has be created from the observed data itself. When we make inference about a particular case, we use all the observed information from that case and a little bit of information from each of the other cases. We say it is only "borrowed", because the information is given back when we move on to make inference about the next case.
The idea of EB and "information borrowing" is used heavily in statistical genomics, when each "case" is usually a gene or a genomic feature (Smyth, 2004; Phipson et al, 2016).
References
Efron, Bradley, and Carl Morris. Stein's paradox in statistics. Scientific American 236, no. 5 (1977): 119-127. http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1, Article 3.
http://www.statsci.org/smyth/pubs/ebayes.pdf
Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963.
http://dx.doi.org/10.1214/16-AOAS920
1
I don't think this interpretation is correct. For example, mixed effects models borrow information, yet can be analyzed in a traditional Bayesian context
– Cliff AB
Dec 13 '18 at 6:55
1
@CliffAB If you dig into mixed model analyses, you will find that the analysis is virtually always empirical Bayes rather than true Bayes. Most authors of course will say they are doing Bayes when it is actually EB because most authors don't make the distinction. If you think can you give an example of a true Bayes mixed model analysis, then I invite you to do so.
– Gordon Smyth
Dec 13 '18 at 7:12
1
@CliffAB In the minority of cases when a true Bayes analysis is used for mixed models (e.g., by MCMC or Winbugs) then use of the term "borrow information" would be IMO out of place. It would certainly disagree with the what Tukey and Efron meant by "borrowing".
– Gordon Smyth
Dec 13 '18 at 7:33
The package brms is a recent extremely popular package for fitting Bayesian (not EB!) regression models, including mixed effects. It's all the rage these days in many applied fields.
– Cliff AB
Dec 13 '18 at 10:28
1
@CliffAB I agree that brms is Bayesian package, which is why the term "borrow information" doesn't appear in the brms documentation.
– Gordon Smyth
Dec 13 '18 at 10:54
|
show 4 more comments
Consider a simple problem like estimating means of multiple groups. If your model treats them as completely unrelated then the only information you have about each mean is the information within that group. If your model treats their means as somewhat related (such as in some mixed-effects type model) then the estimates will be more precise because information from other groups informs (regularizes, shrinks toward a common mean) the estimate for a given group. That's an example of 'borrowing information'.
The notion crops up in actuarial work related to credibility (not necessarily with that specific term of 'borrowing' though borrowing in that sense is explicit in the formulas); this goes back a long way, to at least a century ago, with clear precursors going back to the mid-nineteenth century. For example, see Longley-Cook, L.H. (1962) An introduction to credibility theory PCAS, 49, 194-221.
Here's Whitney, 1918
(The Theory of Experience Rating, PCAS, 4, 274-292):
Here
is
a
risk,
for
instance,
that
is
clearly
to
be
classified
as
a
machine
shop.
In
the
absence
of
other
information
it
should
therefore
fake
the
machine
shop
rate,
namely,
the
average
rate
for
all
risks
of
this
class.
On
the
other
hand
the
risk
has
had
an
experience
of
its
own.
If
the
risk
is
large,
this
may
be
a
better
guide
to
its
hazard
than
the
class-experience.
In
any
event,
whether
the
risk
is
large
or
small,
both
of
these
elements
have
their
value
as
evidence,
and
both
must
be
taken
into
account.
The
difficulty
arises
from
the
fact
that
in
general
the
evidence
is
contradictory;
the
problem
therefore
is
to
find
and
apply
a
criterion
which
will
give
each
its
proper
weight.
While the term borrowing is absent here the notion of using the group-level information to inform us about this machine shop is clearly there. [The notions remain unchanged when "borrowing strength" and "borrowing information" start to be applied to this situation]
1
I appreciate the example, as it clearly explains what borrowing does, but I'm looking for a more precise definition.
– EliK
Dec 13 '18 at 19:17
A precise definition of an imprecise, intuitive term? I suppose one might be possible - one might perhaps define it in terms of reducing variance by relating parameters across groups but one could very easily exclude plausible uses of the notion by doing so
– Glen_b♦
Dec 14 '18 at 3:41
It wasn't clear to me whether or not the imprecise intuition had an actual definition.
– EliK
Dec 14 '18 at 3:43
add a comment |
The most commonly known model that "borrows information" is that of a mixed effects model. This can be analyzed in either the Frequentist or Bayesian setting. The Frequentist method actually has an Empirical Bayes interpretation to it; there's a prior on the mixed effects which, based on $sigma_R^2$, the variance of the random effects. Rather than setting based on prior information, we estimate it from our data.
On the other hand, from the Bayesian perspective, we are not putting a prior on the mixed effects, but rather they are a mid level parameter. That is, we put a prior on $sigma_R^2$, which then acts as like a hyper-parameter for the random effects, but it is different than a traditional prior in that the distribution placed on the random effects is not based purely on prior information, but rather a mix of prior information (i.e., prior on $sigma_R^2$) and the data.
I think it's pretty clear that "borrowing information" is not something purely Bayesian; there are non-Bayesian mixed effects models and these borrow information. However, based on my experience playing around with mixed effects models, I think Bayesian approach to such models is a little more important than some people realize. In particular, in a mixed effect model, one should think that we are estimating $sigma_R^2$ with, at best, the number of individual subjects we have. So if we have 10 subjects measured 100 times, we are still estimating $sigma_R^2$ from only 10 subjects. Not only that, but we don't actually even observe the random effects directly, but rather we just have estimates of them that are derived from the data and $sigma_R$ themselves. So it can be easy to forget just how little information based on the data we actually have to estimate $sigma_R^2$. The less information in the data, the more important the prior information becomes. If you haven't done so yet, I suggest trying to simulate mixed effects models with only a few subjects. You might be surprised just how unstable the estimates from Frequentist methods are, especially when you add just one or two outliers...and how often does one see real datasets without outliers? I believe this issue is covered in Bayesian Data Analysis by Gelman et al, but sadly I don't think its publicly available so no hyperlink.
Finally, multilevel modeling is not just mixed effects, although they are the most common. Any model in which parameters are influenced not just by priors and data, but also other unknown parameters can be called a multilevel model. Of course, this is a very flexible set of models, but can written up from scratch and fit with a minimal amount of work using tools like Stan, NIMBLE, JAGS, etc. To this extent, I'm not sure I would say multilevel modeling is "hype"; basically, you can write up any model that can be represented as a Directed Acyclic Graph and fit it immediately (assuming it has a reasonable run time, that is). This gives a whole lot more power and potential creativity than traditional choices (i.e., regression model packages) yet does not require one to build an entire R package from scratch just to fit a new type of model.
Thank you for the answer. To clarify I was not suggesting multi-level modeling is "hype". I was asking if "information borrowing" has a precise meaning or if that particular term is just hype.
– EliK
Dec 13 '18 at 13:03
@EliK: I'm not sure it has a precise meaning; Gordon Smyth gives what some may consider a precise meaning, i.e., Empirical Bayes, but the way I see that term commonly used now doesn't doesn't seem to fit that meaning. Personally, I don't think it's just a hype term; it's exactly the motivation for using mixed effects models over fixed effects models, although this extends beyond just the standard regression model framework. I do think a lot of people say the more vague "multilevel modeling" instead of the more precise "mixed effects modeling" because it's more fashionable now though.
– Cliff AB
Dec 13 '18 at 13:09
I would say the hype is in ML papers and blogs, where it is argued that you need Bayesian models to implement multilevel models. I would be interested in a worked example - where one compares against crossvalidated regularised model (for prediction)
– seanv507
Dec 13 '18 at 13:39
For what it's worth, the only alternative to Bayesian is Maximum Likelihood, which is just Bayesian with a uniform prior. So that's not really wrong.
– shadowtalker
Dec 13 '18 at 19:05
1
@shadowtalker: if you consider MLE methods to Bayesian, then the word Bayesian is basically meaningless in statistics. However, this is consistent with some of the mistakes I see in the ML literature.
– Cliff AB
Dec 13 '18 at 19:10
add a comment |
I am assuming, since you tagged machine learning that you are interested in prediction, rather than inference.(I believe I am aligned with @Glen_b 's answer, but just translating to this context/vocabulary)
I would claim in this case it is a buzzword.
A regularised linear model with a group variable will borrow information: the prediction at individual level will be a combination of the group mean and individual effect.
One way to think of l1/l2 regularisation is that it is assigning a coefficient cost per reduction in total error, since a group variable affects more samples than an individual variable, there will be pressure to estimate a group effect,leaving a smaller deviation from group effect to each individual variable.
For individual points with enough data, the individual effect will be 'strong ', for those with little data, the effect will be weak.
I think the easiest way to see this is by considering L1 regularisation and 3 individuals of same group with same effect. Unregularised, the problem has an infinite numbers of solutions, whereas regularisation gives a unique solution.
Assigning all the effect to the group coefficient has the lowest l1 norm, since we only need 1 value to cover 3 individuals. Conversely,assigning all the effect to the individual coefficients has the worst, namely 3 times the l1 norm of assigning the effect to the group coefficient.
Note we can have as many hierarchies as we want, and interactions are affected similarly: regularisation will push effects to main variables,rather than rarer interactions.
The blog tjmahr.com/plotting-partial-pooling-in-mixed-effects-models. – linked by @IsabellaGhement gives a quote for borrowing strength
"This effect is sometimes called shrinkage, because more extreme values shrinkage are pulled towards a more reasonable, more average value. In the lme4 book, Douglas Bates provides an alternative to shrinkage [name]"
The term “shrinkage” may have negative connotations. John Tukey
preferred to refer to the process as the estimates for individual
subjects “borrowing strength” from each other. This is a fundamental
difference in the models underlying mixed-effects models versus
strictly fixed effects models. In a mixed-effects model we assume that
the levels of a grouping factor are a selection from a population and,
as a result, can be expected to share characteristics to some degree.
Consequently, the predictions from a mixed-effects model are
attenuated relative to those from strictly fixed-effects models.
What is prediction if not a specific kind of inference?
– shadowtalker
Dec 13 '18 at 19:06
add a comment |
Another source I would like to recommend on this topic which I find particularly instructive is David Robinson's Introduction to Empirical Bayes.
His running example is that of whether a baseball player will manage to hit the next ball thrown at him. The key idea is that if a player has been around for years, one has a pretty clear picture of how capable he is and in particular, one can use his observed batting average as a pretty good estimate of the success probability in the next pitch.
Conversely, a player who has just started playing in a league hasn't revealed much of his actual talent yet. So it seems like a wise choice to adjust the estimate of his success probability towards some overall mean if he has been particularly successful or unsuccessful in his first few games, as that likely is, at least to some extent, due to good or bad luck.
As a minor point, the term "borrowing" certainly does not seem to be used in the sense that something that has been borrowed would need to be returned at some point ;-).
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f381761%2fwhat-precisely-does-it-mean-to-borrow-information%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
This is a term that is specifically from empirical Bayes (EB), in fact the concept that it refers to does not exist in true Bayesian inference. The original term was "borrowing strength", which was coined by John Tukey back in the 1960s and popularized further by Bradley Efron and Carl Morris in a series of statistical articles on Stein's paradox and parametric EB in the 1970s and 1980s. Many people now use "information borrowing" or "information sharing" as synonyms for the same concept. The reason why you may hear it in the context of mixed models is that the most common analyses for mixed models have an EB interpretation.
EB has many applications and applies to many statistical models, but the context always is that you have a large number of (possibly independent) cases and you are trying to estimate a particular parameter (such as the mean or variance) in each case. In Bayesian inference, you make posterior inferences about the parameter based on both the observed data for each case and the prior distribution for that parameter. In EB inference the prior distribution for the parameter is estimated from the whole collection of data cases, after which inference proceeds as for Bayesian inference. Hence, when you estimate the parameter for particular case, you are use both the data for that case and also the estimated prior distribution, and the latter represents the "information" or "strength" that you borrow from the whole ensemble of cases when making inference about one particular case.
Now you can see why EB has "borrowing" but true Bayes does not. In true Bayes, the prior distribution already exists and so doesn't need to be begged or borrowed. In EB, the prior distribution has be created from the observed data itself. When we make inference about a particular case, we use all the observed information from that case and a little bit of information from each of the other cases. We say it is only "borrowed", because the information is given back when we move on to make inference about the next case.
The idea of EB and "information borrowing" is used heavily in statistical genomics, when each "case" is usually a gene or a genomic feature (Smyth, 2004; Phipson et al, 2016).
References
Efron, Bradley, and Carl Morris. Stein's paradox in statistics. Scientific American 236, no. 5 (1977): 119-127. http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1, Article 3.
http://www.statsci.org/smyth/pubs/ebayes.pdf
Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963.
http://dx.doi.org/10.1214/16-AOAS920
1
I don't think this interpretation is correct. For example, mixed effects models borrow information, yet can be analyzed in a traditional Bayesian context
– Cliff AB
Dec 13 '18 at 6:55
1
@CliffAB If you dig into mixed model analyses, you will find that the analysis is virtually always empirical Bayes rather than true Bayes. Most authors of course will say they are doing Bayes when it is actually EB because most authors don't make the distinction. If you think can you give an example of a true Bayes mixed model analysis, then I invite you to do so.
– Gordon Smyth
Dec 13 '18 at 7:12
1
@CliffAB In the minority of cases when a true Bayes analysis is used for mixed models (e.g., by MCMC or Winbugs) then use of the term "borrow information" would be IMO out of place. It would certainly disagree with the what Tukey and Efron meant by "borrowing".
– Gordon Smyth
Dec 13 '18 at 7:33
The package brms is a recent extremely popular package for fitting Bayesian (not EB!) regression models, including mixed effects. It's all the rage these days in many applied fields.
– Cliff AB
Dec 13 '18 at 10:28
1
@CliffAB I agree that brms is Bayesian package, which is why the term "borrow information" doesn't appear in the brms documentation.
– Gordon Smyth
Dec 13 '18 at 10:54
|
show 4 more comments
This is a term that is specifically from empirical Bayes (EB), in fact the concept that it refers to does not exist in true Bayesian inference. The original term was "borrowing strength", which was coined by John Tukey back in the 1960s and popularized further by Bradley Efron and Carl Morris in a series of statistical articles on Stein's paradox and parametric EB in the 1970s and 1980s. Many people now use "information borrowing" or "information sharing" as synonyms for the same concept. The reason why you may hear it in the context of mixed models is that the most common analyses for mixed models have an EB interpretation.
EB has many applications and applies to many statistical models, but the context always is that you have a large number of (possibly independent) cases and you are trying to estimate a particular parameter (such as the mean or variance) in each case. In Bayesian inference, you make posterior inferences about the parameter based on both the observed data for each case and the prior distribution for that parameter. In EB inference the prior distribution for the parameter is estimated from the whole collection of data cases, after which inference proceeds as for Bayesian inference. Hence, when you estimate the parameter for particular case, you are use both the data for that case and also the estimated prior distribution, and the latter represents the "information" or "strength" that you borrow from the whole ensemble of cases when making inference about one particular case.
Now you can see why EB has "borrowing" but true Bayes does not. In true Bayes, the prior distribution already exists and so doesn't need to be begged or borrowed. In EB, the prior distribution has be created from the observed data itself. When we make inference about a particular case, we use all the observed information from that case and a little bit of information from each of the other cases. We say it is only "borrowed", because the information is given back when we move on to make inference about the next case.
The idea of EB and "information borrowing" is used heavily in statistical genomics, when each "case" is usually a gene or a genomic feature (Smyth, 2004; Phipson et al, 2016).
References
Efron, Bradley, and Carl Morris. Stein's paradox in statistics. Scientific American 236, no. 5 (1977): 119-127. http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1, Article 3.
http://www.statsci.org/smyth/pubs/ebayes.pdf
Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963.
http://dx.doi.org/10.1214/16-AOAS920
1
I don't think this interpretation is correct. For example, mixed effects models borrow information, yet can be analyzed in a traditional Bayesian context
– Cliff AB
Dec 13 '18 at 6:55
1
@CliffAB If you dig into mixed model analyses, you will find that the analysis is virtually always empirical Bayes rather than true Bayes. Most authors of course will say they are doing Bayes when it is actually EB because most authors don't make the distinction. If you think can you give an example of a true Bayes mixed model analysis, then I invite you to do so.
– Gordon Smyth
Dec 13 '18 at 7:12
1
@CliffAB In the minority of cases when a true Bayes analysis is used for mixed models (e.g., by MCMC or Winbugs) then use of the term "borrow information" would be IMO out of place. It would certainly disagree with the what Tukey and Efron meant by "borrowing".
– Gordon Smyth
Dec 13 '18 at 7:33
The package brms is a recent extremely popular package for fitting Bayesian (not EB!) regression models, including mixed effects. It's all the rage these days in many applied fields.
– Cliff AB
Dec 13 '18 at 10:28
1
@CliffAB I agree that brms is Bayesian package, which is why the term "borrow information" doesn't appear in the brms documentation.
– Gordon Smyth
Dec 13 '18 at 10:54
|
show 4 more comments
This is a term that is specifically from empirical Bayes (EB), in fact the concept that it refers to does not exist in true Bayesian inference. The original term was "borrowing strength", which was coined by John Tukey back in the 1960s and popularized further by Bradley Efron and Carl Morris in a series of statistical articles on Stein's paradox and parametric EB in the 1970s and 1980s. Many people now use "information borrowing" or "information sharing" as synonyms for the same concept. The reason why you may hear it in the context of mixed models is that the most common analyses for mixed models have an EB interpretation.
EB has many applications and applies to many statistical models, but the context always is that you have a large number of (possibly independent) cases and you are trying to estimate a particular parameter (such as the mean or variance) in each case. In Bayesian inference, you make posterior inferences about the parameter based on both the observed data for each case and the prior distribution for that parameter. In EB inference the prior distribution for the parameter is estimated from the whole collection of data cases, after which inference proceeds as for Bayesian inference. Hence, when you estimate the parameter for particular case, you are use both the data for that case and also the estimated prior distribution, and the latter represents the "information" or "strength" that you borrow from the whole ensemble of cases when making inference about one particular case.
Now you can see why EB has "borrowing" but true Bayes does not. In true Bayes, the prior distribution already exists and so doesn't need to be begged or borrowed. In EB, the prior distribution has be created from the observed data itself. When we make inference about a particular case, we use all the observed information from that case and a little bit of information from each of the other cases. We say it is only "borrowed", because the information is given back when we move on to make inference about the next case.
The idea of EB and "information borrowing" is used heavily in statistical genomics, when each "case" is usually a gene or a genomic feature (Smyth, 2004; Phipson et al, 2016).
References
Efron, Bradley, and Carl Morris. Stein's paradox in statistics. Scientific American 236, no. 5 (1977): 119-127. http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1, Article 3.
http://www.statsci.org/smyth/pubs/ebayes.pdf
Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963.
http://dx.doi.org/10.1214/16-AOAS920
This is a term that is specifically from empirical Bayes (EB), in fact the concept that it refers to does not exist in true Bayesian inference. The original term was "borrowing strength", which was coined by John Tukey back in the 1960s and popularized further by Bradley Efron and Carl Morris in a series of statistical articles on Stein's paradox and parametric EB in the 1970s and 1980s. Many people now use "information borrowing" or "information sharing" as synonyms for the same concept. The reason why you may hear it in the context of mixed models is that the most common analyses for mixed models have an EB interpretation.
EB has many applications and applies to many statistical models, but the context always is that you have a large number of (possibly independent) cases and you are trying to estimate a particular parameter (such as the mean or variance) in each case. In Bayesian inference, you make posterior inferences about the parameter based on both the observed data for each case and the prior distribution for that parameter. In EB inference the prior distribution for the parameter is estimated from the whole collection of data cases, after which inference proceeds as for Bayesian inference. Hence, when you estimate the parameter for particular case, you are use both the data for that case and also the estimated prior distribution, and the latter represents the "information" or "strength" that you borrow from the whole ensemble of cases when making inference about one particular case.
Now you can see why EB has "borrowing" but true Bayes does not. In true Bayes, the prior distribution already exists and so doesn't need to be begged or borrowed. In EB, the prior distribution has be created from the observed data itself. When we make inference about a particular case, we use all the observed information from that case and a little bit of information from each of the other cases. We say it is only "borrowed", because the information is given back when we move on to make inference about the next case.
The idea of EB and "information borrowing" is used heavily in statistical genomics, when each "case" is usually a gene or a genomic feature (Smyth, 2004; Phipson et al, 2016).
References
Efron, Bradley, and Carl Morris. Stein's paradox in statistics. Scientific American 236, no. 5 (1977): 119-127. http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1, Article 3.
http://www.statsci.org/smyth/pubs/ebayes.pdf
Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963.
http://dx.doi.org/10.1214/16-AOAS920
edited Dec 13 '18 at 11:05
answered Dec 13 '18 at 6:22
Gordon SmythGordon Smyth
4,6531125
4,6531125
1
I don't think this interpretation is correct. For example, mixed effects models borrow information, yet can be analyzed in a traditional Bayesian context
– Cliff AB
Dec 13 '18 at 6:55
1
@CliffAB If you dig into mixed model analyses, you will find that the analysis is virtually always empirical Bayes rather than true Bayes. Most authors of course will say they are doing Bayes when it is actually EB because most authors don't make the distinction. If you think can you give an example of a true Bayes mixed model analysis, then I invite you to do so.
– Gordon Smyth
Dec 13 '18 at 7:12
1
@CliffAB In the minority of cases when a true Bayes analysis is used for mixed models (e.g., by MCMC or Winbugs) then use of the term "borrow information" would be IMO out of place. It would certainly disagree with the what Tukey and Efron meant by "borrowing".
– Gordon Smyth
Dec 13 '18 at 7:33
The package brms is a recent extremely popular package for fitting Bayesian (not EB!) regression models, including mixed effects. It's all the rage these days in many applied fields.
– Cliff AB
Dec 13 '18 at 10:28
1
@CliffAB I agree that brms is Bayesian package, which is why the term "borrow information" doesn't appear in the brms documentation.
– Gordon Smyth
Dec 13 '18 at 10:54
|
show 4 more comments
1
I don't think this interpretation is correct. For example, mixed effects models borrow information, yet can be analyzed in a traditional Bayesian context
– Cliff AB
Dec 13 '18 at 6:55
1
@CliffAB If you dig into mixed model analyses, you will find that the analysis is virtually always empirical Bayes rather than true Bayes. Most authors of course will say they are doing Bayes when it is actually EB because most authors don't make the distinction. If you think can you give an example of a true Bayes mixed model analysis, then I invite you to do so.
– Gordon Smyth
Dec 13 '18 at 7:12
1
@CliffAB In the minority of cases when a true Bayes analysis is used for mixed models (e.g., by MCMC or Winbugs) then use of the term "borrow information" would be IMO out of place. It would certainly disagree with the what Tukey and Efron meant by "borrowing".
– Gordon Smyth
Dec 13 '18 at 7:33
The package brms is a recent extremely popular package for fitting Bayesian (not EB!) regression models, including mixed effects. It's all the rage these days in many applied fields.
– Cliff AB
Dec 13 '18 at 10:28
1
@CliffAB I agree that brms is Bayesian package, which is why the term "borrow information" doesn't appear in the brms documentation.
– Gordon Smyth
Dec 13 '18 at 10:54
1
1
I don't think this interpretation is correct. For example, mixed effects models borrow information, yet can be analyzed in a traditional Bayesian context
– Cliff AB
Dec 13 '18 at 6:55
I don't think this interpretation is correct. For example, mixed effects models borrow information, yet can be analyzed in a traditional Bayesian context
– Cliff AB
Dec 13 '18 at 6:55
1
1
@CliffAB If you dig into mixed model analyses, you will find that the analysis is virtually always empirical Bayes rather than true Bayes. Most authors of course will say they are doing Bayes when it is actually EB because most authors don't make the distinction. If you think can you give an example of a true Bayes mixed model analysis, then I invite you to do so.
– Gordon Smyth
Dec 13 '18 at 7:12
@CliffAB If you dig into mixed model analyses, you will find that the analysis is virtually always empirical Bayes rather than true Bayes. Most authors of course will say they are doing Bayes when it is actually EB because most authors don't make the distinction. If you think can you give an example of a true Bayes mixed model analysis, then I invite you to do so.
– Gordon Smyth
Dec 13 '18 at 7:12
1
1
@CliffAB In the minority of cases when a true Bayes analysis is used for mixed models (e.g., by MCMC or Winbugs) then use of the term "borrow information" would be IMO out of place. It would certainly disagree with the what Tukey and Efron meant by "borrowing".
– Gordon Smyth
Dec 13 '18 at 7:33
@CliffAB In the minority of cases when a true Bayes analysis is used for mixed models (e.g., by MCMC or Winbugs) then use of the term "borrow information" would be IMO out of place. It would certainly disagree with the what Tukey and Efron meant by "borrowing".
– Gordon Smyth
Dec 13 '18 at 7:33
The package brms is a recent extremely popular package for fitting Bayesian (not EB!) regression models, including mixed effects. It's all the rage these days in many applied fields.
– Cliff AB
Dec 13 '18 at 10:28
The package brms is a recent extremely popular package for fitting Bayesian (not EB!) regression models, including mixed effects. It's all the rage these days in many applied fields.
– Cliff AB
Dec 13 '18 at 10:28
1
1
@CliffAB I agree that brms is Bayesian package, which is why the term "borrow information" doesn't appear in the brms documentation.
– Gordon Smyth
Dec 13 '18 at 10:54
@CliffAB I agree that brms is Bayesian package, which is why the term "borrow information" doesn't appear in the brms documentation.
– Gordon Smyth
Dec 13 '18 at 10:54
|
show 4 more comments
Consider a simple problem like estimating means of multiple groups. If your model treats them as completely unrelated then the only information you have about each mean is the information within that group. If your model treats their means as somewhat related (such as in some mixed-effects type model) then the estimates will be more precise because information from other groups informs (regularizes, shrinks toward a common mean) the estimate for a given group. That's an example of 'borrowing information'.
The notion crops up in actuarial work related to credibility (not necessarily with that specific term of 'borrowing' though borrowing in that sense is explicit in the formulas); this goes back a long way, to at least a century ago, with clear precursors going back to the mid-nineteenth century. For example, see Longley-Cook, L.H. (1962) An introduction to credibility theory PCAS, 49, 194-221.
Here's Whitney, 1918
(The Theory of Experience Rating, PCAS, 4, 274-292):
Here
is
a
risk,
for
instance,
that
is
clearly
to
be
classified
as
a
machine
shop.
In
the
absence
of
other
information
it
should
therefore
fake
the
machine
shop
rate,
namely,
the
average
rate
for
all
risks
of
this
class.
On
the
other
hand
the
risk
has
had
an
experience
of
its
own.
If
the
risk
is
large,
this
may
be
a
better
guide
to
its
hazard
than
the
class-experience.
In
any
event,
whether
the
risk
is
large
or
small,
both
of
these
elements
have
their
value
as
evidence,
and
both
must
be
taken
into
account.
The
difficulty
arises
from
the
fact
that
in
general
the
evidence
is
contradictory;
the
problem
therefore
is
to
find
and
apply
a
criterion
which
will
give
each
its
proper
weight.
While the term borrowing is absent here the notion of using the group-level information to inform us about this machine shop is clearly there. [The notions remain unchanged when "borrowing strength" and "borrowing information" start to be applied to this situation]
1
I appreciate the example, as it clearly explains what borrowing does, but I'm looking for a more precise definition.
– EliK
Dec 13 '18 at 19:17
A precise definition of an imprecise, intuitive term? I suppose one might be possible - one might perhaps define it in terms of reducing variance by relating parameters across groups but one could very easily exclude plausible uses of the notion by doing so
– Glen_b♦
Dec 14 '18 at 3:41
It wasn't clear to me whether or not the imprecise intuition had an actual definition.
– EliK
Dec 14 '18 at 3:43
add a comment |
Consider a simple problem like estimating means of multiple groups. If your model treats them as completely unrelated then the only information you have about each mean is the information within that group. If your model treats their means as somewhat related (such as in some mixed-effects type model) then the estimates will be more precise because information from other groups informs (regularizes, shrinks toward a common mean) the estimate for a given group. That's an example of 'borrowing information'.
The notion crops up in actuarial work related to credibility (not necessarily with that specific term of 'borrowing' though borrowing in that sense is explicit in the formulas); this goes back a long way, to at least a century ago, with clear precursors going back to the mid-nineteenth century. For example, see Longley-Cook, L.H. (1962) An introduction to credibility theory PCAS, 49, 194-221.
Here's Whitney, 1918
(The Theory of Experience Rating, PCAS, 4, 274-292):
Here
is
a
risk,
for
instance,
that
is
clearly
to
be
classified
as
a
machine
shop.
In
the
absence
of
other
information
it
should
therefore
fake
the
machine
shop
rate,
namely,
the
average
rate
for
all
risks
of
this
class.
On
the
other
hand
the
risk
has
had
an
experience
of
its
own.
If
the
risk
is
large,
this
may
be
a
better
guide
to
its
hazard
than
the
class-experience.
In
any
event,
whether
the
risk
is
large
or
small,
both
of
these
elements
have
their
value
as
evidence,
and
both
must
be
taken
into
account.
The
difficulty
arises
from
the
fact
that
in
general
the
evidence
is
contradictory;
the
problem
therefore
is
to
find
and
apply
a
criterion
which
will
give
each
its
proper
weight.
While the term borrowing is absent here the notion of using the group-level information to inform us about this machine shop is clearly there. [The notions remain unchanged when "borrowing strength" and "borrowing information" start to be applied to this situation]
1
I appreciate the example, as it clearly explains what borrowing does, but I'm looking for a more precise definition.
– EliK
Dec 13 '18 at 19:17
A precise definition of an imprecise, intuitive term? I suppose one might be possible - one might perhaps define it in terms of reducing variance by relating parameters across groups but one could very easily exclude plausible uses of the notion by doing so
– Glen_b♦
Dec 14 '18 at 3:41
It wasn't clear to me whether or not the imprecise intuition had an actual definition.
– EliK
Dec 14 '18 at 3:43
add a comment |
Consider a simple problem like estimating means of multiple groups. If your model treats them as completely unrelated then the only information you have about each mean is the information within that group. If your model treats their means as somewhat related (such as in some mixed-effects type model) then the estimates will be more precise because information from other groups informs (regularizes, shrinks toward a common mean) the estimate for a given group. That's an example of 'borrowing information'.
The notion crops up in actuarial work related to credibility (not necessarily with that specific term of 'borrowing' though borrowing in that sense is explicit in the formulas); this goes back a long way, to at least a century ago, with clear precursors going back to the mid-nineteenth century. For example, see Longley-Cook, L.H. (1962) An introduction to credibility theory PCAS, 49, 194-221.
Here's Whitney, 1918
(The Theory of Experience Rating, PCAS, 4, 274-292):
Here
is
a
risk,
for
instance,
that
is
clearly
to
be
classified
as
a
machine
shop.
In
the
absence
of
other
information
it
should
therefore
fake
the
machine
shop
rate,
namely,
the
average
rate
for
all
risks
of
this
class.
On
the
other
hand
the
risk
has
had
an
experience
of
its
own.
If
the
risk
is
large,
this
may
be
a
better
guide
to
its
hazard
than
the
class-experience.
In
any
event,
whether
the
risk
is
large
or
small,
both
of
these
elements
have
their
value
as
evidence,
and
both
must
be
taken
into
account.
The
difficulty
arises
from
the
fact
that
in
general
the
evidence
is
contradictory;
the
problem
therefore
is
to
find
and
apply
a
criterion
which
will
give
each
its
proper
weight.
While the term borrowing is absent here the notion of using the group-level information to inform us about this machine shop is clearly there. [The notions remain unchanged when "borrowing strength" and "borrowing information" start to be applied to this situation]
Consider a simple problem like estimating means of multiple groups. If your model treats them as completely unrelated then the only information you have about each mean is the information within that group. If your model treats their means as somewhat related (such as in some mixed-effects type model) then the estimates will be more precise because information from other groups informs (regularizes, shrinks toward a common mean) the estimate for a given group. That's an example of 'borrowing information'.
The notion crops up in actuarial work related to credibility (not necessarily with that specific term of 'borrowing' though borrowing in that sense is explicit in the formulas); this goes back a long way, to at least a century ago, with clear precursors going back to the mid-nineteenth century. For example, see Longley-Cook, L.H. (1962) An introduction to credibility theory PCAS, 49, 194-221.
Here's Whitney, 1918
(The Theory of Experience Rating, PCAS, 4, 274-292):
Here
is
a
risk,
for
instance,
that
is
clearly
to
be
classified
as
a
machine
shop.
In
the
absence
of
other
information
it
should
therefore
fake
the
machine
shop
rate,
namely,
the
average
rate
for
all
risks
of
this
class.
On
the
other
hand
the
risk
has
had
an
experience
of
its
own.
If
the
risk
is
large,
this
may
be
a
better
guide
to
its
hazard
than
the
class-experience.
In
any
event,
whether
the
risk
is
large
or
small,
both
of
these
elements
have
their
value
as
evidence,
and
both
must
be
taken
into
account.
The
difficulty
arises
from
the
fact
that
in
general
the
evidence
is
contradictory;
the
problem
therefore
is
to
find
and
apply
a
criterion
which
will
give
each
its
proper
weight.
While the term borrowing is absent here the notion of using the group-level information to inform us about this machine shop is clearly there. [The notions remain unchanged when "borrowing strength" and "borrowing information" start to be applied to this situation]
edited Dec 14 '18 at 4:12
answered Dec 13 '18 at 3:04
Glen_b♦Glen_b
209k22398739
209k22398739
1
I appreciate the example, as it clearly explains what borrowing does, but I'm looking for a more precise definition.
– EliK
Dec 13 '18 at 19:17
A precise definition of an imprecise, intuitive term? I suppose one might be possible - one might perhaps define it in terms of reducing variance by relating parameters across groups but one could very easily exclude plausible uses of the notion by doing so
– Glen_b♦
Dec 14 '18 at 3:41
It wasn't clear to me whether or not the imprecise intuition had an actual definition.
– EliK
Dec 14 '18 at 3:43
add a comment |
1
I appreciate the example, as it clearly explains what borrowing does, but I'm looking for a more precise definition.
– EliK
Dec 13 '18 at 19:17
A precise definition of an imprecise, intuitive term? I suppose one might be possible - one might perhaps define it in terms of reducing variance by relating parameters across groups but one could very easily exclude plausible uses of the notion by doing so
– Glen_b♦
Dec 14 '18 at 3:41
It wasn't clear to me whether or not the imprecise intuition had an actual definition.
– EliK
Dec 14 '18 at 3:43
1
1
I appreciate the example, as it clearly explains what borrowing does, but I'm looking for a more precise definition.
– EliK
Dec 13 '18 at 19:17
I appreciate the example, as it clearly explains what borrowing does, but I'm looking for a more precise definition.
– EliK
Dec 13 '18 at 19:17
A precise definition of an imprecise, intuitive term? I suppose one might be possible - one might perhaps define it in terms of reducing variance by relating parameters across groups but one could very easily exclude plausible uses of the notion by doing so
– Glen_b♦
Dec 14 '18 at 3:41
A precise definition of an imprecise, intuitive term? I suppose one might be possible - one might perhaps define it in terms of reducing variance by relating parameters across groups but one could very easily exclude plausible uses of the notion by doing so
– Glen_b♦
Dec 14 '18 at 3:41
It wasn't clear to me whether or not the imprecise intuition had an actual definition.
– EliK
Dec 14 '18 at 3:43
It wasn't clear to me whether or not the imprecise intuition had an actual definition.
– EliK
Dec 14 '18 at 3:43
add a comment |
The most commonly known model that "borrows information" is that of a mixed effects model. This can be analyzed in either the Frequentist or Bayesian setting. The Frequentist method actually has an Empirical Bayes interpretation to it; there's a prior on the mixed effects which, based on $sigma_R^2$, the variance of the random effects. Rather than setting based on prior information, we estimate it from our data.
On the other hand, from the Bayesian perspective, we are not putting a prior on the mixed effects, but rather they are a mid level parameter. That is, we put a prior on $sigma_R^2$, which then acts as like a hyper-parameter for the random effects, but it is different than a traditional prior in that the distribution placed on the random effects is not based purely on prior information, but rather a mix of prior information (i.e., prior on $sigma_R^2$) and the data.
I think it's pretty clear that "borrowing information" is not something purely Bayesian; there are non-Bayesian mixed effects models and these borrow information. However, based on my experience playing around with mixed effects models, I think Bayesian approach to such models is a little more important than some people realize. In particular, in a mixed effect model, one should think that we are estimating $sigma_R^2$ with, at best, the number of individual subjects we have. So if we have 10 subjects measured 100 times, we are still estimating $sigma_R^2$ from only 10 subjects. Not only that, but we don't actually even observe the random effects directly, but rather we just have estimates of them that are derived from the data and $sigma_R$ themselves. So it can be easy to forget just how little information based on the data we actually have to estimate $sigma_R^2$. The less information in the data, the more important the prior information becomes. If you haven't done so yet, I suggest trying to simulate mixed effects models with only a few subjects. You might be surprised just how unstable the estimates from Frequentist methods are, especially when you add just one or two outliers...and how often does one see real datasets without outliers? I believe this issue is covered in Bayesian Data Analysis by Gelman et al, but sadly I don't think its publicly available so no hyperlink.
Finally, multilevel modeling is not just mixed effects, although they are the most common. Any model in which parameters are influenced not just by priors and data, but also other unknown parameters can be called a multilevel model. Of course, this is a very flexible set of models, but can written up from scratch and fit with a minimal amount of work using tools like Stan, NIMBLE, JAGS, etc. To this extent, I'm not sure I would say multilevel modeling is "hype"; basically, you can write up any model that can be represented as a Directed Acyclic Graph and fit it immediately (assuming it has a reasonable run time, that is). This gives a whole lot more power and potential creativity than traditional choices (i.e., regression model packages) yet does not require one to build an entire R package from scratch just to fit a new type of model.
Thank you for the answer. To clarify I was not suggesting multi-level modeling is "hype". I was asking if "information borrowing" has a precise meaning or if that particular term is just hype.
– EliK
Dec 13 '18 at 13:03
@EliK: I'm not sure it has a precise meaning; Gordon Smyth gives what some may consider a precise meaning, i.e., Empirical Bayes, but the way I see that term commonly used now doesn't doesn't seem to fit that meaning. Personally, I don't think it's just a hype term; it's exactly the motivation for using mixed effects models over fixed effects models, although this extends beyond just the standard regression model framework. I do think a lot of people say the more vague "multilevel modeling" instead of the more precise "mixed effects modeling" because it's more fashionable now though.
– Cliff AB
Dec 13 '18 at 13:09
I would say the hype is in ML papers and blogs, where it is argued that you need Bayesian models to implement multilevel models. I would be interested in a worked example - where one compares against crossvalidated regularised model (for prediction)
– seanv507
Dec 13 '18 at 13:39
For what it's worth, the only alternative to Bayesian is Maximum Likelihood, which is just Bayesian with a uniform prior. So that's not really wrong.
– shadowtalker
Dec 13 '18 at 19:05
1
@shadowtalker: if you consider MLE methods to Bayesian, then the word Bayesian is basically meaningless in statistics. However, this is consistent with some of the mistakes I see in the ML literature.
– Cliff AB
Dec 13 '18 at 19:10
add a comment |
The most commonly known model that "borrows information" is that of a mixed effects model. This can be analyzed in either the Frequentist or Bayesian setting. The Frequentist method actually has an Empirical Bayes interpretation to it; there's a prior on the mixed effects which, based on $sigma_R^2$, the variance of the random effects. Rather than setting based on prior information, we estimate it from our data.
On the other hand, from the Bayesian perspective, we are not putting a prior on the mixed effects, but rather they are a mid level parameter. That is, we put a prior on $sigma_R^2$, which then acts as like a hyper-parameter for the random effects, but it is different than a traditional prior in that the distribution placed on the random effects is not based purely on prior information, but rather a mix of prior information (i.e., prior on $sigma_R^2$) and the data.
I think it's pretty clear that "borrowing information" is not something purely Bayesian; there are non-Bayesian mixed effects models and these borrow information. However, based on my experience playing around with mixed effects models, I think Bayesian approach to such models is a little more important than some people realize. In particular, in a mixed effect model, one should think that we are estimating $sigma_R^2$ with, at best, the number of individual subjects we have. So if we have 10 subjects measured 100 times, we are still estimating $sigma_R^2$ from only 10 subjects. Not only that, but we don't actually even observe the random effects directly, but rather we just have estimates of them that are derived from the data and $sigma_R$ themselves. So it can be easy to forget just how little information based on the data we actually have to estimate $sigma_R^2$. The less information in the data, the more important the prior information becomes. If you haven't done so yet, I suggest trying to simulate mixed effects models with only a few subjects. You might be surprised just how unstable the estimates from Frequentist methods are, especially when you add just one or two outliers...and how often does one see real datasets without outliers? I believe this issue is covered in Bayesian Data Analysis by Gelman et al, but sadly I don't think its publicly available so no hyperlink.
Finally, multilevel modeling is not just mixed effects, although they are the most common. Any model in which parameters are influenced not just by priors and data, but also other unknown parameters can be called a multilevel model. Of course, this is a very flexible set of models, but can written up from scratch and fit with a minimal amount of work using tools like Stan, NIMBLE, JAGS, etc. To this extent, I'm not sure I would say multilevel modeling is "hype"; basically, you can write up any model that can be represented as a Directed Acyclic Graph and fit it immediately (assuming it has a reasonable run time, that is). This gives a whole lot more power and potential creativity than traditional choices (i.e., regression model packages) yet does not require one to build an entire R package from scratch just to fit a new type of model.
Thank you for the answer. To clarify I was not suggesting multi-level modeling is "hype". I was asking if "information borrowing" has a precise meaning or if that particular term is just hype.
– EliK
Dec 13 '18 at 13:03
@EliK: I'm not sure it has a precise meaning; Gordon Smyth gives what some may consider a precise meaning, i.e., Empirical Bayes, but the way I see that term commonly used now doesn't doesn't seem to fit that meaning. Personally, I don't think it's just a hype term; it's exactly the motivation for using mixed effects models over fixed effects models, although this extends beyond just the standard regression model framework. I do think a lot of people say the more vague "multilevel modeling" instead of the more precise "mixed effects modeling" because it's more fashionable now though.
– Cliff AB
Dec 13 '18 at 13:09
I would say the hype is in ML papers and blogs, where it is argued that you need Bayesian models to implement multilevel models. I would be interested in a worked example - where one compares against crossvalidated regularised model (for prediction)
– seanv507
Dec 13 '18 at 13:39
For what it's worth, the only alternative to Bayesian is Maximum Likelihood, which is just Bayesian with a uniform prior. So that's not really wrong.
– shadowtalker
Dec 13 '18 at 19:05
1
@shadowtalker: if you consider MLE methods to Bayesian, then the word Bayesian is basically meaningless in statistics. However, this is consistent with some of the mistakes I see in the ML literature.
– Cliff AB
Dec 13 '18 at 19:10
add a comment |
The most commonly known model that "borrows information" is that of a mixed effects model. This can be analyzed in either the Frequentist or Bayesian setting. The Frequentist method actually has an Empirical Bayes interpretation to it; there's a prior on the mixed effects which, based on $sigma_R^2$, the variance of the random effects. Rather than setting based on prior information, we estimate it from our data.
On the other hand, from the Bayesian perspective, we are not putting a prior on the mixed effects, but rather they are a mid level parameter. That is, we put a prior on $sigma_R^2$, which then acts as like a hyper-parameter for the random effects, but it is different than a traditional prior in that the distribution placed on the random effects is not based purely on prior information, but rather a mix of prior information (i.e., prior on $sigma_R^2$) and the data.
I think it's pretty clear that "borrowing information" is not something purely Bayesian; there are non-Bayesian mixed effects models and these borrow information. However, based on my experience playing around with mixed effects models, I think Bayesian approach to such models is a little more important than some people realize. In particular, in a mixed effect model, one should think that we are estimating $sigma_R^2$ with, at best, the number of individual subjects we have. So if we have 10 subjects measured 100 times, we are still estimating $sigma_R^2$ from only 10 subjects. Not only that, but we don't actually even observe the random effects directly, but rather we just have estimates of them that are derived from the data and $sigma_R$ themselves. So it can be easy to forget just how little information based on the data we actually have to estimate $sigma_R^2$. The less information in the data, the more important the prior information becomes. If you haven't done so yet, I suggest trying to simulate mixed effects models with only a few subjects. You might be surprised just how unstable the estimates from Frequentist methods are, especially when you add just one or two outliers...and how often does one see real datasets without outliers? I believe this issue is covered in Bayesian Data Analysis by Gelman et al, but sadly I don't think its publicly available so no hyperlink.
Finally, multilevel modeling is not just mixed effects, although they are the most common. Any model in which parameters are influenced not just by priors and data, but also other unknown parameters can be called a multilevel model. Of course, this is a very flexible set of models, but can written up from scratch and fit with a minimal amount of work using tools like Stan, NIMBLE, JAGS, etc. To this extent, I'm not sure I would say multilevel modeling is "hype"; basically, you can write up any model that can be represented as a Directed Acyclic Graph and fit it immediately (assuming it has a reasonable run time, that is). This gives a whole lot more power and potential creativity than traditional choices (i.e., regression model packages) yet does not require one to build an entire R package from scratch just to fit a new type of model.
The most commonly known model that "borrows information" is that of a mixed effects model. This can be analyzed in either the Frequentist or Bayesian setting. The Frequentist method actually has an Empirical Bayes interpretation to it; there's a prior on the mixed effects which, based on $sigma_R^2$, the variance of the random effects. Rather than setting based on prior information, we estimate it from our data.
On the other hand, from the Bayesian perspective, we are not putting a prior on the mixed effects, but rather they are a mid level parameter. That is, we put a prior on $sigma_R^2$, which then acts as like a hyper-parameter for the random effects, but it is different than a traditional prior in that the distribution placed on the random effects is not based purely on prior information, but rather a mix of prior information (i.e., prior on $sigma_R^2$) and the data.
I think it's pretty clear that "borrowing information" is not something purely Bayesian; there are non-Bayesian mixed effects models and these borrow information. However, based on my experience playing around with mixed effects models, I think Bayesian approach to such models is a little more important than some people realize. In particular, in a mixed effect model, one should think that we are estimating $sigma_R^2$ with, at best, the number of individual subjects we have. So if we have 10 subjects measured 100 times, we are still estimating $sigma_R^2$ from only 10 subjects. Not only that, but we don't actually even observe the random effects directly, but rather we just have estimates of them that are derived from the data and $sigma_R$ themselves. So it can be easy to forget just how little information based on the data we actually have to estimate $sigma_R^2$. The less information in the data, the more important the prior information becomes. If you haven't done so yet, I suggest trying to simulate mixed effects models with only a few subjects. You might be surprised just how unstable the estimates from Frequentist methods are, especially when you add just one or two outliers...and how often does one see real datasets without outliers? I believe this issue is covered in Bayesian Data Analysis by Gelman et al, but sadly I don't think its publicly available so no hyperlink.
Finally, multilevel modeling is not just mixed effects, although they are the most common. Any model in which parameters are influenced not just by priors and data, but also other unknown parameters can be called a multilevel model. Of course, this is a very flexible set of models, but can written up from scratch and fit with a minimal amount of work using tools like Stan, NIMBLE, JAGS, etc. To this extent, I'm not sure I would say multilevel modeling is "hype"; basically, you can write up any model that can be represented as a Directed Acyclic Graph and fit it immediately (assuming it has a reasonable run time, that is). This gives a whole lot more power and potential creativity than traditional choices (i.e., regression model packages) yet does not require one to build an entire R package from scratch just to fit a new type of model.
answered Dec 13 '18 at 12:57
Cliff ABCliff AB
12.6k12362
12.6k12362
Thank you for the answer. To clarify I was not suggesting multi-level modeling is "hype". I was asking if "information borrowing" has a precise meaning or if that particular term is just hype.
– EliK
Dec 13 '18 at 13:03
@EliK: I'm not sure it has a precise meaning; Gordon Smyth gives what some may consider a precise meaning, i.e., Empirical Bayes, but the way I see that term commonly used now doesn't doesn't seem to fit that meaning. Personally, I don't think it's just a hype term; it's exactly the motivation for using mixed effects models over fixed effects models, although this extends beyond just the standard regression model framework. I do think a lot of people say the more vague "multilevel modeling" instead of the more precise "mixed effects modeling" because it's more fashionable now though.
– Cliff AB
Dec 13 '18 at 13:09
I would say the hype is in ML papers and blogs, where it is argued that you need Bayesian models to implement multilevel models. I would be interested in a worked example - where one compares against crossvalidated regularised model (for prediction)
– seanv507
Dec 13 '18 at 13:39
For what it's worth, the only alternative to Bayesian is Maximum Likelihood, which is just Bayesian with a uniform prior. So that's not really wrong.
– shadowtalker
Dec 13 '18 at 19:05
1
@shadowtalker: if you consider MLE methods to Bayesian, then the word Bayesian is basically meaningless in statistics. However, this is consistent with some of the mistakes I see in the ML literature.
– Cliff AB
Dec 13 '18 at 19:10
add a comment |
Thank you for the answer. To clarify I was not suggesting multi-level modeling is "hype". I was asking if "information borrowing" has a precise meaning or if that particular term is just hype.
– EliK
Dec 13 '18 at 13:03
@EliK: I'm not sure it has a precise meaning; Gordon Smyth gives what some may consider a precise meaning, i.e., Empirical Bayes, but the way I see that term commonly used now doesn't doesn't seem to fit that meaning. Personally, I don't think it's just a hype term; it's exactly the motivation for using mixed effects models over fixed effects models, although this extends beyond just the standard regression model framework. I do think a lot of people say the more vague "multilevel modeling" instead of the more precise "mixed effects modeling" because it's more fashionable now though.
– Cliff AB
Dec 13 '18 at 13:09
I would say the hype is in ML papers and blogs, where it is argued that you need Bayesian models to implement multilevel models. I would be interested in a worked example - where one compares against crossvalidated regularised model (for prediction)
– seanv507
Dec 13 '18 at 13:39
For what it's worth, the only alternative to Bayesian is Maximum Likelihood, which is just Bayesian with a uniform prior. So that's not really wrong.
– shadowtalker
Dec 13 '18 at 19:05
1
@shadowtalker: if you consider MLE methods to Bayesian, then the word Bayesian is basically meaningless in statistics. However, this is consistent with some of the mistakes I see in the ML literature.
– Cliff AB
Dec 13 '18 at 19:10
Thank you for the answer. To clarify I was not suggesting multi-level modeling is "hype". I was asking if "information borrowing" has a precise meaning or if that particular term is just hype.
– EliK
Dec 13 '18 at 13:03
Thank you for the answer. To clarify I was not suggesting multi-level modeling is "hype". I was asking if "information borrowing" has a precise meaning or if that particular term is just hype.
– EliK
Dec 13 '18 at 13:03
@EliK: I'm not sure it has a precise meaning; Gordon Smyth gives what some may consider a precise meaning, i.e., Empirical Bayes, but the way I see that term commonly used now doesn't doesn't seem to fit that meaning. Personally, I don't think it's just a hype term; it's exactly the motivation for using mixed effects models over fixed effects models, although this extends beyond just the standard regression model framework. I do think a lot of people say the more vague "multilevel modeling" instead of the more precise "mixed effects modeling" because it's more fashionable now though.
– Cliff AB
Dec 13 '18 at 13:09
@EliK: I'm not sure it has a precise meaning; Gordon Smyth gives what some may consider a precise meaning, i.e., Empirical Bayes, but the way I see that term commonly used now doesn't doesn't seem to fit that meaning. Personally, I don't think it's just a hype term; it's exactly the motivation for using mixed effects models over fixed effects models, although this extends beyond just the standard regression model framework. I do think a lot of people say the more vague "multilevel modeling" instead of the more precise "mixed effects modeling" because it's more fashionable now though.
– Cliff AB
Dec 13 '18 at 13:09
I would say the hype is in ML papers and blogs, where it is argued that you need Bayesian models to implement multilevel models. I would be interested in a worked example - where one compares against crossvalidated regularised model (for prediction)
– seanv507
Dec 13 '18 at 13:39
I would say the hype is in ML papers and blogs, where it is argued that you need Bayesian models to implement multilevel models. I would be interested in a worked example - where one compares against crossvalidated regularised model (for prediction)
– seanv507
Dec 13 '18 at 13:39
For what it's worth, the only alternative to Bayesian is Maximum Likelihood, which is just Bayesian with a uniform prior. So that's not really wrong.
– shadowtalker
Dec 13 '18 at 19:05
For what it's worth, the only alternative to Bayesian is Maximum Likelihood, which is just Bayesian with a uniform prior. So that's not really wrong.
– shadowtalker
Dec 13 '18 at 19:05
1
1
@shadowtalker: if you consider MLE methods to Bayesian, then the word Bayesian is basically meaningless in statistics. However, this is consistent with some of the mistakes I see in the ML literature.
– Cliff AB
Dec 13 '18 at 19:10
@shadowtalker: if you consider MLE methods to Bayesian, then the word Bayesian is basically meaningless in statistics. However, this is consistent with some of the mistakes I see in the ML literature.
– Cliff AB
Dec 13 '18 at 19:10
add a comment |
I am assuming, since you tagged machine learning that you are interested in prediction, rather than inference.(I believe I am aligned with @Glen_b 's answer, but just translating to this context/vocabulary)
I would claim in this case it is a buzzword.
A regularised linear model with a group variable will borrow information: the prediction at individual level will be a combination of the group mean and individual effect.
One way to think of l1/l2 regularisation is that it is assigning a coefficient cost per reduction in total error, since a group variable affects more samples than an individual variable, there will be pressure to estimate a group effect,leaving a smaller deviation from group effect to each individual variable.
For individual points with enough data, the individual effect will be 'strong ', for those with little data, the effect will be weak.
I think the easiest way to see this is by considering L1 regularisation and 3 individuals of same group with same effect. Unregularised, the problem has an infinite numbers of solutions, whereas regularisation gives a unique solution.
Assigning all the effect to the group coefficient has the lowest l1 norm, since we only need 1 value to cover 3 individuals. Conversely,assigning all the effect to the individual coefficients has the worst, namely 3 times the l1 norm of assigning the effect to the group coefficient.
Note we can have as many hierarchies as we want, and interactions are affected similarly: regularisation will push effects to main variables,rather than rarer interactions.
The blog tjmahr.com/plotting-partial-pooling-in-mixed-effects-models. – linked by @IsabellaGhement gives a quote for borrowing strength
"This effect is sometimes called shrinkage, because more extreme values shrinkage are pulled towards a more reasonable, more average value. In the lme4 book, Douglas Bates provides an alternative to shrinkage [name]"
The term “shrinkage” may have negative connotations. John Tukey
preferred to refer to the process as the estimates for individual
subjects “borrowing strength” from each other. This is a fundamental
difference in the models underlying mixed-effects models versus
strictly fixed effects models. In a mixed-effects model we assume that
the levels of a grouping factor are a selection from a population and,
as a result, can be expected to share characteristics to some degree.
Consequently, the predictions from a mixed-effects model are
attenuated relative to those from strictly fixed-effects models.
What is prediction if not a specific kind of inference?
– shadowtalker
Dec 13 '18 at 19:06
add a comment |
I am assuming, since you tagged machine learning that you are interested in prediction, rather than inference.(I believe I am aligned with @Glen_b 's answer, but just translating to this context/vocabulary)
I would claim in this case it is a buzzword.
A regularised linear model with a group variable will borrow information: the prediction at individual level will be a combination of the group mean and individual effect.
One way to think of l1/l2 regularisation is that it is assigning a coefficient cost per reduction in total error, since a group variable affects more samples than an individual variable, there will be pressure to estimate a group effect,leaving a smaller deviation from group effect to each individual variable.
For individual points with enough data, the individual effect will be 'strong ', for those with little data, the effect will be weak.
I think the easiest way to see this is by considering L1 regularisation and 3 individuals of same group with same effect. Unregularised, the problem has an infinite numbers of solutions, whereas regularisation gives a unique solution.
Assigning all the effect to the group coefficient has the lowest l1 norm, since we only need 1 value to cover 3 individuals. Conversely,assigning all the effect to the individual coefficients has the worst, namely 3 times the l1 norm of assigning the effect to the group coefficient.
Note we can have as many hierarchies as we want, and interactions are affected similarly: regularisation will push effects to main variables,rather than rarer interactions.
The blog tjmahr.com/plotting-partial-pooling-in-mixed-effects-models. – linked by @IsabellaGhement gives a quote for borrowing strength
"This effect is sometimes called shrinkage, because more extreme values shrinkage are pulled towards a more reasonable, more average value. In the lme4 book, Douglas Bates provides an alternative to shrinkage [name]"
The term “shrinkage” may have negative connotations. John Tukey
preferred to refer to the process as the estimates for individual
subjects “borrowing strength” from each other. This is a fundamental
difference in the models underlying mixed-effects models versus
strictly fixed effects models. In a mixed-effects model we assume that
the levels of a grouping factor are a selection from a population and,
as a result, can be expected to share characteristics to some degree.
Consequently, the predictions from a mixed-effects model are
attenuated relative to those from strictly fixed-effects models.
What is prediction if not a specific kind of inference?
– shadowtalker
Dec 13 '18 at 19:06
add a comment |
I am assuming, since you tagged machine learning that you are interested in prediction, rather than inference.(I believe I am aligned with @Glen_b 's answer, but just translating to this context/vocabulary)
I would claim in this case it is a buzzword.
A regularised linear model with a group variable will borrow information: the prediction at individual level will be a combination of the group mean and individual effect.
One way to think of l1/l2 regularisation is that it is assigning a coefficient cost per reduction in total error, since a group variable affects more samples than an individual variable, there will be pressure to estimate a group effect,leaving a smaller deviation from group effect to each individual variable.
For individual points with enough data, the individual effect will be 'strong ', for those with little data, the effect will be weak.
I think the easiest way to see this is by considering L1 regularisation and 3 individuals of same group with same effect. Unregularised, the problem has an infinite numbers of solutions, whereas regularisation gives a unique solution.
Assigning all the effect to the group coefficient has the lowest l1 norm, since we only need 1 value to cover 3 individuals. Conversely,assigning all the effect to the individual coefficients has the worst, namely 3 times the l1 norm of assigning the effect to the group coefficient.
Note we can have as many hierarchies as we want, and interactions are affected similarly: regularisation will push effects to main variables,rather than rarer interactions.
The blog tjmahr.com/plotting-partial-pooling-in-mixed-effects-models. – linked by @IsabellaGhement gives a quote for borrowing strength
"This effect is sometimes called shrinkage, because more extreme values shrinkage are pulled towards a more reasonable, more average value. In the lme4 book, Douglas Bates provides an alternative to shrinkage [name]"
The term “shrinkage” may have negative connotations. John Tukey
preferred to refer to the process as the estimates for individual
subjects “borrowing strength” from each other. This is a fundamental
difference in the models underlying mixed-effects models versus
strictly fixed effects models. In a mixed-effects model we assume that
the levels of a grouping factor are a selection from a population and,
as a result, can be expected to share characteristics to some degree.
Consequently, the predictions from a mixed-effects model are
attenuated relative to those from strictly fixed-effects models.
I am assuming, since you tagged machine learning that you are interested in prediction, rather than inference.(I believe I am aligned with @Glen_b 's answer, but just translating to this context/vocabulary)
I would claim in this case it is a buzzword.
A regularised linear model with a group variable will borrow information: the prediction at individual level will be a combination of the group mean and individual effect.
One way to think of l1/l2 regularisation is that it is assigning a coefficient cost per reduction in total error, since a group variable affects more samples than an individual variable, there will be pressure to estimate a group effect,leaving a smaller deviation from group effect to each individual variable.
For individual points with enough data, the individual effect will be 'strong ', for those with little data, the effect will be weak.
I think the easiest way to see this is by considering L1 regularisation and 3 individuals of same group with same effect. Unregularised, the problem has an infinite numbers of solutions, whereas regularisation gives a unique solution.
Assigning all the effect to the group coefficient has the lowest l1 norm, since we only need 1 value to cover 3 individuals. Conversely,assigning all the effect to the individual coefficients has the worst, namely 3 times the l1 norm of assigning the effect to the group coefficient.
Note we can have as many hierarchies as we want, and interactions are affected similarly: regularisation will push effects to main variables,rather than rarer interactions.
The blog tjmahr.com/plotting-partial-pooling-in-mixed-effects-models. – linked by @IsabellaGhement gives a quote for borrowing strength
"This effect is sometimes called shrinkage, because more extreme values shrinkage are pulled towards a more reasonable, more average value. In the lme4 book, Douglas Bates provides an alternative to shrinkage [name]"
The term “shrinkage” may have negative connotations. John Tukey
preferred to refer to the process as the estimates for individual
subjects “borrowing strength” from each other. This is a fundamental
difference in the models underlying mixed-effects models versus
strictly fixed effects models. In a mixed-effects model we assume that
the levels of a grouping factor are a selection from a population and,
as a result, can be expected to share characteristics to some degree.
Consequently, the predictions from a mixed-effects model are
attenuated relative to those from strictly fixed-effects models.
edited Dec 13 '18 at 12:36
answered Dec 13 '18 at 9:33
seanv507seanv507
2,938918
2,938918
What is prediction if not a specific kind of inference?
– shadowtalker
Dec 13 '18 at 19:06
add a comment |
What is prediction if not a specific kind of inference?
– shadowtalker
Dec 13 '18 at 19:06
What is prediction if not a specific kind of inference?
– shadowtalker
Dec 13 '18 at 19:06
What is prediction if not a specific kind of inference?
– shadowtalker
Dec 13 '18 at 19:06
add a comment |
Another source I would like to recommend on this topic which I find particularly instructive is David Robinson's Introduction to Empirical Bayes.
His running example is that of whether a baseball player will manage to hit the next ball thrown at him. The key idea is that if a player has been around for years, one has a pretty clear picture of how capable he is and in particular, one can use his observed batting average as a pretty good estimate of the success probability in the next pitch.
Conversely, a player who has just started playing in a league hasn't revealed much of his actual talent yet. So it seems like a wise choice to adjust the estimate of his success probability towards some overall mean if he has been particularly successful or unsuccessful in his first few games, as that likely is, at least to some extent, due to good or bad luck.
As a minor point, the term "borrowing" certainly does not seem to be used in the sense that something that has been borrowed would need to be returned at some point ;-).
add a comment |
Another source I would like to recommend on this topic which I find particularly instructive is David Robinson's Introduction to Empirical Bayes.
His running example is that of whether a baseball player will manage to hit the next ball thrown at him. The key idea is that if a player has been around for years, one has a pretty clear picture of how capable he is and in particular, one can use his observed batting average as a pretty good estimate of the success probability in the next pitch.
Conversely, a player who has just started playing in a league hasn't revealed much of his actual talent yet. So it seems like a wise choice to adjust the estimate of his success probability towards some overall mean if he has been particularly successful or unsuccessful in his first few games, as that likely is, at least to some extent, due to good or bad luck.
As a minor point, the term "borrowing" certainly does not seem to be used in the sense that something that has been borrowed would need to be returned at some point ;-).
add a comment |
Another source I would like to recommend on this topic which I find particularly instructive is David Robinson's Introduction to Empirical Bayes.
His running example is that of whether a baseball player will manage to hit the next ball thrown at him. The key idea is that if a player has been around for years, one has a pretty clear picture of how capable he is and in particular, one can use his observed batting average as a pretty good estimate of the success probability in the next pitch.
Conversely, a player who has just started playing in a league hasn't revealed much of his actual talent yet. So it seems like a wise choice to adjust the estimate of his success probability towards some overall mean if he has been particularly successful or unsuccessful in his first few games, as that likely is, at least to some extent, due to good or bad luck.
As a minor point, the term "borrowing" certainly does not seem to be used in the sense that something that has been borrowed would need to be returned at some point ;-).
Another source I would like to recommend on this topic which I find particularly instructive is David Robinson's Introduction to Empirical Bayes.
His running example is that of whether a baseball player will manage to hit the next ball thrown at him. The key idea is that if a player has been around for years, one has a pretty clear picture of how capable he is and in particular, one can use his observed batting average as a pretty good estimate of the success probability in the next pitch.
Conversely, a player who has just started playing in a league hasn't revealed much of his actual talent yet. So it seems like a wise choice to adjust the estimate of his success probability towards some overall mean if he has been particularly successful or unsuccessful in his first few games, as that likely is, at least to some extent, due to good or bad luck.
As a minor point, the term "borrowing" certainly does not seem to be used in the sense that something that has been borrowed would need to be returned at some point ;-).
edited Dec 13 '18 at 15:28
answered Dec 13 '18 at 12:54
Christoph HanckChristoph Hanck
16.6k33973
16.6k33973
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f381761%2fwhat-precisely-does-it-mean-to-borrow-information%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
For your question 2., you may find this link illuminating: tjmahr.com/plotting-partial-pooling-in-mixed-effects-models.
– Isabella Ghement
Dec 13 '18 at 4:07
I would love to see some mention of information theory in the answers here.
– shadowtalker
Dec 13 '18 at 19:07