What precisely does it mean to borrow information?












11















I often people them talk about information borrowing or information sharing in Bayesian hierarchical models. I can't seem to get a straight answer about what this actually means and if it is unique to Bayesian hierarchical models. I sort of get the idea: some levels in your hierarchy share a common parameter. I have no idea how this translates to "information borrowing" though.




  1. Is "information borrowing"/ "information sharing" a buzz word people like to throw out?


  2. Is there an example with closed form posteriors that illustrates this sharing phenomenon?


  3. Is this unique to a Bayesian analysis? Generally, when I see examples of "information borrowing" they are just mixed models. Maybe I learned this models in an old fashioned way, but I don't see any sharing.



I am not interested in starting a philosophical debate about methods. I am just curious about the use of this term.










share|cite|improve this question


















  • 1





    For your question 2., you may find this link illuminating: tjmahr.com/plotting-partial-pooling-in-mixed-effects-models.

    – Isabella Ghement
    Dec 13 '18 at 4:07











  • I would love to see some mention of information theory in the answers here.

    – shadowtalker
    Dec 13 '18 at 19:07
















11















I often people them talk about information borrowing or information sharing in Bayesian hierarchical models. I can't seem to get a straight answer about what this actually means and if it is unique to Bayesian hierarchical models. I sort of get the idea: some levels in your hierarchy share a common parameter. I have no idea how this translates to "information borrowing" though.




  1. Is "information borrowing"/ "information sharing" a buzz word people like to throw out?


  2. Is there an example with closed form posteriors that illustrates this sharing phenomenon?


  3. Is this unique to a Bayesian analysis? Generally, when I see examples of "information borrowing" they are just mixed models. Maybe I learned this models in an old fashioned way, but I don't see any sharing.



I am not interested in starting a philosophical debate about methods. I am just curious about the use of this term.










share|cite|improve this question


















  • 1





    For your question 2., you may find this link illuminating: tjmahr.com/plotting-partial-pooling-in-mixed-effects-models.

    – Isabella Ghement
    Dec 13 '18 at 4:07











  • I would love to see some mention of information theory in the answers here.

    – shadowtalker
    Dec 13 '18 at 19:07














11












11








11


5






I often people them talk about information borrowing or information sharing in Bayesian hierarchical models. I can't seem to get a straight answer about what this actually means and if it is unique to Bayesian hierarchical models. I sort of get the idea: some levels in your hierarchy share a common parameter. I have no idea how this translates to "information borrowing" though.




  1. Is "information borrowing"/ "information sharing" a buzz word people like to throw out?


  2. Is there an example with closed form posteriors that illustrates this sharing phenomenon?


  3. Is this unique to a Bayesian analysis? Generally, when I see examples of "information borrowing" they are just mixed models. Maybe I learned this models in an old fashioned way, but I don't see any sharing.



I am not interested in starting a philosophical debate about methods. I am just curious about the use of this term.










share|cite|improve this question














I often people them talk about information borrowing or information sharing in Bayesian hierarchical models. I can't seem to get a straight answer about what this actually means and if it is unique to Bayesian hierarchical models. I sort of get the idea: some levels in your hierarchy share a common parameter. I have no idea how this translates to "information borrowing" though.




  1. Is "information borrowing"/ "information sharing" a buzz word people like to throw out?


  2. Is there an example with closed form posteriors that illustrates this sharing phenomenon?


  3. Is this unique to a Bayesian analysis? Generally, when I see examples of "information borrowing" they are just mixed models. Maybe I learned this models in an old fashioned way, but I don't see any sharing.



I am not interested in starting a philosophical debate about methods. I am just curious about the use of this term.







machine-learning bayesian multilevel-analysis terminology hierarchical-bayesian






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Dec 13 '18 at 1:33









EliKEliK

355114




355114








  • 1





    For your question 2., you may find this link illuminating: tjmahr.com/plotting-partial-pooling-in-mixed-effects-models.

    – Isabella Ghement
    Dec 13 '18 at 4:07











  • I would love to see some mention of information theory in the answers here.

    – shadowtalker
    Dec 13 '18 at 19:07














  • 1





    For your question 2., you may find this link illuminating: tjmahr.com/plotting-partial-pooling-in-mixed-effects-models.

    – Isabella Ghement
    Dec 13 '18 at 4:07











  • I would love to see some mention of information theory in the answers here.

    – shadowtalker
    Dec 13 '18 at 19:07








1




1





For your question 2., you may find this link illuminating: tjmahr.com/plotting-partial-pooling-in-mixed-effects-models.

– Isabella Ghement
Dec 13 '18 at 4:07





For your question 2., you may find this link illuminating: tjmahr.com/plotting-partial-pooling-in-mixed-effects-models.

– Isabella Ghement
Dec 13 '18 at 4:07













I would love to see some mention of information theory in the answers here.

– shadowtalker
Dec 13 '18 at 19:07





I would love to see some mention of information theory in the answers here.

– shadowtalker
Dec 13 '18 at 19:07










5 Answers
5






active

oldest

votes


















10














This is a term that is specifically from empirical Bayes (EB), in fact the concept that it refers to does not exist in true Bayesian inference. The original term was "borrowing strength", which was coined by John Tukey back in the 1960s and popularized further by Bradley Efron and Carl Morris in a series of statistical articles on Stein's paradox and parametric EB in the 1970s and 1980s. Many people now use "information borrowing" or "information sharing" as synonyms for the same concept. The reason why you may hear it in the context of mixed models is that the most common analyses for mixed models have an EB interpretation.



EB has many applications and applies to many statistical models, but the context always is that you have a large number of (possibly independent) cases and you are trying to estimate a particular parameter (such as the mean or variance) in each case. In Bayesian inference, you make posterior inferences about the parameter based on both the observed data for each case and the prior distribution for that parameter. In EB inference the prior distribution for the parameter is estimated from the whole collection of data cases, after which inference proceeds as for Bayesian inference. Hence, when you estimate the parameter for particular case, you are use both the data for that case and also the estimated prior distribution, and the latter represents the "information" or "strength" that you borrow from the whole ensemble of cases when making inference about one particular case.



Now you can see why EB has "borrowing" but true Bayes does not. In true Bayes, the prior distribution already exists and so doesn't need to be begged or borrowed. In EB, the prior distribution has be created from the observed data itself. When we make inference about a particular case, we use all the observed information from that case and a little bit of information from each of the other cases. We say it is only "borrowed", because the information is given back when we move on to make inference about the next case.



The idea of EB and "information borrowing" is used heavily in statistical genomics, when each "case" is usually a gene or a genomic feature (Smyth, 2004; Phipson et al, 2016).



References



Efron, Bradley, and Carl Morris. Stein's paradox in statistics. Scientific American 236, no. 5 (1977): 119-127. http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf



Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1, Article 3.
http://www.statsci.org/smyth/pubs/ebayes.pdf



Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963.
http://dx.doi.org/10.1214/16-AOAS920






share|cite|improve this answer





















  • 1





    I don't think this interpretation is correct. For example, mixed effects models borrow information, yet can be analyzed in a traditional Bayesian context

    – Cliff AB
    Dec 13 '18 at 6:55






  • 1





    @CliffAB If you dig into mixed model analyses, you will find that the analysis is virtually always empirical Bayes rather than true Bayes. Most authors of course will say they are doing Bayes when it is actually EB because most authors don't make the distinction. If you think can you give an example of a true Bayes mixed model analysis, then I invite you to do so.

    – Gordon Smyth
    Dec 13 '18 at 7:12








  • 1





    @CliffAB In the minority of cases when a true Bayes analysis is used for mixed models (e.g., by MCMC or Winbugs) then use of the term "borrow information" would be IMO out of place. It would certainly disagree with the what Tukey and Efron meant by "borrowing".

    – Gordon Smyth
    Dec 13 '18 at 7:33













  • The package brms is a recent extremely popular package for fitting Bayesian (not EB!) regression models, including mixed effects. It's all the rage these days in many applied fields.

    – Cliff AB
    Dec 13 '18 at 10:28






  • 1





    @CliffAB I agree that brms is Bayesian package, which is why the term "borrow information" doesn't appear in the brms documentation.

    – Gordon Smyth
    Dec 13 '18 at 10:54



















5














Consider a simple problem like estimating means of multiple groups. If your model treats them as completely unrelated then the only information you have about each mean is the information within that group. If your model treats their means as somewhat related (such as in some mixed-effects type model) then the estimates will be more precise because information from other groups informs (regularizes, shrinks toward a common mean) the estimate for a given group. That's an example of 'borrowing information'.



The notion crops up in actuarial work related to credibility (not necessarily with that specific term of 'borrowing' though borrowing in that sense is explicit in the formulas); this goes back a long way, to at least a century ago, with clear precursors going back to the mid-nineteenth century. For example, see Longley-Cook, L.H. (1962) An introduction to credibility theory PCAS, 49, 194-221.



Here's Whitney, 1918
(The Theory of Experience Rating, PCAS, 4, 274-292):




Here
is
a
risk,
for
instance,
that
is
clearly
to
be
classified
as
a
machine
shop.
In
the
absence
of
other
information
it
should
therefore
fake
the
machine
shop
rate,
namely,
the
average
rate
for
all
risks
of
this
class.
On
the
other
hand
the
risk
has
had
an
experience
of
its
own.
If
the
risk
is
large,
this
may
be
a
better
guide
to
its
hazard
than
the
class-experience.
In
any
event,
whether
the
risk
is
large
or
small,
both
of
these
elements
have
their
value
as
evidence,
and
both
must
be
taken
into
account.
The
difficulty
arises
from
the
fact
that
in
general
the
evidence
is
contradictory;
the
problem
therefore
is
to
find
and
apply
a
criterion
which
will
give
each
its
proper
weight.




While the term borrowing is absent here the notion of using the group-level information to inform us about this machine shop is clearly there. [The notions remain unchanged when "borrowing strength" and "borrowing information" start to be applied to this situation]






share|cite|improve this answer





















  • 1





    I appreciate the example, as it clearly explains what borrowing does, but I'm looking for a more precise definition.

    – EliK
    Dec 13 '18 at 19:17











  • A precise definition of an imprecise, intuitive term? I suppose one might be possible - one might perhaps define it in terms of reducing variance by relating parameters across groups but one could very easily exclude plausible uses of the notion by doing so

    – Glen_b
    Dec 14 '18 at 3:41













  • It wasn't clear to me whether or not the imprecise intuition had an actual definition.

    – EliK
    Dec 14 '18 at 3:43



















3














The most commonly known model that "borrows information" is that of a mixed effects model. This can be analyzed in either the Frequentist or Bayesian setting. The Frequentist method actually has an Empirical Bayes interpretation to it; there's a prior on the mixed effects which, based on $sigma_R^2$, the variance of the random effects. Rather than setting based on prior information, we estimate it from our data.



On the other hand, from the Bayesian perspective, we are not putting a prior on the mixed effects, but rather they are a mid level parameter. That is, we put a prior on $sigma_R^2$, which then acts as like a hyper-parameter for the random effects, but it is different than a traditional prior in that the distribution placed on the random effects is not based purely on prior information, but rather a mix of prior information (i.e., prior on $sigma_R^2$) and the data.



I think it's pretty clear that "borrowing information" is not something purely Bayesian; there are non-Bayesian mixed effects models and these borrow information. However, based on my experience playing around with mixed effects models, I think Bayesian approach to such models is a little more important than some people realize. In particular, in a mixed effect model, one should think that we are estimating $sigma_R^2$ with, at best, the number of individual subjects we have. So if we have 10 subjects measured 100 times, we are still estimating $sigma_R^2$ from only 10 subjects. Not only that, but we don't actually even observe the random effects directly, but rather we just have estimates of them that are derived from the data and $sigma_R$ themselves. So it can be easy to forget just how little information based on the data we actually have to estimate $sigma_R^2$. The less information in the data, the more important the prior information becomes. If you haven't done so yet, I suggest trying to simulate mixed effects models with only a few subjects. You might be surprised just how unstable the estimates from Frequentist methods are, especially when you add just one or two outliers...and how often does one see real datasets without outliers? I believe this issue is covered in Bayesian Data Analysis by Gelman et al, but sadly I don't think its publicly available so no hyperlink.



Finally, multilevel modeling is not just mixed effects, although they are the most common. Any model in which parameters are influenced not just by priors and data, but also other unknown parameters can be called a multilevel model. Of course, this is a very flexible set of models, but can written up from scratch and fit with a minimal amount of work using tools like Stan, NIMBLE, JAGS, etc. To this extent, I'm not sure I would say multilevel modeling is "hype"; basically, you can write up any model that can be represented as a Directed Acyclic Graph and fit it immediately (assuming it has a reasonable run time, that is). This gives a whole lot more power and potential creativity than traditional choices (i.e., regression model packages) yet does not require one to build an entire R package from scratch just to fit a new type of model.






share|cite|improve this answer
























  • Thank you for the answer. To clarify I was not suggesting multi-level modeling is "hype". I was asking if "information borrowing" has a precise meaning or if that particular term is just hype.

    – EliK
    Dec 13 '18 at 13:03











  • @EliK: I'm not sure it has a precise meaning; Gordon Smyth gives what some may consider a precise meaning, i.e., Empirical Bayes, but the way I see that term commonly used now doesn't doesn't seem to fit that meaning. Personally, I don't think it's just a hype term; it's exactly the motivation for using mixed effects models over fixed effects models, although this extends beyond just the standard regression model framework. I do think a lot of people say the more vague "multilevel modeling" instead of the more precise "mixed effects modeling" because it's more fashionable now though.

    – Cliff AB
    Dec 13 '18 at 13:09











  • I would say the hype is in ML papers and blogs, where it is argued that you need Bayesian models to implement multilevel models. I would be interested in a worked example - where one compares against crossvalidated regularised model (for prediction)

    – seanv507
    Dec 13 '18 at 13:39











  • For what it's worth, the only alternative to Bayesian is Maximum Likelihood, which is just Bayesian with a uniform prior. So that's not really wrong.

    – shadowtalker
    Dec 13 '18 at 19:05






  • 1





    @shadowtalker: if you consider MLE methods to Bayesian, then the word Bayesian is basically meaningless in statistics. However, this is consistent with some of the mistakes I see in the ML literature.

    – Cliff AB
    Dec 13 '18 at 19:10



















2














I am assuming, since you tagged machine learning that you are interested in prediction, rather than inference.(I believe I am aligned with @Glen_b 's answer, but just translating to this context/vocabulary)



I would claim in this case it is a buzzword.
A regularised linear model with a group variable will borrow information: the prediction at individual level will be a combination of the group mean and individual effect.
One way to think of l1/l2 regularisation is that it is assigning a coefficient cost per reduction in total error, since a group variable affects more samples than an individual variable, there will be pressure to estimate a group effect,leaving a smaller deviation from group effect to each individual variable.



For individual points with enough data, the individual effect will be 'strong ', for those with little data, the effect will be weak.



I think the easiest way to see this is by considering L1 regularisation and 3 individuals of same group with same effect. Unregularised, the problem has an infinite numbers of solutions, whereas regularisation gives a unique solution.



Assigning all the effect to the group coefficient has the lowest l1 norm, since we only need 1 value to cover 3 individuals. Conversely,assigning all the effect to the individual coefficients has the worst, namely 3 times the l1 norm of assigning the effect to the group coefficient.



Note we can have as many hierarchies as we want, and interactions are affected similarly: regularisation will push effects to main variables,rather than rarer interactions.



The blog tjmahr.com/plotting-partial-pooling-in-mixed-effects-models. – linked by @IsabellaGhement gives a quote for borrowing strength



"This effect is sometimes called shrinkage, because more extreme values shrinkage are pulled towards a more reasonable, more average value. In the lme4 book, Douglas Bates provides an alternative to shrinkage [name]"




The term “shrinkage” may have negative connotations. John Tukey
preferred to refer to the process as the estimates for individual
subjects “borrowing strength” from each other. This is a fundamental
difference in the models underlying mixed-effects models versus
strictly fixed effects models. In a mixed-effects model we assume that
the levels of a grouping factor are a selection from a population and,
as a result, can be expected to share characteristics to some degree.
Consequently, the predictions from a mixed-effects model are
attenuated relative to those from strictly fixed-effects models.







share|cite|improve this answer


























  • What is prediction if not a specific kind of inference?

    – shadowtalker
    Dec 13 '18 at 19:06



















0














Another source I would like to recommend on this topic which I find particularly instructive is David Robinson's Introduction to Empirical Bayes.



His running example is that of whether a baseball player will manage to hit the next ball thrown at him. The key idea is that if a player has been around for years, one has a pretty clear picture of how capable he is and in particular, one can use his observed batting average as a pretty good estimate of the success probability in the next pitch.



Conversely, a player who has just started playing in a league hasn't revealed much of his actual talent yet. So it seems like a wise choice to adjust the estimate of his success probability towards some overall mean if he has been particularly successful or unsuccessful in his first few games, as that likely is, at least to some extent, due to good or bad luck.



As a minor point, the term "borrowing" certainly does not seem to be used in the sense that something that has been borrowed would need to be returned at some point ;-).






share|cite|improve this answer

























    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "65"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f381761%2fwhat-precisely-does-it-mean-to-borrow-information%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    5 Answers
    5






    active

    oldest

    votes








    5 Answers
    5






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    10














    This is a term that is specifically from empirical Bayes (EB), in fact the concept that it refers to does not exist in true Bayesian inference. The original term was "borrowing strength", which was coined by John Tukey back in the 1960s and popularized further by Bradley Efron and Carl Morris in a series of statistical articles on Stein's paradox and parametric EB in the 1970s and 1980s. Many people now use "information borrowing" or "information sharing" as synonyms for the same concept. The reason why you may hear it in the context of mixed models is that the most common analyses for mixed models have an EB interpretation.



    EB has many applications and applies to many statistical models, but the context always is that you have a large number of (possibly independent) cases and you are trying to estimate a particular parameter (such as the mean or variance) in each case. In Bayesian inference, you make posterior inferences about the parameter based on both the observed data for each case and the prior distribution for that parameter. In EB inference the prior distribution for the parameter is estimated from the whole collection of data cases, after which inference proceeds as for Bayesian inference. Hence, when you estimate the parameter for particular case, you are use both the data for that case and also the estimated prior distribution, and the latter represents the "information" or "strength" that you borrow from the whole ensemble of cases when making inference about one particular case.



    Now you can see why EB has "borrowing" but true Bayes does not. In true Bayes, the prior distribution already exists and so doesn't need to be begged or borrowed. In EB, the prior distribution has be created from the observed data itself. When we make inference about a particular case, we use all the observed information from that case and a little bit of information from each of the other cases. We say it is only "borrowed", because the information is given back when we move on to make inference about the next case.



    The idea of EB and "information borrowing" is used heavily in statistical genomics, when each "case" is usually a gene or a genomic feature (Smyth, 2004; Phipson et al, 2016).



    References



    Efron, Bradley, and Carl Morris. Stein's paradox in statistics. Scientific American 236, no. 5 (1977): 119-127. http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf



    Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1, Article 3.
    http://www.statsci.org/smyth/pubs/ebayes.pdf



    Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963.
    http://dx.doi.org/10.1214/16-AOAS920






    share|cite|improve this answer





















    • 1





      I don't think this interpretation is correct. For example, mixed effects models borrow information, yet can be analyzed in a traditional Bayesian context

      – Cliff AB
      Dec 13 '18 at 6:55






    • 1





      @CliffAB If you dig into mixed model analyses, you will find that the analysis is virtually always empirical Bayes rather than true Bayes. Most authors of course will say they are doing Bayes when it is actually EB because most authors don't make the distinction. If you think can you give an example of a true Bayes mixed model analysis, then I invite you to do so.

      – Gordon Smyth
      Dec 13 '18 at 7:12








    • 1





      @CliffAB In the minority of cases when a true Bayes analysis is used for mixed models (e.g., by MCMC or Winbugs) then use of the term "borrow information" would be IMO out of place. It would certainly disagree with the what Tukey and Efron meant by "borrowing".

      – Gordon Smyth
      Dec 13 '18 at 7:33













    • The package brms is a recent extremely popular package for fitting Bayesian (not EB!) regression models, including mixed effects. It's all the rage these days in many applied fields.

      – Cliff AB
      Dec 13 '18 at 10:28






    • 1





      @CliffAB I agree that brms is Bayesian package, which is why the term "borrow information" doesn't appear in the brms documentation.

      – Gordon Smyth
      Dec 13 '18 at 10:54
















    10














    This is a term that is specifically from empirical Bayes (EB), in fact the concept that it refers to does not exist in true Bayesian inference. The original term was "borrowing strength", which was coined by John Tukey back in the 1960s and popularized further by Bradley Efron and Carl Morris in a series of statistical articles on Stein's paradox and parametric EB in the 1970s and 1980s. Many people now use "information borrowing" or "information sharing" as synonyms for the same concept. The reason why you may hear it in the context of mixed models is that the most common analyses for mixed models have an EB interpretation.



    EB has many applications and applies to many statistical models, but the context always is that you have a large number of (possibly independent) cases and you are trying to estimate a particular parameter (such as the mean or variance) in each case. In Bayesian inference, you make posterior inferences about the parameter based on both the observed data for each case and the prior distribution for that parameter. In EB inference the prior distribution for the parameter is estimated from the whole collection of data cases, after which inference proceeds as for Bayesian inference. Hence, when you estimate the parameter for particular case, you are use both the data for that case and also the estimated prior distribution, and the latter represents the "information" or "strength" that you borrow from the whole ensemble of cases when making inference about one particular case.



    Now you can see why EB has "borrowing" but true Bayes does not. In true Bayes, the prior distribution already exists and so doesn't need to be begged or borrowed. In EB, the prior distribution has be created from the observed data itself. When we make inference about a particular case, we use all the observed information from that case and a little bit of information from each of the other cases. We say it is only "borrowed", because the information is given back when we move on to make inference about the next case.



    The idea of EB and "information borrowing" is used heavily in statistical genomics, when each "case" is usually a gene or a genomic feature (Smyth, 2004; Phipson et al, 2016).



    References



    Efron, Bradley, and Carl Morris. Stein's paradox in statistics. Scientific American 236, no. 5 (1977): 119-127. http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf



    Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1, Article 3.
    http://www.statsci.org/smyth/pubs/ebayes.pdf



    Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963.
    http://dx.doi.org/10.1214/16-AOAS920






    share|cite|improve this answer





















    • 1





      I don't think this interpretation is correct. For example, mixed effects models borrow information, yet can be analyzed in a traditional Bayesian context

      – Cliff AB
      Dec 13 '18 at 6:55






    • 1





      @CliffAB If you dig into mixed model analyses, you will find that the analysis is virtually always empirical Bayes rather than true Bayes. Most authors of course will say they are doing Bayes when it is actually EB because most authors don't make the distinction. If you think can you give an example of a true Bayes mixed model analysis, then I invite you to do so.

      – Gordon Smyth
      Dec 13 '18 at 7:12








    • 1





      @CliffAB In the minority of cases when a true Bayes analysis is used for mixed models (e.g., by MCMC or Winbugs) then use of the term "borrow information" would be IMO out of place. It would certainly disagree with the what Tukey and Efron meant by "borrowing".

      – Gordon Smyth
      Dec 13 '18 at 7:33













    • The package brms is a recent extremely popular package for fitting Bayesian (not EB!) regression models, including mixed effects. It's all the rage these days in many applied fields.

      – Cliff AB
      Dec 13 '18 at 10:28






    • 1





      @CliffAB I agree that brms is Bayesian package, which is why the term "borrow information" doesn't appear in the brms documentation.

      – Gordon Smyth
      Dec 13 '18 at 10:54














    10












    10








    10







    This is a term that is specifically from empirical Bayes (EB), in fact the concept that it refers to does not exist in true Bayesian inference. The original term was "borrowing strength", which was coined by John Tukey back in the 1960s and popularized further by Bradley Efron and Carl Morris in a series of statistical articles on Stein's paradox and parametric EB in the 1970s and 1980s. Many people now use "information borrowing" or "information sharing" as synonyms for the same concept. The reason why you may hear it in the context of mixed models is that the most common analyses for mixed models have an EB interpretation.



    EB has many applications and applies to many statistical models, but the context always is that you have a large number of (possibly independent) cases and you are trying to estimate a particular parameter (such as the mean or variance) in each case. In Bayesian inference, you make posterior inferences about the parameter based on both the observed data for each case and the prior distribution for that parameter. In EB inference the prior distribution for the parameter is estimated from the whole collection of data cases, after which inference proceeds as for Bayesian inference. Hence, when you estimate the parameter for particular case, you are use both the data for that case and also the estimated prior distribution, and the latter represents the "information" or "strength" that you borrow from the whole ensemble of cases when making inference about one particular case.



    Now you can see why EB has "borrowing" but true Bayes does not. In true Bayes, the prior distribution already exists and so doesn't need to be begged or borrowed. In EB, the prior distribution has be created from the observed data itself. When we make inference about a particular case, we use all the observed information from that case and a little bit of information from each of the other cases. We say it is only "borrowed", because the information is given back when we move on to make inference about the next case.



    The idea of EB and "information borrowing" is used heavily in statistical genomics, when each "case" is usually a gene or a genomic feature (Smyth, 2004; Phipson et al, 2016).



    References



    Efron, Bradley, and Carl Morris. Stein's paradox in statistics. Scientific American 236, no. 5 (1977): 119-127. http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf



    Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1, Article 3.
    http://www.statsci.org/smyth/pubs/ebayes.pdf



    Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963.
    http://dx.doi.org/10.1214/16-AOAS920






    share|cite|improve this answer















    This is a term that is specifically from empirical Bayes (EB), in fact the concept that it refers to does not exist in true Bayesian inference. The original term was "borrowing strength", which was coined by John Tukey back in the 1960s and popularized further by Bradley Efron and Carl Morris in a series of statistical articles on Stein's paradox and parametric EB in the 1970s and 1980s. Many people now use "information borrowing" or "information sharing" as synonyms for the same concept. The reason why you may hear it in the context of mixed models is that the most common analyses for mixed models have an EB interpretation.



    EB has many applications and applies to many statistical models, but the context always is that you have a large number of (possibly independent) cases and you are trying to estimate a particular parameter (such as the mean or variance) in each case. In Bayesian inference, you make posterior inferences about the parameter based on both the observed data for each case and the prior distribution for that parameter. In EB inference the prior distribution for the parameter is estimated from the whole collection of data cases, after which inference proceeds as for Bayesian inference. Hence, when you estimate the parameter for particular case, you are use both the data for that case and also the estimated prior distribution, and the latter represents the "information" or "strength" that you borrow from the whole ensemble of cases when making inference about one particular case.



    Now you can see why EB has "borrowing" but true Bayes does not. In true Bayes, the prior distribution already exists and so doesn't need to be begged or borrowed. In EB, the prior distribution has be created from the observed data itself. When we make inference about a particular case, we use all the observed information from that case and a little bit of information from each of the other cases. We say it is only "borrowed", because the information is given back when we move on to make inference about the next case.



    The idea of EB and "information borrowing" is used heavily in statistical genomics, when each "case" is usually a gene or a genomic feature (Smyth, 2004; Phipson et al, 2016).



    References



    Efron, Bradley, and Carl Morris. Stein's paradox in statistics. Scientific American 236, no. 5 (1977): 119-127. http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf



    Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1, Article 3.
    http://www.statsci.org/smyth/pubs/ebayes.pdf



    Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963.
    http://dx.doi.org/10.1214/16-AOAS920







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited Dec 13 '18 at 11:05

























    answered Dec 13 '18 at 6:22









    Gordon SmythGordon Smyth

    4,6531125




    4,6531125








    • 1





      I don't think this interpretation is correct. For example, mixed effects models borrow information, yet can be analyzed in a traditional Bayesian context

      – Cliff AB
      Dec 13 '18 at 6:55






    • 1





      @CliffAB If you dig into mixed model analyses, you will find that the analysis is virtually always empirical Bayes rather than true Bayes. Most authors of course will say they are doing Bayes when it is actually EB because most authors don't make the distinction. If you think can you give an example of a true Bayes mixed model analysis, then I invite you to do so.

      – Gordon Smyth
      Dec 13 '18 at 7:12








    • 1





      @CliffAB In the minority of cases when a true Bayes analysis is used for mixed models (e.g., by MCMC or Winbugs) then use of the term "borrow information" would be IMO out of place. It would certainly disagree with the what Tukey and Efron meant by "borrowing".

      – Gordon Smyth
      Dec 13 '18 at 7:33













    • The package brms is a recent extremely popular package for fitting Bayesian (not EB!) regression models, including mixed effects. It's all the rage these days in many applied fields.

      – Cliff AB
      Dec 13 '18 at 10:28






    • 1





      @CliffAB I agree that brms is Bayesian package, which is why the term "borrow information" doesn't appear in the brms documentation.

      – Gordon Smyth
      Dec 13 '18 at 10:54














    • 1





      I don't think this interpretation is correct. For example, mixed effects models borrow information, yet can be analyzed in a traditional Bayesian context

      – Cliff AB
      Dec 13 '18 at 6:55






    • 1





      @CliffAB If you dig into mixed model analyses, you will find that the analysis is virtually always empirical Bayes rather than true Bayes. Most authors of course will say they are doing Bayes when it is actually EB because most authors don't make the distinction. If you think can you give an example of a true Bayes mixed model analysis, then I invite you to do so.

      – Gordon Smyth
      Dec 13 '18 at 7:12








    • 1





      @CliffAB In the minority of cases when a true Bayes analysis is used for mixed models (e.g., by MCMC or Winbugs) then use of the term "borrow information" would be IMO out of place. It would certainly disagree with the what Tukey and Efron meant by "borrowing".

      – Gordon Smyth
      Dec 13 '18 at 7:33













    • The package brms is a recent extremely popular package for fitting Bayesian (not EB!) regression models, including mixed effects. It's all the rage these days in many applied fields.

      – Cliff AB
      Dec 13 '18 at 10:28






    • 1





      @CliffAB I agree that brms is Bayesian package, which is why the term "borrow information" doesn't appear in the brms documentation.

      – Gordon Smyth
      Dec 13 '18 at 10:54








    1




    1





    I don't think this interpretation is correct. For example, mixed effects models borrow information, yet can be analyzed in a traditional Bayesian context

    – Cliff AB
    Dec 13 '18 at 6:55





    I don't think this interpretation is correct. For example, mixed effects models borrow information, yet can be analyzed in a traditional Bayesian context

    – Cliff AB
    Dec 13 '18 at 6:55




    1




    1





    @CliffAB If you dig into mixed model analyses, you will find that the analysis is virtually always empirical Bayes rather than true Bayes. Most authors of course will say they are doing Bayes when it is actually EB because most authors don't make the distinction. If you think can you give an example of a true Bayes mixed model analysis, then I invite you to do so.

    – Gordon Smyth
    Dec 13 '18 at 7:12







    @CliffAB If you dig into mixed model analyses, you will find that the analysis is virtually always empirical Bayes rather than true Bayes. Most authors of course will say they are doing Bayes when it is actually EB because most authors don't make the distinction. If you think can you give an example of a true Bayes mixed model analysis, then I invite you to do so.

    – Gordon Smyth
    Dec 13 '18 at 7:12






    1




    1





    @CliffAB In the minority of cases when a true Bayes analysis is used for mixed models (e.g., by MCMC or Winbugs) then use of the term "borrow information" would be IMO out of place. It would certainly disagree with the what Tukey and Efron meant by "borrowing".

    – Gordon Smyth
    Dec 13 '18 at 7:33







    @CliffAB In the minority of cases when a true Bayes analysis is used for mixed models (e.g., by MCMC or Winbugs) then use of the term "borrow information" would be IMO out of place. It would certainly disagree with the what Tukey and Efron meant by "borrowing".

    – Gordon Smyth
    Dec 13 '18 at 7:33















    The package brms is a recent extremely popular package for fitting Bayesian (not EB!) regression models, including mixed effects. It's all the rage these days in many applied fields.

    – Cliff AB
    Dec 13 '18 at 10:28





    The package brms is a recent extremely popular package for fitting Bayesian (not EB!) regression models, including mixed effects. It's all the rage these days in many applied fields.

    – Cliff AB
    Dec 13 '18 at 10:28




    1




    1





    @CliffAB I agree that brms is Bayesian package, which is why the term "borrow information" doesn't appear in the brms documentation.

    – Gordon Smyth
    Dec 13 '18 at 10:54





    @CliffAB I agree that brms is Bayesian package, which is why the term "borrow information" doesn't appear in the brms documentation.

    – Gordon Smyth
    Dec 13 '18 at 10:54













    5














    Consider a simple problem like estimating means of multiple groups. If your model treats them as completely unrelated then the only information you have about each mean is the information within that group. If your model treats their means as somewhat related (such as in some mixed-effects type model) then the estimates will be more precise because information from other groups informs (regularizes, shrinks toward a common mean) the estimate for a given group. That's an example of 'borrowing information'.



    The notion crops up in actuarial work related to credibility (not necessarily with that specific term of 'borrowing' though borrowing in that sense is explicit in the formulas); this goes back a long way, to at least a century ago, with clear precursors going back to the mid-nineteenth century. For example, see Longley-Cook, L.H. (1962) An introduction to credibility theory PCAS, 49, 194-221.



    Here's Whitney, 1918
    (The Theory of Experience Rating, PCAS, 4, 274-292):




    Here
    is
    a
    risk,
    for
    instance,
    that
    is
    clearly
    to
    be
    classified
    as
    a
    machine
    shop.
    In
    the
    absence
    of
    other
    information
    it
    should
    therefore
    fake
    the
    machine
    shop
    rate,
    namely,
    the
    average
    rate
    for
    all
    risks
    of
    this
    class.
    On
    the
    other
    hand
    the
    risk
    has
    had
    an
    experience
    of
    its
    own.
    If
    the
    risk
    is
    large,
    this
    may
    be
    a
    better
    guide
    to
    its
    hazard
    than
    the
    class-experience.
    In
    any
    event,
    whether
    the
    risk
    is
    large
    or
    small,
    both
    of
    these
    elements
    have
    their
    value
    as
    evidence,
    and
    both
    must
    be
    taken
    into
    account.
    The
    difficulty
    arises
    from
    the
    fact
    that
    in
    general
    the
    evidence
    is
    contradictory;
    the
    problem
    therefore
    is
    to
    find
    and
    apply
    a
    criterion
    which
    will
    give
    each
    its
    proper
    weight.




    While the term borrowing is absent here the notion of using the group-level information to inform us about this machine shop is clearly there. [The notions remain unchanged when "borrowing strength" and "borrowing information" start to be applied to this situation]






    share|cite|improve this answer





















    • 1





      I appreciate the example, as it clearly explains what borrowing does, but I'm looking for a more precise definition.

      – EliK
      Dec 13 '18 at 19:17











    • A precise definition of an imprecise, intuitive term? I suppose one might be possible - one might perhaps define it in terms of reducing variance by relating parameters across groups but one could very easily exclude plausible uses of the notion by doing so

      – Glen_b
      Dec 14 '18 at 3:41













    • It wasn't clear to me whether or not the imprecise intuition had an actual definition.

      – EliK
      Dec 14 '18 at 3:43
















    5














    Consider a simple problem like estimating means of multiple groups. If your model treats them as completely unrelated then the only information you have about each mean is the information within that group. If your model treats their means as somewhat related (such as in some mixed-effects type model) then the estimates will be more precise because information from other groups informs (regularizes, shrinks toward a common mean) the estimate for a given group. That's an example of 'borrowing information'.



    The notion crops up in actuarial work related to credibility (not necessarily with that specific term of 'borrowing' though borrowing in that sense is explicit in the formulas); this goes back a long way, to at least a century ago, with clear precursors going back to the mid-nineteenth century. For example, see Longley-Cook, L.H. (1962) An introduction to credibility theory PCAS, 49, 194-221.



    Here's Whitney, 1918
    (The Theory of Experience Rating, PCAS, 4, 274-292):




    Here
    is
    a
    risk,
    for
    instance,
    that
    is
    clearly
    to
    be
    classified
    as
    a
    machine
    shop.
    In
    the
    absence
    of
    other
    information
    it
    should
    therefore
    fake
    the
    machine
    shop
    rate,
    namely,
    the
    average
    rate
    for
    all
    risks
    of
    this
    class.
    On
    the
    other
    hand
    the
    risk
    has
    had
    an
    experience
    of
    its
    own.
    If
    the
    risk
    is
    large,
    this
    may
    be
    a
    better
    guide
    to
    its
    hazard
    than
    the
    class-experience.
    In
    any
    event,
    whether
    the
    risk
    is
    large
    or
    small,
    both
    of
    these
    elements
    have
    their
    value
    as
    evidence,
    and
    both
    must
    be
    taken
    into
    account.
    The
    difficulty
    arises
    from
    the
    fact
    that
    in
    general
    the
    evidence
    is
    contradictory;
    the
    problem
    therefore
    is
    to
    find
    and
    apply
    a
    criterion
    which
    will
    give
    each
    its
    proper
    weight.




    While the term borrowing is absent here the notion of using the group-level information to inform us about this machine shop is clearly there. [The notions remain unchanged when "borrowing strength" and "borrowing information" start to be applied to this situation]






    share|cite|improve this answer





















    • 1





      I appreciate the example, as it clearly explains what borrowing does, but I'm looking for a more precise definition.

      – EliK
      Dec 13 '18 at 19:17











    • A precise definition of an imprecise, intuitive term? I suppose one might be possible - one might perhaps define it in terms of reducing variance by relating parameters across groups but one could very easily exclude plausible uses of the notion by doing so

      – Glen_b
      Dec 14 '18 at 3:41













    • It wasn't clear to me whether or not the imprecise intuition had an actual definition.

      – EliK
      Dec 14 '18 at 3:43














    5












    5








    5







    Consider a simple problem like estimating means of multiple groups. If your model treats them as completely unrelated then the only information you have about each mean is the information within that group. If your model treats their means as somewhat related (such as in some mixed-effects type model) then the estimates will be more precise because information from other groups informs (regularizes, shrinks toward a common mean) the estimate for a given group. That's an example of 'borrowing information'.



    The notion crops up in actuarial work related to credibility (not necessarily with that specific term of 'borrowing' though borrowing in that sense is explicit in the formulas); this goes back a long way, to at least a century ago, with clear precursors going back to the mid-nineteenth century. For example, see Longley-Cook, L.H. (1962) An introduction to credibility theory PCAS, 49, 194-221.



    Here's Whitney, 1918
    (The Theory of Experience Rating, PCAS, 4, 274-292):




    Here
    is
    a
    risk,
    for
    instance,
    that
    is
    clearly
    to
    be
    classified
    as
    a
    machine
    shop.
    In
    the
    absence
    of
    other
    information
    it
    should
    therefore
    fake
    the
    machine
    shop
    rate,
    namely,
    the
    average
    rate
    for
    all
    risks
    of
    this
    class.
    On
    the
    other
    hand
    the
    risk
    has
    had
    an
    experience
    of
    its
    own.
    If
    the
    risk
    is
    large,
    this
    may
    be
    a
    better
    guide
    to
    its
    hazard
    than
    the
    class-experience.
    In
    any
    event,
    whether
    the
    risk
    is
    large
    or
    small,
    both
    of
    these
    elements
    have
    their
    value
    as
    evidence,
    and
    both
    must
    be
    taken
    into
    account.
    The
    difficulty
    arises
    from
    the
    fact
    that
    in
    general
    the
    evidence
    is
    contradictory;
    the
    problem
    therefore
    is
    to
    find
    and
    apply
    a
    criterion
    which
    will
    give
    each
    its
    proper
    weight.




    While the term borrowing is absent here the notion of using the group-level information to inform us about this machine shop is clearly there. [The notions remain unchanged when "borrowing strength" and "borrowing information" start to be applied to this situation]






    share|cite|improve this answer















    Consider a simple problem like estimating means of multiple groups. If your model treats them as completely unrelated then the only information you have about each mean is the information within that group. If your model treats their means as somewhat related (such as in some mixed-effects type model) then the estimates will be more precise because information from other groups informs (regularizes, shrinks toward a common mean) the estimate for a given group. That's an example of 'borrowing information'.



    The notion crops up in actuarial work related to credibility (not necessarily with that specific term of 'borrowing' though borrowing in that sense is explicit in the formulas); this goes back a long way, to at least a century ago, with clear precursors going back to the mid-nineteenth century. For example, see Longley-Cook, L.H. (1962) An introduction to credibility theory PCAS, 49, 194-221.



    Here's Whitney, 1918
    (The Theory of Experience Rating, PCAS, 4, 274-292):




    Here
    is
    a
    risk,
    for
    instance,
    that
    is
    clearly
    to
    be
    classified
    as
    a
    machine
    shop.
    In
    the
    absence
    of
    other
    information
    it
    should
    therefore
    fake
    the
    machine
    shop
    rate,
    namely,
    the
    average
    rate
    for
    all
    risks
    of
    this
    class.
    On
    the
    other
    hand
    the
    risk
    has
    had
    an
    experience
    of
    its
    own.
    If
    the
    risk
    is
    large,
    this
    may
    be
    a
    better
    guide
    to
    its
    hazard
    than
    the
    class-experience.
    In
    any
    event,
    whether
    the
    risk
    is
    large
    or
    small,
    both
    of
    these
    elements
    have
    their
    value
    as
    evidence,
    and
    both
    must
    be
    taken
    into
    account.
    The
    difficulty
    arises
    from
    the
    fact
    that
    in
    general
    the
    evidence
    is
    contradictory;
    the
    problem
    therefore
    is
    to
    find
    and
    apply
    a
    criterion
    which
    will
    give
    each
    its
    proper
    weight.




    While the term borrowing is absent here the notion of using the group-level information to inform us about this machine shop is clearly there. [The notions remain unchanged when "borrowing strength" and "borrowing information" start to be applied to this situation]







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited Dec 14 '18 at 4:12

























    answered Dec 13 '18 at 3:04









    Glen_bGlen_b

    209k22398739




    209k22398739








    • 1





      I appreciate the example, as it clearly explains what borrowing does, but I'm looking for a more precise definition.

      – EliK
      Dec 13 '18 at 19:17











    • A precise definition of an imprecise, intuitive term? I suppose one might be possible - one might perhaps define it in terms of reducing variance by relating parameters across groups but one could very easily exclude plausible uses of the notion by doing so

      – Glen_b
      Dec 14 '18 at 3:41













    • It wasn't clear to me whether or not the imprecise intuition had an actual definition.

      – EliK
      Dec 14 '18 at 3:43














    • 1





      I appreciate the example, as it clearly explains what borrowing does, but I'm looking for a more precise definition.

      – EliK
      Dec 13 '18 at 19:17











    • A precise definition of an imprecise, intuitive term? I suppose one might be possible - one might perhaps define it in terms of reducing variance by relating parameters across groups but one could very easily exclude plausible uses of the notion by doing so

      – Glen_b
      Dec 14 '18 at 3:41













    • It wasn't clear to me whether or not the imprecise intuition had an actual definition.

      – EliK
      Dec 14 '18 at 3:43








    1




    1





    I appreciate the example, as it clearly explains what borrowing does, but I'm looking for a more precise definition.

    – EliK
    Dec 13 '18 at 19:17





    I appreciate the example, as it clearly explains what borrowing does, but I'm looking for a more precise definition.

    – EliK
    Dec 13 '18 at 19:17













    A precise definition of an imprecise, intuitive term? I suppose one might be possible - one might perhaps define it in terms of reducing variance by relating parameters across groups but one could very easily exclude plausible uses of the notion by doing so

    – Glen_b
    Dec 14 '18 at 3:41







    A precise definition of an imprecise, intuitive term? I suppose one might be possible - one might perhaps define it in terms of reducing variance by relating parameters across groups but one could very easily exclude plausible uses of the notion by doing so

    – Glen_b
    Dec 14 '18 at 3:41















    It wasn't clear to me whether or not the imprecise intuition had an actual definition.

    – EliK
    Dec 14 '18 at 3:43





    It wasn't clear to me whether or not the imprecise intuition had an actual definition.

    – EliK
    Dec 14 '18 at 3:43











    3














    The most commonly known model that "borrows information" is that of a mixed effects model. This can be analyzed in either the Frequentist or Bayesian setting. The Frequentist method actually has an Empirical Bayes interpretation to it; there's a prior on the mixed effects which, based on $sigma_R^2$, the variance of the random effects. Rather than setting based on prior information, we estimate it from our data.



    On the other hand, from the Bayesian perspective, we are not putting a prior on the mixed effects, but rather they are a mid level parameter. That is, we put a prior on $sigma_R^2$, which then acts as like a hyper-parameter for the random effects, but it is different than a traditional prior in that the distribution placed on the random effects is not based purely on prior information, but rather a mix of prior information (i.e., prior on $sigma_R^2$) and the data.



    I think it's pretty clear that "borrowing information" is not something purely Bayesian; there are non-Bayesian mixed effects models and these borrow information. However, based on my experience playing around with mixed effects models, I think Bayesian approach to such models is a little more important than some people realize. In particular, in a mixed effect model, one should think that we are estimating $sigma_R^2$ with, at best, the number of individual subjects we have. So if we have 10 subjects measured 100 times, we are still estimating $sigma_R^2$ from only 10 subjects. Not only that, but we don't actually even observe the random effects directly, but rather we just have estimates of them that are derived from the data and $sigma_R$ themselves. So it can be easy to forget just how little information based on the data we actually have to estimate $sigma_R^2$. The less information in the data, the more important the prior information becomes. If you haven't done so yet, I suggest trying to simulate mixed effects models with only a few subjects. You might be surprised just how unstable the estimates from Frequentist methods are, especially when you add just one or two outliers...and how often does one see real datasets without outliers? I believe this issue is covered in Bayesian Data Analysis by Gelman et al, but sadly I don't think its publicly available so no hyperlink.



    Finally, multilevel modeling is not just mixed effects, although they are the most common. Any model in which parameters are influenced not just by priors and data, but also other unknown parameters can be called a multilevel model. Of course, this is a very flexible set of models, but can written up from scratch and fit with a minimal amount of work using tools like Stan, NIMBLE, JAGS, etc. To this extent, I'm not sure I would say multilevel modeling is "hype"; basically, you can write up any model that can be represented as a Directed Acyclic Graph and fit it immediately (assuming it has a reasonable run time, that is). This gives a whole lot more power and potential creativity than traditional choices (i.e., regression model packages) yet does not require one to build an entire R package from scratch just to fit a new type of model.






    share|cite|improve this answer
























    • Thank you for the answer. To clarify I was not suggesting multi-level modeling is "hype". I was asking if "information borrowing" has a precise meaning or if that particular term is just hype.

      – EliK
      Dec 13 '18 at 13:03











    • @EliK: I'm not sure it has a precise meaning; Gordon Smyth gives what some may consider a precise meaning, i.e., Empirical Bayes, but the way I see that term commonly used now doesn't doesn't seem to fit that meaning. Personally, I don't think it's just a hype term; it's exactly the motivation for using mixed effects models over fixed effects models, although this extends beyond just the standard regression model framework. I do think a lot of people say the more vague "multilevel modeling" instead of the more precise "mixed effects modeling" because it's more fashionable now though.

      – Cliff AB
      Dec 13 '18 at 13:09











    • I would say the hype is in ML papers and blogs, where it is argued that you need Bayesian models to implement multilevel models. I would be interested in a worked example - where one compares against crossvalidated regularised model (for prediction)

      – seanv507
      Dec 13 '18 at 13:39











    • For what it's worth, the only alternative to Bayesian is Maximum Likelihood, which is just Bayesian with a uniform prior. So that's not really wrong.

      – shadowtalker
      Dec 13 '18 at 19:05






    • 1





      @shadowtalker: if you consider MLE methods to Bayesian, then the word Bayesian is basically meaningless in statistics. However, this is consistent with some of the mistakes I see in the ML literature.

      – Cliff AB
      Dec 13 '18 at 19:10
















    3














    The most commonly known model that "borrows information" is that of a mixed effects model. This can be analyzed in either the Frequentist or Bayesian setting. The Frequentist method actually has an Empirical Bayes interpretation to it; there's a prior on the mixed effects which, based on $sigma_R^2$, the variance of the random effects. Rather than setting based on prior information, we estimate it from our data.



    On the other hand, from the Bayesian perspective, we are not putting a prior on the mixed effects, but rather they are a mid level parameter. That is, we put a prior on $sigma_R^2$, which then acts as like a hyper-parameter for the random effects, but it is different than a traditional prior in that the distribution placed on the random effects is not based purely on prior information, but rather a mix of prior information (i.e., prior on $sigma_R^2$) and the data.



    I think it's pretty clear that "borrowing information" is not something purely Bayesian; there are non-Bayesian mixed effects models and these borrow information. However, based on my experience playing around with mixed effects models, I think Bayesian approach to such models is a little more important than some people realize. In particular, in a mixed effect model, one should think that we are estimating $sigma_R^2$ with, at best, the number of individual subjects we have. So if we have 10 subjects measured 100 times, we are still estimating $sigma_R^2$ from only 10 subjects. Not only that, but we don't actually even observe the random effects directly, but rather we just have estimates of them that are derived from the data and $sigma_R$ themselves. So it can be easy to forget just how little information based on the data we actually have to estimate $sigma_R^2$. The less information in the data, the more important the prior information becomes. If you haven't done so yet, I suggest trying to simulate mixed effects models with only a few subjects. You might be surprised just how unstable the estimates from Frequentist methods are, especially when you add just one or two outliers...and how often does one see real datasets without outliers? I believe this issue is covered in Bayesian Data Analysis by Gelman et al, but sadly I don't think its publicly available so no hyperlink.



    Finally, multilevel modeling is not just mixed effects, although they are the most common. Any model in which parameters are influenced not just by priors and data, but also other unknown parameters can be called a multilevel model. Of course, this is a very flexible set of models, but can written up from scratch and fit with a minimal amount of work using tools like Stan, NIMBLE, JAGS, etc. To this extent, I'm not sure I would say multilevel modeling is "hype"; basically, you can write up any model that can be represented as a Directed Acyclic Graph and fit it immediately (assuming it has a reasonable run time, that is). This gives a whole lot more power and potential creativity than traditional choices (i.e., regression model packages) yet does not require one to build an entire R package from scratch just to fit a new type of model.






    share|cite|improve this answer
























    • Thank you for the answer. To clarify I was not suggesting multi-level modeling is "hype". I was asking if "information borrowing" has a precise meaning or if that particular term is just hype.

      – EliK
      Dec 13 '18 at 13:03











    • @EliK: I'm not sure it has a precise meaning; Gordon Smyth gives what some may consider a precise meaning, i.e., Empirical Bayes, but the way I see that term commonly used now doesn't doesn't seem to fit that meaning. Personally, I don't think it's just a hype term; it's exactly the motivation for using mixed effects models over fixed effects models, although this extends beyond just the standard regression model framework. I do think a lot of people say the more vague "multilevel modeling" instead of the more precise "mixed effects modeling" because it's more fashionable now though.

      – Cliff AB
      Dec 13 '18 at 13:09











    • I would say the hype is in ML papers and blogs, where it is argued that you need Bayesian models to implement multilevel models. I would be interested in a worked example - where one compares against crossvalidated regularised model (for prediction)

      – seanv507
      Dec 13 '18 at 13:39











    • For what it's worth, the only alternative to Bayesian is Maximum Likelihood, which is just Bayesian with a uniform prior. So that's not really wrong.

      – shadowtalker
      Dec 13 '18 at 19:05






    • 1





      @shadowtalker: if you consider MLE methods to Bayesian, then the word Bayesian is basically meaningless in statistics. However, this is consistent with some of the mistakes I see in the ML literature.

      – Cliff AB
      Dec 13 '18 at 19:10














    3












    3








    3







    The most commonly known model that "borrows information" is that of a mixed effects model. This can be analyzed in either the Frequentist or Bayesian setting. The Frequentist method actually has an Empirical Bayes interpretation to it; there's a prior on the mixed effects which, based on $sigma_R^2$, the variance of the random effects. Rather than setting based on prior information, we estimate it from our data.



    On the other hand, from the Bayesian perspective, we are not putting a prior on the mixed effects, but rather they are a mid level parameter. That is, we put a prior on $sigma_R^2$, which then acts as like a hyper-parameter for the random effects, but it is different than a traditional prior in that the distribution placed on the random effects is not based purely on prior information, but rather a mix of prior information (i.e., prior on $sigma_R^2$) and the data.



    I think it's pretty clear that "borrowing information" is not something purely Bayesian; there are non-Bayesian mixed effects models and these borrow information. However, based on my experience playing around with mixed effects models, I think Bayesian approach to such models is a little more important than some people realize. In particular, in a mixed effect model, one should think that we are estimating $sigma_R^2$ with, at best, the number of individual subjects we have. So if we have 10 subjects measured 100 times, we are still estimating $sigma_R^2$ from only 10 subjects. Not only that, but we don't actually even observe the random effects directly, but rather we just have estimates of them that are derived from the data and $sigma_R$ themselves. So it can be easy to forget just how little information based on the data we actually have to estimate $sigma_R^2$. The less information in the data, the more important the prior information becomes. If you haven't done so yet, I suggest trying to simulate mixed effects models with only a few subjects. You might be surprised just how unstable the estimates from Frequentist methods are, especially when you add just one or two outliers...and how often does one see real datasets without outliers? I believe this issue is covered in Bayesian Data Analysis by Gelman et al, but sadly I don't think its publicly available so no hyperlink.



    Finally, multilevel modeling is not just mixed effects, although they are the most common. Any model in which parameters are influenced not just by priors and data, but also other unknown parameters can be called a multilevel model. Of course, this is a very flexible set of models, but can written up from scratch and fit with a minimal amount of work using tools like Stan, NIMBLE, JAGS, etc. To this extent, I'm not sure I would say multilevel modeling is "hype"; basically, you can write up any model that can be represented as a Directed Acyclic Graph and fit it immediately (assuming it has a reasonable run time, that is). This gives a whole lot more power and potential creativity than traditional choices (i.e., regression model packages) yet does not require one to build an entire R package from scratch just to fit a new type of model.






    share|cite|improve this answer













    The most commonly known model that "borrows information" is that of a mixed effects model. This can be analyzed in either the Frequentist or Bayesian setting. The Frequentist method actually has an Empirical Bayes interpretation to it; there's a prior on the mixed effects which, based on $sigma_R^2$, the variance of the random effects. Rather than setting based on prior information, we estimate it from our data.



    On the other hand, from the Bayesian perspective, we are not putting a prior on the mixed effects, but rather they are a mid level parameter. That is, we put a prior on $sigma_R^2$, which then acts as like a hyper-parameter for the random effects, but it is different than a traditional prior in that the distribution placed on the random effects is not based purely on prior information, but rather a mix of prior information (i.e., prior on $sigma_R^2$) and the data.



    I think it's pretty clear that "borrowing information" is not something purely Bayesian; there are non-Bayesian mixed effects models and these borrow information. However, based on my experience playing around with mixed effects models, I think Bayesian approach to such models is a little more important than some people realize. In particular, in a mixed effect model, one should think that we are estimating $sigma_R^2$ with, at best, the number of individual subjects we have. So if we have 10 subjects measured 100 times, we are still estimating $sigma_R^2$ from only 10 subjects. Not only that, but we don't actually even observe the random effects directly, but rather we just have estimates of them that are derived from the data and $sigma_R$ themselves. So it can be easy to forget just how little information based on the data we actually have to estimate $sigma_R^2$. The less information in the data, the more important the prior information becomes. If you haven't done so yet, I suggest trying to simulate mixed effects models with only a few subjects. You might be surprised just how unstable the estimates from Frequentist methods are, especially when you add just one or two outliers...and how often does one see real datasets without outliers? I believe this issue is covered in Bayesian Data Analysis by Gelman et al, but sadly I don't think its publicly available so no hyperlink.



    Finally, multilevel modeling is not just mixed effects, although they are the most common. Any model in which parameters are influenced not just by priors and data, but also other unknown parameters can be called a multilevel model. Of course, this is a very flexible set of models, but can written up from scratch and fit with a minimal amount of work using tools like Stan, NIMBLE, JAGS, etc. To this extent, I'm not sure I would say multilevel modeling is "hype"; basically, you can write up any model that can be represented as a Directed Acyclic Graph and fit it immediately (assuming it has a reasonable run time, that is). This gives a whole lot more power and potential creativity than traditional choices (i.e., regression model packages) yet does not require one to build an entire R package from scratch just to fit a new type of model.







    share|cite|improve this answer












    share|cite|improve this answer



    share|cite|improve this answer










    answered Dec 13 '18 at 12:57









    Cliff ABCliff AB

    12.6k12362




    12.6k12362













    • Thank you for the answer. To clarify I was not suggesting multi-level modeling is "hype". I was asking if "information borrowing" has a precise meaning or if that particular term is just hype.

      – EliK
      Dec 13 '18 at 13:03











    • @EliK: I'm not sure it has a precise meaning; Gordon Smyth gives what some may consider a precise meaning, i.e., Empirical Bayes, but the way I see that term commonly used now doesn't doesn't seem to fit that meaning. Personally, I don't think it's just a hype term; it's exactly the motivation for using mixed effects models over fixed effects models, although this extends beyond just the standard regression model framework. I do think a lot of people say the more vague "multilevel modeling" instead of the more precise "mixed effects modeling" because it's more fashionable now though.

      – Cliff AB
      Dec 13 '18 at 13:09











    • I would say the hype is in ML papers and blogs, where it is argued that you need Bayesian models to implement multilevel models. I would be interested in a worked example - where one compares against crossvalidated regularised model (for prediction)

      – seanv507
      Dec 13 '18 at 13:39











    • For what it's worth, the only alternative to Bayesian is Maximum Likelihood, which is just Bayesian with a uniform prior. So that's not really wrong.

      – shadowtalker
      Dec 13 '18 at 19:05






    • 1





      @shadowtalker: if you consider MLE methods to Bayesian, then the word Bayesian is basically meaningless in statistics. However, this is consistent with some of the mistakes I see in the ML literature.

      – Cliff AB
      Dec 13 '18 at 19:10



















    • Thank you for the answer. To clarify I was not suggesting multi-level modeling is "hype". I was asking if "information borrowing" has a precise meaning or if that particular term is just hype.

      – EliK
      Dec 13 '18 at 13:03











    • @EliK: I'm not sure it has a precise meaning; Gordon Smyth gives what some may consider a precise meaning, i.e., Empirical Bayes, but the way I see that term commonly used now doesn't doesn't seem to fit that meaning. Personally, I don't think it's just a hype term; it's exactly the motivation for using mixed effects models over fixed effects models, although this extends beyond just the standard regression model framework. I do think a lot of people say the more vague "multilevel modeling" instead of the more precise "mixed effects modeling" because it's more fashionable now though.

      – Cliff AB
      Dec 13 '18 at 13:09











    • I would say the hype is in ML papers and blogs, where it is argued that you need Bayesian models to implement multilevel models. I would be interested in a worked example - where one compares against crossvalidated regularised model (for prediction)

      – seanv507
      Dec 13 '18 at 13:39











    • For what it's worth, the only alternative to Bayesian is Maximum Likelihood, which is just Bayesian with a uniform prior. So that's not really wrong.

      – shadowtalker
      Dec 13 '18 at 19:05






    • 1





      @shadowtalker: if you consider MLE methods to Bayesian, then the word Bayesian is basically meaningless in statistics. However, this is consistent with some of the mistakes I see in the ML literature.

      – Cliff AB
      Dec 13 '18 at 19:10

















    Thank you for the answer. To clarify I was not suggesting multi-level modeling is "hype". I was asking if "information borrowing" has a precise meaning or if that particular term is just hype.

    – EliK
    Dec 13 '18 at 13:03





    Thank you for the answer. To clarify I was not suggesting multi-level modeling is "hype". I was asking if "information borrowing" has a precise meaning or if that particular term is just hype.

    – EliK
    Dec 13 '18 at 13:03













    @EliK: I'm not sure it has a precise meaning; Gordon Smyth gives what some may consider a precise meaning, i.e., Empirical Bayes, but the way I see that term commonly used now doesn't doesn't seem to fit that meaning. Personally, I don't think it's just a hype term; it's exactly the motivation for using mixed effects models over fixed effects models, although this extends beyond just the standard regression model framework. I do think a lot of people say the more vague "multilevel modeling" instead of the more precise "mixed effects modeling" because it's more fashionable now though.

    – Cliff AB
    Dec 13 '18 at 13:09





    @EliK: I'm not sure it has a precise meaning; Gordon Smyth gives what some may consider a precise meaning, i.e., Empirical Bayes, but the way I see that term commonly used now doesn't doesn't seem to fit that meaning. Personally, I don't think it's just a hype term; it's exactly the motivation for using mixed effects models over fixed effects models, although this extends beyond just the standard regression model framework. I do think a lot of people say the more vague "multilevel modeling" instead of the more precise "mixed effects modeling" because it's more fashionable now though.

    – Cliff AB
    Dec 13 '18 at 13:09













    I would say the hype is in ML papers and blogs, where it is argued that you need Bayesian models to implement multilevel models. I would be interested in a worked example - where one compares against crossvalidated regularised model (for prediction)

    – seanv507
    Dec 13 '18 at 13:39





    I would say the hype is in ML papers and blogs, where it is argued that you need Bayesian models to implement multilevel models. I would be interested in a worked example - where one compares against crossvalidated regularised model (for prediction)

    – seanv507
    Dec 13 '18 at 13:39













    For what it's worth, the only alternative to Bayesian is Maximum Likelihood, which is just Bayesian with a uniform prior. So that's not really wrong.

    – shadowtalker
    Dec 13 '18 at 19:05





    For what it's worth, the only alternative to Bayesian is Maximum Likelihood, which is just Bayesian with a uniform prior. So that's not really wrong.

    – shadowtalker
    Dec 13 '18 at 19:05




    1




    1





    @shadowtalker: if you consider MLE methods to Bayesian, then the word Bayesian is basically meaningless in statistics. However, this is consistent with some of the mistakes I see in the ML literature.

    – Cliff AB
    Dec 13 '18 at 19:10





    @shadowtalker: if you consider MLE methods to Bayesian, then the word Bayesian is basically meaningless in statistics. However, this is consistent with some of the mistakes I see in the ML literature.

    – Cliff AB
    Dec 13 '18 at 19:10











    2














    I am assuming, since you tagged machine learning that you are interested in prediction, rather than inference.(I believe I am aligned with @Glen_b 's answer, but just translating to this context/vocabulary)



    I would claim in this case it is a buzzword.
    A regularised linear model with a group variable will borrow information: the prediction at individual level will be a combination of the group mean and individual effect.
    One way to think of l1/l2 regularisation is that it is assigning a coefficient cost per reduction in total error, since a group variable affects more samples than an individual variable, there will be pressure to estimate a group effect,leaving a smaller deviation from group effect to each individual variable.



    For individual points with enough data, the individual effect will be 'strong ', for those with little data, the effect will be weak.



    I think the easiest way to see this is by considering L1 regularisation and 3 individuals of same group with same effect. Unregularised, the problem has an infinite numbers of solutions, whereas regularisation gives a unique solution.



    Assigning all the effect to the group coefficient has the lowest l1 norm, since we only need 1 value to cover 3 individuals. Conversely,assigning all the effect to the individual coefficients has the worst, namely 3 times the l1 norm of assigning the effect to the group coefficient.



    Note we can have as many hierarchies as we want, and interactions are affected similarly: regularisation will push effects to main variables,rather than rarer interactions.



    The blog tjmahr.com/plotting-partial-pooling-in-mixed-effects-models. – linked by @IsabellaGhement gives a quote for borrowing strength



    "This effect is sometimes called shrinkage, because more extreme values shrinkage are pulled towards a more reasonable, more average value. In the lme4 book, Douglas Bates provides an alternative to shrinkage [name]"




    The term “shrinkage” may have negative connotations. John Tukey
    preferred to refer to the process as the estimates for individual
    subjects “borrowing strength” from each other. This is a fundamental
    difference in the models underlying mixed-effects models versus
    strictly fixed effects models. In a mixed-effects model we assume that
    the levels of a grouping factor are a selection from a population and,
    as a result, can be expected to share characteristics to some degree.
    Consequently, the predictions from a mixed-effects model are
    attenuated relative to those from strictly fixed-effects models.







    share|cite|improve this answer


























    • What is prediction if not a specific kind of inference?

      – shadowtalker
      Dec 13 '18 at 19:06
















    2














    I am assuming, since you tagged machine learning that you are interested in prediction, rather than inference.(I believe I am aligned with @Glen_b 's answer, but just translating to this context/vocabulary)



    I would claim in this case it is a buzzword.
    A regularised linear model with a group variable will borrow information: the prediction at individual level will be a combination of the group mean and individual effect.
    One way to think of l1/l2 regularisation is that it is assigning a coefficient cost per reduction in total error, since a group variable affects more samples than an individual variable, there will be pressure to estimate a group effect,leaving a smaller deviation from group effect to each individual variable.



    For individual points with enough data, the individual effect will be 'strong ', for those with little data, the effect will be weak.



    I think the easiest way to see this is by considering L1 regularisation and 3 individuals of same group with same effect. Unregularised, the problem has an infinite numbers of solutions, whereas regularisation gives a unique solution.



    Assigning all the effect to the group coefficient has the lowest l1 norm, since we only need 1 value to cover 3 individuals. Conversely,assigning all the effect to the individual coefficients has the worst, namely 3 times the l1 norm of assigning the effect to the group coefficient.



    Note we can have as many hierarchies as we want, and interactions are affected similarly: regularisation will push effects to main variables,rather than rarer interactions.



    The blog tjmahr.com/plotting-partial-pooling-in-mixed-effects-models. – linked by @IsabellaGhement gives a quote for borrowing strength



    "This effect is sometimes called shrinkage, because more extreme values shrinkage are pulled towards a more reasonable, more average value. In the lme4 book, Douglas Bates provides an alternative to shrinkage [name]"




    The term “shrinkage” may have negative connotations. John Tukey
    preferred to refer to the process as the estimates for individual
    subjects “borrowing strength” from each other. This is a fundamental
    difference in the models underlying mixed-effects models versus
    strictly fixed effects models. In a mixed-effects model we assume that
    the levels of a grouping factor are a selection from a population and,
    as a result, can be expected to share characteristics to some degree.
    Consequently, the predictions from a mixed-effects model are
    attenuated relative to those from strictly fixed-effects models.







    share|cite|improve this answer


























    • What is prediction if not a specific kind of inference?

      – shadowtalker
      Dec 13 '18 at 19:06














    2












    2








    2







    I am assuming, since you tagged machine learning that you are interested in prediction, rather than inference.(I believe I am aligned with @Glen_b 's answer, but just translating to this context/vocabulary)



    I would claim in this case it is a buzzword.
    A regularised linear model with a group variable will borrow information: the prediction at individual level will be a combination of the group mean and individual effect.
    One way to think of l1/l2 regularisation is that it is assigning a coefficient cost per reduction in total error, since a group variable affects more samples than an individual variable, there will be pressure to estimate a group effect,leaving a smaller deviation from group effect to each individual variable.



    For individual points with enough data, the individual effect will be 'strong ', for those with little data, the effect will be weak.



    I think the easiest way to see this is by considering L1 regularisation and 3 individuals of same group with same effect. Unregularised, the problem has an infinite numbers of solutions, whereas regularisation gives a unique solution.



    Assigning all the effect to the group coefficient has the lowest l1 norm, since we only need 1 value to cover 3 individuals. Conversely,assigning all the effect to the individual coefficients has the worst, namely 3 times the l1 norm of assigning the effect to the group coefficient.



    Note we can have as many hierarchies as we want, and interactions are affected similarly: regularisation will push effects to main variables,rather than rarer interactions.



    The blog tjmahr.com/plotting-partial-pooling-in-mixed-effects-models. – linked by @IsabellaGhement gives a quote for borrowing strength



    "This effect is sometimes called shrinkage, because more extreme values shrinkage are pulled towards a more reasonable, more average value. In the lme4 book, Douglas Bates provides an alternative to shrinkage [name]"




    The term “shrinkage” may have negative connotations. John Tukey
    preferred to refer to the process as the estimates for individual
    subjects “borrowing strength” from each other. This is a fundamental
    difference in the models underlying mixed-effects models versus
    strictly fixed effects models. In a mixed-effects model we assume that
    the levels of a grouping factor are a selection from a population and,
    as a result, can be expected to share characteristics to some degree.
    Consequently, the predictions from a mixed-effects model are
    attenuated relative to those from strictly fixed-effects models.







    share|cite|improve this answer















    I am assuming, since you tagged machine learning that you are interested in prediction, rather than inference.(I believe I am aligned with @Glen_b 's answer, but just translating to this context/vocabulary)



    I would claim in this case it is a buzzword.
    A regularised linear model with a group variable will borrow information: the prediction at individual level will be a combination of the group mean and individual effect.
    One way to think of l1/l2 regularisation is that it is assigning a coefficient cost per reduction in total error, since a group variable affects more samples than an individual variable, there will be pressure to estimate a group effect,leaving a smaller deviation from group effect to each individual variable.



    For individual points with enough data, the individual effect will be 'strong ', for those with little data, the effect will be weak.



    I think the easiest way to see this is by considering L1 regularisation and 3 individuals of same group with same effect. Unregularised, the problem has an infinite numbers of solutions, whereas regularisation gives a unique solution.



    Assigning all the effect to the group coefficient has the lowest l1 norm, since we only need 1 value to cover 3 individuals. Conversely,assigning all the effect to the individual coefficients has the worst, namely 3 times the l1 norm of assigning the effect to the group coefficient.



    Note we can have as many hierarchies as we want, and interactions are affected similarly: regularisation will push effects to main variables,rather than rarer interactions.



    The blog tjmahr.com/plotting-partial-pooling-in-mixed-effects-models. – linked by @IsabellaGhement gives a quote for borrowing strength



    "This effect is sometimes called shrinkage, because more extreme values shrinkage are pulled towards a more reasonable, more average value. In the lme4 book, Douglas Bates provides an alternative to shrinkage [name]"




    The term “shrinkage” may have negative connotations. John Tukey
    preferred to refer to the process as the estimates for individual
    subjects “borrowing strength” from each other. This is a fundamental
    difference in the models underlying mixed-effects models versus
    strictly fixed effects models. In a mixed-effects model we assume that
    the levels of a grouping factor are a selection from a population and,
    as a result, can be expected to share characteristics to some degree.
    Consequently, the predictions from a mixed-effects model are
    attenuated relative to those from strictly fixed-effects models.








    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited Dec 13 '18 at 12:36

























    answered Dec 13 '18 at 9:33









    seanv507seanv507

    2,938918




    2,938918













    • What is prediction if not a specific kind of inference?

      – shadowtalker
      Dec 13 '18 at 19:06



















    • What is prediction if not a specific kind of inference?

      – shadowtalker
      Dec 13 '18 at 19:06

















    What is prediction if not a specific kind of inference?

    – shadowtalker
    Dec 13 '18 at 19:06





    What is prediction if not a specific kind of inference?

    – shadowtalker
    Dec 13 '18 at 19:06











    0














    Another source I would like to recommend on this topic which I find particularly instructive is David Robinson's Introduction to Empirical Bayes.



    His running example is that of whether a baseball player will manage to hit the next ball thrown at him. The key idea is that if a player has been around for years, one has a pretty clear picture of how capable he is and in particular, one can use his observed batting average as a pretty good estimate of the success probability in the next pitch.



    Conversely, a player who has just started playing in a league hasn't revealed much of his actual talent yet. So it seems like a wise choice to adjust the estimate of his success probability towards some overall mean if he has been particularly successful or unsuccessful in his first few games, as that likely is, at least to some extent, due to good or bad luck.



    As a minor point, the term "borrowing" certainly does not seem to be used in the sense that something that has been borrowed would need to be returned at some point ;-).






    share|cite|improve this answer






























      0














      Another source I would like to recommend on this topic which I find particularly instructive is David Robinson's Introduction to Empirical Bayes.



      His running example is that of whether a baseball player will manage to hit the next ball thrown at him. The key idea is that if a player has been around for years, one has a pretty clear picture of how capable he is and in particular, one can use his observed batting average as a pretty good estimate of the success probability in the next pitch.



      Conversely, a player who has just started playing in a league hasn't revealed much of his actual talent yet. So it seems like a wise choice to adjust the estimate of his success probability towards some overall mean if he has been particularly successful or unsuccessful in his first few games, as that likely is, at least to some extent, due to good or bad luck.



      As a minor point, the term "borrowing" certainly does not seem to be used in the sense that something that has been borrowed would need to be returned at some point ;-).






      share|cite|improve this answer




























        0












        0








        0







        Another source I would like to recommend on this topic which I find particularly instructive is David Robinson's Introduction to Empirical Bayes.



        His running example is that of whether a baseball player will manage to hit the next ball thrown at him. The key idea is that if a player has been around for years, one has a pretty clear picture of how capable he is and in particular, one can use his observed batting average as a pretty good estimate of the success probability in the next pitch.



        Conversely, a player who has just started playing in a league hasn't revealed much of his actual talent yet. So it seems like a wise choice to adjust the estimate of his success probability towards some overall mean if he has been particularly successful or unsuccessful in his first few games, as that likely is, at least to some extent, due to good or bad luck.



        As a minor point, the term "borrowing" certainly does not seem to be used in the sense that something that has been borrowed would need to be returned at some point ;-).






        share|cite|improve this answer















        Another source I would like to recommend on this topic which I find particularly instructive is David Robinson's Introduction to Empirical Bayes.



        His running example is that of whether a baseball player will manage to hit the next ball thrown at him. The key idea is that if a player has been around for years, one has a pretty clear picture of how capable he is and in particular, one can use his observed batting average as a pretty good estimate of the success probability in the next pitch.



        Conversely, a player who has just started playing in a league hasn't revealed much of his actual talent yet. So it seems like a wise choice to adjust the estimate of his success probability towards some overall mean if he has been particularly successful or unsuccessful in his first few games, as that likely is, at least to some extent, due to good or bad luck.



        As a minor point, the term "borrowing" certainly does not seem to be used in the sense that something that has been borrowed would need to be returned at some point ;-).







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited Dec 13 '18 at 15:28

























        answered Dec 13 '18 at 12:54









        Christoph HanckChristoph Hanck

        16.6k33973




        16.6k33973






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f381761%2fwhat-precisely-does-it-mean-to-borrow-information%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Сан-Квентин

            Алькесар

            Josef Freinademetz