How to normalize data between 0 and 1?











up vote
2
down vote

favorite












I have seen the min-max normalization formula in several answers (e.g. [1], [2], [3]), where data is normalized into the interval $left[0,1 right]$.



However, is there a method to normalize data into the interval $left(0,1 right)$, i.e. excluding 0 and 1?



EDIT:



My data is a sample from a uniform distribution within the range $left[a,b right]$. I would like to normalize it into the interval $left(0,1 right)$ while remaining uniformly distributed.










share|cite|improve this question




















  • 2




    $$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
    – Sycorax
    Dec 4 at 15:47












  • Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
    – skoestlmeier
    Dec 4 at 16:01















up vote
2
down vote

favorite












I have seen the min-max normalization formula in several answers (e.g. [1], [2], [3]), where data is normalized into the interval $left[0,1 right]$.



However, is there a method to normalize data into the interval $left(0,1 right)$, i.e. excluding 0 and 1?



EDIT:



My data is a sample from a uniform distribution within the range $left[a,b right]$. I would like to normalize it into the interval $left(0,1 right)$ while remaining uniformly distributed.










share|cite|improve this question




















  • 2




    $$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
    – Sycorax
    Dec 4 at 15:47












  • Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
    – skoestlmeier
    Dec 4 at 16:01













up vote
2
down vote

favorite









up vote
2
down vote

favorite











I have seen the min-max normalization formula in several answers (e.g. [1], [2], [3]), where data is normalized into the interval $left[0,1 right]$.



However, is there a method to normalize data into the interval $left(0,1 right)$, i.e. excluding 0 and 1?



EDIT:



My data is a sample from a uniform distribution within the range $left[a,b right]$. I would like to normalize it into the interval $left(0,1 right)$ while remaining uniformly distributed.










share|cite|improve this question















I have seen the min-max normalization formula in several answers (e.g. [1], [2], [3]), where data is normalized into the interval $left[0,1 right]$.



However, is there a method to normalize data into the interval $left(0,1 right)$, i.e. excluding 0 and 1?



EDIT:



My data is a sample from a uniform distribution within the range $left[a,b right]$. I would like to normalize it into the interval $left(0,1 right)$ while remaining uniformly distributed.







dataset normalization






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Dec 4 at 16:00

























asked Dec 4 at 15:30









skoestlmeier

12316




12316








  • 2




    $$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
    – Sycorax
    Dec 4 at 15:47












  • Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
    – skoestlmeier
    Dec 4 at 16:01














  • 2




    $$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
    – Sycorax
    Dec 4 at 15:47












  • Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
    – skoestlmeier
    Dec 4 at 16:01








2




2




$$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
– Sycorax
Dec 4 at 15:47






$$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
– Sycorax
Dec 4 at 15:47














Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
– skoestlmeier
Dec 4 at 16:01




Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
– skoestlmeier
Dec 4 at 16:01










3 Answers
3






active

oldest

votes

















up vote
3
down vote



accepted










Using the property that the CDF is uniformly distributed on $[0,1]$, you can compute the empirical CDF for $x$. This is essentially the same as ranking the data and then rescaling by the number of elements $n$. To enforce the requirement that the scaled data exclude 0 and 1, you can deviate from the standard ECDF procedure and construct the scale so that the outputs are $frac{1}{n+1}, frac{2}{n+1},cdots, frac{n}{n+1}$, which is likewise uniform.






share|cite|improve this answer























  • There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
    – Glen_b
    Dec 6 at 5:12












  • Does this have any particular name?
    – Sycorax
    Dec 6 at 13:42










  • Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
    – Glen_b
    Dec 7 at 8:49




















up vote
4
down vote













A uniform distribution on $(a, b)$ is the same as a uniform distribution on $[a, b]$, since for any $X$ distributed uniformly on $[a, b]$, $P(X = a) = P(X = b) = 0$. So, just use the formulae for translating to $[0, 1]$. On the other hand, if your sample has a value equal to $a$ or $b$, then you can safely conclude that you don't actually have a continuous uniform distribution.






share|cite|improve this answer























  • I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
    – dedObed
    Dec 4 at 19:47










  • @dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
    – Kodiologist
    Dec 4 at 20:27










  • I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
    – dedObed
    Dec 4 at 20:34










  • @dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
    – Kodiologist
    Dec 4 at 20:36








  • 1




    @dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
    – Kodiologist
    Dec 4 at 21:58




















up vote
1
down vote













The formula $x' = frac{x - min{x}}{max{x} - min{x}}$ will normalize the values in $[0,1]$.



I am not sure of why you want to exclude $0$ and $1$, anyway one way would be to choose a new minimum and maximum values for the transformed variable, e.g. $[0+epsilon,1-epsilon]$. You can then transform the variable using
$$x' = epsilon + (1-2epsilon) cdot left(frac{x - min{x}}{max{x} - min{x}} right)$$



Another way could be, as suggested by Sycorax in his comment, to use a logistic transform
$$ x' = frac{1}{1 + exp(-x)} $$
This ensures that $forall x in mathbb{R} implies x' in (0,1)$.
However, depending on the original distribution of $x$, $x'$ might span only a limited range of the interval $(0,1)$, so you might want to try e.g. to standardize $x$ before applying the logistic transform.






share|cite|improve this answer





















    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "65"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f380276%2fhow-to-normalize-data-between-0-and-1%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    3
    down vote



    accepted










    Using the property that the CDF is uniformly distributed on $[0,1]$, you can compute the empirical CDF for $x$. This is essentially the same as ranking the data and then rescaling by the number of elements $n$. To enforce the requirement that the scaled data exclude 0 and 1, you can deviate from the standard ECDF procedure and construct the scale so that the outputs are $frac{1}{n+1}, frac{2}{n+1},cdots, frac{n}{n+1}$, which is likewise uniform.






    share|cite|improve this answer























    • There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
      – Glen_b
      Dec 6 at 5:12












    • Does this have any particular name?
      – Sycorax
      Dec 6 at 13:42










    • Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
      – Glen_b
      Dec 7 at 8:49

















    up vote
    3
    down vote



    accepted










    Using the property that the CDF is uniformly distributed on $[0,1]$, you can compute the empirical CDF for $x$. This is essentially the same as ranking the data and then rescaling by the number of elements $n$. To enforce the requirement that the scaled data exclude 0 and 1, you can deviate from the standard ECDF procedure and construct the scale so that the outputs are $frac{1}{n+1}, frac{2}{n+1},cdots, frac{n}{n+1}$, which is likewise uniform.






    share|cite|improve this answer























    • There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
      – Glen_b
      Dec 6 at 5:12












    • Does this have any particular name?
      – Sycorax
      Dec 6 at 13:42










    • Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
      – Glen_b
      Dec 7 at 8:49















    up vote
    3
    down vote



    accepted







    up vote
    3
    down vote



    accepted






    Using the property that the CDF is uniformly distributed on $[0,1]$, you can compute the empirical CDF for $x$. This is essentially the same as ranking the data and then rescaling by the number of elements $n$. To enforce the requirement that the scaled data exclude 0 and 1, you can deviate from the standard ECDF procedure and construct the scale so that the outputs are $frac{1}{n+1}, frac{2}{n+1},cdots, frac{n}{n+1}$, which is likewise uniform.






    share|cite|improve this answer














    Using the property that the CDF is uniformly distributed on $[0,1]$, you can compute the empirical CDF for $x$. This is essentially the same as ranking the data and then rescaling by the number of elements $n$. To enforce the requirement that the scaled data exclude 0 and 1, you can deviate from the standard ECDF procedure and construct the scale so that the outputs are $frac{1}{n+1}, frac{2}{n+1},cdots, frac{n}{n+1}$, which is likewise uniform.







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited Dec 5 at 15:08

























    answered Dec 4 at 16:07









    Sycorax

    38.2k997186




    38.2k997186












    • There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
      – Glen_b
      Dec 6 at 5:12












    • Does this have any particular name?
      – Sycorax
      Dec 6 at 13:42










    • Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
      – Glen_b
      Dec 7 at 8:49




















    • There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
      – Glen_b
      Dec 6 at 5:12












    • Does this have any particular name?
      – Sycorax
      Dec 6 at 13:42










    • Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
      – Glen_b
      Dec 7 at 8:49


















    There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
    – Glen_b
    Dec 6 at 5:12






    There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
    – Glen_b
    Dec 6 at 5:12














    Does this have any particular name?
    – Sycorax
    Dec 6 at 13:42




    Does this have any particular name?
    – Sycorax
    Dec 6 at 13:42












    Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
    – Glen_b
    Dec 7 at 8:49






    Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
    – Glen_b
    Dec 7 at 8:49














    up vote
    4
    down vote













    A uniform distribution on $(a, b)$ is the same as a uniform distribution on $[a, b]$, since for any $X$ distributed uniformly on $[a, b]$, $P(X = a) = P(X = b) = 0$. So, just use the formulae for translating to $[0, 1]$. On the other hand, if your sample has a value equal to $a$ or $b$, then you can safely conclude that you don't actually have a continuous uniform distribution.






    share|cite|improve this answer























    • I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
      – dedObed
      Dec 4 at 19:47










    • @dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
      – Kodiologist
      Dec 4 at 20:27










    • I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
      – dedObed
      Dec 4 at 20:34










    • @dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
      – Kodiologist
      Dec 4 at 20:36








    • 1




      @dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
      – Kodiologist
      Dec 4 at 21:58

















    up vote
    4
    down vote













    A uniform distribution on $(a, b)$ is the same as a uniform distribution on $[a, b]$, since for any $X$ distributed uniformly on $[a, b]$, $P(X = a) = P(X = b) = 0$. So, just use the formulae for translating to $[0, 1]$. On the other hand, if your sample has a value equal to $a$ or $b$, then you can safely conclude that you don't actually have a continuous uniform distribution.






    share|cite|improve this answer























    • I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
      – dedObed
      Dec 4 at 19:47










    • @dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
      – Kodiologist
      Dec 4 at 20:27










    • I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
      – dedObed
      Dec 4 at 20:34










    • @dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
      – Kodiologist
      Dec 4 at 20:36








    • 1




      @dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
      – Kodiologist
      Dec 4 at 21:58















    up vote
    4
    down vote










    up vote
    4
    down vote









    A uniform distribution on $(a, b)$ is the same as a uniform distribution on $[a, b]$, since for any $X$ distributed uniformly on $[a, b]$, $P(X = a) = P(X = b) = 0$. So, just use the formulae for translating to $[0, 1]$. On the other hand, if your sample has a value equal to $a$ or $b$, then you can safely conclude that you don't actually have a continuous uniform distribution.






    share|cite|improve this answer














    A uniform distribution on $(a, b)$ is the same as a uniform distribution on $[a, b]$, since for any $X$ distributed uniformly on $[a, b]$, $P(X = a) = P(X = b) = 0$. So, just use the formulae for translating to $[0, 1]$. On the other hand, if your sample has a value equal to $a$ or $b$, then you can safely conclude that you don't actually have a continuous uniform distribution.







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited Dec 4 at 19:35

























    answered Dec 4 at 16:11









    Kodiologist

    16.5k22953




    16.5k22953












    • I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
      – dedObed
      Dec 4 at 19:47










    • @dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
      – Kodiologist
      Dec 4 at 20:27










    • I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
      – dedObed
      Dec 4 at 20:34










    • @dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
      – Kodiologist
      Dec 4 at 20:36








    • 1




      @dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
      – Kodiologist
      Dec 4 at 21:58




















    • I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
      – dedObed
      Dec 4 at 19:47










    • @dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
      – Kodiologist
      Dec 4 at 20:27










    • I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
      – dedObed
      Dec 4 at 20:34










    • @dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
      – Kodiologist
      Dec 4 at 20:36








    • 1




      @dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
      – Kodiologist
      Dec 4 at 21:58


















    I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
    – dedObed
    Dec 4 at 19:47




    I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
    – dedObed
    Dec 4 at 19:47












    @dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
    – Kodiologist
    Dec 4 at 20:27




    @dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
    – Kodiologist
    Dec 4 at 20:27












    I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
    – dedObed
    Dec 4 at 20:34




    I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
    – dedObed
    Dec 4 at 20:34












    @dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
    – Kodiologist
    Dec 4 at 20:36






    @dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
    – Kodiologist
    Dec 4 at 20:36






    1




    1




    @dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
    – Kodiologist
    Dec 4 at 21:58






    @dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
    – Kodiologist
    Dec 4 at 21:58












    up vote
    1
    down vote













    The formula $x' = frac{x - min{x}}{max{x} - min{x}}$ will normalize the values in $[0,1]$.



    I am not sure of why you want to exclude $0$ and $1$, anyway one way would be to choose a new minimum and maximum values for the transformed variable, e.g. $[0+epsilon,1-epsilon]$. You can then transform the variable using
    $$x' = epsilon + (1-2epsilon) cdot left(frac{x - min{x}}{max{x} - min{x}} right)$$



    Another way could be, as suggested by Sycorax in his comment, to use a logistic transform
    $$ x' = frac{1}{1 + exp(-x)} $$
    This ensures that $forall x in mathbb{R} implies x' in (0,1)$.
    However, depending on the original distribution of $x$, $x'$ might span only a limited range of the interval $(0,1)$, so you might want to try e.g. to standardize $x$ before applying the logistic transform.






    share|cite|improve this answer

























      up vote
      1
      down vote













      The formula $x' = frac{x - min{x}}{max{x} - min{x}}$ will normalize the values in $[0,1]$.



      I am not sure of why you want to exclude $0$ and $1$, anyway one way would be to choose a new minimum and maximum values for the transformed variable, e.g. $[0+epsilon,1-epsilon]$. You can then transform the variable using
      $$x' = epsilon + (1-2epsilon) cdot left(frac{x - min{x}}{max{x} - min{x}} right)$$



      Another way could be, as suggested by Sycorax in his comment, to use a logistic transform
      $$ x' = frac{1}{1 + exp(-x)} $$
      This ensures that $forall x in mathbb{R} implies x' in (0,1)$.
      However, depending on the original distribution of $x$, $x'$ might span only a limited range of the interval $(0,1)$, so you might want to try e.g. to standardize $x$ before applying the logistic transform.






      share|cite|improve this answer























        up vote
        1
        down vote










        up vote
        1
        down vote









        The formula $x' = frac{x - min{x}}{max{x} - min{x}}$ will normalize the values in $[0,1]$.



        I am not sure of why you want to exclude $0$ and $1$, anyway one way would be to choose a new minimum and maximum values for the transformed variable, e.g. $[0+epsilon,1-epsilon]$. You can then transform the variable using
        $$x' = epsilon + (1-2epsilon) cdot left(frac{x - min{x}}{max{x} - min{x}} right)$$



        Another way could be, as suggested by Sycorax in his comment, to use a logistic transform
        $$ x' = frac{1}{1 + exp(-x)} $$
        This ensures that $forall x in mathbb{R} implies x' in (0,1)$.
        However, depending on the original distribution of $x$, $x'$ might span only a limited range of the interval $(0,1)$, so you might want to try e.g. to standardize $x$ before applying the logistic transform.






        share|cite|improve this answer












        The formula $x' = frac{x - min{x}}{max{x} - min{x}}$ will normalize the values in $[0,1]$.



        I am not sure of why you want to exclude $0$ and $1$, anyway one way would be to choose a new minimum and maximum values for the transformed variable, e.g. $[0+epsilon,1-epsilon]$. You can then transform the variable using
        $$x' = epsilon + (1-2epsilon) cdot left(frac{x - min{x}}{max{x} - min{x}} right)$$



        Another way could be, as suggested by Sycorax in his comment, to use a logistic transform
        $$ x' = frac{1}{1 + exp(-x)} $$
        This ensures that $forall x in mathbb{R} implies x' in (0,1)$.
        However, depending on the original distribution of $x$, $x'$ might span only a limited range of the interval $(0,1)$, so you might want to try e.g. to standardize $x$ before applying the logistic transform.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Dec 4 at 16:01









        matteo

        1,371513




        1,371513






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f380276%2fhow-to-normalize-data-between-0-and-1%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Список кардиналов, возведённых папой римским Каликстом III

            Deduzione

            Mysql.sock missing - “Can't connect to local MySQL server through socket”