How to normalize data between 0 and 1?

up vote
2
down vote

favorite

I have seen the min-max normalization formula in several answers (e.g. [1], [2], [3]), where data is normalized into the interval $left[0,1 right]$.

However, is there a method to normalize data into the interval $left(0,1 right)$, i.e. excluding 0 and 1?

EDIT:

My data is a sample from a uniform distribution within the range $left[a,b right]$. I would like to normalize it into the interval $left(0,1 right)$ while remaining uniformly distributed.

edited Dec 4 at 16:00

asked Dec 4 at 15:30

skoestlmeier

12316

2

$$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
– Sycorax
Dec 4 at 15:47

Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
– skoestlmeier
Dec 4 at 16:01

add a comment |

up vote
2
down vote

favorite

I have seen the min-max normalization formula in several answers (e.g. [1], [2], [3]), where data is normalized into the interval $left[0,1 right]$.

However, is there a method to normalize data into the interval $left(0,1 right)$, i.e. excluding 0 and 1?

EDIT:

My data is a sample from a uniform distribution within the range $left[a,b right]$. I would like to normalize it into the interval $left(0,1 right)$ while remaining uniformly distributed.

edited Dec 4 at 16:00

asked Dec 4 at 15:30

skoestlmeier

12316

2

$$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
– Sycorax
Dec 4 at 15:47

Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
– skoestlmeier
Dec 4 at 16:01

add a comment |

up vote
2
down vote

favorite

I have seen the min-max normalization formula in several answers (e.g. [1], [2], [3]), where data is normalized into the interval $left[0,1 right]$.

However, is there a method to normalize data into the interval $left(0,1 right)$, i.e. excluding 0 and 1?

EDIT:

My data is a sample from a uniform distribution within the range $left[a,b right]$. I would like to normalize it into the interval $left(0,1 right)$ while remaining uniformly distributed.

edited Dec 4 at 16:00

asked Dec 4 at 15:30

skoestlmeier

12316

I have seen the min-max normalization formula in several answers (e.g. [1], [2], [3]), where data is normalized into the interval $left[0,1 right]$.

However, is there a method to normalize data into the interval $left(0,1 right)$, i.e. excluding 0 and 1?

EDIT:

My data is a sample from a uniform distribution within the range $left[a,b right]$. I would like to normalize it into the interval $left(0,1 right)$ while remaining uniformly distributed.

dataset normalization

edited Dec 4 at 16:00

asked Dec 4 at 15:30

skoestlmeier

12316

edited Dec 4 at 16:00

asked Dec 4 at 15:30

skoestlmeier

12316

edited Dec 4 at 16:00

asked Dec 4 at 15:30

skoestlmeier

12316

asked Dec 4 at 15:30

skoestlmeier

12316

asked Dec 4 at 15:30

skoestlmeier

12316

2

$$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
– Sycorax
Dec 4 at 15:47

Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
– skoestlmeier
Dec 4 at 16:01

add a comment |

2

$$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
– Sycorax
Dec 4 at 15:47

Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
– skoestlmeier
Dec 4 at 16:01

$$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
– Sycorax
Dec 4 at 15:47

Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
– skoestlmeier
Dec 4 at 16:01

add a comment |

3 Answers
3

active

oldest

votes

up vote
3
down vote

accepted

Using the property that the CDF is uniformly distributed on $[0,1]$, you can compute the empirical CDF for $x$. This is essentially the same as ranking the data and then rescaling by the number of elements $n$. To enforce the requirement that the scaled data exclude 0 and 1, you can deviate from the standard ECDF procedure and construct the scale so that the outputs are $frac{1}{n+1}, frac{2}{n+1},cdots, frac{n}{n+1}$, which is likewise uniform.

edited Dec 5 at 15:08

answered Dec 4 at 16:07

Sycorax

38.2k997186

There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
– Glen_b♦
Dec 6 at 5:12

Does this have any particular name?
– Sycorax
Dec 6 at 13:42

Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
– Glen_b♦
Dec 7 at 8:49

add a comment |

up vote
4
down vote

A uniform distribution on $(a, b)$ is the same as a uniform distribution on $[a, b]$, since for any $X$ distributed uniformly on $[a, b]$, $P(X = a) = P(X = b) = 0$. So, just use the formulae for translating to $[0, 1]$. On the other hand, if your sample has a value equal to $a$ or $b$, then you can safely conclude that you don't actually have a continuous uniform distribution.

edited Dec 4 at 19:35

answered Dec 4 at 16:11

Kodiologist

16.5k22953

I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
– dedObed
Dec 4 at 19:47

@dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
– Kodiologist
Dec 4 at 20:27

I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
– dedObed
Dec 4 at 20:34

@dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
– Kodiologist
Dec 4 at 20:36

1

@dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
– Kodiologist
Dec 4 at 21:58

|
show 1 more comment

up vote
1
down vote

The formula $x' = frac{x - min{x}}{max{x} - min{x}}$ will normalize the values in $[0,1]$.

I am not sure of why you want to exclude $0$ and $1$, anyway one way would be to choose a new minimum and maximum values for the transformed variable, e.g. $[0+epsilon,1-epsilon]$. You can then transform the variable using
$$x' = epsilon + (1-2epsilon) cdot left(frac{x - min{x}}{max{x} - min{x}} right)$$

Another way could be, as suggested by Sycorax in his comment, to use a logistic transform
$$ x' = frac{1}{1 + exp(-x)} $$
This ensures that $forall x in mathbb{R} implies x' in (0,1)$.
However, depending on the original distribution of $x$, $x'$ might span only a limited range of the interval $(0,1)$, so you might want to try e.g. to standardize $x$ before applying the logistic transform.

answered Dec 4 at 16:01

matteo

1,371513

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f380276%2fhow-to-normalize-data-between-0-and-1%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
3
down vote

accepted

edited Dec 5 at 15:08

answered Dec 4 at 16:07

Sycorax

38.2k997186

There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
– Glen_b♦
Dec 6 at 5:12

Does this have any particular name?
– Sycorax
Dec 6 at 13:42

Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
– Glen_b♦
Dec 7 at 8:49

add a comment |

up vote
3
down vote

accepted

edited Dec 5 at 15:08

answered Dec 4 at 16:07

Sycorax

38.2k997186

There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
– Glen_b♦
Dec 6 at 5:12

Does this have any particular name?
– Sycorax
Dec 6 at 13:42

Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
– Glen_b♦
Dec 7 at 8:49

add a comment |

up vote
3
down vote

accepted

edited Dec 5 at 15:08

answered Dec 4 at 16:07

Sycorax

38.2k997186

edited Dec 5 at 15:08

answered Dec 4 at 16:07

Sycorax

38.2k997186

edited Dec 5 at 15:08

answered Dec 4 at 16:07

Sycorax

38.2k997186

answered Dec 4 at 16:07

Sycorax

38.2k997186

answered Dec 4 at 16:07

Sycorax

38.2k997186

There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
– Glen_b♦
Dec 6 at 5:12

Does this have any particular name?
– Sycorax
Dec 6 at 13:42

Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
– Glen_b♦
Dec 7 at 8:49

add a comment |

There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
– Glen_b♦
Dec 6 at 5:12

Does this have any particular name?
– Sycorax
Dec 6 at 13:42

Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
– Glen_b♦
Dec 7 at 8:49

There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
– Glen_b♦
Dec 6 at 5:12

Does this have any particular name?
– Sycorax
Dec 6 at 13:42

Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
– Glen_b♦
Dec 7 at 8:49

add a comment |

up vote
4
down vote

edited Dec 4 at 19:35

answered Dec 4 at 16:11

Kodiologist

16.5k22953

I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
– dedObed
Dec 4 at 19:47

@dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
– Kodiologist
Dec 4 at 20:27

I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
– dedObed
Dec 4 at 20:34

@dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
– Kodiologist
Dec 4 at 20:36

1

@dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
– Kodiologist
Dec 4 at 21:58

|
show 1 more comment

up vote
4
down vote

edited Dec 4 at 19:35

answered Dec 4 at 16:11

Kodiologist

16.5k22953

I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
– dedObed
Dec 4 at 19:47

@dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
– Kodiologist
Dec 4 at 20:27

I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
– dedObed
Dec 4 at 20:34

@dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
– Kodiologist
Dec 4 at 20:36

1

@dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
– Kodiologist
Dec 4 at 21:58

|
show 1 more comment

up vote
4
down vote

edited Dec 4 at 19:35

answered Dec 4 at 16:11

Kodiologist

16.5k22953

edited Dec 4 at 19:35

answered Dec 4 at 16:11

Kodiologist

16.5k22953

edited Dec 4 at 19:35

answered Dec 4 at 16:11

Kodiologist

16.5k22953

answered Dec 4 at 16:11

Kodiologist

16.5k22953

answered Dec 4 at 16:11

Kodiologist

16.5k22953

I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
– dedObed
Dec 4 at 19:47

@dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
– Kodiologist
Dec 4 at 20:27

I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
– dedObed
Dec 4 at 20:34

@dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
– Kodiologist
Dec 4 at 20:36

1

@dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
– Kodiologist
Dec 4 at 21:58

|
show 1 more comment

I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
– dedObed
Dec 4 at 19:47

@dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
– Kodiologist
Dec 4 at 20:27

I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
– dedObed
Dec 4 at 20:34

@dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
– Kodiologist
Dec 4 at 20:36

1

@dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
– Kodiologist
Dec 4 at 21:58

I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
– dedObed
Dec 4 at 19:47

@dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
– Kodiologist
Dec 4 at 20:27

I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
– dedObed
Dec 4 at 20:34

@dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
– Kodiologist
Dec 4 at 20:36

@dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
– Kodiologist
Dec 4 at 21:58

|
show 1 more comment

up vote
1
down vote

The formula $x' = frac{x - min{x}}{max{x} - min{x}}$ will normalize the values in $[0,1]$.

answered Dec 4 at 16:01

matteo

1,371513

add a comment |

up vote
1
down vote

The formula $x' = frac{x - min{x}}{max{x} - min{x}}$ will normalize the values in $[0,1]$.

answered Dec 4 at 16:01

matteo

1,371513

add a comment |

up vote
1
down vote

The formula $x' = frac{x - min{x}}{max{x} - min{x}}$ will normalize the values in $[0,1]$.

answered Dec 4 at 16:01

matteo

1,371513

The formula $x' = frac{x - min{x}}{max{x} - min{x}}$ will normalize the values in $[0,1]$.

answered Dec 4 at 16:01

matteo

1,371513

answered Dec 4 at 16:01

matteo

1,371513

answered Dec 4 at 16:01

matteo

1,371513

answered Dec 4 at 16:01

matteo

1,371513

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gfrktyl