How to construct a list of lengths efficiently
up vote
13
down vote
favorite
Say I have a sorted list of integers
RandomInteger[{1, 100000}, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], {m, 10000}]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
add a comment |
up vote
13
down vote
favorite
Say I have a sorted list of integers
RandomInteger[{1, 100000}, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], {m, 10000}]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe anEmpiricalDistribution
orBinCounts
already can accomplish what you want?
– Thies Heidecke
4 hours ago
add a comment |
up vote
13
down vote
favorite
up vote
13
down vote
favorite
Say I have a sorted list of integers
RandomInteger[{1, 100000}, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], {m, 10000}]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
Say I have a sorted list of integers
RandomInteger[{1, 100000}, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], {m, 10000}]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
list-manipulation performance-tuning filtering
edited 5 hours ago
Alexey Popkov
38k4104260
38k4104260
asked 16 hours ago
AccidentalFourierTransform
4,9261941
4,9261941
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe anEmpiricalDistribution
orBinCounts
already can accomplish what you want?
– Thies Heidecke
4 hours ago
add a comment |
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe anEmpiricalDistribution
orBinCounts
already can accomplish what you want?
– Thies Heidecke
4 hours ago
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an
EmpiricalDistribution
or BinCounts
already can accomplish what you want?– Thies Heidecke
4 hours ago
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an
EmpiricalDistribution
or BinCounts
already can accomplish what you want?– Thies Heidecke
4 hours ago
add a comment |
4 Answers
4
active
oldest
votes
up vote
14
down vote
accepted
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],{m,10000}];//AbsoluteTiming
r1 === r2
{0.435358, Null}
{41.4357, Null}
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
{
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
},
With[{near = Nearest[unique->"Index", Range @ Length @ s][[All,1]]},
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[{1,100000},10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i, Length[Select[s, # <= 10000 &]]}]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
{0.432897, Null}
{0.122198, Null}
{0.025923, Null}
True
Ah, great answer, as usual.
– AccidentalFourierTransform
16 hours ago
Can you please check my answer because my laptop is very slow
– J42161217
14 hours ago
add a comment |
up vote
9
down vote
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;;10000]]; // AbsoluteTiming
{0.157123, Null}
Using MapThread the same code is way faster
r6 = Flatten[
Join[{Table[0, s[[1]] - 1]},
MapThread[
Table, {Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]}]]][[;; 10000]]; // AbsoluteTiming
r6===r3
{0.008387, Null}
True
1
These are the timing on my laptop r1={0.444354, Null},r2={39.456, Null},r3={0.157123, Null} True
– Okkes Dulgerci
13 hours ago
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
13 hours ago
add a comment |
up vote
8
down vote
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, {1, 1 + 10000, 1}]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
Beat me to it - BinCounts is the way... +1
– ciao
10 hours ago
Hey @ciao, you are back?!!
– kglr
10 hours ago
Sorry @Henrik; thanks for the edit.
– kglr
7 hours ago
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
2 hours ago
add a comment |
up vote
3
down vote
s = Sort[RandomInteger[{1, 100000}, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === {}),
rules[[2]],
With[{spopt = SystemOptions["SparseArrayOptions"]},
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> f}],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in a total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
14
down vote
accepted
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],{m,10000}];//AbsoluteTiming
r1 === r2
{0.435358, Null}
{41.4357, Null}
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
{
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
},
With[{near = Nearest[unique->"Index", Range @ Length @ s][[All,1]]},
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[{1,100000},10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i, Length[Select[s, # <= 10000 &]]}]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
{0.432897, Null}
{0.122198, Null}
{0.025923, Null}
True
Ah, great answer, as usual.
– AccidentalFourierTransform
16 hours ago
Can you please check my answer because my laptop is very slow
– J42161217
14 hours ago
add a comment |
up vote
14
down vote
accepted
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],{m,10000}];//AbsoluteTiming
r1 === r2
{0.435358, Null}
{41.4357, Null}
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
{
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
},
With[{near = Nearest[unique->"Index", Range @ Length @ s][[All,1]]},
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[{1,100000},10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i, Length[Select[s, # <= 10000 &]]}]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
{0.432897, Null}
{0.122198, Null}
{0.025923, Null}
True
Ah, great answer, as usual.
– AccidentalFourierTransform
16 hours ago
Can you please check my answer because my laptop is very slow
– J42161217
14 hours ago
add a comment |
up vote
14
down vote
accepted
up vote
14
down vote
accepted
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],{m,10000}];//AbsoluteTiming
r1 === r2
{0.435358, Null}
{41.4357, Null}
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
{
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
},
With[{near = Nearest[unique->"Index", Range @ Length @ s][[All,1]]},
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[{1,100000},10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i, Length[Select[s, # <= 10000 &]]}]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
{0.432897, Null}
{0.122198, Null}
{0.025923, Null}
True
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],{m,10000}];//AbsoluteTiming
r1 === r2
{0.435358, Null}
{41.4357, Null}
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
{
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
},
With[{near = Nearest[unique->"Index", Range @ Length @ s][[All,1]]},
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[{1,100000},10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i, Length[Select[s, # <= 10000 &]]}]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
{0.432897, Null}
{0.122198, Null}
{0.025923, Null}
True
edited 13 hours ago
answered 16 hours ago
Carl Woll
64.8k285169
64.8k285169
Ah, great answer, as usual.
– AccidentalFourierTransform
16 hours ago
Can you please check my answer because my laptop is very slow
– J42161217
14 hours ago
add a comment |
Ah, great answer, as usual.
– AccidentalFourierTransform
16 hours ago
Can you please check my answer because my laptop is very slow
– J42161217
14 hours ago
Ah, great answer, as usual.
– AccidentalFourierTransform
16 hours ago
Ah, great answer, as usual.
– AccidentalFourierTransform
16 hours ago
Can you please check my answer because my laptop is very slow
– J42161217
14 hours ago
Can you please check my answer because my laptop is very slow
– J42161217
14 hours ago
add a comment |
up vote
9
down vote
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;;10000]]; // AbsoluteTiming
{0.157123, Null}
Using MapThread the same code is way faster
r6 = Flatten[
Join[{Table[0, s[[1]] - 1]},
MapThread[
Table, {Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]}]]][[;; 10000]]; // AbsoluteTiming
r6===r3
{0.008387, Null}
True
1
These are the timing on my laptop r1={0.444354, Null},r2={39.456, Null},r3={0.157123, Null} True
– Okkes Dulgerci
13 hours ago
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
13 hours ago
add a comment |
up vote
9
down vote
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;;10000]]; // AbsoluteTiming
{0.157123, Null}
Using MapThread the same code is way faster
r6 = Flatten[
Join[{Table[0, s[[1]] - 1]},
MapThread[
Table, {Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]}]]][[;; 10000]]; // AbsoluteTiming
r6===r3
{0.008387, Null}
True
1
These are the timing on my laptop r1={0.444354, Null},r2={39.456, Null},r3={0.157123, Null} True
– Okkes Dulgerci
13 hours ago
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
13 hours ago
add a comment |
up vote
9
down vote
up vote
9
down vote
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;;10000]]; // AbsoluteTiming
{0.157123, Null}
Using MapThread the same code is way faster
r6 = Flatten[
Join[{Table[0, s[[1]] - 1]},
MapThread[
Table, {Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]}]]][[;; 10000]]; // AbsoluteTiming
r6===r3
{0.008387, Null}
True
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;;10000]]; // AbsoluteTiming
{0.157123, Null}
Using MapThread the same code is way faster
r6 = Flatten[
Join[{Table[0, s[[1]] - 1]},
MapThread[
Table, {Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]}]]][[;; 10000]]; // AbsoluteTiming
r6===r3
{0.008387, Null}
True
edited 4 hours ago
answered 14 hours ago
J42161217
2,853218
2,853218
1
These are the timing on my laptop r1={0.444354, Null},r2={39.456, Null},r3={0.157123, Null} True
– Okkes Dulgerci
13 hours ago
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
13 hours ago
add a comment |
1
These are the timing on my laptop r1={0.444354, Null},r2={39.456, Null},r3={0.157123, Null} True
– Okkes Dulgerci
13 hours ago
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
13 hours ago
1
1
These are the timing on my laptop r1={0.444354, Null},r2={39.456, Null},r3={0.157123, Null} True
– Okkes Dulgerci
13 hours ago
These are the timing on my laptop r1={0.444354, Null},r2={39.456, Null},r3={0.157123, Null} True
– Okkes Dulgerci
13 hours ago
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
13 hours ago
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
13 hours ago
add a comment |
up vote
8
down vote
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, {1, 1 + 10000, 1}]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
Beat me to it - BinCounts is the way... +1
– ciao
10 hours ago
Hey @ciao, you are back?!!
– kglr
10 hours ago
Sorry @Henrik; thanks for the edit.
– kglr
7 hours ago
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
2 hours ago
add a comment |
up vote
8
down vote
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, {1, 1 + 10000, 1}]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
Beat me to it - BinCounts is the way... +1
– ciao
10 hours ago
Hey @ciao, you are back?!!
– kglr
10 hours ago
Sorry @Henrik; thanks for the edit.
– kglr
7 hours ago
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
2 hours ago
add a comment |
up vote
8
down vote
up vote
8
down vote
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, {1, 1 + 10000, 1}]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, {1, 1 + 10000, 1}]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
edited 7 hours ago
Henrik Schumacher
44.8k265130
44.8k265130
answered 11 hours ago
kglr
170k8193397
170k8193397
Beat me to it - BinCounts is the way... +1
– ciao
10 hours ago
Hey @ciao, you are back?!!
– kglr
10 hours ago
Sorry @Henrik; thanks for the edit.
– kglr
7 hours ago
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
2 hours ago
add a comment |
Beat me to it - BinCounts is the way... +1
– ciao
10 hours ago
Hey @ciao, you are back?!!
– kglr
10 hours ago
Sorry @Henrik; thanks for the edit.
– kglr
7 hours ago
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
2 hours ago
Beat me to it - BinCounts is the way... +1
– ciao
10 hours ago
Beat me to it - BinCounts is the way... +1
– ciao
10 hours ago
Hey @ciao, you are back?!!
– kglr
10 hours ago
Hey @ciao, you are back?!!
– kglr
10 hours ago
Sorry @Henrik; thanks for the edit.
– kglr
7 hours ago
Sorry @Henrik; thanks for the edit.
– kglr
7 hours ago
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
2 hours ago
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
2 hours ago
add a comment |
up vote
3
down vote
s = Sort[RandomInteger[{1, 100000}, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === {}),
rules[[2]],
With[{spopt = SystemOptions["SparseArrayOptions"]},
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> f}],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in a total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
add a comment |
up vote
3
down vote
s = Sort[RandomInteger[{1, 100000}, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === {}),
rules[[2]],
With[{spopt = SystemOptions["SparseArrayOptions"]},
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> f}],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in a total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
add a comment |
up vote
3
down vote
up vote
3
down vote
s = Sort[RandomInteger[{1, 100000}, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === {}),
rules[[2]],
With[{spopt = SystemOptions["SparseArrayOptions"]},
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> f}],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in a total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
s = Sort[RandomInteger[{1, 100000}, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === {}),
rules[[2]],
With[{spopt = SystemOptions["SparseArrayOptions"]},
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> f}],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in a total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
answered 11 hours ago
Henrik Schumacher
44.8k265130
44.8k265130
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f185874%2fhow-to-construct-a-list-of-lengths-efficiently%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an
EmpiricalDistribution
orBinCounts
already can accomplish what you want?– Thies Heidecke
4 hours ago