2-sum problem on a range in Python

Here is the problem description (from an algorithm course: https://www.coursera.org/learn/algorithm-design-analysis):

The goal of this problem is to implement a variant of the 2-SUM
algorithm (covered in the Week 6 lecture on hash table applications).

The file contains 1 million integers, both positive and negative
(there might be some repetitions!). This is your array of integers,
with the $i$th row of the file specifying the $i$th entry of the array.

Your task is to compute the number of target values $t$ in the interval
[-10000,10000] (inclusive) such that there are distinct numbers $x,y$ in
the input file that satisfy $x+y=t$.

I wrote the following Python program. It runs already 25 minutes and just calculated 6200 of 20000 numbers.
This is too slow.

import sys



def has_two_sum(n,num_set,nums):

  res = any(((n-x) in num_set) and 2*x !=n for x in nums)

  return res



with open(sys.argv[1], 'r') as fp:

  nums = [int(line) for line in fp]

  num_set = set(nums)



two_sum = sum(1  for n in range(-10000,10001) if has_two_sum(n,num_set,nums))

print(two_sum)

edited Jul 21 '16 at 6:28

asked Jul 20 '16 at 13:27

David Michael Gang

21517

You're asking for help/providing partial solutions to a Coursera algorithms course. I'm pretty sure this violates the Coursera TOS... at the very least, you should probably cite your source for the problem if you're going to the trouble of putting the description in a quote block.
– apnorton
Jul 21 '16 at 6:24

add a comment |

Here is the problem description (from an algorithm course: https://www.coursera.org/learn/algorithm-design-analysis):

The goal of this problem is to implement a variant of the 2-SUM
algorithm (covered in the Week 6 lecture on hash table applications).

The file contains 1 million integers, both positive and negative
(there might be some repetitions!). This is your array of integers,
with the $i$th row of the file specifying the $i$th entry of the array.

Your task is to compute the number of target values $t$ in the interval
[-10000,10000] (inclusive) such that there are distinct numbers $x,y$ in
the input file that satisfy $x+y=t$.

I wrote the following Python program. It runs already 25 minutes and just calculated 6200 of 20000 numbers.
This is too slow.

import sys



def has_two_sum(n,num_set,nums):

  res = any(((n-x) in num_set) and 2*x !=n for x in nums)

  return res



with open(sys.argv[1], 'r') as fp:

  nums = [int(line) for line in fp]

  num_set = set(nums)



two_sum = sum(1  for n in range(-10000,10001) if has_two_sum(n,num_set,nums))

print(two_sum)

edited Jul 21 '16 at 6:28

asked Jul 20 '16 at 13:27

David Michael Gang

21517

You're asking for help/providing partial solutions to a Coursera algorithms course. I'm pretty sure this violates the Coursera TOS... at the very least, you should probably cite your source for the problem if you're going to the trouble of putting the description in a quote block.
– apnorton
Jul 21 '16 at 6:24

add a comment |

Here is the problem description (from an algorithm course: https://www.coursera.org/learn/algorithm-design-analysis):

The goal of this problem is to implement a variant of the 2-SUM
algorithm (covered in the Week 6 lecture on hash table applications).

The file contains 1 million integers, both positive and negative
(there might be some repetitions!). This is your array of integers,
with the $i$th row of the file specifying the $i$th entry of the array.

Your task is to compute the number of target values $t$ in the interval
[-10000,10000] (inclusive) such that there are distinct numbers $x,y$ in
the input file that satisfy $x+y=t$.

I wrote the following Python program. It runs already 25 minutes and just calculated 6200 of 20000 numbers.
This is too slow.

import sys



def has_two_sum(n,num_set,nums):

  res = any(((n-x) in num_set) and 2*x !=n for x in nums)

  return res



with open(sys.argv[1], 'r') as fp:

  nums = [int(line) for line in fp]

  num_set = set(nums)



two_sum = sum(1  for n in range(-10000,10001) if has_two_sum(n,num_set,nums))

print(two_sum)

edited Jul 21 '16 at 6:28

asked Jul 20 '16 at 13:27

David Michael Gang

21517

Here is the problem description (from an algorithm course: https://www.coursera.org/learn/algorithm-design-analysis):

The goal of this problem is to implement a variant of the 2-SUM
algorithm (covered in the Week 6 lecture on hash table applications).

The file contains 1 million integers, both positive and negative
(there might be some repetitions!). This is your array of integers,
with the $i$th row of the file specifying the $i$th entry of the array.

Your task is to compute the number of target values $t$ in the interval
[-10000,10000] (inclusive) such that there are distinct numbers $x,y$ in
the input file that satisfy $x+y=t$.

I wrote the following Python program. It runs already 25 minutes and just calculated 6200 of 20000 numbers.
This is too slow.

import sys



def has_two_sum(n,num_set,nums):

  res = any(((n-x) in num_set) and 2*x !=n for x in nums)

  return res



with open(sys.argv[1], 'r') as fp:

  nums = [int(line) for line in fp]

  num_set = set(nums)



two_sum = sum(1  for n in range(-10000,10001) if has_two_sum(n,num_set,nums))

print(two_sum)

python performance time-limit-exceeded

edited Jul 21 '16 at 6:28

asked Jul 20 '16 at 13:27

David Michael Gang

21517

edited Jul 21 '16 at 6:28

asked Jul 20 '16 at 13:27

David Michael Gang

21517

edited Jul 21 '16 at 6:28

asked Jul 20 '16 at 13:27

David Michael Gang

21517

asked Jul 20 '16 at 13:27

David Michael Gang

21517

asked Jul 20 '16 at 13:27

David Michael Gang

21517

You're asking for help/providing partial solutions to a Coursera algorithms course. I'm pretty sure this violates the Coursera TOS... at the very least, you should probably cite your source for the problem if you're going to the trouble of putting the description in a quote block.
– apnorton
Jul 21 '16 at 6:24

add a comment |

You're asking for help/providing partial solutions to a Coursera algorithms course. I'm pretty sure this violates the Coursera TOS... at the very least, you should probably cite your source for the problem if you're going to the trouble of putting the description in a quote block.
– apnorton
Jul 21 '16 at 6:24

You're asking for help/providing partial solutions to a Coursera algorithms course. I'm pretty sure this violates the Coursera TOS... at the very least, you should probably cite your source for the problem if you're going to the trouble of putting the description in a quote block.
– apnorton
Jul 21 '16 at 6:24

add a comment |

4 Answers
4

active

oldest

votes

Your code runs instantly on my system with 1 million randomly generated numbers. are you sure you are searching for n - x in num_set and not in nums?

Anyway, a couple of Python style pointers:

You can iterate over the set directly, no need to pass both the set and the list:

def has_two_sum(n, num_set):

    res = any(((n-x) in num_set) and 2*x != n for x in num_set)

    return res

Not much point in storing a value to immediuately return it:

def has_two_sum(n, num_set):

    return any(((n-x) in num_set) and 2*x != n for x in num_set)

Boolean values are converted to 0 or 1 when used in an integer setting, so your adding of values is typically written as:
```
two_sum = sum(has_two_sum(n, num_set) for n in range(-10000, 10001))
```

If you are going to have a function that contains a single liner, you better give it a great, descriptive name. I don't think that has_two_sum really fits the bill. Perhaps has_two_items_that_sum_to_n or has_two_that_sum_to_n? Alternatively, I think my preferred option here would be to get rid of the function altogether and let code speak by itself, writing the whole thing as two nested comprehensions:
```
two_sum = sum(any(n - x in num_set and 2*x != n for x in num_set)

              for n in range(-10000, 10001))
```
There is a school of thought that considers nested comprehensions to be confusing (e.g. the Google Python style guide doesn't allow them), but I think this case is simple enough, YMMV.

answered Jul 20 '16 at 15:02

Jaime

5,8421327

3

If you already got rid of the need to keep num as a list, a set comprehension should build it faster than the detour over the list.
– Graipher
Jul 20 '16 at 15:30

Hi, I get much worse timings. I forgot to say that my integers are in the size of 11 digits. I also use python 3. Maybe this makes a difference
– David Michael Gang
Jul 21 '16 at 6:37

1

Are you on a 32 bit system? If Python has to use its long, unlimited precision integers, then yes, things will get substantially slower. 11 digits is more than 2**31 - 1, but 2**63 - 1 is 18 digits, so on a 64 bit build of Python this should be a non issue. Python 2 or 3 I think should have no major effect on this, although my timings were on Python 2.
– Jaime
Jul 21 '16 at 10:01

add a comment |

To get a grasp of what to speedup I ran a profiler on your code.
To get the 1 million integers I made a function to generate them, with the following:

def generate_random_numbers(amount=1000000):

    import random

    n = [str(int(random.normalvariate(0, amount))) for _ in range(amount)]

    with open('data', 'w') as f:

        f.write('n'.join(n))

Yes I did put the sigma as one million too.
I then copied your code into a function, removed the file part, and ran the profiler over it.
Resulting in the following code:

import cProfile



def read_nums():

    with open('data', 'r') as fp:

        return [int(line) for line in fp]



def has_two_sum(n,num_set,nums):

    res = any(((n-x) in num_set) and 2*x !=n for x in nums)

    return res



def two_sum(nums):

    nums = [int(line) for line in nums]

    num_set = set(nums)

    return sum(1  for n in range(-10000,10001) if has_two_sum(n,num_set,nums))



nums = read_nums()

cProfile.run('two_sum({!r})'.format(nums))

Which resulted in the following profile of your code:

         166366 function calls in 6.966 seconds



   Ordered by: standard name



   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.362    0.362    0.900    0.900 2sum.py:11(two_sum)

        1    0.395    0.395    0.395    0.395 2sum.py:12(<listcomp>)

    20002    0.015    0.000    0.139    0.000 2sum.py:14(<genexpr>)

    20001    0.025    0.000    0.124    0.000 2sum.py:7(has_two_sum)

   106356    0.052    0.000    0.052    0.000 2sum.py:8(<genexpr>)

        1    0.216    0.216    1.116    1.116 <string>:1(<module>)

    20001    0.047    0.000    0.095    0.000 {built-in method builtins.any}

        1    5.851    5.851    6.966    6.966 {built-in method builtins.exec}

        1    0.004    0.004    0.143    0.143 {built-in method builtins.sum}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Which is not 25 minuets for 6200 of the 20000.
Instead it's a lot less. It says it took 6.966, but cumtime of two_sums is 0.900.
I then used a timer on it, which says it runs in ~0.899s.
Using:

print(timeit('two_sum({!r})'.format(nums), 'from __main__ import two_sum', number=100)/100)

So performance wise, I'd say it's not Python's fault, and would be very hard to optimize.
Instead I'd change your code to be a 'three line' function, but format the comprehensions to be more readable.

I'd remove un-needed brackets

Use a set compression.

Remove odd whitespace. ((1 for ...)

def two_sum(nums):

    nums = {int(n) for n in nums}

    return sum(

        1

        for n in range(-10000, 10001)

        if any(n - x in nums and 2 * x != n for x in nums)

    )

answered Jul 20 '16 at 15:17

Peilonrayz

25.2k337106

Since you profiled the code, is there an advantage in using x + x != n over 2 * x != n?
– Mathias Ettinger
Jul 20 '16 at 15:34

@MathiasEttinger I ran the profile code on the original twice and it had a 10% difference (0.896 - 0.784) in speed, when I changed it to use + it went to 0.804. Honestly I don't know if it's any good...
– Peilonrayz
Jul 20 '16 at 15:45

Ok, nice to know. It was just out of curiosity since the use case was pretty close to the toy example in this doc.
– Mathias Ettinger
Jul 20 '16 at 15:56

add a comment |

You aren't handling duplicates correctly, by my interpretation of the problem. Based on the sample code in the course notes, I interpret "distinct" to mean that you may not use the same entry twice, but you may use the same number twice if it occurs more than once in the file. That is, if the input file contains

3

3

… then I would expect it to be able to form a target sum of 6. Your program reports that 6 cannot be formed.

answered Jul 20 '16 at 16:43

200_success

128k15150412

add a comment |

The answers to the question don't pay attention that the original array consists of a million of signed eleven digit integers (i.e. both positive and negative). That is the real reason for the poor performance of the Python code, even on a 64-bit architecture.

[EDIT] Integers in Python 3 have arbitrary precision by default (e.g. see this article for details of implementation). In short, they can be represented as

i = x*2^(30*0) + y*2^(30*1) + z*2^(30*2)...

For numbers > 2^31 (e.g. eleven digit numbers), they are manipulated as complex objects with >=2 parts, and therefore the performance of the code manipulating such numbers is much slower than on smaller numbers in a range -2^30 to 2^30 for which there is a "fast path" CPython implementation (they are treated as fixed 32-bit integers). The 2-SUM programming assignment in the Algo_Stanford course deliberately uses huge numbers in the input array to make brute force solutions unacceptably slow.

edited Dec 19 at 8:19

answered Dec 18 at 11:14

Sergei Zaitseff

New contributor

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f135392%2f2-sum-problem-on-a-range-in-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Your code runs instantly on my system with 1 million randomly generated numbers. are you sure you are searching for n - x in num_set and not in nums?

Anyway, a couple of Python style pointers:

You can iterate over the set directly, no need to pass both the set and the list:

def has_two_sum(n, num_set):

    res = any(((n-x) in num_set) and 2*x != n for x in num_set)

    return res

Not much point in storing a value to immediuately return it:

def has_two_sum(n, num_set):

    return any(((n-x) in num_set) and 2*x != n for x in num_set)

Boolean values are converted to 0 or 1 when used in an integer setting, so your adding of values is typically written as:
```
two_sum = sum(has_two_sum(n, num_set) for n in range(-10000, 10001))
```

If you are going to have a function that contains a single liner, you better give it a great, descriptive name. I don't think that has_two_sum really fits the bill. Perhaps has_two_items_that_sum_to_n or has_two_that_sum_to_n? Alternatively, I think my preferred option here would be to get rid of the function altogether and let code speak by itself, writing the whole thing as two nested comprehensions:
```
two_sum = sum(any(n - x in num_set and 2*x != n for x in num_set)

              for n in range(-10000, 10001))
```
There is a school of thought that considers nested comprehensions to be confusing (e.g. the Google Python style guide doesn't allow them), but I think this case is simple enough, YMMV.

answered Jul 20 '16 at 15:02

Jaime

5,8421327

3

If you already got rid of the need to keep num as a list, a set comprehension should build it faster than the detour over the list.
– Graipher
Jul 20 '16 at 15:30

Hi, I get much worse timings. I forgot to say that my integers are in the size of 11 digits. I also use python 3. Maybe this makes a difference
– David Michael Gang
Jul 21 '16 at 6:37

1

Are you on a 32 bit system? If Python has to use its long, unlimited precision integers, then yes, things will get substantially slower. 11 digits is more than 2**31 - 1, but 2**63 - 1 is 18 digits, so on a 64 bit build of Python this should be a non issue. Python 2 or 3 I think should have no major effect on this, although my timings were on Python 2.
– Jaime
Jul 21 '16 at 10:01

add a comment |

Your code runs instantly on my system with 1 million randomly generated numbers. are you sure you are searching for n - x in num_set and not in nums?

Anyway, a couple of Python style pointers:

You can iterate over the set directly, no need to pass both the set and the list:

def has_two_sum(n, num_set):

    res = any(((n-x) in num_set) and 2*x != n for x in num_set)

    return res

Not much point in storing a value to immediuately return it:

def has_two_sum(n, num_set):

    return any(((n-x) in num_set) and 2*x != n for x in num_set)

Boolean values are converted to 0 or 1 when used in an integer setting, so your adding of values is typically written as:
```
two_sum = sum(has_two_sum(n, num_set) for n in range(-10000, 10001))
```

If you are going to have a function that contains a single liner, you better give it a great, descriptive name. I don't think that has_two_sum really fits the bill. Perhaps has_two_items_that_sum_to_n or has_two_that_sum_to_n? Alternatively, I think my preferred option here would be to get rid of the function altogether and let code speak by itself, writing the whole thing as two nested comprehensions:
```
two_sum = sum(any(n - x in num_set and 2*x != n for x in num_set)

              for n in range(-10000, 10001))
```
There is a school of thought that considers nested comprehensions to be confusing (e.g. the Google Python style guide doesn't allow them), but I think this case is simple enough, YMMV.

answered Jul 20 '16 at 15:02

Jaime

5,8421327

3

If you already got rid of the need to keep num as a list, a set comprehension should build it faster than the detour over the list.
– Graipher
Jul 20 '16 at 15:30

Hi, I get much worse timings. I forgot to say that my integers are in the size of 11 digits. I also use python 3. Maybe this makes a difference
– David Michael Gang
Jul 21 '16 at 6:37

1

Are you on a 32 bit system? If Python has to use its long, unlimited precision integers, then yes, things will get substantially slower. 11 digits is more than 2**31 - 1, but 2**63 - 1 is 18 digits, so on a 64 bit build of Python this should be a non issue. Python 2 or 3 I think should have no major effect on this, although my timings were on Python 2.
– Jaime
Jul 21 '16 at 10:01

add a comment |

Your code runs instantly on my system with 1 million randomly generated numbers. are you sure you are searching for n - x in num_set and not in nums?

Anyway, a couple of Python style pointers:

You can iterate over the set directly, no need to pass both the set and the list:

def has_two_sum(n, num_set):

    res = any(((n-x) in num_set) and 2*x != n for x in num_set)

    return res

Not much point in storing a value to immediuately return it:

def has_two_sum(n, num_set):

    return any(((n-x) in num_set) and 2*x != n for x in num_set)

Boolean values are converted to 0 or 1 when used in an integer setting, so your adding of values is typically written as:
```
two_sum = sum(has_two_sum(n, num_set) for n in range(-10000, 10001))
```

If you are going to have a function that contains a single liner, you better give it a great, descriptive name. I don't think that has_two_sum really fits the bill. Perhaps has_two_items_that_sum_to_n or has_two_that_sum_to_n? Alternatively, I think my preferred option here would be to get rid of the function altogether and let code speak by itself, writing the whole thing as two nested comprehensions:
```
two_sum = sum(any(n - x in num_set and 2*x != n for x in num_set)

              for n in range(-10000, 10001))
```
There is a school of thought that considers nested comprehensions to be confusing (e.g. the Google Python style guide doesn't allow them), but I think this case is simple enough, YMMV.

answered Jul 20 '16 at 15:02

Jaime

5,8421327

Your code runs instantly on my system with 1 million randomly generated numbers. are you sure you are searching for n - x in num_set and not in nums?

Anyway, a couple of Python style pointers:

You can iterate over the set directly, no need to pass both the set and the list:

def has_two_sum(n, num_set):

    res = any(((n-x) in num_set) and 2*x != n for x in num_set)

    return res

Not much point in storing a value to immediuately return it:

def has_two_sum(n, num_set):

    return any(((n-x) in num_set) and 2*x != n for x in num_set)

Boolean values are converted to 0 or 1 when used in an integer setting, so your adding of values is typically written as:
```
two_sum = sum(has_two_sum(n, num_set) for n in range(-10000, 10001))
```

If you are going to have a function that contains a single liner, you better give it a great, descriptive name. I don't think that has_two_sum really fits the bill. Perhaps has_two_items_that_sum_to_n or has_two_that_sum_to_n? Alternatively, I think my preferred option here would be to get rid of the function altogether and let code speak by itself, writing the whole thing as two nested comprehensions:
```
two_sum = sum(any(n - x in num_set and 2*x != n for x in num_set)

              for n in range(-10000, 10001))
```
There is a school of thought that considers nested comprehensions to be confusing (e.g. the Google Python style guide doesn't allow them), but I think this case is simple enough, YMMV.

answered Jul 20 '16 at 15:02

Jaime

5,8421327

answered Jul 20 '16 at 15:02

Jaime

5,8421327

answered Jul 20 '16 at 15:02

Jaime

5,8421327

answered Jul 20 '16 at 15:02

Jaime

5,8421327

3

If you already got rid of the need to keep num as a list, a set comprehension should build it faster than the detour over the list.
– Graipher
Jul 20 '16 at 15:30

Hi, I get much worse timings. I forgot to say that my integers are in the size of 11 digits. I also use python 3. Maybe this makes a difference
– David Michael Gang
Jul 21 '16 at 6:37

1

Are you on a 32 bit system? If Python has to use its long, unlimited precision integers, then yes, things will get substantially slower. 11 digits is more than 2**31 - 1, but 2**63 - 1 is 18 digits, so on a 64 bit build of Python this should be a non issue. Python 2 or 3 I think should have no major effect on this, although my timings were on Python 2.
– Jaime
Jul 21 '16 at 10:01

add a comment |

3

If you already got rid of the need to keep num as a list, a set comprehension should build it faster than the detour over the list.
– Graipher
Jul 20 '16 at 15:30

Hi, I get much worse timings. I forgot to say that my integers are in the size of 11 digits. I also use python 3. Maybe this makes a difference
– David Michael Gang
Jul 21 '16 at 6:37

1

Are you on a 32 bit system? If Python has to use its long, unlimited precision integers, then yes, things will get substantially slower. 11 digits is more than 2**31 - 1, but 2**63 - 1 is 18 digits, so on a 64 bit build of Python this should be a non issue. Python 2 or 3 I think should have no major effect on this, although my timings were on Python 2.
– Jaime
Jul 21 '16 at 10:01

If you already got rid of the need to keep num as a list, a set comprehension should build it faster than the detour over the list.
– Graipher
Jul 20 '16 at 15:30

Hi, I get much worse timings. I forgot to say that my integers are in the size of 11 digits. I also use python 3. Maybe this makes a difference
– David Michael Gang
Jul 21 '16 at 6:37

Are you on a 32 bit system? If Python has to use its long, unlimited precision integers, then yes, things will get substantially slower. 11 digits is more than 2**31 - 1, but 2**63 - 1 is 18 digits, so on a 64 bit build of Python this should be a non issue. Python 2 or 3 I think should have no major effect on this, although my timings were on Python 2.
– Jaime
Jul 21 '16 at 10:01

add a comment |

To get a grasp of what to speedup I ran a profiler on your code.
To get the 1 million integers I made a function to generate them, with the following:

def generate_random_numbers(amount=1000000):

    import random

    n = [str(int(random.normalvariate(0, amount))) for _ in range(amount)]

    with open('data', 'w') as f:

        f.write('n'.join(n))

Yes I did put the sigma as one million too.
I then copied your code into a function, removed the file part, and ran the profiler over it.
Resulting in the following code:

import cProfile



def read_nums():

    with open('data', 'r') as fp:

        return [int(line) for line in fp]



def has_two_sum(n,num_set,nums):

    res = any(((n-x) in num_set) and 2*x !=n for x in nums)

    return res



def two_sum(nums):

    nums = [int(line) for line in nums]

    num_set = set(nums)

    return sum(1  for n in range(-10000,10001) if has_two_sum(n,num_set,nums))



nums = read_nums()

cProfile.run('two_sum({!r})'.format(nums))

Which resulted in the following profile of your code:

         166366 function calls in 6.966 seconds



   Ordered by: standard name



   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.362    0.362    0.900    0.900 2sum.py:11(two_sum)

        1    0.395    0.395    0.395    0.395 2sum.py:12(<listcomp>)

    20002    0.015    0.000    0.139    0.000 2sum.py:14(<genexpr>)

    20001    0.025    0.000    0.124    0.000 2sum.py:7(has_two_sum)

   106356    0.052    0.000    0.052    0.000 2sum.py:8(<genexpr>)

        1    0.216    0.216    1.116    1.116 <string>:1(<module>)

    20001    0.047    0.000    0.095    0.000 {built-in method builtins.any}

        1    5.851    5.851    6.966    6.966 {built-in method builtins.exec}

        1    0.004    0.004    0.143    0.143 {built-in method builtins.sum}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Which is not 25 minuets for 6200 of the 20000.
Instead it's a lot less. It says it took 6.966, but cumtime of two_sums is 0.900.
I then used a timer on it, which says it runs in ~0.899s.
Using:

print(timeit('two_sum({!r})'.format(nums), 'from __main__ import two_sum', number=100)/100)

I'd remove un-needed brackets

Use a set compression.

Remove odd whitespace. ((1 for ...)

def two_sum(nums):

    nums = {int(n) for n in nums}

    return sum(

        1

        for n in range(-10000, 10001)

        if any(n - x in nums and 2 * x != n for x in nums)

    )

answered Jul 20 '16 at 15:17

Peilonrayz

25.2k337106

Since you profiled the code, is there an advantage in using x + x != n over 2 * x != n?
– Mathias Ettinger
Jul 20 '16 at 15:34

@MathiasEttinger I ran the profile code on the original twice and it had a 10% difference (0.896 - 0.784) in speed, when I changed it to use + it went to 0.804. Honestly I don't know if it's any good...
– Peilonrayz
Jul 20 '16 at 15:45

Ok, nice to know. It was just out of curiosity since the use case was pretty close to the toy example in this doc.
– Mathias Ettinger
Jul 20 '16 at 15:56

add a comment |

To get a grasp of what to speedup I ran a profiler on your code.
To get the 1 million integers I made a function to generate them, with the following:

def generate_random_numbers(amount=1000000):

    import random

    n = [str(int(random.normalvariate(0, amount))) for _ in range(amount)]

    with open('data', 'w') as f:

        f.write('n'.join(n))

Yes I did put the sigma as one million too.
I then copied your code into a function, removed the file part, and ran the profiler over it.
Resulting in the following code:

import cProfile



def read_nums():

    with open('data', 'r') as fp:

        return [int(line) for line in fp]



def has_two_sum(n,num_set,nums):

    res = any(((n-x) in num_set) and 2*x !=n for x in nums)

    return res



def two_sum(nums):

    nums = [int(line) for line in nums]

    num_set = set(nums)

    return sum(1  for n in range(-10000,10001) if has_two_sum(n,num_set,nums))



nums = read_nums()

cProfile.run('two_sum({!r})'.format(nums))

Which resulted in the following profile of your code:

         166366 function calls in 6.966 seconds



   Ordered by: standard name



   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.362    0.362    0.900    0.900 2sum.py:11(two_sum)

        1    0.395    0.395    0.395    0.395 2sum.py:12(<listcomp>)

    20002    0.015    0.000    0.139    0.000 2sum.py:14(<genexpr>)

    20001    0.025    0.000    0.124    0.000 2sum.py:7(has_two_sum)

   106356    0.052    0.000    0.052    0.000 2sum.py:8(<genexpr>)

        1    0.216    0.216    1.116    1.116 <string>:1(<module>)

    20001    0.047    0.000    0.095    0.000 {built-in method builtins.any}

        1    5.851    5.851    6.966    6.966 {built-in method builtins.exec}

        1    0.004    0.004    0.143    0.143 {built-in method builtins.sum}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Which is not 25 minuets for 6200 of the 20000.
Instead it's a lot less. It says it took 6.966, but cumtime of two_sums is 0.900.
I then used a timer on it, which says it runs in ~0.899s.
Using:

print(timeit('two_sum({!r})'.format(nums), 'from __main__ import two_sum', number=100)/100)

I'd remove un-needed brackets

Use a set compression.

Remove odd whitespace. ((1 for ...)

def two_sum(nums):

    nums = {int(n) for n in nums}

    return sum(

        1

        for n in range(-10000, 10001)

        if any(n - x in nums and 2 * x != n for x in nums)

    )

answered Jul 20 '16 at 15:17

Peilonrayz

25.2k337106

Since you profiled the code, is there an advantage in using x + x != n over 2 * x != n?
– Mathias Ettinger
Jul 20 '16 at 15:34

@MathiasEttinger I ran the profile code on the original twice and it had a 10% difference (0.896 - 0.784) in speed, when I changed it to use + it went to 0.804. Honestly I don't know if it's any good...
– Peilonrayz
Jul 20 '16 at 15:45

Ok, nice to know. It was just out of curiosity since the use case was pretty close to the toy example in this doc.
– Mathias Ettinger
Jul 20 '16 at 15:56

add a comment |

To get a grasp of what to speedup I ran a profiler on your code.
To get the 1 million integers I made a function to generate them, with the following:

def generate_random_numbers(amount=1000000):

    import random

    n = [str(int(random.normalvariate(0, amount))) for _ in range(amount)]

    with open('data', 'w') as f:

        f.write('n'.join(n))

Yes I did put the sigma as one million too.
I then copied your code into a function, removed the file part, and ran the profiler over it.
Resulting in the following code:

import cProfile



def read_nums():

    with open('data', 'r') as fp:

        return [int(line) for line in fp]



def has_two_sum(n,num_set,nums):

    res = any(((n-x) in num_set) and 2*x !=n for x in nums)

    return res



def two_sum(nums):

    nums = [int(line) for line in nums]

    num_set = set(nums)

    return sum(1  for n in range(-10000,10001) if has_two_sum(n,num_set,nums))



nums = read_nums()

cProfile.run('two_sum({!r})'.format(nums))

Which resulted in the following profile of your code:

         166366 function calls in 6.966 seconds



   Ordered by: standard name



   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.362    0.362    0.900    0.900 2sum.py:11(two_sum)

        1    0.395    0.395    0.395    0.395 2sum.py:12(<listcomp>)

    20002    0.015    0.000    0.139    0.000 2sum.py:14(<genexpr>)

    20001    0.025    0.000    0.124    0.000 2sum.py:7(has_two_sum)

   106356    0.052    0.000    0.052    0.000 2sum.py:8(<genexpr>)

        1    0.216    0.216    1.116    1.116 <string>:1(<module>)

    20001    0.047    0.000    0.095    0.000 {built-in method builtins.any}

        1    5.851    5.851    6.966    6.966 {built-in method builtins.exec}

        1    0.004    0.004    0.143    0.143 {built-in method builtins.sum}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Which is not 25 minuets for 6200 of the 20000.
Instead it's a lot less. It says it took 6.966, but cumtime of two_sums is 0.900.
I then used a timer on it, which says it runs in ~0.899s.
Using:

print(timeit('two_sum({!r})'.format(nums), 'from __main__ import two_sum', number=100)/100)

I'd remove un-needed brackets

Use a set compression.

Remove odd whitespace. ((1 for ...)

def two_sum(nums):

    nums = {int(n) for n in nums}

    return sum(

        1

        for n in range(-10000, 10001)

        if any(n - x in nums and 2 * x != n for x in nums)

    )

answered Jul 20 '16 at 15:17

Peilonrayz

25.2k337106

To get a grasp of what to speedup I ran a profiler on your code.
To get the 1 million integers I made a function to generate them, with the following:

def generate_random_numbers(amount=1000000):

    import random

    n = [str(int(random.normalvariate(0, amount))) for _ in range(amount)]

    with open('data', 'w') as f:

        f.write('n'.join(n))

Yes I did put the sigma as one million too.
I then copied your code into a function, removed the file part, and ran the profiler over it.
Resulting in the following code:

import cProfile



def read_nums():

    with open('data', 'r') as fp:

        return [int(line) for line in fp]



def has_two_sum(n,num_set,nums):

    res = any(((n-x) in num_set) and 2*x !=n for x in nums)

    return res



def two_sum(nums):

    nums = [int(line) for line in nums]

    num_set = set(nums)

    return sum(1  for n in range(-10000,10001) if has_two_sum(n,num_set,nums))



nums = read_nums()

cProfile.run('two_sum({!r})'.format(nums))

Which resulted in the following profile of your code:

         166366 function calls in 6.966 seconds



   Ordered by: standard name



   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.362    0.362    0.900    0.900 2sum.py:11(two_sum)

        1    0.395    0.395    0.395    0.395 2sum.py:12(<listcomp>)

    20002    0.015    0.000    0.139    0.000 2sum.py:14(<genexpr>)

    20001    0.025    0.000    0.124    0.000 2sum.py:7(has_two_sum)

   106356    0.052    0.000    0.052    0.000 2sum.py:8(<genexpr>)

        1    0.216    0.216    1.116    1.116 <string>:1(<module>)

    20001    0.047    0.000    0.095    0.000 {built-in method builtins.any}

        1    5.851    5.851    6.966    6.966 {built-in method builtins.exec}

        1    0.004    0.004    0.143    0.143 {built-in method builtins.sum}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Which is not 25 minuets for 6200 of the 20000.
Instead it's a lot less. It says it took 6.966, but cumtime of two_sums is 0.900.
I then used a timer on it, which says it runs in ~0.899s.
Using:

print(timeit('two_sum({!r})'.format(nums), 'from __main__ import two_sum', number=100)/100)

I'd remove un-needed brackets

Use a set compression.

Remove odd whitespace. ((1 for ...)

def two_sum(nums):

    nums = {int(n) for n in nums}

    return sum(

        1

        for n in range(-10000, 10001)

        if any(n - x in nums and 2 * x != n for x in nums)

    )

answered Jul 20 '16 at 15:17

Peilonrayz

25.2k337106

answered Jul 20 '16 at 15:17

Peilonrayz

25.2k337106

answered Jul 20 '16 at 15:17

Peilonrayz

25.2k337106

answered Jul 20 '16 at 15:17

Peilonrayz

25.2k337106

Since you profiled the code, is there an advantage in using x + x != n over 2 * x != n?
– Mathias Ettinger
Jul 20 '16 at 15:34

@MathiasEttinger I ran the profile code on the original twice and it had a 10% difference (0.896 - 0.784) in speed, when I changed it to use + it went to 0.804. Honestly I don't know if it's any good...
– Peilonrayz
Jul 20 '16 at 15:45

Ok, nice to know. It was just out of curiosity since the use case was pretty close to the toy example in this doc.
– Mathias Ettinger
Jul 20 '16 at 15:56

add a comment |

Since you profiled the code, is there an advantage in using x + x != n over 2 * x != n?
– Mathias Ettinger
Jul 20 '16 at 15:34

@MathiasEttinger I ran the profile code on the original twice and it had a 10% difference (0.896 - 0.784) in speed, when I changed it to use + it went to 0.804. Honestly I don't know if it's any good...
– Peilonrayz
Jul 20 '16 at 15:45

Ok, nice to know. It was just out of curiosity since the use case was pretty close to the toy example in this doc.
– Mathias Ettinger
Jul 20 '16 at 15:56

Since you profiled the code, is there an advantage in using x + x != n over 2 * x != n?
– Mathias Ettinger
Jul 20 '16 at 15:34

@MathiasEttinger I ran the profile code on the original twice and it had a 10% difference (0.896 - 0.784) in speed, when I changed it to use + it went to 0.804. Honestly I don't know if it's any good...
– Peilonrayz
Jul 20 '16 at 15:45

Ok, nice to know. It was just out of curiosity since the use case was pretty close to the toy example in this doc.
– Mathias Ettinger
Jul 20 '16 at 15:56

add a comment |

3

3

… then I would expect it to be able to form a target sum of 6. Your program reports that 6 cannot be formed.

answered Jul 20 '16 at 16:43

200_success

128k15150412

add a comment |

3

3

… then I would expect it to be able to form a target sum of 6. Your program reports that 6 cannot be formed.

answered Jul 20 '16 at 16:43

200_success

128k15150412

add a comment |

3

3

… then I would expect it to be able to form a target sum of 6. Your program reports that 6 cannot be formed.

answered Jul 20 '16 at 16:43

200_success

128k15150412

3

3

… then I would expect it to be able to form a target sum of 6. Your program reports that 6 cannot be formed.

answered Jul 20 '16 at 16:43

200_success

128k15150412

answered Jul 20 '16 at 16:43

200_success

128k15150412

answered Jul 20 '16 at 16:43

200_success

128k15150412

answered Jul 20 '16 at 16:43

200_success

128k15150412

add a comment |

[EDIT] Integers in Python 3 have arbitrary precision by default (e.g. see this article for details of implementation). In short, they can be represented as

i = x*2^(30*0) + y*2^(30*1) + z*2^(30*2)...

edited Dec 19 at 8:19

answered Dec 18 at 11:14

Sergei Zaitseff

New contributor

add a comment |

[EDIT] Integers in Python 3 have arbitrary precision by default (e.g. see this article for details of implementation). In short, they can be represented as

i = x*2^(30*0) + y*2^(30*1) + z*2^(30*2)...

edited Dec 19 at 8:19

answered Dec 18 at 11:14

Sergei Zaitseff

New contributor

add a comment |

[EDIT] Integers in Python 3 have arbitrary precision by default (e.g. see this article for details of implementation). In short, they can be represented as

i = x*2^(30*0) + y*2^(30*1) + z*2^(30*2)...

edited Dec 19 at 8:19

answered Dec 18 at 11:14

Sergei Zaitseff

New contributor

[EDIT] Integers in Python 3 have arbitrary precision by default (e.g. see this article for details of implementation). In short, they can be represented as

i = x*2^(30*0) + y*2^(30*1) + z*2^(30*2)...

edited Dec 19 at 8:19

answered Dec 18 at 11:14

Sergei Zaitseff

New contributor

edited Dec 19 at 8:19

answered Dec 18 at 11:14

Sergei Zaitseff

New contributor

answered Dec 18 at 11:14

Sergei Zaitseff

answered Dec 18 at 11:14

Sergei Zaitseff

New contributor

Sergei Zaitseff is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gfrktyl