Data transformation involving dictionaries and lists
My initial problem is to transform a List
of Mapping
s whose values are List
s of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").
Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product
.
Input:
[
{
('B3G', 'B1'): [1.0, 2.0],
('B1G', 'B1'): [11.0, 12.0]
},
{
('B2G', 'B1'): [1.5, 2.5, 3.5]
}
]
Output:
[
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
]
To make the matter more confusing, the keys of each dictionaries are Tuple
s of strings.
Here is a possible implementation, using a class
to isolate the whole mess.
@dataclass
class ParametricMapping:
"""Abstraction for multi-dimensional parametric mappings."""
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
"""Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""
labels = [label for arg in self.mappings for label in tuple(arg.keys())]
pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
def cartesian_product(*args):
"""Cartesian product similar to `itertools.product`"""
result = []
for pool in args:
result = [x + [y] for x in result for y in pool]
return result
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
tmp =
for r in results:
tmp.append({k: v for k, v in zip(labels, r)})
if len(tmp) == 0:
return [{}]
else:
return tmp
Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?
python python-3.x
add a comment |
My initial problem is to transform a List
of Mapping
s whose values are List
s of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").
Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product
.
Input:
[
{
('B3G', 'B1'): [1.0, 2.0],
('B1G', 'B1'): [11.0, 12.0]
},
{
('B2G', 'B1'): [1.5, 2.5, 3.5]
}
]
Output:
[
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
]
To make the matter more confusing, the keys of each dictionaries are Tuple
s of strings.
Here is a possible implementation, using a class
to isolate the whole mess.
@dataclass
class ParametricMapping:
"""Abstraction for multi-dimensional parametric mappings."""
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
"""Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""
labels = [label for arg in self.mappings for label in tuple(arg.keys())]
pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
def cartesian_product(*args):
"""Cartesian product similar to `itertools.product`"""
result = []
for pool in args:
result = [x + [y] for x in result for y in pool]
return result
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
tmp =
for r in results:
tmp.append({k: v for k, v in zip(labels, r)})
if len(tmp) == 0:
return [{}]
else:
return tmp
Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?
python python-3.x
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Mathias Ettinger
yesterday
add a comment |
My initial problem is to transform a List
of Mapping
s whose values are List
s of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").
Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product
.
Input:
[
{
('B3G', 'B1'): [1.0, 2.0],
('B1G', 'B1'): [11.0, 12.0]
},
{
('B2G', 'B1'): [1.5, 2.5, 3.5]
}
]
Output:
[
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
]
To make the matter more confusing, the keys of each dictionaries are Tuple
s of strings.
Here is a possible implementation, using a class
to isolate the whole mess.
@dataclass
class ParametricMapping:
"""Abstraction for multi-dimensional parametric mappings."""
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
"""Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""
labels = [label for arg in self.mappings for label in tuple(arg.keys())]
pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
def cartesian_product(*args):
"""Cartesian product similar to `itertools.product`"""
result = []
for pool in args:
result = [x + [y] for x in result for y in pool]
return result
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
tmp =
for r in results:
tmp.append({k: v for k, v in zip(labels, r)})
if len(tmp) == 0:
return [{}]
else:
return tmp
Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?
python python-3.x
My initial problem is to transform a List
of Mapping
s whose values are List
s of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").
Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product
.
Input:
[
{
('B3G', 'B1'): [1.0, 2.0],
('B1G', 'B1'): [11.0, 12.0]
},
{
('B2G', 'B1'): [1.5, 2.5, 3.5]
}
]
Output:
[
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
]
To make the matter more confusing, the keys of each dictionaries are Tuple
s of strings.
Here is a possible implementation, using a class
to isolate the whole mess.
@dataclass
class ParametricMapping:
"""Abstraction for multi-dimensional parametric mappings."""
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
"""Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""
labels = [label for arg in self.mappings for label in tuple(arg.keys())]
pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
def cartesian_product(*args):
"""Cartesian product similar to `itertools.product`"""
result = []
for pool in args:
result = [x + [y] for x in result for y in pool]
return result
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
tmp =
for r in results:
tmp.append({k: v for k, v in zip(labels, r)})
if len(tmp) == 0:
return [{}]
else:
return tmp
Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?
python python-3.x
python python-3.x
edited yesterday
Mathias Ettinger
23.9k33182
23.9k33182
asked 2 days ago
Cedric H.Cedric H.
1515
1515
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Mathias Ettinger
yesterday
add a comment |
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Mathias Ettinger
yesterday
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Mathias Ettinger
yesterday
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Mathias Ettinger
yesterday
add a comment |
1 Answer
1
active
oldest
votes
TL;DR: Scroll to the end for the provided code with suggested improvements
Suggestions
1. Standalone cartesian_product
helper function outside scope of combinations
A preferable solution would be to simply use itertools.product
as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...
If you still don't want to use itertools.product
:
While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product
a nested function inside combinations
serves only to confuse the purpose of combinations
. It is better to define cartesian_product
above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product
’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations
, readers are already familiar with cartesian_product
, and their train of thought isn’t derailed by trying to understand a nested function.
2. flatten
helper function
This helper function should be used thusly:
for term in cartesian_product(*pools):
results.append(flatten(term))
This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten
call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.
3. Consolidate some code
If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
should now be
results =
for term in itertools.product(*pools):
results.append(flatten(term))
This block can be expressed clearly and concisely with the following list comprehension:
results = [flatten(term) for term in itertools.product(*pools)]
Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results
, which is a list containing the flatten
ed outputs of product
.
4. Check for empty mappings
at top of combinations
Instead of the if/else at the end of combinations
that checks for len(tmp) == 0
, make the first two lines of combinations
the following:
if self.mappings == [{}]:
return [{}]
The else
clause at the bottom of combinations
is no longer necessary, and you can simply return tmp
. This is cleaner because it handles the case where mappings
is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp)
was evaluated at the end of combinations
. This also allows readers to make the assumption that mappings
is non-empty when they get into the actual work being done by combinations
, which is just one less thing to worry about. Alternatively, a more concise option would be to replace
if len(tmp) == 0:
return [{}]
else:
return tmp
with
return tmp or [{}]
This works because if mappings
is empty, the value of tmp
will be an empty list, which will evaluate to False, returning the value following the or
. This more concise version may come at the cost of decreased readability.
5. Define labels
and pools
using helper methods or non-init fields
Remove the first two code lines from combinations
, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations
and to clean up the code overall. These both have the added benefit of exposing labels
and pools
as additional attributes of the dataclass
that can be accessed outside of combinations
.
See the ParametricMapping1
class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2
class defined below it for an implementation using non-init fields.
TL;DR
In the end, if all suggestions are followed, the code should include the below declaration of flatten
, along with one of the two following blocks (ParametricMapping1
, or ParametricMapping2
):
def flatten(l):
return [item for sublist in l for item in sublist]
With either ...
@dataclass
class ParametricMapping1:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
def _labels(self) -> List[Tuple[str]]:
return flatten(self.mappings)
def _pools(self) -> List[List[Sequence[float]]]:
return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
if self.mappings == [{}]:
return [{}]
pool_values = [flatten(term) for term in itertools.product(*self._pools())]
return [dict(zip(self._labels(), v)) for v in pool_values]
Or ...
@dataclass
class ParametricMapping2:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
labels: List[Tuple[str]] = field(init=False, repr=False)
pools: List[List[Sequence[float]]] = field(init=False, repr=False)
def __post_init__(self):
self.labels = flatten(self.mappings)
self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
pool_values = [flatten(term) for term in itertools.product(*self.pools)]
return [dict(zip(self.labels, v)) for v in pool_values] or [{}]
Edit (2019-01-09 - 1530):
- The definitions of
_labels
andself.labels
in the above two code blocks, respectively, have been simplified per @MathiasEttinger's excellent suggestion. See revision history for their original definitions.
New contributor
1
Nice answer. Two thoughts though: isn'tlabels
simplyflatten(mappings)
? And isn'tzip
already returningtuple
s? Why keep themap
call?
– Mathias Ettinger
yesterday
Also I like to defineflatten
in terms ofitertools.chain.from_iterable
. But it's mostly a matter of personal taste.
– Mathias Ettinger
yesterday
Thanks a lot for the answer. I will investigate in details. @MathiasEttinger I didn't get your second point.
– Cedric H.
yesterday
2
@CedricH. I usually defineflatten = itertools.chain.from_iterable
as I prefer to work with iterables, but as an alternative to theflatten
proposed by this answer, you could havedef flatten(iterable): return list(itertools.chain.from_iterable(iterable))
instead. As said, there might not be much of a difference appart for style.
– Mathias Ettinger
yesterday
@MathiasEttinger Got it. I also changed the definition oflabels
based on your first comment.
– Cedric H.
yesterday
|
show 3 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f211121%2fdata-transformation-involving-dictionaries-and-lists%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
TL;DR: Scroll to the end for the provided code with suggested improvements
Suggestions
1. Standalone cartesian_product
helper function outside scope of combinations
A preferable solution would be to simply use itertools.product
as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...
If you still don't want to use itertools.product
:
While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product
a nested function inside combinations
serves only to confuse the purpose of combinations
. It is better to define cartesian_product
above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product
’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations
, readers are already familiar with cartesian_product
, and their train of thought isn’t derailed by trying to understand a nested function.
2. flatten
helper function
This helper function should be used thusly:
for term in cartesian_product(*pools):
results.append(flatten(term))
This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten
call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.
3. Consolidate some code
If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
should now be
results =
for term in itertools.product(*pools):
results.append(flatten(term))
This block can be expressed clearly and concisely with the following list comprehension:
results = [flatten(term) for term in itertools.product(*pools)]
Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results
, which is a list containing the flatten
ed outputs of product
.
4. Check for empty mappings
at top of combinations
Instead of the if/else at the end of combinations
that checks for len(tmp) == 0
, make the first two lines of combinations
the following:
if self.mappings == [{}]:
return [{}]
The else
clause at the bottom of combinations
is no longer necessary, and you can simply return tmp
. This is cleaner because it handles the case where mappings
is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp)
was evaluated at the end of combinations
. This also allows readers to make the assumption that mappings
is non-empty when they get into the actual work being done by combinations
, which is just one less thing to worry about. Alternatively, a more concise option would be to replace
if len(tmp) == 0:
return [{}]
else:
return tmp
with
return tmp or [{}]
This works because if mappings
is empty, the value of tmp
will be an empty list, which will evaluate to False, returning the value following the or
. This more concise version may come at the cost of decreased readability.
5. Define labels
and pools
using helper methods or non-init fields
Remove the first two code lines from combinations
, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations
and to clean up the code overall. These both have the added benefit of exposing labels
and pools
as additional attributes of the dataclass
that can be accessed outside of combinations
.
See the ParametricMapping1
class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2
class defined below it for an implementation using non-init fields.
TL;DR
In the end, if all suggestions are followed, the code should include the below declaration of flatten
, along with one of the two following blocks (ParametricMapping1
, or ParametricMapping2
):
def flatten(l):
return [item for sublist in l for item in sublist]
With either ...
@dataclass
class ParametricMapping1:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
def _labels(self) -> List[Tuple[str]]:
return flatten(self.mappings)
def _pools(self) -> List[List[Sequence[float]]]:
return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
if self.mappings == [{}]:
return [{}]
pool_values = [flatten(term) for term in itertools.product(*self._pools())]
return [dict(zip(self._labels(), v)) for v in pool_values]
Or ...
@dataclass
class ParametricMapping2:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
labels: List[Tuple[str]] = field(init=False, repr=False)
pools: List[List[Sequence[float]]] = field(init=False, repr=False)
def __post_init__(self):
self.labels = flatten(self.mappings)
self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
pool_values = [flatten(term) for term in itertools.product(*self.pools)]
return [dict(zip(self.labels, v)) for v in pool_values] or [{}]
Edit (2019-01-09 - 1530):
- The definitions of
_labels
andself.labels
in the above two code blocks, respectively, have been simplified per @MathiasEttinger's excellent suggestion. See revision history for their original definitions.
New contributor
1
Nice answer. Two thoughts though: isn'tlabels
simplyflatten(mappings)
? And isn'tzip
already returningtuple
s? Why keep themap
call?
– Mathias Ettinger
yesterday
Also I like to defineflatten
in terms ofitertools.chain.from_iterable
. But it's mostly a matter of personal taste.
– Mathias Ettinger
yesterday
Thanks a lot for the answer. I will investigate in details. @MathiasEttinger I didn't get your second point.
– Cedric H.
yesterday
2
@CedricH. I usually defineflatten = itertools.chain.from_iterable
as I prefer to work with iterables, but as an alternative to theflatten
proposed by this answer, you could havedef flatten(iterable): return list(itertools.chain.from_iterable(iterable))
instead. As said, there might not be much of a difference appart for style.
– Mathias Ettinger
yesterday
@MathiasEttinger Got it. I also changed the definition oflabels
based on your first comment.
– Cedric H.
yesterday
|
show 3 more comments
TL;DR: Scroll to the end for the provided code with suggested improvements
Suggestions
1. Standalone cartesian_product
helper function outside scope of combinations
A preferable solution would be to simply use itertools.product
as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...
If you still don't want to use itertools.product
:
While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product
a nested function inside combinations
serves only to confuse the purpose of combinations
. It is better to define cartesian_product
above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product
’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations
, readers are already familiar with cartesian_product
, and their train of thought isn’t derailed by trying to understand a nested function.
2. flatten
helper function
This helper function should be used thusly:
for term in cartesian_product(*pools):
results.append(flatten(term))
This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten
call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.
3. Consolidate some code
If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
should now be
results =
for term in itertools.product(*pools):
results.append(flatten(term))
This block can be expressed clearly and concisely with the following list comprehension:
results = [flatten(term) for term in itertools.product(*pools)]
Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results
, which is a list containing the flatten
ed outputs of product
.
4. Check for empty mappings
at top of combinations
Instead of the if/else at the end of combinations
that checks for len(tmp) == 0
, make the first two lines of combinations
the following:
if self.mappings == [{}]:
return [{}]
The else
clause at the bottom of combinations
is no longer necessary, and you can simply return tmp
. This is cleaner because it handles the case where mappings
is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp)
was evaluated at the end of combinations
. This also allows readers to make the assumption that mappings
is non-empty when they get into the actual work being done by combinations
, which is just one less thing to worry about. Alternatively, a more concise option would be to replace
if len(tmp) == 0:
return [{}]
else:
return tmp
with
return tmp or [{}]
This works because if mappings
is empty, the value of tmp
will be an empty list, which will evaluate to False, returning the value following the or
. This more concise version may come at the cost of decreased readability.
5. Define labels
and pools
using helper methods or non-init fields
Remove the first two code lines from combinations
, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations
and to clean up the code overall. These both have the added benefit of exposing labels
and pools
as additional attributes of the dataclass
that can be accessed outside of combinations
.
See the ParametricMapping1
class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2
class defined below it for an implementation using non-init fields.
TL;DR
In the end, if all suggestions are followed, the code should include the below declaration of flatten
, along with one of the two following blocks (ParametricMapping1
, or ParametricMapping2
):
def flatten(l):
return [item for sublist in l for item in sublist]
With either ...
@dataclass
class ParametricMapping1:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
def _labels(self) -> List[Tuple[str]]:
return flatten(self.mappings)
def _pools(self) -> List[List[Sequence[float]]]:
return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
if self.mappings == [{}]:
return [{}]
pool_values = [flatten(term) for term in itertools.product(*self._pools())]
return [dict(zip(self._labels(), v)) for v in pool_values]
Or ...
@dataclass
class ParametricMapping2:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
labels: List[Tuple[str]] = field(init=False, repr=False)
pools: List[List[Sequence[float]]] = field(init=False, repr=False)
def __post_init__(self):
self.labels = flatten(self.mappings)
self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
pool_values = [flatten(term) for term in itertools.product(*self.pools)]
return [dict(zip(self.labels, v)) for v in pool_values] or [{}]
Edit (2019-01-09 - 1530):
- The definitions of
_labels
andself.labels
in the above two code blocks, respectively, have been simplified per @MathiasEttinger's excellent suggestion. See revision history for their original definitions.
New contributor
1
Nice answer. Two thoughts though: isn'tlabels
simplyflatten(mappings)
? And isn'tzip
already returningtuple
s? Why keep themap
call?
– Mathias Ettinger
yesterday
Also I like to defineflatten
in terms ofitertools.chain.from_iterable
. But it's mostly a matter of personal taste.
– Mathias Ettinger
yesterday
Thanks a lot for the answer. I will investigate in details. @MathiasEttinger I didn't get your second point.
– Cedric H.
yesterday
2
@CedricH. I usually defineflatten = itertools.chain.from_iterable
as I prefer to work with iterables, but as an alternative to theflatten
proposed by this answer, you could havedef flatten(iterable): return list(itertools.chain.from_iterable(iterable))
instead. As said, there might not be much of a difference appart for style.
– Mathias Ettinger
yesterday
@MathiasEttinger Got it. I also changed the definition oflabels
based on your first comment.
– Cedric H.
yesterday
|
show 3 more comments
TL;DR: Scroll to the end for the provided code with suggested improvements
Suggestions
1. Standalone cartesian_product
helper function outside scope of combinations
A preferable solution would be to simply use itertools.product
as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...
If you still don't want to use itertools.product
:
While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product
a nested function inside combinations
serves only to confuse the purpose of combinations
. It is better to define cartesian_product
above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product
’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations
, readers are already familiar with cartesian_product
, and their train of thought isn’t derailed by trying to understand a nested function.
2. flatten
helper function
This helper function should be used thusly:
for term in cartesian_product(*pools):
results.append(flatten(term))
This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten
call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.
3. Consolidate some code
If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
should now be
results =
for term in itertools.product(*pools):
results.append(flatten(term))
This block can be expressed clearly and concisely with the following list comprehension:
results = [flatten(term) for term in itertools.product(*pools)]
Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results
, which is a list containing the flatten
ed outputs of product
.
4. Check for empty mappings
at top of combinations
Instead of the if/else at the end of combinations
that checks for len(tmp) == 0
, make the first two lines of combinations
the following:
if self.mappings == [{}]:
return [{}]
The else
clause at the bottom of combinations
is no longer necessary, and you can simply return tmp
. This is cleaner because it handles the case where mappings
is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp)
was evaluated at the end of combinations
. This also allows readers to make the assumption that mappings
is non-empty when they get into the actual work being done by combinations
, which is just one less thing to worry about. Alternatively, a more concise option would be to replace
if len(tmp) == 0:
return [{}]
else:
return tmp
with
return tmp or [{}]
This works because if mappings
is empty, the value of tmp
will be an empty list, which will evaluate to False, returning the value following the or
. This more concise version may come at the cost of decreased readability.
5. Define labels
and pools
using helper methods or non-init fields
Remove the first two code lines from combinations
, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations
and to clean up the code overall. These both have the added benefit of exposing labels
and pools
as additional attributes of the dataclass
that can be accessed outside of combinations
.
See the ParametricMapping1
class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2
class defined below it for an implementation using non-init fields.
TL;DR
In the end, if all suggestions are followed, the code should include the below declaration of flatten
, along with one of the two following blocks (ParametricMapping1
, or ParametricMapping2
):
def flatten(l):
return [item for sublist in l for item in sublist]
With either ...
@dataclass
class ParametricMapping1:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
def _labels(self) -> List[Tuple[str]]:
return flatten(self.mappings)
def _pools(self) -> List[List[Sequence[float]]]:
return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
if self.mappings == [{}]:
return [{}]
pool_values = [flatten(term) for term in itertools.product(*self._pools())]
return [dict(zip(self._labels(), v)) for v in pool_values]
Or ...
@dataclass
class ParametricMapping2:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
labels: List[Tuple[str]] = field(init=False, repr=False)
pools: List[List[Sequence[float]]] = field(init=False, repr=False)
def __post_init__(self):
self.labels = flatten(self.mappings)
self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
pool_values = [flatten(term) for term in itertools.product(*self.pools)]
return [dict(zip(self.labels, v)) for v in pool_values] or [{}]
Edit (2019-01-09 - 1530):
- The definitions of
_labels
andself.labels
in the above two code blocks, respectively, have been simplified per @MathiasEttinger's excellent suggestion. See revision history for their original definitions.
New contributor
TL;DR: Scroll to the end for the provided code with suggested improvements
Suggestions
1. Standalone cartesian_product
helper function outside scope of combinations
A preferable solution would be to simply use itertools.product
as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...
If you still don't want to use itertools.product
:
While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product
a nested function inside combinations
serves only to confuse the purpose of combinations
. It is better to define cartesian_product
above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product
’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations
, readers are already familiar with cartesian_product
, and their train of thought isn’t derailed by trying to understand a nested function.
2. flatten
helper function
This helper function should be used thusly:
for term in cartesian_product(*pools):
results.append(flatten(term))
This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten
call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.
3. Consolidate some code
If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally
results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])
should now be
results =
for term in itertools.product(*pools):
results.append(flatten(term))
This block can be expressed clearly and concisely with the following list comprehension:
results = [flatten(term) for term in itertools.product(*pools)]
Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results
, which is a list containing the flatten
ed outputs of product
.
4. Check for empty mappings
at top of combinations
Instead of the if/else at the end of combinations
that checks for len(tmp) == 0
, make the first two lines of combinations
the following:
if self.mappings == [{}]:
return [{}]
The else
clause at the bottom of combinations
is no longer necessary, and you can simply return tmp
. This is cleaner because it handles the case where mappings
is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp)
was evaluated at the end of combinations
. This also allows readers to make the assumption that mappings
is non-empty when they get into the actual work being done by combinations
, which is just one less thing to worry about. Alternatively, a more concise option would be to replace
if len(tmp) == 0:
return [{}]
else:
return tmp
with
return tmp or [{}]
This works because if mappings
is empty, the value of tmp
will be an empty list, which will evaluate to False, returning the value following the or
. This more concise version may come at the cost of decreased readability.
5. Define labels
and pools
using helper methods or non-init fields
Remove the first two code lines from combinations
, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations
and to clean up the code overall. These both have the added benefit of exposing labels
and pools
as additional attributes of the dataclass
that can be accessed outside of combinations
.
See the ParametricMapping1
class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2
class defined below it for an implementation using non-init fields.
TL;DR
In the end, if all suggestions are followed, the code should include the below declaration of flatten
, along with one of the two following blocks (ParametricMapping1
, or ParametricMapping2
):
def flatten(l):
return [item for sublist in l for item in sublist]
With either ...
@dataclass
class ParametricMapping1:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
def _labels(self) -> List[Tuple[str]]:
return flatten(self.mappings)
def _pools(self) -> List[List[Sequence[float]]]:
return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
if self.mappings == [{}]:
return [{}]
pool_values = [flatten(term) for term in itertools.product(*self._pools())]
return [dict(zip(self._labels(), v)) for v in pool_values]
Or ...
@dataclass
class ParametricMapping2:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
labels: List[Tuple[str]] = field(init=False, repr=False)
pools: List[List[Sequence[float]]] = field(init=False, repr=False)
def __post_init__(self):
self.labels = flatten(self.mappings)
self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]
@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
pool_values = [flatten(term) for term in itertools.product(*self.pools)]
return [dict(zip(self.labels, v)) for v in pool_values] or [{}]
Edit (2019-01-09 - 1530):
- The definitions of
_labels
andself.labels
in the above two code blocks, respectively, have been simplified per @MathiasEttinger's excellent suggestion. See revision history for their original definitions.
New contributor
edited yesterday
New contributor
answered 2 days ago
Hunter McGushionHunter McGushion
1065
1065
New contributor
New contributor
1
Nice answer. Two thoughts though: isn'tlabels
simplyflatten(mappings)
? And isn'tzip
already returningtuple
s? Why keep themap
call?
– Mathias Ettinger
yesterday
Also I like to defineflatten
in terms ofitertools.chain.from_iterable
. But it's mostly a matter of personal taste.
– Mathias Ettinger
yesterday
Thanks a lot for the answer. I will investigate in details. @MathiasEttinger I didn't get your second point.
– Cedric H.
yesterday
2
@CedricH. I usually defineflatten = itertools.chain.from_iterable
as I prefer to work with iterables, but as an alternative to theflatten
proposed by this answer, you could havedef flatten(iterable): return list(itertools.chain.from_iterable(iterable))
instead. As said, there might not be much of a difference appart for style.
– Mathias Ettinger
yesterday
@MathiasEttinger Got it. I also changed the definition oflabels
based on your first comment.
– Cedric H.
yesterday
|
show 3 more comments
1
Nice answer. Two thoughts though: isn'tlabels
simplyflatten(mappings)
? And isn'tzip
already returningtuple
s? Why keep themap
call?
– Mathias Ettinger
yesterday
Also I like to defineflatten
in terms ofitertools.chain.from_iterable
. But it's mostly a matter of personal taste.
– Mathias Ettinger
yesterday
Thanks a lot for the answer. I will investigate in details. @MathiasEttinger I didn't get your second point.
– Cedric H.
yesterday
2
@CedricH. I usually defineflatten = itertools.chain.from_iterable
as I prefer to work with iterables, but as an alternative to theflatten
proposed by this answer, you could havedef flatten(iterable): return list(itertools.chain.from_iterable(iterable))
instead. As said, there might not be much of a difference appart for style.
– Mathias Ettinger
yesterday
@MathiasEttinger Got it. I also changed the definition oflabels
based on your first comment.
– Cedric H.
yesterday
1
1
Nice answer. Two thoughts though: isn't
labels
simply flatten(mappings)
? And isn't zip
already returning tuple
s? Why keep the map
call?– Mathias Ettinger
yesterday
Nice answer. Two thoughts though: isn't
labels
simply flatten(mappings)
? And isn't zip
already returning tuple
s? Why keep the map
call?– Mathias Ettinger
yesterday
Also I like to define
flatten
in terms of itertools.chain.from_iterable
. But it's mostly a matter of personal taste.– Mathias Ettinger
yesterday
Also I like to define
flatten
in terms of itertools.chain.from_iterable
. But it's mostly a matter of personal taste.– Mathias Ettinger
yesterday
Thanks a lot for the answer. I will investigate in details. @MathiasEttinger I didn't get your second point.
– Cedric H.
yesterday
Thanks a lot for the answer. I will investigate in details. @MathiasEttinger I didn't get your second point.
– Cedric H.
yesterday
2
2
@CedricH. I usually define
flatten = itertools.chain.from_iterable
as I prefer to work with iterables, but as an alternative to the flatten
proposed by this answer, you could have def flatten(iterable): return list(itertools.chain.from_iterable(iterable))
instead. As said, there might not be much of a difference appart for style.– Mathias Ettinger
yesterday
@CedricH. I usually define
flatten = itertools.chain.from_iterable
as I prefer to work with iterables, but as an alternative to the flatten
proposed by this answer, you could have def flatten(iterable): return list(itertools.chain.from_iterable(iterable))
instead. As said, there might not be much of a difference appart for style.– Mathias Ettinger
yesterday
@MathiasEttinger Got it. I also changed the definition of
labels
based on your first comment.– Cedric H.
yesterday
@MathiasEttinger Got it. I also changed the definition of
labels
based on your first comment.– Cedric H.
yesterday
|
show 3 more comments
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f211121%2fdata-transformation-involving-dictionaries-and-lists%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Mathias Ettinger
yesterday