Data transformation involving dictionaries and lists












4














My initial problem is to transform a List of Mappings whose values are Lists of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").



Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product.



Input:



[
{
('B3G', 'B1'): [1.0, 2.0],
('B1G', 'B1'): [11.0, 12.0]
},
{
('B2G', 'B1'): [1.5, 2.5, 3.5]
}
]


Output:



[
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
]


To make the matter more confusing, the keys of each dictionaries are Tuples of strings.



Here is a possible implementation, using a class to isolate the whole mess.



@dataclass
class ParametricMapping:
"""Abstraction for multi-dimensional parametric mappings."""

mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
"""Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""

labels = [label for arg in self.mappings for label in tuple(arg.keys())]
pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

def cartesian_product(*args):
"""Cartesian product similar to `itertools.product`"""
result = []
for pool in args:
result = [x + [y] for x in result for y in pool]
return result

results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])

tmp =
for r in results:
tmp.append({k: v for k, v in zip(labels, r)})

if len(tmp) == 0:
return [{}]
else:
return tmp


Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?










share|improve this question
























  • Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Mathias Ettinger
    yesterday
















4














My initial problem is to transform a List of Mappings whose values are Lists of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").



Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product.



Input:



[
{
('B3G', 'B1'): [1.0, 2.0],
('B1G', 'B1'): [11.0, 12.0]
},
{
('B2G', 'B1'): [1.5, 2.5, 3.5]
}
]


Output:



[
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
]


To make the matter more confusing, the keys of each dictionaries are Tuples of strings.



Here is a possible implementation, using a class to isolate the whole mess.



@dataclass
class ParametricMapping:
"""Abstraction for multi-dimensional parametric mappings."""

mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
"""Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""

labels = [label for arg in self.mappings for label in tuple(arg.keys())]
pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

def cartesian_product(*args):
"""Cartesian product similar to `itertools.product`"""
result = []
for pool in args:
result = [x + [y] for x in result for y in pool]
return result

results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])

tmp =
for r in results:
tmp.append({k: v for k, v in zip(labels, r)})

if len(tmp) == 0:
return [{}]
else:
return tmp


Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?










share|improve this question
























  • Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Mathias Ettinger
    yesterday














4












4








4







My initial problem is to transform a List of Mappings whose values are Lists of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").



Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product.



Input:



[
{
('B3G', 'B1'): [1.0, 2.0],
('B1G', 'B1'): [11.0, 12.0]
},
{
('B2G', 'B1'): [1.5, 2.5, 3.5]
}
]


Output:



[
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
]


To make the matter more confusing, the keys of each dictionaries are Tuples of strings.



Here is a possible implementation, using a class to isolate the whole mess.



@dataclass
class ParametricMapping:
"""Abstraction for multi-dimensional parametric mappings."""

mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
"""Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""

labels = [label for arg in self.mappings for label in tuple(arg.keys())]
pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

def cartesian_product(*args):
"""Cartesian product similar to `itertools.product`"""
result = []
for pool in args:
result = [x + [y] for x in result for y in pool]
return result

results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])

tmp =
for r in results:
tmp.append({k: v for k, v in zip(labels, r)})

if len(tmp) == 0:
return [{}]
else:
return tmp


Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?










share|improve this question















My initial problem is to transform a List of Mappings whose values are Lists of values. The transformed list must contain the Cartesian product of all lists (those that are values in the dictionaries) that belong to separate dictionaries (in other words, the values of the lists present in the same directories are "coupled").



Basically, if you disregard the keys of the dictionaries, this would be solved simply with itertools.product.



Input:



[
{
('B3G', 'B1'): [1.0, 2.0],
('B1G', 'B1'): [11.0, 12.0]
},
{
('B2G', 'B1'): [1.5, 2.5, 3.5]
}
]


Output:



[
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 1.0, ('B1G', 'B1'): 11.0, ('B2G', 'B1'): 3.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 1.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 2.5},
{('B3G', 'B1'): 2.0, ('B1G', 'B1'): 12.0, ('B2G', 'B1'): 3.5}
]


To make the matter more confusing, the keys of each dictionaries are Tuples of strings.



Here is a possible implementation, using a class to isolate the whole mess.



@dataclass
class ParametricMapping:
"""Abstraction for multi-dimensional parametric mappings."""

mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
"""Cartesian product adapted to work with dictionaries, roughly similar to `itertools.product`."""

labels = [label for arg in self.mappings for label in tuple(arg.keys())]
pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

def cartesian_product(*args):
"""Cartesian product similar to `itertools.product`"""
result = []
for pool in args:
result = [x + [y] for x in result for y in pool]
return result

results =
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])

tmp =
for r in results:
tmp.append({k: v for k, v in zip(labels, r)})

if len(tmp) == 0:
return [{}]
else:
return tmp


Question: How can I improve this to make it cleaner (priority #1) and faster (#2)?







python python-3.x






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited yesterday









Mathias Ettinger

23.9k33182




23.9k33182










asked 2 days ago









Cedric H.Cedric H.

1515




1515












  • Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Mathias Ettinger
    yesterday


















  • Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Mathias Ettinger
    yesterday
















Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Mathias Ettinger
yesterday




Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Mathias Ettinger
yesterday










1 Answer
1






active

oldest

votes


















6














TL;DR: Scroll to the end for the provided code with suggested improvements



Suggestions



1. Standalone cartesian_product helper function outside scope of combinations



A preferable solution would be to simply use itertools.product as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...



If you still don't want to use itertools.product:



While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product a nested function inside combinations serves only to confuse the purpose of combinations. It is better to define cartesian_product above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations, readers are already familiar with cartesian_product, and their train of thought isn’t derailed by trying to understand a nested function.



2. flatten helper function



This helper function should be used thusly:



for term in cartesian_product(*pools):
results.append(flatten(term))


This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.



3. Consolidate some code



If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally



results = 
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])


should now be



results = 
for term in itertools.product(*pools):
results.append(flatten(term))


This block can be expressed clearly and concisely with the following list comprehension:



results = [flatten(term) for term in itertools.product(*pools)]


Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results, which is a list containing the flattened outputs of product.



4. Check for empty mappings at top of combinations



Instead of the if/else at the end of combinations that checks for len(tmp) == 0, make the first two lines of combinations the following:



if self.mappings == [{}]:
return [{}]


The else clause at the bottom of combinations is no longer necessary, and you can simply return tmp. This is cleaner because it handles the case where mappings is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp) was evaluated at the end of combinations. This also allows readers to make the assumption that mappings is non-empty when they get into the actual work being done by combinations, which is just one less thing to worry about. Alternatively, a more concise option would be to replace



if len(tmp) == 0:
return [{}]
else:
return tmp


with



return tmp or [{}]


This works because if mappings is empty, the value of tmp will be an empty list, which will evaluate to False, returning the value following the or. This more concise version may come at the cost of decreased readability.



5. Define labels and pools using helper methods or non-init fields



Remove the first two code lines from combinations, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations and to clean up the code overall. These both have the added benefit of exposing labels and pools as additional attributes of the dataclass that can be accessed outside of combinations.



See the ParametricMapping1 class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2 class defined below it for an implementation using non-init fields.



TL;DR



In the end, if all suggestions are followed, the code should include the below declaration of flatten, along with one of the two following blocks (ParametricMapping1, or ParametricMapping2):



def flatten(l):
return [item for sublist in l for item in sublist]


With either ...



@dataclass
class ParametricMapping1:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

def _labels(self) -> List[Tuple[str]]:
return flatten(self.mappings)

def _pools(self) -> List[List[Sequence[float]]]:
return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
if self.mappings == [{}]:
return [{}]

pool_values = [flatten(term) for term in itertools.product(*self._pools())]
return [dict(zip(self._labels(), v)) for v in pool_values]


Or ...



@dataclass
class ParametricMapping2:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
labels: List[Tuple[str]] = field(init=False, repr=False)
pools: List[List[Sequence[float]]] = field(init=False, repr=False)

def __post_init__(self):
self.labels = flatten(self.mappings)
self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
pool_values = [flatten(term) for term in itertools.product(*self.pools)]
return [dict(zip(self.labels, v)) for v in pool_values] or [{}]


Edit (2019-01-09 - 1530):




  • The definitions of _labels and self.labels in the above two code blocks, respectively, have been simplified per @MathiasEttinger's excellent suggestion. See revision history for their original definitions.






share|improve this answer










New contributor




Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.














  • 1




    Nice answer. Two thoughts though: isn't labels simply flatten(mappings)? And isn't zip already returning tuples? Why keep the map call?
    – Mathias Ettinger
    yesterday










  • Also I like to define flatten in terms of itertools.chain.from_iterable. But it's mostly a matter of personal taste.
    – Mathias Ettinger
    yesterday










  • Thanks a lot for the answer. I will investigate in details. @MathiasEttinger I didn't get your second point.
    – Cedric H.
    yesterday






  • 2




    @CedricH. I usually define flatten = itertools.chain.from_iterable as I prefer to work with iterables, but as an alternative to the flatten proposed by this answer, you could have def flatten(iterable): return list(itertools.chain.from_iterable(iterable)) instead. As said, there might not be much of a difference appart for style.
    – Mathias Ettinger
    yesterday












  • @MathiasEttinger Got it. I also changed the definition of labels based on your first comment.
    – Cedric H.
    yesterday











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f211121%2fdata-transformation-involving-dictionaries-and-lists%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









6














TL;DR: Scroll to the end for the provided code with suggested improvements



Suggestions



1. Standalone cartesian_product helper function outside scope of combinations



A preferable solution would be to simply use itertools.product as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...



If you still don't want to use itertools.product:



While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product a nested function inside combinations serves only to confuse the purpose of combinations. It is better to define cartesian_product above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations, readers are already familiar with cartesian_product, and their train of thought isn’t derailed by trying to understand a nested function.



2. flatten helper function



This helper function should be used thusly:



for term in cartesian_product(*pools):
results.append(flatten(term))


This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.



3. Consolidate some code



If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally



results = 
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])


should now be



results = 
for term in itertools.product(*pools):
results.append(flatten(term))


This block can be expressed clearly and concisely with the following list comprehension:



results = [flatten(term) for term in itertools.product(*pools)]


Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results, which is a list containing the flattened outputs of product.



4. Check for empty mappings at top of combinations



Instead of the if/else at the end of combinations that checks for len(tmp) == 0, make the first two lines of combinations the following:



if self.mappings == [{}]:
return [{}]


The else clause at the bottom of combinations is no longer necessary, and you can simply return tmp. This is cleaner because it handles the case where mappings is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp) was evaluated at the end of combinations. This also allows readers to make the assumption that mappings is non-empty when they get into the actual work being done by combinations, which is just one less thing to worry about. Alternatively, a more concise option would be to replace



if len(tmp) == 0:
return [{}]
else:
return tmp


with



return tmp or [{}]


This works because if mappings is empty, the value of tmp will be an empty list, which will evaluate to False, returning the value following the or. This more concise version may come at the cost of decreased readability.



5. Define labels and pools using helper methods or non-init fields



Remove the first two code lines from combinations, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations and to clean up the code overall. These both have the added benefit of exposing labels and pools as additional attributes of the dataclass that can be accessed outside of combinations.



See the ParametricMapping1 class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2 class defined below it for an implementation using non-init fields.



TL;DR



In the end, if all suggestions are followed, the code should include the below declaration of flatten, along with one of the two following blocks (ParametricMapping1, or ParametricMapping2):



def flatten(l):
return [item for sublist in l for item in sublist]


With either ...



@dataclass
class ParametricMapping1:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

def _labels(self) -> List[Tuple[str]]:
return flatten(self.mappings)

def _pools(self) -> List[List[Sequence[float]]]:
return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
if self.mappings == [{}]:
return [{}]

pool_values = [flatten(term) for term in itertools.product(*self._pools())]
return [dict(zip(self._labels(), v)) for v in pool_values]


Or ...



@dataclass
class ParametricMapping2:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
labels: List[Tuple[str]] = field(init=False, repr=False)
pools: List[List[Sequence[float]]] = field(init=False, repr=False)

def __post_init__(self):
self.labels = flatten(self.mappings)
self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
pool_values = [flatten(term) for term in itertools.product(*self.pools)]
return [dict(zip(self.labels, v)) for v in pool_values] or [{}]


Edit (2019-01-09 - 1530):




  • The definitions of _labels and self.labels in the above two code blocks, respectively, have been simplified per @MathiasEttinger's excellent suggestion. See revision history for their original definitions.






share|improve this answer










New contributor




Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.














  • 1




    Nice answer. Two thoughts though: isn't labels simply flatten(mappings)? And isn't zip already returning tuples? Why keep the map call?
    – Mathias Ettinger
    yesterday










  • Also I like to define flatten in terms of itertools.chain.from_iterable. But it's mostly a matter of personal taste.
    – Mathias Ettinger
    yesterday










  • Thanks a lot for the answer. I will investigate in details. @MathiasEttinger I didn't get your second point.
    – Cedric H.
    yesterday






  • 2




    @CedricH. I usually define flatten = itertools.chain.from_iterable as I prefer to work with iterables, but as an alternative to the flatten proposed by this answer, you could have def flatten(iterable): return list(itertools.chain.from_iterable(iterable)) instead. As said, there might not be much of a difference appart for style.
    – Mathias Ettinger
    yesterday












  • @MathiasEttinger Got it. I also changed the definition of labels based on your first comment.
    – Cedric H.
    yesterday
















6














TL;DR: Scroll to the end for the provided code with suggested improvements



Suggestions



1. Standalone cartesian_product helper function outside scope of combinations



A preferable solution would be to simply use itertools.product as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...



If you still don't want to use itertools.product:



While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product a nested function inside combinations serves only to confuse the purpose of combinations. It is better to define cartesian_product above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations, readers are already familiar with cartesian_product, and their train of thought isn’t derailed by trying to understand a nested function.



2. flatten helper function



This helper function should be used thusly:



for term in cartesian_product(*pools):
results.append(flatten(term))


This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.



3. Consolidate some code



If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally



results = 
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])


should now be



results = 
for term in itertools.product(*pools):
results.append(flatten(term))


This block can be expressed clearly and concisely with the following list comprehension:



results = [flatten(term) for term in itertools.product(*pools)]


Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results, which is a list containing the flattened outputs of product.



4. Check for empty mappings at top of combinations



Instead of the if/else at the end of combinations that checks for len(tmp) == 0, make the first two lines of combinations the following:



if self.mappings == [{}]:
return [{}]


The else clause at the bottom of combinations is no longer necessary, and you can simply return tmp. This is cleaner because it handles the case where mappings is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp) was evaluated at the end of combinations. This also allows readers to make the assumption that mappings is non-empty when they get into the actual work being done by combinations, which is just one less thing to worry about. Alternatively, a more concise option would be to replace



if len(tmp) == 0:
return [{}]
else:
return tmp


with



return tmp or [{}]


This works because if mappings is empty, the value of tmp will be an empty list, which will evaluate to False, returning the value following the or. This more concise version may come at the cost of decreased readability.



5. Define labels and pools using helper methods or non-init fields



Remove the first two code lines from combinations, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations and to clean up the code overall. These both have the added benefit of exposing labels and pools as additional attributes of the dataclass that can be accessed outside of combinations.



See the ParametricMapping1 class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2 class defined below it for an implementation using non-init fields.



TL;DR



In the end, if all suggestions are followed, the code should include the below declaration of flatten, along with one of the two following blocks (ParametricMapping1, or ParametricMapping2):



def flatten(l):
return [item for sublist in l for item in sublist]


With either ...



@dataclass
class ParametricMapping1:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

def _labels(self) -> List[Tuple[str]]:
return flatten(self.mappings)

def _pools(self) -> List[List[Sequence[float]]]:
return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
if self.mappings == [{}]:
return [{}]

pool_values = [flatten(term) for term in itertools.product(*self._pools())]
return [dict(zip(self._labels(), v)) for v in pool_values]


Or ...



@dataclass
class ParametricMapping2:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
labels: List[Tuple[str]] = field(init=False, repr=False)
pools: List[List[Sequence[float]]] = field(init=False, repr=False)

def __post_init__(self):
self.labels = flatten(self.mappings)
self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
pool_values = [flatten(term) for term in itertools.product(*self.pools)]
return [dict(zip(self.labels, v)) for v in pool_values] or [{}]


Edit (2019-01-09 - 1530):




  • The definitions of _labels and self.labels in the above two code blocks, respectively, have been simplified per @MathiasEttinger's excellent suggestion. See revision history for their original definitions.






share|improve this answer










New contributor




Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.














  • 1




    Nice answer. Two thoughts though: isn't labels simply flatten(mappings)? And isn't zip already returning tuples? Why keep the map call?
    – Mathias Ettinger
    yesterday










  • Also I like to define flatten in terms of itertools.chain.from_iterable. But it's mostly a matter of personal taste.
    – Mathias Ettinger
    yesterday










  • Thanks a lot for the answer. I will investigate in details. @MathiasEttinger I didn't get your second point.
    – Cedric H.
    yesterday






  • 2




    @CedricH. I usually define flatten = itertools.chain.from_iterable as I prefer to work with iterables, but as an alternative to the flatten proposed by this answer, you could have def flatten(iterable): return list(itertools.chain.from_iterable(iterable)) instead. As said, there might not be much of a difference appart for style.
    – Mathias Ettinger
    yesterday












  • @MathiasEttinger Got it. I also changed the definition of labels based on your first comment.
    – Cedric H.
    yesterday














6












6








6






TL;DR: Scroll to the end for the provided code with suggested improvements



Suggestions



1. Standalone cartesian_product helper function outside scope of combinations



A preferable solution would be to simply use itertools.product as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...



If you still don't want to use itertools.product:



While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product a nested function inside combinations serves only to confuse the purpose of combinations. It is better to define cartesian_product above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations, readers are already familiar with cartesian_product, and their train of thought isn’t derailed by trying to understand a nested function.



2. flatten helper function



This helper function should be used thusly:



for term in cartesian_product(*pools):
results.append(flatten(term))


This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.



3. Consolidate some code



If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally



results = 
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])


should now be



results = 
for term in itertools.product(*pools):
results.append(flatten(term))


This block can be expressed clearly and concisely with the following list comprehension:



results = [flatten(term) for term in itertools.product(*pools)]


Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results, which is a list containing the flattened outputs of product.



4. Check for empty mappings at top of combinations



Instead of the if/else at the end of combinations that checks for len(tmp) == 0, make the first two lines of combinations the following:



if self.mappings == [{}]:
return [{}]


The else clause at the bottom of combinations is no longer necessary, and you can simply return tmp. This is cleaner because it handles the case where mappings is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp) was evaluated at the end of combinations. This also allows readers to make the assumption that mappings is non-empty when they get into the actual work being done by combinations, which is just one less thing to worry about. Alternatively, a more concise option would be to replace



if len(tmp) == 0:
return [{}]
else:
return tmp


with



return tmp or [{}]


This works because if mappings is empty, the value of tmp will be an empty list, which will evaluate to False, returning the value following the or. This more concise version may come at the cost of decreased readability.



5. Define labels and pools using helper methods or non-init fields



Remove the first two code lines from combinations, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations and to clean up the code overall. These both have the added benefit of exposing labels and pools as additional attributes of the dataclass that can be accessed outside of combinations.



See the ParametricMapping1 class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2 class defined below it for an implementation using non-init fields.



TL;DR



In the end, if all suggestions are followed, the code should include the below declaration of flatten, along with one of the two following blocks (ParametricMapping1, or ParametricMapping2):



def flatten(l):
return [item for sublist in l for item in sublist]


With either ...



@dataclass
class ParametricMapping1:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

def _labels(self) -> List[Tuple[str]]:
return flatten(self.mappings)

def _pools(self) -> List[List[Sequence[float]]]:
return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
if self.mappings == [{}]:
return [{}]

pool_values = [flatten(term) for term in itertools.product(*self._pools())]
return [dict(zip(self._labels(), v)) for v in pool_values]


Or ...



@dataclass
class ParametricMapping2:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
labels: List[Tuple[str]] = field(init=False, repr=False)
pools: List[List[Sequence[float]]] = field(init=False, repr=False)

def __post_init__(self):
self.labels = flatten(self.mappings)
self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
pool_values = [flatten(term) for term in itertools.product(*self.pools)]
return [dict(zip(self.labels, v)) for v in pool_values] or [{}]


Edit (2019-01-09 - 1530):




  • The definitions of _labels and self.labels in the above two code blocks, respectively, have been simplified per @MathiasEttinger's excellent suggestion. See revision history for their original definitions.






share|improve this answer










New contributor




Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









TL;DR: Scroll to the end for the provided code with suggested improvements



Suggestions



1. Standalone cartesian_product helper function outside scope of combinations



A preferable solution would be to simply use itertools.product as most Python-savvy readers will be familiar with it - and it is well-documented for those that aren’t. However, since you explicitly mention it...



If you still don't want to use itertools.product:



While it does improve clarity to construct a hierarchy of scopes that expose things only when necessary, in this case, I believe that making cartesian_product a nested function inside combinations serves only to confuse the purpose of combinations. It is better to define cartesian_product above, making the code below cleaner and easier to understand. Now someone reading your code will first see cartesian_product’s definition and understand its relatively simple purpose. Then, inside ParametricMapping.combinations, readers are already familiar with cartesian_product, and their train of thought isn’t derailed by trying to understand a nested function.



2. flatten helper function



This helper function should be used thusly:



for term in cartesian_product(*pools):
results.append(flatten(term))


This may seem silly as flattening is a simple operation, but in this example there are a few fairly tricky list/dict comprehensions nearby. Therefore, it may help to replace that piece with a simple flatten call to clean up some confusion and to emphasize that this is a straight-forward operation - A fact that could be lost on some readers of the code in its current state. My point here is that stacking lots of loops and comprehensions on top of each other (especially without documentation/comments) can quickly get messy and confusing.



3. Consolidate some code



If you took both of the above suggestions, a nice opportunity to consolidate some code has presented itself. After the above changes, the section that was originally



results = 
for term in cartesian_product(*pools):
results.append([pp for p in term for pp in p])


should now be



results = 
for term in itertools.product(*pools):
results.append(flatten(term))


This block can be expressed clearly and concisely with the following list comprehension:



results = [flatten(term) for term in itertools.product(*pools)]


Note that because of the separation of functionality into helper functions, the purpose of this list comprehension is abundantly clear. It creates results, which is a list containing the flattened outputs of product.



4. Check for empty mappings at top of combinations



Instead of the if/else at the end of combinations that checks for len(tmp) == 0, make the first two lines of combinations the following:



if self.mappings == [{}]:
return [{}]


The else clause at the bottom of combinations is no longer necessary, and you can simply return tmp. This is cleaner because it handles the case where mappings is empty immediately, which means those cases bypass all the code that was being executed before when len(tmp) was evaluated at the end of combinations. This also allows readers to make the assumption that mappings is non-empty when they get into the actual work being done by combinations, which is just one less thing to worry about. Alternatively, a more concise option would be to replace



if len(tmp) == 0:
return [{}]
else:
return tmp


with



return tmp or [{}]


This works because if mappings is empty, the value of tmp will be an empty list, which will evaluate to False, returning the value following the or. This more concise version may come at the cost of decreased readability.



5. Define labels and pools using helper methods or non-init fields



Remove the first two code lines from combinations, and move them to either helper methods, or additional fields that are processed post-initialization. I recommend doing this to further clarify the work actually being done by combinations and to clean up the code overall. These both have the added benefit of exposing labels and pools as additional attributes of the dataclass that can be accessed outside of combinations.



See the ParametricMapping1 class defined in the section below for an implementation using helper methods, and see the alternative ParametricMapping2 class defined below it for an implementation using non-init fields.



TL;DR



In the end, if all suggestions are followed, the code should include the below declaration of flatten, along with one of the two following blocks (ParametricMapping1, or ParametricMapping2):



def flatten(l):
return [item for sublist in l for item in sublist]


With either ...



@dataclass
class ParametricMapping1:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])

def _labels(self) -> List[Tuple[str]]:
return flatten(self.mappings)

def _pools(self) -> List[List[Sequence[float]]]:
return [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
if self.mappings == [{}]:
return [{}]

pool_values = [flatten(term) for term in itertools.product(*self._pools())]
return [dict(zip(self._labels(), v)) for v in pool_values]


Or ...



@dataclass
class ParametricMapping2:
mappings: List[Mapping[Tuple[str], Sequence[float]]] = field(default_factory=lambda: [{}])
labels: List[Tuple[str]] = field(init=False, repr=False)
pools: List[List[Sequence[float]]] = field(init=False, repr=False)

def __post_init__(self):
self.labels = flatten(self.mappings)
self.pools = [list(map(tuple, zip(*arg.values()))) for arg in self.mappings]

@property
def combinations(self) -> List[Mapping[Tuple[str], float]]:
pool_values = [flatten(term) for term in itertools.product(*self.pools)]
return [dict(zip(self.labels, v)) for v in pool_values] or [{}]


Edit (2019-01-09 - 1530):




  • The definitions of _labels and self.labels in the above two code blocks, respectively, have been simplified per @MathiasEttinger's excellent suggestion. See revision history for their original definitions.







share|improve this answer










New contributor




Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this answer



share|improve this answer








edited yesterday





















New contributor




Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









answered 2 days ago









Hunter McGushionHunter McGushion

1065




1065




New contributor




Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Hunter McGushion is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 1




    Nice answer. Two thoughts though: isn't labels simply flatten(mappings)? And isn't zip already returning tuples? Why keep the map call?
    – Mathias Ettinger
    yesterday










  • Also I like to define flatten in terms of itertools.chain.from_iterable. But it's mostly a matter of personal taste.
    – Mathias Ettinger
    yesterday










  • Thanks a lot for the answer. I will investigate in details. @MathiasEttinger I didn't get your second point.
    – Cedric H.
    yesterday






  • 2




    @CedricH. I usually define flatten = itertools.chain.from_iterable as I prefer to work with iterables, but as an alternative to the flatten proposed by this answer, you could have def flatten(iterable): return list(itertools.chain.from_iterable(iterable)) instead. As said, there might not be much of a difference appart for style.
    – Mathias Ettinger
    yesterday












  • @MathiasEttinger Got it. I also changed the definition of labels based on your first comment.
    – Cedric H.
    yesterday














  • 1




    Nice answer. Two thoughts though: isn't labels simply flatten(mappings)? And isn't zip already returning tuples? Why keep the map call?
    – Mathias Ettinger
    yesterday










  • Also I like to define flatten in terms of itertools.chain.from_iterable. But it's mostly a matter of personal taste.
    – Mathias Ettinger
    yesterday










  • Thanks a lot for the answer. I will investigate in details. @MathiasEttinger I didn't get your second point.
    – Cedric H.
    yesterday






  • 2




    @CedricH. I usually define flatten = itertools.chain.from_iterable as I prefer to work with iterables, but as an alternative to the flatten proposed by this answer, you could have def flatten(iterable): return list(itertools.chain.from_iterable(iterable)) instead. As said, there might not be much of a difference appart for style.
    – Mathias Ettinger
    yesterday












  • @MathiasEttinger Got it. I also changed the definition of labels based on your first comment.
    – Cedric H.
    yesterday








1




1




Nice answer. Two thoughts though: isn't labels simply flatten(mappings)? And isn't zip already returning tuples? Why keep the map call?
– Mathias Ettinger
yesterday




Nice answer. Two thoughts though: isn't labels simply flatten(mappings)? And isn't zip already returning tuples? Why keep the map call?
– Mathias Ettinger
yesterday












Also I like to define flatten in terms of itertools.chain.from_iterable. But it's mostly a matter of personal taste.
– Mathias Ettinger
yesterday




Also I like to define flatten in terms of itertools.chain.from_iterable. But it's mostly a matter of personal taste.
– Mathias Ettinger
yesterday












Thanks a lot for the answer. I will investigate in details. @MathiasEttinger I didn't get your second point.
– Cedric H.
yesterday




Thanks a lot for the answer. I will investigate in details. @MathiasEttinger I didn't get your second point.
– Cedric H.
yesterday




2




2




@CedricH. I usually define flatten = itertools.chain.from_iterable as I prefer to work with iterables, but as an alternative to the flatten proposed by this answer, you could have def flatten(iterable): return list(itertools.chain.from_iterable(iterable)) instead. As said, there might not be much of a difference appart for style.
– Mathias Ettinger
yesterday






@CedricH. I usually define flatten = itertools.chain.from_iterable as I prefer to work with iterables, but as an alternative to the flatten proposed by this answer, you could have def flatten(iterable): return list(itertools.chain.from_iterable(iterable)) instead. As said, there might not be much of a difference appart for style.
– Mathias Ettinger
yesterday














@MathiasEttinger Got it. I also changed the definition of labels based on your first comment.
– Cedric H.
yesterday




@MathiasEttinger Got it. I also changed the definition of labels based on your first comment.
– Cedric H.
yesterday


















draft saved

draft discarded




















































Thanks for contributing an answer to Code Review Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f211121%2fdata-transformation-involving-dictionaries-and-lists%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Сан-Квентин

8-я гвардейская общевойсковая армия

Алькесар