Search for ways in OpenStreetMap extract

up vote
4
down vote

favorite

What does it do ?

Looks in big XML files (1-3 GB) for given parameters that I need in my project, appends them to lists and finally exports both of them to the CSV files.

My XML scheme

I need to get child named 'v', which is value of specific child 'k' nested in tag named 'tag' which is nested within parent 'way'.

enter image description here

What I tried

IBM's take on big xml files

Plus various scripts from StackExchange.

Performance

My function needs 50% less time to parse XML than above method, any other I tried or I was able to try, because with some I was unable to figure things out and given up on them.

My goal

To get some tips on my code, hopefully find way to speed it up.
I need to parse about 20 - 30 GB of XML files (as mentioned before single file is 1-3 GB), whole procedure is really time consuming - It takes about 12 hours up to 2 days of non stop parsing.

Variables description:

xml - path to xml file

list_agis - list of IDs, list which length is number between 5k - 1 kk

parent - way

child - tag

child_atribitute - k

child_value_1 - highway

child_value_2 - track

name_id, name_atribute, name_file, path_csv, part_name - variables needed to create csv file name

My code

def save_to_csv(

        list_1, list_2, name_1, 

        name_2, csv_name, catalogue, 

        part_name):

    """

    Saves to CSV, based on 2 lists.

    """

    raw_data = {name_1: list_1,

                name_2: list_2}

    df = pd.DataFrame(raw_data, columns=[name_1, name_2])

    df.to_csv(

        '{0}{1}_{2}.csv'.format(catalogue, part_name, csv_name), 

        index=False, header=True, encoding = 'CP1250')





def xml_parser(

        xml, lista_agis, parent, 

        atribiute_parent, child, child_atribiute,

        child_value_1, child_value_2, name_file,

        sciezka_csv, name_id, name_atribiute, 

        part_name):

    """

    Function to pick from xml files tag values.

    Firstly it creates tree of xml file and then

    goes each level town and when final condtion is fullfiled

    id and value from xml file is appended to list in the end of

    xml file list is saved to CSV.



    """

    rootElement = ET.parse(xml).getroot()

    list_id = 

    list_value = 

    for subelement in rootElement:

        if subelement.tag == parent:

            if subelement.get(atribiute_parent) in lista_agis:

                for sselement in subelement:

                    if sselement.tag == child:

                        if sselement.attrib[child_atribiute] == child_value_1:

                            list_id.append(

                                subelement.get(atribiute_parent))

                            list_value.append(

                                sselement.get(child_value_2))



    save_to_csv(

        list_id, list_value, name_id, 

        name_atribiute, name_file, 

        sciezka_csv, part_name)

edited Nov 21 at 15:34

Toby Speight

22.6k537109

asked Nov 21 at 14:59

JuniorPythonNewbie

504

1

Welcome on code review. When you ask to be reviewed from international people, try to give english code. You have more chance to get useful response if code have meaning for reviewers.
– Calak
Nov 21 at 15:06

Is that OSM XML? Can you avoid the overhead of XML by using a PBF version of your OSM dump instead? (C++ certainly has good libraries for reading OSM PBF; I'd be very surprised if Python doesn't).
– Toby Speight
Nov 21 at 15:28

2

Yes that is OSM XML. Im gonna dive into that pbf files then. Thanks! edit: I think this is my solution. imposm.org/docs/imposm.parser/latest
– JuniorPythonNewbie
Nov 21 at 15:31

2

You probably want to use a stream parser, instead of loading the entire document into memory and then parsing it. See this answer on Stack Overflow.
– AJNeufeld
Nov 21 at 15:50

1

k and v are "attributes", not "children".
– Reinderien
Nov 21 at 16:55

|
show 1 more comment

up vote
4
down vote

favorite

What does it do ?

Looks in big XML files (1-3 GB) for given parameters that I need in my project, appends them to lists and finally exports both of them to the CSV files.

My XML scheme

I need to get child named 'v', which is value of specific child 'k' nested in tag named 'tag' which is nested within parent 'way'.

enter image description here

What I tried

IBM's take on big xml files

Plus various scripts from StackExchange.

Performance

My function needs 50% less time to parse XML than above method, any other I tried or I was able to try, because with some I was unable to figure things out and given up on them.

My goal

Variables description:

xml - path to xml file

list_agis - list of IDs, list which length is number between 5k - 1 kk

parent - way

child - tag

child_atribitute - k

child_value_1 - highway

child_value_2 - track

name_id, name_atribute, name_file, path_csv, part_name - variables needed to create csv file name

My code

def save_to_csv(

        list_1, list_2, name_1, 

        name_2, csv_name, catalogue, 

        part_name):

    """

    Saves to CSV, based on 2 lists.

    """

    raw_data = {name_1: list_1,

                name_2: list_2}

    df = pd.DataFrame(raw_data, columns=[name_1, name_2])

    df.to_csv(

        '{0}{1}_{2}.csv'.format(catalogue, part_name, csv_name), 

        index=False, header=True, encoding = 'CP1250')





def xml_parser(

        xml, lista_agis, parent, 

        atribiute_parent, child, child_atribiute,

        child_value_1, child_value_2, name_file,

        sciezka_csv, name_id, name_atribiute, 

        part_name):

    """

    Function to pick from xml files tag values.

    Firstly it creates tree of xml file and then

    goes each level town and when final condtion is fullfiled

    id and value from xml file is appended to list in the end of

    xml file list is saved to CSV.



    """

    rootElement = ET.parse(xml).getroot()

    list_id = 

    list_value = 

    for subelement in rootElement:

        if subelement.tag == parent:

            if subelement.get(atribiute_parent) in lista_agis:

                for sselement in subelement:

                    if sselement.tag == child:

                        if sselement.attrib[child_atribiute] == child_value_1:

                            list_id.append(

                                subelement.get(atribiute_parent))

                            list_value.append(

                                sselement.get(child_value_2))



    save_to_csv(

        list_id, list_value, name_id, 

        name_atribiute, name_file, 

        sciezka_csv, part_name)

edited Nov 21 at 15:34

Toby Speight

22.6k537109

asked Nov 21 at 14:59

JuniorPythonNewbie

504

1

Welcome on code review. When you ask to be reviewed from international people, try to give english code. You have more chance to get useful response if code have meaning for reviewers.
– Calak
Nov 21 at 15:06

Is that OSM XML? Can you avoid the overhead of XML by using a PBF version of your OSM dump instead? (C++ certainly has good libraries for reading OSM PBF; I'd be very surprised if Python doesn't).
– Toby Speight
Nov 21 at 15:28

2

Yes that is OSM XML. Im gonna dive into that pbf files then. Thanks! edit: I think this is my solution. imposm.org/docs/imposm.parser/latest
– JuniorPythonNewbie
Nov 21 at 15:31

2

You probably want to use a stream parser, instead of loading the entire document into memory and then parsing it. See this answer on Stack Overflow.
– AJNeufeld
Nov 21 at 15:50

1

k and v are "attributes", not "children".
– Reinderien
Nov 21 at 16:55

|
show 1 more comment

up vote
4
down vote

favorite

What does it do ?

Looks in big XML files (1-3 GB) for given parameters that I need in my project, appends them to lists and finally exports both of them to the CSV files.

My XML scheme

I need to get child named 'v', which is value of specific child 'k' nested in tag named 'tag' which is nested within parent 'way'.

enter image description here

What I tried

IBM's take on big xml files

Plus various scripts from StackExchange.

Performance

My function needs 50% less time to parse XML than above method, any other I tried or I was able to try, because with some I was unable to figure things out and given up on them.

My goal

Variables description:

xml - path to xml file

list_agis - list of IDs, list which length is number between 5k - 1 kk

parent - way

child - tag

child_atribitute - k

child_value_1 - highway

child_value_2 - track

name_id, name_atribute, name_file, path_csv, part_name - variables needed to create csv file name

My code

def save_to_csv(

        list_1, list_2, name_1, 

        name_2, csv_name, catalogue, 

        part_name):

    """

    Saves to CSV, based on 2 lists.

    """

    raw_data = {name_1: list_1,

                name_2: list_2}

    df = pd.DataFrame(raw_data, columns=[name_1, name_2])

    df.to_csv(

        '{0}{1}_{2}.csv'.format(catalogue, part_name, csv_name), 

        index=False, header=True, encoding = 'CP1250')





def xml_parser(

        xml, lista_agis, parent, 

        atribiute_parent, child, child_atribiute,

        child_value_1, child_value_2, name_file,

        sciezka_csv, name_id, name_atribiute, 

        part_name):

    """

    Function to pick from xml files tag values.

    Firstly it creates tree of xml file and then

    goes each level town and when final condtion is fullfiled

    id and value from xml file is appended to list in the end of

    xml file list is saved to CSV.



    """

    rootElement = ET.parse(xml).getroot()

    list_id = 

    list_value = 

    for subelement in rootElement:

        if subelement.tag == parent:

            if subelement.get(atribiute_parent) in lista_agis:

                for sselement in subelement:

                    if sselement.tag == child:

                        if sselement.attrib[child_atribiute] == child_value_1:

                            list_id.append(

                                subelement.get(atribiute_parent))

                            list_value.append(

                                sselement.get(child_value_2))



    save_to_csv(

        list_id, list_value, name_id, 

        name_atribiute, name_file, 

        sciezka_csv, part_name)

edited Nov 21 at 15:34

Toby Speight

22.6k537109

asked Nov 21 at 14:59

JuniorPythonNewbie

504

What does it do ?

Looks in big XML files (1-3 GB) for given parameters that I need in my project, appends them to lists and finally exports both of them to the CSV files.

My XML scheme

I need to get child named 'v', which is value of specific child 'k' nested in tag named 'tag' which is nested within parent 'way'.

enter image description here

What I tried

IBM's take on big xml files

Plus various scripts from StackExchange.

Performance

My function needs 50% less time to parse XML than above method, any other I tried or I was able to try, because with some I was unable to figure things out and given up on them.

My goal

Variables description:

xml - path to xml file

list_agis - list of IDs, list which length is number between 5k - 1 kk

parent - way

child - tag

child_atribitute - k

child_value_1 - highway

child_value_2 - track

name_id, name_atribute, name_file, path_csv, part_name - variables needed to create csv file name

My code

def save_to_csv(

        list_1, list_2, name_1, 

        name_2, csv_name, catalogue, 

        part_name):

    """

    Saves to CSV, based on 2 lists.

    """

    raw_data = {name_1: list_1,

                name_2: list_2}

    df = pd.DataFrame(raw_data, columns=[name_1, name_2])

    df.to_csv(

        '{0}{1}_{2}.csv'.format(catalogue, part_name, csv_name), 

        index=False, header=True, encoding = 'CP1250')





def xml_parser(

        xml, lista_agis, parent, 

        atribiute_parent, child, child_atribiute,

        child_value_1, child_value_2, name_file,

        sciezka_csv, name_id, name_atribiute, 

        part_name):

    """

    Function to pick from xml files tag values.

    Firstly it creates tree of xml file and then

    goes each level town and when final condtion is fullfiled

    id and value from xml file is appended to list in the end of

    xml file list is saved to CSV.



    """

    rootElement = ET.parse(xml).getroot()

    list_id = 

    list_value = 

    for subelement in rootElement:

        if subelement.tag == parent:

            if subelement.get(atribiute_parent) in lista_agis:

                for sselement in subelement:

                    if sselement.tag == child:

                        if sselement.attrib[child_atribiute] == child_value_1:

                            list_id.append(

                                subelement.get(atribiute_parent))

                            list_value.append(

                                sselement.get(child_value_2))



    save_to_csv(

        list_id, list_value, name_id, 

        name_atribiute, name_file, 

        sciezka_csv, part_name)

python parsing xml

edited Nov 21 at 15:34

Toby Speight

22.6k537109

asked Nov 21 at 14:59

JuniorPythonNewbie

504

edited Nov 21 at 15:34

Toby Speight

22.6k537109

asked Nov 21 at 14:59

JuniorPythonNewbie

504

edited Nov 21 at 15:34

Toby Speight

22.6k537109

edited Nov 21 at 15:34

Toby Speight

22.6k537109

edited Nov 21 at 15:34

Toby Speight

22.6k537109

asked Nov 21 at 14:59

JuniorPythonNewbie

504

asked Nov 21 at 14:59

JuniorPythonNewbie

504

asked Nov 21 at 14:59

JuniorPythonNewbie

504

1

Welcome on code review. When you ask to be reviewed from international people, try to give english code. You have more chance to get useful response if code have meaning for reviewers.
– Calak
Nov 21 at 15:06

Is that OSM XML? Can you avoid the overhead of XML by using a PBF version of your OSM dump instead? (C++ certainly has good libraries for reading OSM PBF; I'd be very surprised if Python doesn't).
– Toby Speight
Nov 21 at 15:28

2

Yes that is OSM XML. Im gonna dive into that pbf files then. Thanks! edit: I think this is my solution. imposm.org/docs/imposm.parser/latest
– JuniorPythonNewbie
Nov 21 at 15:31

2

You probably want to use a stream parser, instead of loading the entire document into memory and then parsing it. See this answer on Stack Overflow.
– AJNeufeld
Nov 21 at 15:50

1

k and v are "attributes", not "children".
– Reinderien
Nov 21 at 16:55

|
show 1 more comment

1

Welcome on code review. When you ask to be reviewed from international people, try to give english code. You have more chance to get useful response if code have meaning for reviewers.
– Calak
Nov 21 at 15:06

Is that OSM XML? Can you avoid the overhead of XML by using a PBF version of your OSM dump instead? (C++ certainly has good libraries for reading OSM PBF; I'd be very surprised if Python doesn't).
– Toby Speight
Nov 21 at 15:28

2

Yes that is OSM XML. Im gonna dive into that pbf files then. Thanks! edit: I think this is my solution. imposm.org/docs/imposm.parser/latest
– JuniorPythonNewbie
Nov 21 at 15:31

2

You probably want to use a stream parser, instead of loading the entire document into memory and then parsing it. See this answer on Stack Overflow.
– AJNeufeld
Nov 21 at 15:50

1

k and v are "attributes", not "children".
– Reinderien
Nov 21 at 16:55

Welcome on code review. When you ask to be reviewed from international people, try to give english code. You have more chance to get useful response if code have meaning for reviewers.
– Calak
Nov 21 at 15:06

Is that OSM XML? Can you avoid the overhead of XML by using a PBF version of your OSM dump instead? (C++ certainly has good libraries for reading OSM PBF; I'd be very surprised if Python doesn't).
– Toby Speight
Nov 21 at 15:28

Yes that is OSM XML. Im gonna dive into that pbf files then. Thanks! edit: I think this is my solution. imposm.org/docs/imposm.parser/latest
– JuniorPythonNewbie
Nov 21 at 15:31

You probably want to use a stream parser, instead of loading the entire document into memory and then parsing it. See this answer on Stack Overflow.
– AJNeufeld
Nov 21 at 15:50

k and v are "attributes", not "children".
– Reinderien
Nov 21 at 16:55

|
show 1 more comment

1 Answer
1

active

oldest

votes

up vote
0
down vote

Introduction

Unfortunately imposm is just for Python 2, my project is in Python 3. I think lxml library looks promising. I wrote simple code to test it, right now it is based on just 2 nodes.

Picture of nodes

I attach picture from XML file so one can see what I am dealing with.

enter image description here

How it works

It is iterating over l_id (which is list of ids) using etree.parse and findall.

First inner loop gathers dictionary where given id is.

Second inner loop gathers dictionary where chosen value is.

Loop for dict_ids_all appends to new list only ids from dictionary.

Loop for dict_ids_all appends to new list only value from dictionary.

My code

tree = lxml.etree.parse(r'path to xml')

l_dict_ids_all = 

l_dict_values_all= 

l_only_id =

l_only_values = 

l_id = ['"35121262"', '"35121263"']

name = '"name"'



for id in l_id:

    for tag in tree.findall('//node[@id={0}]'.format(id)):

        l_dict_ids_all.append(tag.attrib)



    for tag in tree.findall('//node[@id={0}]//tag[@k={1}]'.format(id,name)):

        l_dict_values_all.append(tag.attrib)





#printing is only for review purpose



print('Full id dict')        

print(l_dict_ids_all)



print('Full Value dict')

print(l_dict_values_all)



print('Only ID list')

for element in l_dict_ids_all:

    l_only_id.append(element['id'])

print(l_only_id)



print('Only Value list')

for element in l_dict_values_all:

    l_only_values.append(element['k'])

print(l_only_values)

Output

Full id dict

[{'id': '35121262', 'visible': 'true', 'version': '17',
'changeset': '41419943', 'timestamp': '2016-08-12T22:24:23Z', 'user':
'kocio', 'uid': '52087', 'lat': '52.1560439', 'lon': '21.0346808'},
{'id': '35121263', 'visible': 'true', 'version': '16', 'changeset':
'41419943', 'timestamp': '2016-08-12T22:24:20Z', 'user': 'kocio',
'uid': '52087', 'lat': '52.1492285', 'lon': '21.0461042'}]

Full Value dict [{'k': 'name', 'v': 'Stokłosy'}, {'k': 'name', 'v': 'Imielin'}]

Only ID list ['35121262', '35121263']

Only Value list ['name', 'name']

What I tried

I am aware that creating list and using it to append items to new list is wrong, but whenever I tried something like this:

l_dict_ids_all.append(tag.attrib[0]['id'])

Received an error :

TypeError Traceback (most recent call)

ipython-input-91-8b0a49bc5f35 in ()
7 for id in l_id:
8 for tag in tree.findall('//node[@id={0}]'.format(id)):
----> 9 l_dict_ids_all.append(tag.attrib[0]['id'])
10
src/lxml/etree.pyx in lxml.etree._Attrib.getitem()

src/lxml/apihelpers.pxi in lxml.etree._getAttributeValue()

src/lxml/apihelpers.pxi in lxml.etree._getNodeAttributeValue()

src/lxml/apihelpers.pxi in lxml.etree._getNsTag()

src/lxml/apihelpers.pxi in lxml.etree.__getNsTag()

src/lxml/apihelpers.pxi in lxml.etree._utf8()

TypeError: Argument must be bytes or unicode, got 'int'

My goal/problem

Code is working, but I want to make it better.
I need to get rid of 2 out 4 lists which I create at the begging.

answered Nov 23 at 11:56

JuniorPythonNewbie

504

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208158%2fsearch-for-ways-in-openstreetmap-extract%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

Introduction

Unfortunately imposm is just for Python 2, my project is in Python 3. I think lxml library looks promising. I wrote simple code to test it, right now it is based on just 2 nodes.

Picture of nodes

I attach picture from XML file so one can see what I am dealing with.

enter image description here

How it works

It is iterating over l_id (which is list of ids) using etree.parse and findall.

First inner loop gathers dictionary where given id is.

Second inner loop gathers dictionary where chosen value is.

Loop for dict_ids_all appends to new list only ids from dictionary.

Loop for dict_ids_all appends to new list only value from dictionary.

My code

tree = lxml.etree.parse(r'path to xml')

l_dict_ids_all = 

l_dict_values_all= 

l_only_id =

l_only_values = 

l_id = ['"35121262"', '"35121263"']

name = '"name"'



for id in l_id:

    for tag in tree.findall('//node[@id={0}]'.format(id)):

        l_dict_ids_all.append(tag.attrib)



    for tag in tree.findall('//node[@id={0}]//tag[@k={1}]'.format(id,name)):

        l_dict_values_all.append(tag.attrib)





#printing is only for review purpose



print('Full id dict')        

print(l_dict_ids_all)



print('Full Value dict')

print(l_dict_values_all)



print('Only ID list')

for element in l_dict_ids_all:

    l_only_id.append(element['id'])

print(l_only_id)



print('Only Value list')

for element in l_dict_values_all:

    l_only_values.append(element['k'])

print(l_only_values)

Output

Full id dict

[{'id': '35121262', 'visible': 'true', 'version': '17',
'changeset': '41419943', 'timestamp': '2016-08-12T22:24:23Z', 'user':
'kocio', 'uid': '52087', 'lat': '52.1560439', 'lon': '21.0346808'},
{'id': '35121263', 'visible': 'true', 'version': '16', 'changeset':
'41419943', 'timestamp': '2016-08-12T22:24:20Z', 'user': 'kocio',
'uid': '52087', 'lat': '52.1492285', 'lon': '21.0461042'}]

Full Value dict [{'k': 'name', 'v': 'Stokłosy'}, {'k': 'name', 'v': 'Imielin'}]

Only ID list ['35121262', '35121263']

Only Value list ['name', 'name']

What I tried

I am aware that creating list and using it to append items to new list is wrong, but whenever I tried something like this:

l_dict_ids_all.append(tag.attrib[0]['id'])

Received an error :

TypeError Traceback (most recent call)

ipython-input-91-8b0a49bc5f35 in ()
7 for id in l_id:
8 for tag in tree.findall('//node[@id={0}]'.format(id)):
----> 9 l_dict_ids_all.append(tag.attrib[0]['id'])
10
src/lxml/etree.pyx in lxml.etree._Attrib.getitem()

src/lxml/apihelpers.pxi in lxml.etree._getAttributeValue()

src/lxml/apihelpers.pxi in lxml.etree._getNodeAttributeValue()

src/lxml/apihelpers.pxi in lxml.etree._getNsTag()

src/lxml/apihelpers.pxi in lxml.etree.__getNsTag()

src/lxml/apihelpers.pxi in lxml.etree._utf8()

TypeError: Argument must be bytes or unicode, got 'int'

My goal/problem

Code is working, but I want to make it better.
I need to get rid of 2 out 4 lists which I create at the begging.

answered Nov 23 at 11:56

JuniorPythonNewbie

504

add a comment |

up vote
0
down vote

Introduction

Unfortunately imposm is just for Python 2, my project is in Python 3. I think lxml library looks promising. I wrote simple code to test it, right now it is based on just 2 nodes.

Picture of nodes

I attach picture from XML file so one can see what I am dealing with.

enter image description here

How it works

It is iterating over l_id (which is list of ids) using etree.parse and findall.

First inner loop gathers dictionary where given id is.

Second inner loop gathers dictionary where chosen value is.

Loop for dict_ids_all appends to new list only ids from dictionary.

Loop for dict_ids_all appends to new list only value from dictionary.

My code

tree = lxml.etree.parse(r'path to xml')

l_dict_ids_all = 

l_dict_values_all= 

l_only_id =

l_only_values = 

l_id = ['"35121262"', '"35121263"']

name = '"name"'



for id in l_id:

    for tag in tree.findall('//node[@id={0}]'.format(id)):

        l_dict_ids_all.append(tag.attrib)



    for tag in tree.findall('//node[@id={0}]//tag[@k={1}]'.format(id,name)):

        l_dict_values_all.append(tag.attrib)





#printing is only for review purpose



print('Full id dict')        

print(l_dict_ids_all)



print('Full Value dict')

print(l_dict_values_all)



print('Only ID list')

for element in l_dict_ids_all:

    l_only_id.append(element['id'])

print(l_only_id)



print('Only Value list')

for element in l_dict_values_all:

    l_only_values.append(element['k'])

print(l_only_values)

Output

Full id dict

[{'id': '35121262', 'visible': 'true', 'version': '17',
'changeset': '41419943', 'timestamp': '2016-08-12T22:24:23Z', 'user':
'kocio', 'uid': '52087', 'lat': '52.1560439', 'lon': '21.0346808'},
{'id': '35121263', 'visible': 'true', 'version': '16', 'changeset':
'41419943', 'timestamp': '2016-08-12T22:24:20Z', 'user': 'kocio',
'uid': '52087', 'lat': '52.1492285', 'lon': '21.0461042'}]

Full Value dict [{'k': 'name', 'v': 'Stokłosy'}, {'k': 'name', 'v': 'Imielin'}]

Only ID list ['35121262', '35121263']

Only Value list ['name', 'name']

What I tried

I am aware that creating list and using it to append items to new list is wrong, but whenever I tried something like this:

l_dict_ids_all.append(tag.attrib[0]['id'])

Received an error :

TypeError Traceback (most recent call)

ipython-input-91-8b0a49bc5f35 in ()
7 for id in l_id:
8 for tag in tree.findall('//node[@id={0}]'.format(id)):
----> 9 l_dict_ids_all.append(tag.attrib[0]['id'])
10
src/lxml/etree.pyx in lxml.etree._Attrib.getitem()

src/lxml/apihelpers.pxi in lxml.etree._getAttributeValue()

src/lxml/apihelpers.pxi in lxml.etree._getNodeAttributeValue()

src/lxml/apihelpers.pxi in lxml.etree._getNsTag()

src/lxml/apihelpers.pxi in lxml.etree.__getNsTag()

src/lxml/apihelpers.pxi in lxml.etree._utf8()

TypeError: Argument must be bytes or unicode, got 'int'

My goal/problem

Code is working, but I want to make it better.
I need to get rid of 2 out 4 lists which I create at the begging.

answered Nov 23 at 11:56

JuniorPythonNewbie

504

add a comment |

up vote
0
down vote

Introduction

Unfortunately imposm is just for Python 2, my project is in Python 3. I think lxml library looks promising. I wrote simple code to test it, right now it is based on just 2 nodes.

Picture of nodes

I attach picture from XML file so one can see what I am dealing with.

enter image description here

How it works

It is iterating over l_id (which is list of ids) using etree.parse and findall.

First inner loop gathers dictionary where given id is.

Second inner loop gathers dictionary where chosen value is.

Loop for dict_ids_all appends to new list only ids from dictionary.

Loop for dict_ids_all appends to new list only value from dictionary.

My code

tree = lxml.etree.parse(r'path to xml')

l_dict_ids_all = 

l_dict_values_all= 

l_only_id =

l_only_values = 

l_id = ['"35121262"', '"35121263"']

name = '"name"'



for id in l_id:

    for tag in tree.findall('//node[@id={0}]'.format(id)):

        l_dict_ids_all.append(tag.attrib)



    for tag in tree.findall('//node[@id={0}]//tag[@k={1}]'.format(id,name)):

        l_dict_values_all.append(tag.attrib)





#printing is only for review purpose



print('Full id dict')        

print(l_dict_ids_all)



print('Full Value dict')

print(l_dict_values_all)



print('Only ID list')

for element in l_dict_ids_all:

    l_only_id.append(element['id'])

print(l_only_id)



print('Only Value list')

for element in l_dict_values_all:

    l_only_values.append(element['k'])

print(l_only_values)

Output

Full id dict

[{'id': '35121262', 'visible': 'true', 'version': '17',
'changeset': '41419943', 'timestamp': '2016-08-12T22:24:23Z', 'user':
'kocio', 'uid': '52087', 'lat': '52.1560439', 'lon': '21.0346808'},
{'id': '35121263', 'visible': 'true', 'version': '16', 'changeset':
'41419943', 'timestamp': '2016-08-12T22:24:20Z', 'user': 'kocio',
'uid': '52087', 'lat': '52.1492285', 'lon': '21.0461042'}]

Full Value dict [{'k': 'name', 'v': 'Stokłosy'}, {'k': 'name', 'v': 'Imielin'}]

Only ID list ['35121262', '35121263']

Only Value list ['name', 'name']

What I tried

I am aware that creating list and using it to append items to new list is wrong, but whenever I tried something like this:

l_dict_ids_all.append(tag.attrib[0]['id'])

Received an error :

TypeError Traceback (most recent call)

ipython-input-91-8b0a49bc5f35 in ()
7 for id in l_id:
8 for tag in tree.findall('//node[@id={0}]'.format(id)):
----> 9 l_dict_ids_all.append(tag.attrib[0]['id'])
10
src/lxml/etree.pyx in lxml.etree._Attrib.getitem()

src/lxml/apihelpers.pxi in lxml.etree._getAttributeValue()

src/lxml/apihelpers.pxi in lxml.etree._getNodeAttributeValue()

src/lxml/apihelpers.pxi in lxml.etree._getNsTag()

src/lxml/apihelpers.pxi in lxml.etree.__getNsTag()

src/lxml/apihelpers.pxi in lxml.etree._utf8()

TypeError: Argument must be bytes or unicode, got 'int'

My goal/problem

Code is working, but I want to make it better.
I need to get rid of 2 out 4 lists which I create at the begging.

answered Nov 23 at 11:56

JuniorPythonNewbie

504

Introduction

Unfortunately imposm is just for Python 2, my project is in Python 3. I think lxml library looks promising. I wrote simple code to test it, right now it is based on just 2 nodes.

Picture of nodes

I attach picture from XML file so one can see what I am dealing with.

enter image description here

How it works

It is iterating over l_id (which is list of ids) using etree.parse and findall.

First inner loop gathers dictionary where given id is.

Second inner loop gathers dictionary where chosen value is.

Loop for dict_ids_all appends to new list only ids from dictionary.

Loop for dict_ids_all appends to new list only value from dictionary.

My code

tree = lxml.etree.parse(r'path to xml')

l_dict_ids_all = 

l_dict_values_all= 

l_only_id =

l_only_values = 

l_id = ['"35121262"', '"35121263"']

name = '"name"'



for id in l_id:

    for tag in tree.findall('//node[@id={0}]'.format(id)):

        l_dict_ids_all.append(tag.attrib)



    for tag in tree.findall('//node[@id={0}]//tag[@k={1}]'.format(id,name)):

        l_dict_values_all.append(tag.attrib)





#printing is only for review purpose



print('Full id dict')        

print(l_dict_ids_all)



print('Full Value dict')

print(l_dict_values_all)



print('Only ID list')

for element in l_dict_ids_all:

    l_only_id.append(element['id'])

print(l_only_id)



print('Only Value list')

for element in l_dict_values_all:

    l_only_values.append(element['k'])

print(l_only_values)

Output

Full id dict

[{'id': '35121262', 'visible': 'true', 'version': '17',
'changeset': '41419943', 'timestamp': '2016-08-12T22:24:23Z', 'user':
'kocio', 'uid': '52087', 'lat': '52.1560439', 'lon': '21.0346808'},
{'id': '35121263', 'visible': 'true', 'version': '16', 'changeset':
'41419943', 'timestamp': '2016-08-12T22:24:20Z', 'user': 'kocio',
'uid': '52087', 'lat': '52.1492285', 'lon': '21.0461042'}]

Full Value dict [{'k': 'name', 'v': 'Stokłosy'}, {'k': 'name', 'v': 'Imielin'}]

Only ID list ['35121262', '35121263']

Only Value list ['name', 'name']

What I tried

I am aware that creating list and using it to append items to new list is wrong, but whenever I tried something like this:

l_dict_ids_all.append(tag.attrib[0]['id'])

Received an error :

TypeError Traceback (most recent call)

ipython-input-91-8b0a49bc5f35 in ()
7 for id in l_id:
8 for tag in tree.findall('//node[@id={0}]'.format(id)):
----> 9 l_dict_ids_all.append(tag.attrib[0]['id'])
10
src/lxml/etree.pyx in lxml.etree._Attrib.getitem()

src/lxml/apihelpers.pxi in lxml.etree._getAttributeValue()

src/lxml/apihelpers.pxi in lxml.etree._getNodeAttributeValue()

src/lxml/apihelpers.pxi in lxml.etree._getNsTag()

src/lxml/apihelpers.pxi in lxml.etree.__getNsTag()

src/lxml/apihelpers.pxi in lxml.etree._utf8()

TypeError: Argument must be bytes or unicode, got 'int'

My goal/problem

Code is working, but I want to make it better.
I need to get rid of 2 out 4 lists which I create at the begging.

answered Nov 23 at 11:56

JuniorPythonNewbie

504

answered Nov 23 at 11:56

JuniorPythonNewbie

504

answered Nov 23 at 11:56

JuniorPythonNewbie

504

answered Nov 23 at 11:56

JuniorPythonNewbie

504

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gfrktyl