Search for ways in OpenStreetMap extract











up vote
4
down vote

favorite












What does it do ?



Looks in big XML files (1-3 GB) for given parameters that I need in my project, appends them to lists and finally exports both of them to the CSV files.



My XML scheme



I need to get child named 'v', which is value of specific child 'k' nested in tag named 'tag' which is nested within parent 'way'.



enter image description here



What I tried



IBM's take on big xml files



Plus various scripts from StackExchange.



Performance



My function needs 50% less time to parse XML than above method, any other I tried or I was able to try, because with some I was unable to figure things out and given up on them.



My goal



To get some tips on my code, hopefully find way to speed it up.
I need to parse about 20 - 30 GB of XML files (as mentioned before single file is 1-3 GB), whole procedure is really time consuming - It takes about 12 hours up to 2 days of non stop parsing.



Variables description:





  • xml - path to xml file


  • list_agis - list of IDs, list which length is number between 5k - 1 kk


  • parent - way


  • child - tag


  • child_atribitute - k


  • child_value_1 - highway


  • child_value_2 - track


  • name_id, name_atribute, name_file, path_csv, part_name - variables needed to create csv file name


My code



def save_to_csv(
list_1, list_2, name_1,
name_2, csv_name, catalogue,
part_name):
"""
Saves to CSV, based on 2 lists.
"""
raw_data = {name_1: list_1,
name_2: list_2}
df = pd.DataFrame(raw_data, columns=[name_1, name_2])
df.to_csv(
'{0}{1}_{2}.csv'.format(catalogue, part_name, csv_name),
index=False, header=True, encoding = 'CP1250')


def xml_parser(
xml, lista_agis, parent,
atribiute_parent, child, child_atribiute,
child_value_1, child_value_2, name_file,
sciezka_csv, name_id, name_atribiute,
part_name):
"""
Function to pick from xml files tag values.
Firstly it creates tree of xml file and then
goes each level town and when final condtion is fullfiled
id and value from xml file is appended to list in the end of
xml file list is saved to CSV.

"""
rootElement = ET.parse(xml).getroot()
list_id =
list_value =
for subelement in rootElement:
if subelement.tag == parent:
if subelement.get(atribiute_parent) in lista_agis:
for sselement in subelement:
if sselement.tag == child:
if sselement.attrib[child_atribiute] == child_value_1:
list_id.append(
subelement.get(atribiute_parent))
list_value.append(
sselement.get(child_value_2))

save_to_csv(
list_id, list_value, name_id,
name_atribiute, name_file,
sciezka_csv, part_name)









share|improve this question




















  • 1




    Welcome on code review. When you ask to be reviewed from international people, try to give english code. You have more chance to get useful response if code have meaning for reviewers.
    – Calak
    Nov 21 at 15:06










  • Is that OSM XML? Can you avoid the overhead of XML by using a PBF version of your OSM dump instead? (C++ certainly has good libraries for reading OSM PBF; I'd be very surprised if Python doesn't).
    – Toby Speight
    Nov 21 at 15:28






  • 2




    Yes that is OSM XML. Im gonna dive into that pbf files then. Thanks! edit: I think this is my solution. imposm.org/docs/imposm.parser/latest
    – JuniorPythonNewbie
    Nov 21 at 15:31








  • 2




    You probably want to use a stream parser, instead of loading the entire document into memory and then parsing it. See this answer on Stack Overflow.
    – AJNeufeld
    Nov 21 at 15:50








  • 1




    k and v are "attributes", not "children".
    – Reinderien
    Nov 21 at 16:55















up vote
4
down vote

favorite












What does it do ?



Looks in big XML files (1-3 GB) for given parameters that I need in my project, appends them to lists and finally exports both of them to the CSV files.



My XML scheme



I need to get child named 'v', which is value of specific child 'k' nested in tag named 'tag' which is nested within parent 'way'.



enter image description here



What I tried



IBM's take on big xml files



Plus various scripts from StackExchange.



Performance



My function needs 50% less time to parse XML than above method, any other I tried or I was able to try, because with some I was unable to figure things out and given up on them.



My goal



To get some tips on my code, hopefully find way to speed it up.
I need to parse about 20 - 30 GB of XML files (as mentioned before single file is 1-3 GB), whole procedure is really time consuming - It takes about 12 hours up to 2 days of non stop parsing.



Variables description:





  • xml - path to xml file


  • list_agis - list of IDs, list which length is number between 5k - 1 kk


  • parent - way


  • child - tag


  • child_atribitute - k


  • child_value_1 - highway


  • child_value_2 - track


  • name_id, name_atribute, name_file, path_csv, part_name - variables needed to create csv file name


My code



def save_to_csv(
list_1, list_2, name_1,
name_2, csv_name, catalogue,
part_name):
"""
Saves to CSV, based on 2 lists.
"""
raw_data = {name_1: list_1,
name_2: list_2}
df = pd.DataFrame(raw_data, columns=[name_1, name_2])
df.to_csv(
'{0}{1}_{2}.csv'.format(catalogue, part_name, csv_name),
index=False, header=True, encoding = 'CP1250')


def xml_parser(
xml, lista_agis, parent,
atribiute_parent, child, child_atribiute,
child_value_1, child_value_2, name_file,
sciezka_csv, name_id, name_atribiute,
part_name):
"""
Function to pick from xml files tag values.
Firstly it creates tree of xml file and then
goes each level town and when final condtion is fullfiled
id and value from xml file is appended to list in the end of
xml file list is saved to CSV.

"""
rootElement = ET.parse(xml).getroot()
list_id =
list_value =
for subelement in rootElement:
if subelement.tag == parent:
if subelement.get(atribiute_parent) in lista_agis:
for sselement in subelement:
if sselement.tag == child:
if sselement.attrib[child_atribiute] == child_value_1:
list_id.append(
subelement.get(atribiute_parent))
list_value.append(
sselement.get(child_value_2))

save_to_csv(
list_id, list_value, name_id,
name_atribiute, name_file,
sciezka_csv, part_name)









share|improve this question




















  • 1




    Welcome on code review. When you ask to be reviewed from international people, try to give english code. You have more chance to get useful response if code have meaning for reviewers.
    – Calak
    Nov 21 at 15:06










  • Is that OSM XML? Can you avoid the overhead of XML by using a PBF version of your OSM dump instead? (C++ certainly has good libraries for reading OSM PBF; I'd be very surprised if Python doesn't).
    – Toby Speight
    Nov 21 at 15:28






  • 2




    Yes that is OSM XML. Im gonna dive into that pbf files then. Thanks! edit: I think this is my solution. imposm.org/docs/imposm.parser/latest
    – JuniorPythonNewbie
    Nov 21 at 15:31








  • 2




    You probably want to use a stream parser, instead of loading the entire document into memory and then parsing it. See this answer on Stack Overflow.
    – AJNeufeld
    Nov 21 at 15:50








  • 1




    k and v are "attributes", not "children".
    – Reinderien
    Nov 21 at 16:55













up vote
4
down vote

favorite









up vote
4
down vote

favorite











What does it do ?



Looks in big XML files (1-3 GB) for given parameters that I need in my project, appends them to lists and finally exports both of them to the CSV files.



My XML scheme



I need to get child named 'v', which is value of specific child 'k' nested in tag named 'tag' which is nested within parent 'way'.



enter image description here



What I tried



IBM's take on big xml files



Plus various scripts from StackExchange.



Performance



My function needs 50% less time to parse XML than above method, any other I tried or I was able to try, because with some I was unable to figure things out and given up on them.



My goal



To get some tips on my code, hopefully find way to speed it up.
I need to parse about 20 - 30 GB of XML files (as mentioned before single file is 1-3 GB), whole procedure is really time consuming - It takes about 12 hours up to 2 days of non stop parsing.



Variables description:





  • xml - path to xml file


  • list_agis - list of IDs, list which length is number between 5k - 1 kk


  • parent - way


  • child - tag


  • child_atribitute - k


  • child_value_1 - highway


  • child_value_2 - track


  • name_id, name_atribute, name_file, path_csv, part_name - variables needed to create csv file name


My code



def save_to_csv(
list_1, list_2, name_1,
name_2, csv_name, catalogue,
part_name):
"""
Saves to CSV, based on 2 lists.
"""
raw_data = {name_1: list_1,
name_2: list_2}
df = pd.DataFrame(raw_data, columns=[name_1, name_2])
df.to_csv(
'{0}{1}_{2}.csv'.format(catalogue, part_name, csv_name),
index=False, header=True, encoding = 'CP1250')


def xml_parser(
xml, lista_agis, parent,
atribiute_parent, child, child_atribiute,
child_value_1, child_value_2, name_file,
sciezka_csv, name_id, name_atribiute,
part_name):
"""
Function to pick from xml files tag values.
Firstly it creates tree of xml file and then
goes each level town and when final condtion is fullfiled
id and value from xml file is appended to list in the end of
xml file list is saved to CSV.

"""
rootElement = ET.parse(xml).getroot()
list_id =
list_value =
for subelement in rootElement:
if subelement.tag == parent:
if subelement.get(atribiute_parent) in lista_agis:
for sselement in subelement:
if sselement.tag == child:
if sselement.attrib[child_atribiute] == child_value_1:
list_id.append(
subelement.get(atribiute_parent))
list_value.append(
sselement.get(child_value_2))

save_to_csv(
list_id, list_value, name_id,
name_atribiute, name_file,
sciezka_csv, part_name)









share|improve this question















What does it do ?



Looks in big XML files (1-3 GB) for given parameters that I need in my project, appends them to lists and finally exports both of them to the CSV files.



My XML scheme



I need to get child named 'v', which is value of specific child 'k' nested in tag named 'tag' which is nested within parent 'way'.



enter image description here



What I tried



IBM's take on big xml files



Plus various scripts from StackExchange.



Performance



My function needs 50% less time to parse XML than above method, any other I tried or I was able to try, because with some I was unable to figure things out and given up on them.



My goal



To get some tips on my code, hopefully find way to speed it up.
I need to parse about 20 - 30 GB of XML files (as mentioned before single file is 1-3 GB), whole procedure is really time consuming - It takes about 12 hours up to 2 days of non stop parsing.



Variables description:





  • xml - path to xml file


  • list_agis - list of IDs, list which length is number between 5k - 1 kk


  • parent - way


  • child - tag


  • child_atribitute - k


  • child_value_1 - highway


  • child_value_2 - track


  • name_id, name_atribute, name_file, path_csv, part_name - variables needed to create csv file name


My code



def save_to_csv(
list_1, list_2, name_1,
name_2, csv_name, catalogue,
part_name):
"""
Saves to CSV, based on 2 lists.
"""
raw_data = {name_1: list_1,
name_2: list_2}
df = pd.DataFrame(raw_data, columns=[name_1, name_2])
df.to_csv(
'{0}{1}_{2}.csv'.format(catalogue, part_name, csv_name),
index=False, header=True, encoding = 'CP1250')


def xml_parser(
xml, lista_agis, parent,
atribiute_parent, child, child_atribiute,
child_value_1, child_value_2, name_file,
sciezka_csv, name_id, name_atribiute,
part_name):
"""
Function to pick from xml files tag values.
Firstly it creates tree of xml file and then
goes each level town and when final condtion is fullfiled
id and value from xml file is appended to list in the end of
xml file list is saved to CSV.

"""
rootElement = ET.parse(xml).getroot()
list_id =
list_value =
for subelement in rootElement:
if subelement.tag == parent:
if subelement.get(atribiute_parent) in lista_agis:
for sselement in subelement:
if sselement.tag == child:
if sselement.attrib[child_atribiute] == child_value_1:
list_id.append(
subelement.get(atribiute_parent))
list_value.append(
sselement.get(child_value_2))

save_to_csv(
list_id, list_value, name_id,
name_atribiute, name_file,
sciezka_csv, part_name)






python parsing xml






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 at 15:34









Toby Speight

22.6k537109




22.6k537109










asked Nov 21 at 14:59









JuniorPythonNewbie

504




504








  • 1




    Welcome on code review. When you ask to be reviewed from international people, try to give english code. You have more chance to get useful response if code have meaning for reviewers.
    – Calak
    Nov 21 at 15:06










  • Is that OSM XML? Can you avoid the overhead of XML by using a PBF version of your OSM dump instead? (C++ certainly has good libraries for reading OSM PBF; I'd be very surprised if Python doesn't).
    – Toby Speight
    Nov 21 at 15:28






  • 2




    Yes that is OSM XML. Im gonna dive into that pbf files then. Thanks! edit: I think this is my solution. imposm.org/docs/imposm.parser/latest
    – JuniorPythonNewbie
    Nov 21 at 15:31








  • 2




    You probably want to use a stream parser, instead of loading the entire document into memory and then parsing it. See this answer on Stack Overflow.
    – AJNeufeld
    Nov 21 at 15:50








  • 1




    k and v are "attributes", not "children".
    – Reinderien
    Nov 21 at 16:55














  • 1




    Welcome on code review. When you ask to be reviewed from international people, try to give english code. You have more chance to get useful response if code have meaning for reviewers.
    – Calak
    Nov 21 at 15:06










  • Is that OSM XML? Can you avoid the overhead of XML by using a PBF version of your OSM dump instead? (C++ certainly has good libraries for reading OSM PBF; I'd be very surprised if Python doesn't).
    – Toby Speight
    Nov 21 at 15:28






  • 2




    Yes that is OSM XML. Im gonna dive into that pbf files then. Thanks! edit: I think this is my solution. imposm.org/docs/imposm.parser/latest
    – JuniorPythonNewbie
    Nov 21 at 15:31








  • 2




    You probably want to use a stream parser, instead of loading the entire document into memory and then parsing it. See this answer on Stack Overflow.
    – AJNeufeld
    Nov 21 at 15:50








  • 1




    k and v are "attributes", not "children".
    – Reinderien
    Nov 21 at 16:55








1




1




Welcome on code review. When you ask to be reviewed from international people, try to give english code. You have more chance to get useful response if code have meaning for reviewers.
– Calak
Nov 21 at 15:06




Welcome on code review. When you ask to be reviewed from international people, try to give english code. You have more chance to get useful response if code have meaning for reviewers.
– Calak
Nov 21 at 15:06












Is that OSM XML? Can you avoid the overhead of XML by using a PBF version of your OSM dump instead? (C++ certainly has good libraries for reading OSM PBF; I'd be very surprised if Python doesn't).
– Toby Speight
Nov 21 at 15:28




Is that OSM XML? Can you avoid the overhead of XML by using a PBF version of your OSM dump instead? (C++ certainly has good libraries for reading OSM PBF; I'd be very surprised if Python doesn't).
– Toby Speight
Nov 21 at 15:28




2




2




Yes that is OSM XML. Im gonna dive into that pbf files then. Thanks! edit: I think this is my solution. imposm.org/docs/imposm.parser/latest
– JuniorPythonNewbie
Nov 21 at 15:31






Yes that is OSM XML. Im gonna dive into that pbf files then. Thanks! edit: I think this is my solution. imposm.org/docs/imposm.parser/latest
– JuniorPythonNewbie
Nov 21 at 15:31






2




2




You probably want to use a stream parser, instead of loading the entire document into memory and then parsing it. See this answer on Stack Overflow.
– AJNeufeld
Nov 21 at 15:50






You probably want to use a stream parser, instead of loading the entire document into memory and then parsing it. See this answer on Stack Overflow.
– AJNeufeld
Nov 21 at 15:50






1




1




k and v are "attributes", not "children".
– Reinderien
Nov 21 at 16:55




k and v are "attributes", not "children".
– Reinderien
Nov 21 at 16:55










1 Answer
1






active

oldest

votes

















up vote
0
down vote













Introduction



Unfortunately imposm is just for Python 2, my project is in Python 3. I think lxml library looks promising. I wrote simple code to test it, right now it is based on just 2 nodes.



Picture of nodes



I attach picture from XML file so one can see what I am dealing with.



enter image description here



How it works



It is iterating over l_id (which is list of ids) using etree.parse and findall.



First inner loop gathers dictionary where given id is.



Second inner loop gathers dictionary where chosen value is.



Loop for dict_ids_all appends to new list only ids from dictionary.



Loop for dict_ids_all appends to new list only value from dictionary.



My code



tree = lxml.etree.parse(r'path to xml')
l_dict_ids_all =
l_dict_values_all=
l_only_id =
l_only_values =
l_id = ['"35121262"', '"35121263"']
name = '"name"'

for id in l_id:
for tag in tree.findall('//node[@id={0}]'.format(id)):
l_dict_ids_all.append(tag.attrib)

for tag in tree.findall('//node[@id={0}]//tag[@k={1}]'.format(id,name)):
l_dict_values_all.append(tag.attrib)


#printing is only for review purpose

print('Full id dict')
print(l_dict_ids_all)

print('Full Value dict')
print(l_dict_values_all)

print('Only ID list')
for element in l_dict_ids_all:
l_only_id.append(element['id'])
print(l_only_id)

print('Only Value list')
for element in l_dict_values_all:
l_only_values.append(element['k'])
print(l_only_values)


Output




Full id dict



[{'id': '35121262', 'visible': 'true', 'version': '17',
'changeset': '41419943', 'timestamp': '2016-08-12T22:24:23Z', 'user':
'kocio', 'uid': '52087', 'lat': '52.1560439', 'lon': '21.0346808'},
{'id': '35121263', 'visible': 'true', 'version': '16', 'changeset':
'41419943', 'timestamp': '2016-08-12T22:24:20Z', 'user': 'kocio',
'uid': '52087', 'lat': '52.1492285', 'lon': '21.0461042'}]



Full Value dict [{'k': 'name', 'v': 'Stokłosy'}, {'k': 'name', 'v': 'Imielin'}]



Only ID list ['35121262', '35121263']



Only Value list ['name', 'name']




What I tried



I am aware that creating list and using it to append items to new list is wrong, but whenever I tried something like this:



l_dict_ids_all.append(tag.attrib[0]['id'])


Received an error :




TypeError Traceback (most recent call)



ipython-input-91-8b0a49bc5f35 in ()
7 for id in l_id:
8 for tag in tree.findall('//node[@id={0}]'.format(id)):
----> 9 l_dict_ids_all.append(tag.attrib[0]['id'])
10
src/lxml/etree.pyx in lxml.etree._Attrib.getitem()



src/lxml/apihelpers.pxi in lxml.etree._getAttributeValue()



src/lxml/apihelpers.pxi in lxml.etree._getNodeAttributeValue()



src/lxml/apihelpers.pxi in lxml.etree._getNsTag()



src/lxml/apihelpers.pxi in lxml.etree.__getNsTag()



src/lxml/apihelpers.pxi in lxml.etree._utf8()



TypeError: Argument must be bytes or unicode, got 'int'




My goal/problem



Code is working, but I want to make it better.
I need to get rid of 2 out 4 lists which I create at the begging.






share|improve this answer





















    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "196"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208158%2fsearch-for-ways-in-openstreetmap-extract%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    Introduction



    Unfortunately imposm is just for Python 2, my project is in Python 3. I think lxml library looks promising. I wrote simple code to test it, right now it is based on just 2 nodes.



    Picture of nodes



    I attach picture from XML file so one can see what I am dealing with.



    enter image description here



    How it works



    It is iterating over l_id (which is list of ids) using etree.parse and findall.



    First inner loop gathers dictionary where given id is.



    Second inner loop gathers dictionary where chosen value is.



    Loop for dict_ids_all appends to new list only ids from dictionary.



    Loop for dict_ids_all appends to new list only value from dictionary.



    My code



    tree = lxml.etree.parse(r'path to xml')
    l_dict_ids_all =
    l_dict_values_all=
    l_only_id =
    l_only_values =
    l_id = ['"35121262"', '"35121263"']
    name = '"name"'

    for id in l_id:
    for tag in tree.findall('//node[@id={0}]'.format(id)):
    l_dict_ids_all.append(tag.attrib)

    for tag in tree.findall('//node[@id={0}]//tag[@k={1}]'.format(id,name)):
    l_dict_values_all.append(tag.attrib)


    #printing is only for review purpose

    print('Full id dict')
    print(l_dict_ids_all)

    print('Full Value dict')
    print(l_dict_values_all)

    print('Only ID list')
    for element in l_dict_ids_all:
    l_only_id.append(element['id'])
    print(l_only_id)

    print('Only Value list')
    for element in l_dict_values_all:
    l_only_values.append(element['k'])
    print(l_only_values)


    Output




    Full id dict



    [{'id': '35121262', 'visible': 'true', 'version': '17',
    'changeset': '41419943', 'timestamp': '2016-08-12T22:24:23Z', 'user':
    'kocio', 'uid': '52087', 'lat': '52.1560439', 'lon': '21.0346808'},
    {'id': '35121263', 'visible': 'true', 'version': '16', 'changeset':
    '41419943', 'timestamp': '2016-08-12T22:24:20Z', 'user': 'kocio',
    'uid': '52087', 'lat': '52.1492285', 'lon': '21.0461042'}]



    Full Value dict [{'k': 'name', 'v': 'Stokłosy'}, {'k': 'name', 'v': 'Imielin'}]



    Only ID list ['35121262', '35121263']



    Only Value list ['name', 'name']




    What I tried



    I am aware that creating list and using it to append items to new list is wrong, but whenever I tried something like this:



    l_dict_ids_all.append(tag.attrib[0]['id'])


    Received an error :




    TypeError Traceback (most recent call)



    ipython-input-91-8b0a49bc5f35 in ()
    7 for id in l_id:
    8 for tag in tree.findall('//node[@id={0}]'.format(id)):
    ----> 9 l_dict_ids_all.append(tag.attrib[0]['id'])
    10
    src/lxml/etree.pyx in lxml.etree._Attrib.getitem()



    src/lxml/apihelpers.pxi in lxml.etree._getAttributeValue()



    src/lxml/apihelpers.pxi in lxml.etree._getNodeAttributeValue()



    src/lxml/apihelpers.pxi in lxml.etree._getNsTag()



    src/lxml/apihelpers.pxi in lxml.etree.__getNsTag()



    src/lxml/apihelpers.pxi in lxml.etree._utf8()



    TypeError: Argument must be bytes or unicode, got 'int'




    My goal/problem



    Code is working, but I want to make it better.
    I need to get rid of 2 out 4 lists which I create at the begging.






    share|improve this answer

























      up vote
      0
      down vote













      Introduction



      Unfortunately imposm is just for Python 2, my project is in Python 3. I think lxml library looks promising. I wrote simple code to test it, right now it is based on just 2 nodes.



      Picture of nodes



      I attach picture from XML file so one can see what I am dealing with.



      enter image description here



      How it works



      It is iterating over l_id (which is list of ids) using etree.parse and findall.



      First inner loop gathers dictionary where given id is.



      Second inner loop gathers dictionary where chosen value is.



      Loop for dict_ids_all appends to new list only ids from dictionary.



      Loop for dict_ids_all appends to new list only value from dictionary.



      My code



      tree = lxml.etree.parse(r'path to xml')
      l_dict_ids_all =
      l_dict_values_all=
      l_only_id =
      l_only_values =
      l_id = ['"35121262"', '"35121263"']
      name = '"name"'

      for id in l_id:
      for tag in tree.findall('//node[@id={0}]'.format(id)):
      l_dict_ids_all.append(tag.attrib)

      for tag in tree.findall('//node[@id={0}]//tag[@k={1}]'.format(id,name)):
      l_dict_values_all.append(tag.attrib)


      #printing is only for review purpose

      print('Full id dict')
      print(l_dict_ids_all)

      print('Full Value dict')
      print(l_dict_values_all)

      print('Only ID list')
      for element in l_dict_ids_all:
      l_only_id.append(element['id'])
      print(l_only_id)

      print('Only Value list')
      for element in l_dict_values_all:
      l_only_values.append(element['k'])
      print(l_only_values)


      Output




      Full id dict



      [{'id': '35121262', 'visible': 'true', 'version': '17',
      'changeset': '41419943', 'timestamp': '2016-08-12T22:24:23Z', 'user':
      'kocio', 'uid': '52087', 'lat': '52.1560439', 'lon': '21.0346808'},
      {'id': '35121263', 'visible': 'true', 'version': '16', 'changeset':
      '41419943', 'timestamp': '2016-08-12T22:24:20Z', 'user': 'kocio',
      'uid': '52087', 'lat': '52.1492285', 'lon': '21.0461042'}]



      Full Value dict [{'k': 'name', 'v': 'Stokłosy'}, {'k': 'name', 'v': 'Imielin'}]



      Only ID list ['35121262', '35121263']



      Only Value list ['name', 'name']




      What I tried



      I am aware that creating list and using it to append items to new list is wrong, but whenever I tried something like this:



      l_dict_ids_all.append(tag.attrib[0]['id'])


      Received an error :




      TypeError Traceback (most recent call)



      ipython-input-91-8b0a49bc5f35 in ()
      7 for id in l_id:
      8 for tag in tree.findall('//node[@id={0}]'.format(id)):
      ----> 9 l_dict_ids_all.append(tag.attrib[0]['id'])
      10
      src/lxml/etree.pyx in lxml.etree._Attrib.getitem()



      src/lxml/apihelpers.pxi in lxml.etree._getAttributeValue()



      src/lxml/apihelpers.pxi in lxml.etree._getNodeAttributeValue()



      src/lxml/apihelpers.pxi in lxml.etree._getNsTag()



      src/lxml/apihelpers.pxi in lxml.etree.__getNsTag()



      src/lxml/apihelpers.pxi in lxml.etree._utf8()



      TypeError: Argument must be bytes or unicode, got 'int'




      My goal/problem



      Code is working, but I want to make it better.
      I need to get rid of 2 out 4 lists which I create at the begging.






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        Introduction



        Unfortunately imposm is just for Python 2, my project is in Python 3. I think lxml library looks promising. I wrote simple code to test it, right now it is based on just 2 nodes.



        Picture of nodes



        I attach picture from XML file so one can see what I am dealing with.



        enter image description here



        How it works



        It is iterating over l_id (which is list of ids) using etree.parse and findall.



        First inner loop gathers dictionary where given id is.



        Second inner loop gathers dictionary where chosen value is.



        Loop for dict_ids_all appends to new list only ids from dictionary.



        Loop for dict_ids_all appends to new list only value from dictionary.



        My code



        tree = lxml.etree.parse(r'path to xml')
        l_dict_ids_all =
        l_dict_values_all=
        l_only_id =
        l_only_values =
        l_id = ['"35121262"', '"35121263"']
        name = '"name"'

        for id in l_id:
        for tag in tree.findall('//node[@id={0}]'.format(id)):
        l_dict_ids_all.append(tag.attrib)

        for tag in tree.findall('//node[@id={0}]//tag[@k={1}]'.format(id,name)):
        l_dict_values_all.append(tag.attrib)


        #printing is only for review purpose

        print('Full id dict')
        print(l_dict_ids_all)

        print('Full Value dict')
        print(l_dict_values_all)

        print('Only ID list')
        for element in l_dict_ids_all:
        l_only_id.append(element['id'])
        print(l_only_id)

        print('Only Value list')
        for element in l_dict_values_all:
        l_only_values.append(element['k'])
        print(l_only_values)


        Output




        Full id dict



        [{'id': '35121262', 'visible': 'true', 'version': '17',
        'changeset': '41419943', 'timestamp': '2016-08-12T22:24:23Z', 'user':
        'kocio', 'uid': '52087', 'lat': '52.1560439', 'lon': '21.0346808'},
        {'id': '35121263', 'visible': 'true', 'version': '16', 'changeset':
        '41419943', 'timestamp': '2016-08-12T22:24:20Z', 'user': 'kocio',
        'uid': '52087', 'lat': '52.1492285', 'lon': '21.0461042'}]



        Full Value dict [{'k': 'name', 'v': 'Stokłosy'}, {'k': 'name', 'v': 'Imielin'}]



        Only ID list ['35121262', '35121263']



        Only Value list ['name', 'name']




        What I tried



        I am aware that creating list and using it to append items to new list is wrong, but whenever I tried something like this:



        l_dict_ids_all.append(tag.attrib[0]['id'])


        Received an error :




        TypeError Traceback (most recent call)



        ipython-input-91-8b0a49bc5f35 in ()
        7 for id in l_id:
        8 for tag in tree.findall('//node[@id={0}]'.format(id)):
        ----> 9 l_dict_ids_all.append(tag.attrib[0]['id'])
        10
        src/lxml/etree.pyx in lxml.etree._Attrib.getitem()



        src/lxml/apihelpers.pxi in lxml.etree._getAttributeValue()



        src/lxml/apihelpers.pxi in lxml.etree._getNodeAttributeValue()



        src/lxml/apihelpers.pxi in lxml.etree._getNsTag()



        src/lxml/apihelpers.pxi in lxml.etree.__getNsTag()



        src/lxml/apihelpers.pxi in lxml.etree._utf8()



        TypeError: Argument must be bytes or unicode, got 'int'




        My goal/problem



        Code is working, but I want to make it better.
        I need to get rid of 2 out 4 lists which I create at the begging.






        share|improve this answer












        Introduction



        Unfortunately imposm is just for Python 2, my project is in Python 3. I think lxml library looks promising. I wrote simple code to test it, right now it is based on just 2 nodes.



        Picture of nodes



        I attach picture from XML file so one can see what I am dealing with.



        enter image description here



        How it works



        It is iterating over l_id (which is list of ids) using etree.parse and findall.



        First inner loop gathers dictionary where given id is.



        Second inner loop gathers dictionary where chosen value is.



        Loop for dict_ids_all appends to new list only ids from dictionary.



        Loop for dict_ids_all appends to new list only value from dictionary.



        My code



        tree = lxml.etree.parse(r'path to xml')
        l_dict_ids_all =
        l_dict_values_all=
        l_only_id =
        l_only_values =
        l_id = ['"35121262"', '"35121263"']
        name = '"name"'

        for id in l_id:
        for tag in tree.findall('//node[@id={0}]'.format(id)):
        l_dict_ids_all.append(tag.attrib)

        for tag in tree.findall('//node[@id={0}]//tag[@k={1}]'.format(id,name)):
        l_dict_values_all.append(tag.attrib)


        #printing is only for review purpose

        print('Full id dict')
        print(l_dict_ids_all)

        print('Full Value dict')
        print(l_dict_values_all)

        print('Only ID list')
        for element in l_dict_ids_all:
        l_only_id.append(element['id'])
        print(l_only_id)

        print('Only Value list')
        for element in l_dict_values_all:
        l_only_values.append(element['k'])
        print(l_only_values)


        Output




        Full id dict



        [{'id': '35121262', 'visible': 'true', 'version': '17',
        'changeset': '41419943', 'timestamp': '2016-08-12T22:24:23Z', 'user':
        'kocio', 'uid': '52087', 'lat': '52.1560439', 'lon': '21.0346808'},
        {'id': '35121263', 'visible': 'true', 'version': '16', 'changeset':
        '41419943', 'timestamp': '2016-08-12T22:24:20Z', 'user': 'kocio',
        'uid': '52087', 'lat': '52.1492285', 'lon': '21.0461042'}]



        Full Value dict [{'k': 'name', 'v': 'Stokłosy'}, {'k': 'name', 'v': 'Imielin'}]



        Only ID list ['35121262', '35121263']



        Only Value list ['name', 'name']




        What I tried



        I am aware that creating list and using it to append items to new list is wrong, but whenever I tried something like this:



        l_dict_ids_all.append(tag.attrib[0]['id'])


        Received an error :




        TypeError Traceback (most recent call)



        ipython-input-91-8b0a49bc5f35 in ()
        7 for id in l_id:
        8 for tag in tree.findall('//node[@id={0}]'.format(id)):
        ----> 9 l_dict_ids_all.append(tag.attrib[0]['id'])
        10
        src/lxml/etree.pyx in lxml.etree._Attrib.getitem()



        src/lxml/apihelpers.pxi in lxml.etree._getAttributeValue()



        src/lxml/apihelpers.pxi in lxml.etree._getNodeAttributeValue()



        src/lxml/apihelpers.pxi in lxml.etree._getNsTag()



        src/lxml/apihelpers.pxi in lxml.etree.__getNsTag()



        src/lxml/apihelpers.pxi in lxml.etree._utf8()



        TypeError: Argument must be bytes or unicode, got 'int'




        My goal/problem



        Code is working, but I want to make it better.
        I need to get rid of 2 out 4 lists which I create at the begging.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 23 at 11:56









        JuniorPythonNewbie

        504




        504






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Code Review Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208158%2fsearch-for-ways-in-openstreetmap-extract%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Сан-Квентин

            8-я гвардейская общевойсковая армия

            Алькесар