Counting SQL GUIDs from a server log and printing the stats, improved












2














This is a continuation from my original question. After the improvements suggested by @Reinderien and some of my own.

This approach I've taken is kinda obvious. And I'm not using any parallel processing. I think there's scope for improvement because I know of a crate, Rayon in Rust which could have run the steps I'm currently running, parallelly. I'll explain why I think this is possible below.



"""
Find the number of 'exceptions' and 'added' event's in the exception log
with respect to the device ID.

author: clmno
date: 2018-12-23
updated: 2018-12-27
"""

from time import time
import re

def timer(fn):
""" Used to time a function's execution"""
def f(*args, **kwargs):
before = time()
rv = fn(*args, **kwargs)
after = time()
print("elapsed", after - before)
return rv
return f

#compile the regex globally
re_prefix = '.*?'
re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'
rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)

def find_sql_guid(txt):
""" From the passed in txt, find the SQL guid using re"""
m = rg.search(txt)
if m:
guid1 = m.group(1)
else:
print("ERROR: No SQL guid in line. Check the code")
exit(-1)
return guid1

@timer
def find_device_IDs(file_obj, element):
""" Find the element (type: str) within the file (file path is
provide as arg). Then find the SQL guid from the line at hand.
(Each line has a SQL guid)
Return a dict of {element: [<list of SQL guids>]}
"""
lines = set()
for line in file_obj:
if element in line:
#find the sql-guid from the line-str & append
lines.add(find_sql_guid(line))
file_obj.seek(0)
return lines

@timer
def find_num_occurences(file_obj, key, search_val, unique_values):
""" Find and append SQL guids that are in a line that contains a string
that's in search_val into 'exception' and 'added'
Return a dict of {'exception':set(<set of SQL guids>),
'added': set(<set of SQL guids>)}
"""
lines = {'exception':set(), 'added': set()}

for line in file_obj:
for value in unique_values:
if value in line:
if search_val[0] in line:
lines['exception'].add(value)
elif search_val[1] in line:
lines['added'].add(value)
file_obj.seek(0)
return lines

def print_stats(num_exceptions_dict):
for key in num_exceptions_dict.keys():
print("{} added ".format(key) +
str(len(list(num_exceptions_dict[key]["added"]))))
print("{} exceptions ".format(key) +
str(len(list(num_exceptions_dict[key]["exception"]))))

if __name__ == "__main__":
path = 'log/server.log'
search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')

with open(path) as file_obj:
#find every occurance of device ID and find their corresponding SQL
# guids (unique ID)
unique_ids_dict = {
element: find_device_IDs(file_obj, element)
for element in search_list
}

#Now for each unique ID find if string ["Exception occurred",
# "Packet record has been added"] is found in it's SQL guid list.
search_with_in_deviceID = ("Exception occurred",
"Packet record has been added")
#reset the file pointer
file_obj.seek(0)
num_exceptions_dict = {
elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,
unique_ids_dict[elem])
for elem in search_list
}

print_stats(num_exceptions_dict)


and here's a small server log for you to experiment on



Improvements




  • More pythonic with some help from Reinderien.


  • Opening the file only once. Can't see a significant change in execution speed.

  • Using better data structure model. Was using dicts everywhere, sets made sense.


My current approach is to




  1. Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.

  2. For each SQL GUID find if it resulted in an exception or added event. Store them in dict.


  3. Print the stats


Parallelize

Step one and two are just going through the file searching for a particular string and performing a set of instructions once the sting is found. And so each process (both within steps one and two, and step one and two as a whole) is independent of each other / mutually exclusive. And so running them in parallel makes more sense.



How should I get going to improve this code?










share|improve this question









New contributor




clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • And I've read of Donald Knuth's famour string search algo Knuth-Morris-Pratt. What I'm doing is a string search. Just wondering
    – clmno
    Dec 27 at 6:36










  • Your dictionary comprehension to find the unique IDs does not work. After find_device_IDs has run for the first time, you have reached the end of the file, so all except one ID are empty sets. A file_obj.seek(0) at the end of the function would help for now.
    – Graipher
    Dec 27 at 10:03












  • The signature of that function should also be def find_device_IDs(file_obj, element):, otherwise it just accesses the global variable file_obj.
    – Graipher
    Dec 27 at 10:08










  • @Graipher Fixed that in the code and tried running again. Sadly, there isn't a significant improvement from my previous version. (its 45s now, used to be 47s)
    – clmno
    Dec 27 at 10:22






  • 1




    Will write an answer to see if you can get away with a single pass over the file.
    – Graipher
    Dec 27 at 10:25
















2














This is a continuation from my original question. After the improvements suggested by @Reinderien and some of my own.

This approach I've taken is kinda obvious. And I'm not using any parallel processing. I think there's scope for improvement because I know of a crate, Rayon in Rust which could have run the steps I'm currently running, parallelly. I'll explain why I think this is possible below.



"""
Find the number of 'exceptions' and 'added' event's in the exception log
with respect to the device ID.

author: clmno
date: 2018-12-23
updated: 2018-12-27
"""

from time import time
import re

def timer(fn):
""" Used to time a function's execution"""
def f(*args, **kwargs):
before = time()
rv = fn(*args, **kwargs)
after = time()
print("elapsed", after - before)
return rv
return f

#compile the regex globally
re_prefix = '.*?'
re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'
rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)

def find_sql_guid(txt):
""" From the passed in txt, find the SQL guid using re"""
m = rg.search(txt)
if m:
guid1 = m.group(1)
else:
print("ERROR: No SQL guid in line. Check the code")
exit(-1)
return guid1

@timer
def find_device_IDs(file_obj, element):
""" Find the element (type: str) within the file (file path is
provide as arg). Then find the SQL guid from the line at hand.
(Each line has a SQL guid)
Return a dict of {element: [<list of SQL guids>]}
"""
lines = set()
for line in file_obj:
if element in line:
#find the sql-guid from the line-str & append
lines.add(find_sql_guid(line))
file_obj.seek(0)
return lines

@timer
def find_num_occurences(file_obj, key, search_val, unique_values):
""" Find and append SQL guids that are in a line that contains a string
that's in search_val into 'exception' and 'added'
Return a dict of {'exception':set(<set of SQL guids>),
'added': set(<set of SQL guids>)}
"""
lines = {'exception':set(), 'added': set()}

for line in file_obj:
for value in unique_values:
if value in line:
if search_val[0] in line:
lines['exception'].add(value)
elif search_val[1] in line:
lines['added'].add(value)
file_obj.seek(0)
return lines

def print_stats(num_exceptions_dict):
for key in num_exceptions_dict.keys():
print("{} added ".format(key) +
str(len(list(num_exceptions_dict[key]["added"]))))
print("{} exceptions ".format(key) +
str(len(list(num_exceptions_dict[key]["exception"]))))

if __name__ == "__main__":
path = 'log/server.log'
search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')

with open(path) as file_obj:
#find every occurance of device ID and find their corresponding SQL
# guids (unique ID)
unique_ids_dict = {
element: find_device_IDs(file_obj, element)
for element in search_list
}

#Now for each unique ID find if string ["Exception occurred",
# "Packet record has been added"] is found in it's SQL guid list.
search_with_in_deviceID = ("Exception occurred",
"Packet record has been added")
#reset the file pointer
file_obj.seek(0)
num_exceptions_dict = {
elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,
unique_ids_dict[elem])
for elem in search_list
}

print_stats(num_exceptions_dict)


and here's a small server log for you to experiment on



Improvements




  • More pythonic with some help from Reinderien.


  • Opening the file only once. Can't see a significant change in execution speed.

  • Using better data structure model. Was using dicts everywhere, sets made sense.


My current approach is to




  1. Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.

  2. For each SQL GUID find if it resulted in an exception or added event. Store them in dict.


  3. Print the stats


Parallelize

Step one and two are just going through the file searching for a particular string and performing a set of instructions once the sting is found. And so each process (both within steps one and two, and step one and two as a whole) is independent of each other / mutually exclusive. And so running them in parallel makes more sense.



How should I get going to improve this code?










share|improve this question









New contributor




clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • And I've read of Donald Knuth's famour string search algo Knuth-Morris-Pratt. What I'm doing is a string search. Just wondering
    – clmno
    Dec 27 at 6:36










  • Your dictionary comprehension to find the unique IDs does not work. After find_device_IDs has run for the first time, you have reached the end of the file, so all except one ID are empty sets. A file_obj.seek(0) at the end of the function would help for now.
    – Graipher
    Dec 27 at 10:03












  • The signature of that function should also be def find_device_IDs(file_obj, element):, otherwise it just accesses the global variable file_obj.
    – Graipher
    Dec 27 at 10:08










  • @Graipher Fixed that in the code and tried running again. Sadly, there isn't a significant improvement from my previous version. (its 45s now, used to be 47s)
    – clmno
    Dec 27 at 10:22






  • 1




    Will write an answer to see if you can get away with a single pass over the file.
    – Graipher
    Dec 27 at 10:25














2












2








2


2





This is a continuation from my original question. After the improvements suggested by @Reinderien and some of my own.

This approach I've taken is kinda obvious. And I'm not using any parallel processing. I think there's scope for improvement because I know of a crate, Rayon in Rust which could have run the steps I'm currently running, parallelly. I'll explain why I think this is possible below.



"""
Find the number of 'exceptions' and 'added' event's in the exception log
with respect to the device ID.

author: clmno
date: 2018-12-23
updated: 2018-12-27
"""

from time import time
import re

def timer(fn):
""" Used to time a function's execution"""
def f(*args, **kwargs):
before = time()
rv = fn(*args, **kwargs)
after = time()
print("elapsed", after - before)
return rv
return f

#compile the regex globally
re_prefix = '.*?'
re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'
rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)

def find_sql_guid(txt):
""" From the passed in txt, find the SQL guid using re"""
m = rg.search(txt)
if m:
guid1 = m.group(1)
else:
print("ERROR: No SQL guid in line. Check the code")
exit(-1)
return guid1

@timer
def find_device_IDs(file_obj, element):
""" Find the element (type: str) within the file (file path is
provide as arg). Then find the SQL guid from the line at hand.
(Each line has a SQL guid)
Return a dict of {element: [<list of SQL guids>]}
"""
lines = set()
for line in file_obj:
if element in line:
#find the sql-guid from the line-str & append
lines.add(find_sql_guid(line))
file_obj.seek(0)
return lines

@timer
def find_num_occurences(file_obj, key, search_val, unique_values):
""" Find and append SQL guids that are in a line that contains a string
that's in search_val into 'exception' and 'added'
Return a dict of {'exception':set(<set of SQL guids>),
'added': set(<set of SQL guids>)}
"""
lines = {'exception':set(), 'added': set()}

for line in file_obj:
for value in unique_values:
if value in line:
if search_val[0] in line:
lines['exception'].add(value)
elif search_val[1] in line:
lines['added'].add(value)
file_obj.seek(0)
return lines

def print_stats(num_exceptions_dict):
for key in num_exceptions_dict.keys():
print("{} added ".format(key) +
str(len(list(num_exceptions_dict[key]["added"]))))
print("{} exceptions ".format(key) +
str(len(list(num_exceptions_dict[key]["exception"]))))

if __name__ == "__main__":
path = 'log/server.log'
search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')

with open(path) as file_obj:
#find every occurance of device ID and find their corresponding SQL
# guids (unique ID)
unique_ids_dict = {
element: find_device_IDs(file_obj, element)
for element in search_list
}

#Now for each unique ID find if string ["Exception occurred",
# "Packet record has been added"] is found in it's SQL guid list.
search_with_in_deviceID = ("Exception occurred",
"Packet record has been added")
#reset the file pointer
file_obj.seek(0)
num_exceptions_dict = {
elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,
unique_ids_dict[elem])
for elem in search_list
}

print_stats(num_exceptions_dict)


and here's a small server log for you to experiment on



Improvements




  • More pythonic with some help from Reinderien.


  • Opening the file only once. Can't see a significant change in execution speed.

  • Using better data structure model. Was using dicts everywhere, sets made sense.


My current approach is to




  1. Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.

  2. For each SQL GUID find if it resulted in an exception or added event. Store them in dict.


  3. Print the stats


Parallelize

Step one and two are just going through the file searching for a particular string and performing a set of instructions once the sting is found. And so each process (both within steps one and two, and step one and two as a whole) is independent of each other / mutually exclusive. And so running them in parallel makes more sense.



How should I get going to improve this code?










share|improve this question









New contributor




clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











This is a continuation from my original question. After the improvements suggested by @Reinderien and some of my own.

This approach I've taken is kinda obvious. And I'm not using any parallel processing. I think there's scope for improvement because I know of a crate, Rayon in Rust which could have run the steps I'm currently running, parallelly. I'll explain why I think this is possible below.



"""
Find the number of 'exceptions' and 'added' event's in the exception log
with respect to the device ID.

author: clmno
date: 2018-12-23
updated: 2018-12-27
"""

from time import time
import re

def timer(fn):
""" Used to time a function's execution"""
def f(*args, **kwargs):
before = time()
rv = fn(*args, **kwargs)
after = time()
print("elapsed", after - before)
return rv
return f

#compile the regex globally
re_prefix = '.*?'
re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'
rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)

def find_sql_guid(txt):
""" From the passed in txt, find the SQL guid using re"""
m = rg.search(txt)
if m:
guid1 = m.group(1)
else:
print("ERROR: No SQL guid in line. Check the code")
exit(-1)
return guid1

@timer
def find_device_IDs(file_obj, element):
""" Find the element (type: str) within the file (file path is
provide as arg). Then find the SQL guid from the line at hand.
(Each line has a SQL guid)
Return a dict of {element: [<list of SQL guids>]}
"""
lines = set()
for line in file_obj:
if element in line:
#find the sql-guid from the line-str & append
lines.add(find_sql_guid(line))
file_obj.seek(0)
return lines

@timer
def find_num_occurences(file_obj, key, search_val, unique_values):
""" Find and append SQL guids that are in a line that contains a string
that's in search_val into 'exception' and 'added'
Return a dict of {'exception':set(<set of SQL guids>),
'added': set(<set of SQL guids>)}
"""
lines = {'exception':set(), 'added': set()}

for line in file_obj:
for value in unique_values:
if value in line:
if search_val[0] in line:
lines['exception'].add(value)
elif search_val[1] in line:
lines['added'].add(value)
file_obj.seek(0)
return lines

def print_stats(num_exceptions_dict):
for key in num_exceptions_dict.keys():
print("{} added ".format(key) +
str(len(list(num_exceptions_dict[key]["added"]))))
print("{} exceptions ".format(key) +
str(len(list(num_exceptions_dict[key]["exception"]))))

if __name__ == "__main__":
path = 'log/server.log'
search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')

with open(path) as file_obj:
#find every occurance of device ID and find their corresponding SQL
# guids (unique ID)
unique_ids_dict = {
element: find_device_IDs(file_obj, element)
for element in search_list
}

#Now for each unique ID find if string ["Exception occurred",
# "Packet record has been added"] is found in it's SQL guid list.
search_with_in_deviceID = ("Exception occurred",
"Packet record has been added")
#reset the file pointer
file_obj.seek(0)
num_exceptions_dict = {
elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,
unique_ids_dict[elem])
for elem in search_list
}

print_stats(num_exceptions_dict)


and here's a small server log for you to experiment on



Improvements




  • More pythonic with some help from Reinderien.


  • Opening the file only once. Can't see a significant change in execution speed.

  • Using better data structure model. Was using dicts everywhere, sets made sense.


My current approach is to




  1. Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.

  2. For each SQL GUID find if it resulted in an exception or added event. Store them in dict.


  3. Print the stats


Parallelize

Step one and two are just going through the file searching for a particular string and performing a set of instructions once the sting is found. And so each process (both within steps one and two, and step one and two as a whole) is independent of each other / mutually exclusive. And so running them in parallel makes more sense.



How should I get going to improve this code?







python






share|improve this question









New contributor




clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited Dec 27 at 10:18





















New contributor




clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Dec 27 at 6:17









clmno

1304




1304




New contributor




clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • And I've read of Donald Knuth's famour string search algo Knuth-Morris-Pratt. What I'm doing is a string search. Just wondering
    – clmno
    Dec 27 at 6:36










  • Your dictionary comprehension to find the unique IDs does not work. After find_device_IDs has run for the first time, you have reached the end of the file, so all except one ID are empty sets. A file_obj.seek(0) at the end of the function would help for now.
    – Graipher
    Dec 27 at 10:03












  • The signature of that function should also be def find_device_IDs(file_obj, element):, otherwise it just accesses the global variable file_obj.
    – Graipher
    Dec 27 at 10:08










  • @Graipher Fixed that in the code and tried running again. Sadly, there isn't a significant improvement from my previous version. (its 45s now, used to be 47s)
    – clmno
    Dec 27 at 10:22






  • 1




    Will write an answer to see if you can get away with a single pass over the file.
    – Graipher
    Dec 27 at 10:25


















  • And I've read of Donald Knuth's famour string search algo Knuth-Morris-Pratt. What I'm doing is a string search. Just wondering
    – clmno
    Dec 27 at 6:36










  • Your dictionary comprehension to find the unique IDs does not work. After find_device_IDs has run for the first time, you have reached the end of the file, so all except one ID are empty sets. A file_obj.seek(0) at the end of the function would help for now.
    – Graipher
    Dec 27 at 10:03












  • The signature of that function should also be def find_device_IDs(file_obj, element):, otherwise it just accesses the global variable file_obj.
    – Graipher
    Dec 27 at 10:08










  • @Graipher Fixed that in the code and tried running again. Sadly, there isn't a significant improvement from my previous version. (its 45s now, used to be 47s)
    – clmno
    Dec 27 at 10:22






  • 1




    Will write an answer to see if you can get away with a single pass over the file.
    – Graipher
    Dec 27 at 10:25
















And I've read of Donald Knuth's famour string search algo Knuth-Morris-Pratt. What I'm doing is a string search. Just wondering
– clmno
Dec 27 at 6:36




And I've read of Donald Knuth's famour string search algo Knuth-Morris-Pratt. What I'm doing is a string search. Just wondering
– clmno
Dec 27 at 6:36












Your dictionary comprehension to find the unique IDs does not work. After find_device_IDs has run for the first time, you have reached the end of the file, so all except one ID are empty sets. A file_obj.seek(0) at the end of the function would help for now.
– Graipher
Dec 27 at 10:03






Your dictionary comprehension to find the unique IDs does not work. After find_device_IDs has run for the first time, you have reached the end of the file, so all except one ID are empty sets. A file_obj.seek(0) at the end of the function would help for now.
– Graipher
Dec 27 at 10:03














The signature of that function should also be def find_device_IDs(file_obj, element):, otherwise it just accesses the global variable file_obj.
– Graipher
Dec 27 at 10:08




The signature of that function should also be def find_device_IDs(file_obj, element):, otherwise it just accesses the global variable file_obj.
– Graipher
Dec 27 at 10:08












@Graipher Fixed that in the code and tried running again. Sadly, there isn't a significant improvement from my previous version. (its 45s now, used to be 47s)
– clmno
Dec 27 at 10:22




@Graipher Fixed that in the code and tried running again. Sadly, there isn't a significant improvement from my previous version. (its 45s now, used to be 47s)
– clmno
Dec 27 at 10:22




1




1




Will write an answer to see if you can get away with a single pass over the file.
– Graipher
Dec 27 at 10:25




Will write an answer to see if you can get away with a single pass over the file.
– Graipher
Dec 27 at 10:25










1 Answer
1






active

oldest

votes


















3














As evidenced by your timings a major bottleneck of your code is having to read the file multiple times. At least it is now only opened once, but the actual content is still read eight times (once for each unique ID, so four times in your example, and then once again each for the exceptions).



First, let's reduce this to two passes, once for the IDs and once for the exceptions/added events:



from collections import defaultdict

@timer
def find_device_IDs(file_obj, search_list):
""" Find the element (type: str) within the file (file path is
provide as arg). Then find the SQL guid from the line at hand.
(Each line has a SQL guid)
Return a dict of {element: [<list of SQL guids>]}
"""
sql_guids = defaultdict(set)
for line in file_obj:
for element in search_list:
if element in line:
#find the sql-guid from the line-str & append
sql_guids[element].add(find_sql_guid(line))
return sql_guids


The exception/added finding function is a bit more complicated. Here we first need to invert the dictionary:



device_ids = {sql_guid: device_id for device_id, values in unique_ids_dict.items() for sql_guid in values}
# {'0af229d1-283e-4575-a818-901617a762a7': '3BAA5C57',
# '2f4a7f93-d7ed-4514-bef0-9bb0f025ecd3': '3BAA5C42',
# '4e720c6e-1866-4c9b-b967-dfab049266fb': '3BAA5B67',
# '85708e5d-768d-4a90-ab71-60a737de96e3': '3BAA5B67',
# 'e268b224-bfb7-40c7-8ae5-500eaecb292b': '3BAA5B84',
# 'e4ced298-530c-41cc-98a7-42a2e4fe5987': '3BAA5B67'}


Then we can use that:



@timer
def find_num_occurences(file_obj, sql_guids, search_vals):
device_ids = {sql_guid: device_id for device_id, values in sql_guids.items() for sql_guid in values}
data = defaultdict(lambda: defaultdict(set))

for line in file_obj:
for sql_guid, device_id in device_ids.items():
if sql_guid in line:
for key, search_val in search_vals.items():
if search_val in line:
data[device_id][key].add(sql_guid)
return data


The usage is almost the same as your code:



with open(path) as file_obj:
device_ids = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')
sql_guids = find_device_IDs(file_obj, device_ids)
file_obj.seek(0)

search_with_in_deviceID = {"exception": "Exception occurred",
"added": "Packet record has been added"}
print(find_num_occurences(file_obj, sql_guids, search_with_in_deviceID))

# defaultdict(<function __main__.find_num_occurences.<locals>.<lambda>>,
# {'3BAA5B67': defaultdict(set,
# {'added': {'4e720c6e-1866-4c9b-b967-dfab049266fb'},
# 'exception': {'85708e5d-768d-4a90-ab71-60a737de96e3',
# 'e4ced298-530c-41cc-98a7-42a2e4fe5987'}}),
# '3BAA5B84': defaultdict(set,
# {'added': {'e268b224-bfb7-40c7-8ae5-500eaecb292b'}}),
# '3BAA5C42': defaultdict(set,
# {'added': {'2f4a7f93-d7ed-4514-bef0-9bb0f025ecd3'}}),
# '3BAA5C57': defaultdict(set,
# {'added': {'0af229d1-283e-4575-a818-901617a762a7'}})})




You can actually get this down to a single pass, by collecting all IDs where an exception occurred and only at the end joining that with the elements you are actually searching for:



def get_data(file_obj, device_ids, search_vals):
sql_guid_to_device_id = {}
data = defaultdict(set)

for line in file_obj:
# search for an sql_guid
m = rg.search(line)
if m:
sql_guid = m.group(1)

# Add to mapping
for device_id in device_ids:
if device_id in line:
sql_guid_to_device_id[sql_guid] = device_id

# Add to exceptions/added
for key, search_val in search_vals.items():
if search_val in line:
data[sql_guid].add(key)
return sql_guid_to_device_id, data

def merge(sql_guid_to_device_id, data):
data2 = defaultdict(lambda: defaultdict(set))

for sql_guid, values in data.items():
if sql_guid in sql_guid_to_device_id:
for key in values:
data2[sql_guid_to_device_id[sql_guid]][key].add(sql_guid)
return data2


With the following usage:



with open(path) as file_obj:
device_ids = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')
search_with_in_deviceID = {"exception": "Exception occurred",
"added": "Packet record has been added"}
sql_guid_to_device_id, data = get_data(file_obj, device_ids, search_with_in_deviceID)
data2 = merge(sql_guid_to_device_id, data)

for device_id, values in data2.items():
for key, sql_guids in values.items():
print(f"{device_id} {key} {len(sql_guids)}")

# 3BAA5B67 exception 2
# 3BAA5B67 added 1
# 3BAA5C42 added 1
# 3BAA5B84 added 1
# 3BAA5C57 added 1


get_data, data and data2 still need better names...



Other than that this should be faster because it reads the file only once. It does consume more memory, though, because it also saves exceptions or added events for SQL guids which you later don't need. If this trade-off is not worth it, go back to the first half of this answer.






share|improve this answer



















  • 2




    The one pass solution was pretty awesome! It has brought it down to 6 seconds! TIL Lesser IO operations, faster your code. Coming from hardware/fw we always use IOs and rarely have enough RAM. ty
    – clmno
    Dec 27 at 12:18











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






clmno is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210405%2fcounting-sql-guids-from-a-server-log-and-printing-the-stats-improved%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














As evidenced by your timings a major bottleneck of your code is having to read the file multiple times. At least it is now only opened once, but the actual content is still read eight times (once for each unique ID, so four times in your example, and then once again each for the exceptions).



First, let's reduce this to two passes, once for the IDs and once for the exceptions/added events:



from collections import defaultdict

@timer
def find_device_IDs(file_obj, search_list):
""" Find the element (type: str) within the file (file path is
provide as arg). Then find the SQL guid from the line at hand.
(Each line has a SQL guid)
Return a dict of {element: [<list of SQL guids>]}
"""
sql_guids = defaultdict(set)
for line in file_obj:
for element in search_list:
if element in line:
#find the sql-guid from the line-str & append
sql_guids[element].add(find_sql_guid(line))
return sql_guids


The exception/added finding function is a bit more complicated. Here we first need to invert the dictionary:



device_ids = {sql_guid: device_id for device_id, values in unique_ids_dict.items() for sql_guid in values}
# {'0af229d1-283e-4575-a818-901617a762a7': '3BAA5C57',
# '2f4a7f93-d7ed-4514-bef0-9bb0f025ecd3': '3BAA5C42',
# '4e720c6e-1866-4c9b-b967-dfab049266fb': '3BAA5B67',
# '85708e5d-768d-4a90-ab71-60a737de96e3': '3BAA5B67',
# 'e268b224-bfb7-40c7-8ae5-500eaecb292b': '3BAA5B84',
# 'e4ced298-530c-41cc-98a7-42a2e4fe5987': '3BAA5B67'}


Then we can use that:



@timer
def find_num_occurences(file_obj, sql_guids, search_vals):
device_ids = {sql_guid: device_id for device_id, values in sql_guids.items() for sql_guid in values}
data = defaultdict(lambda: defaultdict(set))

for line in file_obj:
for sql_guid, device_id in device_ids.items():
if sql_guid in line:
for key, search_val in search_vals.items():
if search_val in line:
data[device_id][key].add(sql_guid)
return data


The usage is almost the same as your code:



with open(path) as file_obj:
device_ids = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')
sql_guids = find_device_IDs(file_obj, device_ids)
file_obj.seek(0)

search_with_in_deviceID = {"exception": "Exception occurred",
"added": "Packet record has been added"}
print(find_num_occurences(file_obj, sql_guids, search_with_in_deviceID))

# defaultdict(<function __main__.find_num_occurences.<locals>.<lambda>>,
# {'3BAA5B67': defaultdict(set,
# {'added': {'4e720c6e-1866-4c9b-b967-dfab049266fb'},
# 'exception': {'85708e5d-768d-4a90-ab71-60a737de96e3',
# 'e4ced298-530c-41cc-98a7-42a2e4fe5987'}}),
# '3BAA5B84': defaultdict(set,
# {'added': {'e268b224-bfb7-40c7-8ae5-500eaecb292b'}}),
# '3BAA5C42': defaultdict(set,
# {'added': {'2f4a7f93-d7ed-4514-bef0-9bb0f025ecd3'}}),
# '3BAA5C57': defaultdict(set,
# {'added': {'0af229d1-283e-4575-a818-901617a762a7'}})})




You can actually get this down to a single pass, by collecting all IDs where an exception occurred and only at the end joining that with the elements you are actually searching for:



def get_data(file_obj, device_ids, search_vals):
sql_guid_to_device_id = {}
data = defaultdict(set)

for line in file_obj:
# search for an sql_guid
m = rg.search(line)
if m:
sql_guid = m.group(1)

# Add to mapping
for device_id in device_ids:
if device_id in line:
sql_guid_to_device_id[sql_guid] = device_id

# Add to exceptions/added
for key, search_val in search_vals.items():
if search_val in line:
data[sql_guid].add(key)
return sql_guid_to_device_id, data

def merge(sql_guid_to_device_id, data):
data2 = defaultdict(lambda: defaultdict(set))

for sql_guid, values in data.items():
if sql_guid in sql_guid_to_device_id:
for key in values:
data2[sql_guid_to_device_id[sql_guid]][key].add(sql_guid)
return data2


With the following usage:



with open(path) as file_obj:
device_ids = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')
search_with_in_deviceID = {"exception": "Exception occurred",
"added": "Packet record has been added"}
sql_guid_to_device_id, data = get_data(file_obj, device_ids, search_with_in_deviceID)
data2 = merge(sql_guid_to_device_id, data)

for device_id, values in data2.items():
for key, sql_guids in values.items():
print(f"{device_id} {key} {len(sql_guids)}")

# 3BAA5B67 exception 2
# 3BAA5B67 added 1
# 3BAA5C42 added 1
# 3BAA5B84 added 1
# 3BAA5C57 added 1


get_data, data and data2 still need better names...



Other than that this should be faster because it reads the file only once. It does consume more memory, though, because it also saves exceptions or added events for SQL guids which you later don't need. If this trade-off is not worth it, go back to the first half of this answer.






share|improve this answer



















  • 2




    The one pass solution was pretty awesome! It has brought it down to 6 seconds! TIL Lesser IO operations, faster your code. Coming from hardware/fw we always use IOs and rarely have enough RAM. ty
    – clmno
    Dec 27 at 12:18
















3














As evidenced by your timings a major bottleneck of your code is having to read the file multiple times. At least it is now only opened once, but the actual content is still read eight times (once for each unique ID, so four times in your example, and then once again each for the exceptions).



First, let's reduce this to two passes, once for the IDs and once for the exceptions/added events:



from collections import defaultdict

@timer
def find_device_IDs(file_obj, search_list):
""" Find the element (type: str) within the file (file path is
provide as arg). Then find the SQL guid from the line at hand.
(Each line has a SQL guid)
Return a dict of {element: [<list of SQL guids>]}
"""
sql_guids = defaultdict(set)
for line in file_obj:
for element in search_list:
if element in line:
#find the sql-guid from the line-str & append
sql_guids[element].add(find_sql_guid(line))
return sql_guids


The exception/added finding function is a bit more complicated. Here we first need to invert the dictionary:



device_ids = {sql_guid: device_id for device_id, values in unique_ids_dict.items() for sql_guid in values}
# {'0af229d1-283e-4575-a818-901617a762a7': '3BAA5C57',
# '2f4a7f93-d7ed-4514-bef0-9bb0f025ecd3': '3BAA5C42',
# '4e720c6e-1866-4c9b-b967-dfab049266fb': '3BAA5B67',
# '85708e5d-768d-4a90-ab71-60a737de96e3': '3BAA5B67',
# 'e268b224-bfb7-40c7-8ae5-500eaecb292b': '3BAA5B84',
# 'e4ced298-530c-41cc-98a7-42a2e4fe5987': '3BAA5B67'}


Then we can use that:



@timer
def find_num_occurences(file_obj, sql_guids, search_vals):
device_ids = {sql_guid: device_id for device_id, values in sql_guids.items() for sql_guid in values}
data = defaultdict(lambda: defaultdict(set))

for line in file_obj:
for sql_guid, device_id in device_ids.items():
if sql_guid in line:
for key, search_val in search_vals.items():
if search_val in line:
data[device_id][key].add(sql_guid)
return data


The usage is almost the same as your code:



with open(path) as file_obj:
device_ids = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')
sql_guids = find_device_IDs(file_obj, device_ids)
file_obj.seek(0)

search_with_in_deviceID = {"exception": "Exception occurred",
"added": "Packet record has been added"}
print(find_num_occurences(file_obj, sql_guids, search_with_in_deviceID))

# defaultdict(<function __main__.find_num_occurences.<locals>.<lambda>>,
# {'3BAA5B67': defaultdict(set,
# {'added': {'4e720c6e-1866-4c9b-b967-dfab049266fb'},
# 'exception': {'85708e5d-768d-4a90-ab71-60a737de96e3',
# 'e4ced298-530c-41cc-98a7-42a2e4fe5987'}}),
# '3BAA5B84': defaultdict(set,
# {'added': {'e268b224-bfb7-40c7-8ae5-500eaecb292b'}}),
# '3BAA5C42': defaultdict(set,
# {'added': {'2f4a7f93-d7ed-4514-bef0-9bb0f025ecd3'}}),
# '3BAA5C57': defaultdict(set,
# {'added': {'0af229d1-283e-4575-a818-901617a762a7'}})})




You can actually get this down to a single pass, by collecting all IDs where an exception occurred and only at the end joining that with the elements you are actually searching for:



def get_data(file_obj, device_ids, search_vals):
sql_guid_to_device_id = {}
data = defaultdict(set)

for line in file_obj:
# search for an sql_guid
m = rg.search(line)
if m:
sql_guid = m.group(1)

# Add to mapping
for device_id in device_ids:
if device_id in line:
sql_guid_to_device_id[sql_guid] = device_id

# Add to exceptions/added
for key, search_val in search_vals.items():
if search_val in line:
data[sql_guid].add(key)
return sql_guid_to_device_id, data

def merge(sql_guid_to_device_id, data):
data2 = defaultdict(lambda: defaultdict(set))

for sql_guid, values in data.items():
if sql_guid in sql_guid_to_device_id:
for key in values:
data2[sql_guid_to_device_id[sql_guid]][key].add(sql_guid)
return data2


With the following usage:



with open(path) as file_obj:
device_ids = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')
search_with_in_deviceID = {"exception": "Exception occurred",
"added": "Packet record has been added"}
sql_guid_to_device_id, data = get_data(file_obj, device_ids, search_with_in_deviceID)
data2 = merge(sql_guid_to_device_id, data)

for device_id, values in data2.items():
for key, sql_guids in values.items():
print(f"{device_id} {key} {len(sql_guids)}")

# 3BAA5B67 exception 2
# 3BAA5B67 added 1
# 3BAA5C42 added 1
# 3BAA5B84 added 1
# 3BAA5C57 added 1


get_data, data and data2 still need better names...



Other than that this should be faster because it reads the file only once. It does consume more memory, though, because it also saves exceptions or added events for SQL guids which you later don't need. If this trade-off is not worth it, go back to the first half of this answer.






share|improve this answer



















  • 2




    The one pass solution was pretty awesome! It has brought it down to 6 seconds! TIL Lesser IO operations, faster your code. Coming from hardware/fw we always use IOs and rarely have enough RAM. ty
    – clmno
    Dec 27 at 12:18














3












3








3






As evidenced by your timings a major bottleneck of your code is having to read the file multiple times. At least it is now only opened once, but the actual content is still read eight times (once for each unique ID, so four times in your example, and then once again each for the exceptions).



First, let's reduce this to two passes, once for the IDs and once for the exceptions/added events:



from collections import defaultdict

@timer
def find_device_IDs(file_obj, search_list):
""" Find the element (type: str) within the file (file path is
provide as arg). Then find the SQL guid from the line at hand.
(Each line has a SQL guid)
Return a dict of {element: [<list of SQL guids>]}
"""
sql_guids = defaultdict(set)
for line in file_obj:
for element in search_list:
if element in line:
#find the sql-guid from the line-str & append
sql_guids[element].add(find_sql_guid(line))
return sql_guids


The exception/added finding function is a bit more complicated. Here we first need to invert the dictionary:



device_ids = {sql_guid: device_id for device_id, values in unique_ids_dict.items() for sql_guid in values}
# {'0af229d1-283e-4575-a818-901617a762a7': '3BAA5C57',
# '2f4a7f93-d7ed-4514-bef0-9bb0f025ecd3': '3BAA5C42',
# '4e720c6e-1866-4c9b-b967-dfab049266fb': '3BAA5B67',
# '85708e5d-768d-4a90-ab71-60a737de96e3': '3BAA5B67',
# 'e268b224-bfb7-40c7-8ae5-500eaecb292b': '3BAA5B84',
# 'e4ced298-530c-41cc-98a7-42a2e4fe5987': '3BAA5B67'}


Then we can use that:



@timer
def find_num_occurences(file_obj, sql_guids, search_vals):
device_ids = {sql_guid: device_id for device_id, values in sql_guids.items() for sql_guid in values}
data = defaultdict(lambda: defaultdict(set))

for line in file_obj:
for sql_guid, device_id in device_ids.items():
if sql_guid in line:
for key, search_val in search_vals.items():
if search_val in line:
data[device_id][key].add(sql_guid)
return data


The usage is almost the same as your code:



with open(path) as file_obj:
device_ids = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')
sql_guids = find_device_IDs(file_obj, device_ids)
file_obj.seek(0)

search_with_in_deviceID = {"exception": "Exception occurred",
"added": "Packet record has been added"}
print(find_num_occurences(file_obj, sql_guids, search_with_in_deviceID))

# defaultdict(<function __main__.find_num_occurences.<locals>.<lambda>>,
# {'3BAA5B67': defaultdict(set,
# {'added': {'4e720c6e-1866-4c9b-b967-dfab049266fb'},
# 'exception': {'85708e5d-768d-4a90-ab71-60a737de96e3',
# 'e4ced298-530c-41cc-98a7-42a2e4fe5987'}}),
# '3BAA5B84': defaultdict(set,
# {'added': {'e268b224-bfb7-40c7-8ae5-500eaecb292b'}}),
# '3BAA5C42': defaultdict(set,
# {'added': {'2f4a7f93-d7ed-4514-bef0-9bb0f025ecd3'}}),
# '3BAA5C57': defaultdict(set,
# {'added': {'0af229d1-283e-4575-a818-901617a762a7'}})})




You can actually get this down to a single pass, by collecting all IDs where an exception occurred and only at the end joining that with the elements you are actually searching for:



def get_data(file_obj, device_ids, search_vals):
sql_guid_to_device_id = {}
data = defaultdict(set)

for line in file_obj:
# search for an sql_guid
m = rg.search(line)
if m:
sql_guid = m.group(1)

# Add to mapping
for device_id in device_ids:
if device_id in line:
sql_guid_to_device_id[sql_guid] = device_id

# Add to exceptions/added
for key, search_val in search_vals.items():
if search_val in line:
data[sql_guid].add(key)
return sql_guid_to_device_id, data

def merge(sql_guid_to_device_id, data):
data2 = defaultdict(lambda: defaultdict(set))

for sql_guid, values in data.items():
if sql_guid in sql_guid_to_device_id:
for key in values:
data2[sql_guid_to_device_id[sql_guid]][key].add(sql_guid)
return data2


With the following usage:



with open(path) as file_obj:
device_ids = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')
search_with_in_deviceID = {"exception": "Exception occurred",
"added": "Packet record has been added"}
sql_guid_to_device_id, data = get_data(file_obj, device_ids, search_with_in_deviceID)
data2 = merge(sql_guid_to_device_id, data)

for device_id, values in data2.items():
for key, sql_guids in values.items():
print(f"{device_id} {key} {len(sql_guids)}")

# 3BAA5B67 exception 2
# 3BAA5B67 added 1
# 3BAA5C42 added 1
# 3BAA5B84 added 1
# 3BAA5C57 added 1


get_data, data and data2 still need better names...



Other than that this should be faster because it reads the file only once. It does consume more memory, though, because it also saves exceptions or added events for SQL guids which you later don't need. If this trade-off is not worth it, go back to the first half of this answer.






share|improve this answer














As evidenced by your timings a major bottleneck of your code is having to read the file multiple times. At least it is now only opened once, but the actual content is still read eight times (once for each unique ID, so four times in your example, and then once again each for the exceptions).



First, let's reduce this to two passes, once for the IDs and once for the exceptions/added events:



from collections import defaultdict

@timer
def find_device_IDs(file_obj, search_list):
""" Find the element (type: str) within the file (file path is
provide as arg). Then find the SQL guid from the line at hand.
(Each line has a SQL guid)
Return a dict of {element: [<list of SQL guids>]}
"""
sql_guids = defaultdict(set)
for line in file_obj:
for element in search_list:
if element in line:
#find the sql-guid from the line-str & append
sql_guids[element].add(find_sql_guid(line))
return sql_guids


The exception/added finding function is a bit more complicated. Here we first need to invert the dictionary:



device_ids = {sql_guid: device_id for device_id, values in unique_ids_dict.items() for sql_guid in values}
# {'0af229d1-283e-4575-a818-901617a762a7': '3BAA5C57',
# '2f4a7f93-d7ed-4514-bef0-9bb0f025ecd3': '3BAA5C42',
# '4e720c6e-1866-4c9b-b967-dfab049266fb': '3BAA5B67',
# '85708e5d-768d-4a90-ab71-60a737de96e3': '3BAA5B67',
# 'e268b224-bfb7-40c7-8ae5-500eaecb292b': '3BAA5B84',
# 'e4ced298-530c-41cc-98a7-42a2e4fe5987': '3BAA5B67'}


Then we can use that:



@timer
def find_num_occurences(file_obj, sql_guids, search_vals):
device_ids = {sql_guid: device_id for device_id, values in sql_guids.items() for sql_guid in values}
data = defaultdict(lambda: defaultdict(set))

for line in file_obj:
for sql_guid, device_id in device_ids.items():
if sql_guid in line:
for key, search_val in search_vals.items():
if search_val in line:
data[device_id][key].add(sql_guid)
return data


The usage is almost the same as your code:



with open(path) as file_obj:
device_ids = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')
sql_guids = find_device_IDs(file_obj, device_ids)
file_obj.seek(0)

search_with_in_deviceID = {"exception": "Exception occurred",
"added": "Packet record has been added"}
print(find_num_occurences(file_obj, sql_guids, search_with_in_deviceID))

# defaultdict(<function __main__.find_num_occurences.<locals>.<lambda>>,
# {'3BAA5B67': defaultdict(set,
# {'added': {'4e720c6e-1866-4c9b-b967-dfab049266fb'},
# 'exception': {'85708e5d-768d-4a90-ab71-60a737de96e3',
# 'e4ced298-530c-41cc-98a7-42a2e4fe5987'}}),
# '3BAA5B84': defaultdict(set,
# {'added': {'e268b224-bfb7-40c7-8ae5-500eaecb292b'}}),
# '3BAA5C42': defaultdict(set,
# {'added': {'2f4a7f93-d7ed-4514-bef0-9bb0f025ecd3'}}),
# '3BAA5C57': defaultdict(set,
# {'added': {'0af229d1-283e-4575-a818-901617a762a7'}})})




You can actually get this down to a single pass, by collecting all IDs where an exception occurred and only at the end joining that with the elements you are actually searching for:



def get_data(file_obj, device_ids, search_vals):
sql_guid_to_device_id = {}
data = defaultdict(set)

for line in file_obj:
# search for an sql_guid
m = rg.search(line)
if m:
sql_guid = m.group(1)

# Add to mapping
for device_id in device_ids:
if device_id in line:
sql_guid_to_device_id[sql_guid] = device_id

# Add to exceptions/added
for key, search_val in search_vals.items():
if search_val in line:
data[sql_guid].add(key)
return sql_guid_to_device_id, data

def merge(sql_guid_to_device_id, data):
data2 = defaultdict(lambda: defaultdict(set))

for sql_guid, values in data.items():
if sql_guid in sql_guid_to_device_id:
for key in values:
data2[sql_guid_to_device_id[sql_guid]][key].add(sql_guid)
return data2


With the following usage:



with open(path) as file_obj:
device_ids = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')
search_with_in_deviceID = {"exception": "Exception occurred",
"added": "Packet record has been added"}
sql_guid_to_device_id, data = get_data(file_obj, device_ids, search_with_in_deviceID)
data2 = merge(sql_guid_to_device_id, data)

for device_id, values in data2.items():
for key, sql_guids in values.items():
print(f"{device_id} {key} {len(sql_guids)}")

# 3BAA5B67 exception 2
# 3BAA5B67 added 1
# 3BAA5C42 added 1
# 3BAA5B84 added 1
# 3BAA5C57 added 1


get_data, data and data2 still need better names...



Other than that this should be faster because it reads the file only once. It does consume more memory, though, because it also saves exceptions or added events for SQL guids which you later don't need. If this trade-off is not worth it, go back to the first half of this answer.







share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 27 at 11:58

























answered Dec 27 at 11:40









Graipher

23.5k53585




23.5k53585








  • 2




    The one pass solution was pretty awesome! It has brought it down to 6 seconds! TIL Lesser IO operations, faster your code. Coming from hardware/fw we always use IOs and rarely have enough RAM. ty
    – clmno
    Dec 27 at 12:18














  • 2




    The one pass solution was pretty awesome! It has brought it down to 6 seconds! TIL Lesser IO operations, faster your code. Coming from hardware/fw we always use IOs and rarely have enough RAM. ty
    – clmno
    Dec 27 at 12:18








2




2




The one pass solution was pretty awesome! It has brought it down to 6 seconds! TIL Lesser IO operations, faster your code. Coming from hardware/fw we always use IOs and rarely have enough RAM. ty
– clmno
Dec 27 at 12:18




The one pass solution was pretty awesome! It has brought it down to 6 seconds! TIL Lesser IO operations, faster your code. Coming from hardware/fw we always use IOs and rarely have enough RAM. ty
– clmno
Dec 27 at 12:18










clmno is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















clmno is a new contributor. Be nice, and check out our Code of Conduct.













clmno is a new contributor. Be nice, and check out our Code of Conduct.












clmno is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Code Review Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210405%2fcounting-sql-guids-from-a-server-log-and-printing-the-stats-improved%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Сан-Квентин

8-я гвардейская общевойсковая армия

Алькесар