Aggregate Pandas Columns on Geospacial Distance

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

I have a dataframe that has 3 columns, Latitude, Longitude and Median_Income. I need to get the average median income for all points within x km of the original point into a 4th column. I need to do this for each observation.

I have tried making 3 functions which I use apply to attempt to do this quickly. However, the dataframes take forever to process (hours). I haven't seen an error yet, so it appears to be working okay.

The Haversine formula, I found on here. I am using it to calculate the distance between 2 points using lat/lon.

from math import radians, cos, sin, asin, sqrt



def haversine(lon1, lat1, lon2, lat2):



    #Calculate the great circle distance between two points 

    #on the earth (specified in decimal degrees)



    # convert decimal degrees to radians 

    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])



    # haversine formula 

    dlon = lon2 - lon1 

    dlat = lat2 - lat1 

    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2

    c = 2 * asin(sqrt(a)) 

    r = 6371 # Radius of earth in kilometers. Use 3956 for miles

    return c * r

My hav_checker function will check the distance of the current row against all other rows returning a dataframe with the haversine distance in a column.

def hav_checker(row, lon, lat):



    hav = haversine(row['longitude'], row['latitude'], lon, lat)



    return hav

My value grabber fucntion uses the frame returned by hav_checker to return the mean value from my target column (median_income).

For reference, I am using the California housing dataset to build this out.

def value_grabber(row, frame, threshold, target_col):



    frame = frame.copy()



    frame['hav'] = frame.apply(hav_checker, lon = row['longitude'], lat = row['latitude'], axis=1)



    mean_tar = frame.loc[frame.loc[:,'hav'] <= threshold, target_col].mean()



    return mean_tar

I am trying to return these 3 columns to my original dataframe for a feature engineering project within a larger class project.

df['MedianIncomeWithin3KM'] = df.apply(value_grabber, frame=df, threshold=3, target_col='median_income', axis=1)



df['MedianIncomeWithin1KM'] = df.apply(value_grabber, frame=df, threshold=1, target_col='median_income', axis=1)



df['MedianIncomeWithinHalfKM'] = df.apply(value_grabber, frame=df, threshold=.5, target_col='median_income', axis=1)

I have been able to successfully do this with looping but it is extremely time intensive and need a faster solution.

asked 1 hour ago

krewsayder

New contributor

add a comment |

The Haversine formula, I found on here. I am using it to calculate the distance between 2 points using lat/lon.

from math import radians, cos, sin, asin, sqrt



def haversine(lon1, lat1, lon2, lat2):



    #Calculate the great circle distance between two points 

    #on the earth (specified in decimal degrees)



    # convert decimal degrees to radians 

    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])



    # haversine formula 

    dlon = lon2 - lon1 

    dlat = lat2 - lat1 

    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2

    c = 2 * asin(sqrt(a)) 

    r = 6371 # Radius of earth in kilometers. Use 3956 for miles

    return c * r

My hav_checker function will check the distance of the current row against all other rows returning a dataframe with the haversine distance in a column.

def hav_checker(row, lon, lat):



    hav = haversine(row['longitude'], row['latitude'], lon, lat)



    return hav

My value grabber fucntion uses the frame returned by hav_checker to return the mean value from my target column (median_income).

For reference, I am using the California housing dataset to build this out.

def value_grabber(row, frame, threshold, target_col):



    frame = frame.copy()



    frame['hav'] = frame.apply(hav_checker, lon = row['longitude'], lat = row['latitude'], axis=1)



    mean_tar = frame.loc[frame.loc[:,'hav'] <= threshold, target_col].mean()



    return mean_tar

I am trying to return these 3 columns to my original dataframe for a feature engineering project within a larger class project.

df['MedianIncomeWithin3KM'] = df.apply(value_grabber, frame=df, threshold=3, target_col='median_income', axis=1)



df['MedianIncomeWithin1KM'] = df.apply(value_grabber, frame=df, threshold=1, target_col='median_income', axis=1)



df['MedianIncomeWithinHalfKM'] = df.apply(value_grabber, frame=df, threshold=.5, target_col='median_income', axis=1)

I have been able to successfully do this with looping but it is extremely time intensive and need a faster solution.

asked 1 hour ago

krewsayder

New contributor

add a comment |

The Haversine formula, I found on here. I am using it to calculate the distance between 2 points using lat/lon.

from math import radians, cos, sin, asin, sqrt



def haversine(lon1, lat1, lon2, lat2):



    #Calculate the great circle distance between two points 

    #on the earth (specified in decimal degrees)



    # convert decimal degrees to radians 

    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])



    # haversine formula 

    dlon = lon2 - lon1 

    dlat = lat2 - lat1 

    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2

    c = 2 * asin(sqrt(a)) 

    r = 6371 # Radius of earth in kilometers. Use 3956 for miles

    return c * r

My hav_checker function will check the distance of the current row against all other rows returning a dataframe with the haversine distance in a column.

def hav_checker(row, lon, lat):



    hav = haversine(row['longitude'], row['latitude'], lon, lat)



    return hav

My value grabber fucntion uses the frame returned by hav_checker to return the mean value from my target column (median_income).

For reference, I am using the California housing dataset to build this out.

def value_grabber(row, frame, threshold, target_col):



    frame = frame.copy()



    frame['hav'] = frame.apply(hav_checker, lon = row['longitude'], lat = row['latitude'], axis=1)



    mean_tar = frame.loc[frame.loc[:,'hav'] <= threshold, target_col].mean()



    return mean_tar

I am trying to return these 3 columns to my original dataframe for a feature engineering project within a larger class project.

df['MedianIncomeWithin3KM'] = df.apply(value_grabber, frame=df, threshold=3, target_col='median_income', axis=1)



df['MedianIncomeWithin1KM'] = df.apply(value_grabber, frame=df, threshold=1, target_col='median_income', axis=1)



df['MedianIncomeWithinHalfKM'] = df.apply(value_grabber, frame=df, threshold=.5, target_col='median_income', axis=1)

I have been able to successfully do this with looping but it is extremely time intensive and need a faster solution.

asked 1 hour ago

krewsayder

New contributor

The Haversine formula, I found on here. I am using it to calculate the distance between 2 points using lat/lon.

from math import radians, cos, sin, asin, sqrt



def haversine(lon1, lat1, lon2, lat2):



    #Calculate the great circle distance between two points 

    #on the earth (specified in decimal degrees)



    # convert decimal degrees to radians 

    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])



    # haversine formula 

    dlon = lon2 - lon1 

    dlat = lat2 - lat1 

    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2

    c = 2 * asin(sqrt(a)) 

    r = 6371 # Radius of earth in kilometers. Use 3956 for miles

    return c * r

My hav_checker function will check the distance of the current row against all other rows returning a dataframe with the haversine distance in a column.

def hav_checker(row, lon, lat):



    hav = haversine(row['longitude'], row['latitude'], lon, lat)



    return hav

My value grabber fucntion uses the frame returned by hav_checker to return the mean value from my target column (median_income).

For reference, I am using the California housing dataset to build this out.

def value_grabber(row, frame, threshold, target_col):



    frame = frame.copy()



    frame['hav'] = frame.apply(hav_checker, lon = row['longitude'], lat = row['latitude'], axis=1)



    mean_tar = frame.loc[frame.loc[:,'hav'] <= threshold, target_col].mean()



    return mean_tar

I am trying to return these 3 columns to my original dataframe for a feature engineering project within a larger class project.

df['MedianIncomeWithin3KM'] = df.apply(value_grabber, frame=df, threshold=3, target_col='median_income', axis=1)



df['MedianIncomeWithin1KM'] = df.apply(value_grabber, frame=df, threshold=1, target_col='median_income', axis=1)



df['MedianIncomeWithinHalfKM'] = df.apply(value_grabber, frame=df, threshold=.5, target_col='median_income', axis=1)

I have been able to successfully do this with looping but it is extremely time intensive and need a faster solution.

python numpy geospatial

asked 1 hour ago

krewsayder

New contributor

asked 1 hour ago

krewsayder

New contributor

asked 1 hour ago

krewsayder

New contributor

asked 1 hour ago

krewsayder

asked 1 hour ago

krewsayder

New contributor

krewsayder is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

krewsayder is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f217557%2faggregate-pandas-columns-on-geospacial-distance%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

krewsayder is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

krewsayder is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

vCNTUUJxFJE0kjxsz4J 9g2yMp0CxHOj4yzBV wRFF8bqMHL4Omg vn5yPBrFdbcrm2ZI,mgQkrV,5 6yX

搜尋此網誌

Gfrktyl