How to organize a wedding with R: google place API, google drive, web scraping and automatic emails

Organizing a wedding is … challenging but as Rusers we do have a major asset! One of the most challenging part is to find a venue. Indeed there are a lot of them but some are already booked for your date. So, in order to check if a venue is already booked I’ll show you how I made a list of possible venues with google search API, stored the list on google drive, web scrap for their emails and send them with R.

Setup

# store passwords
library(config)
# data wrangling
library(plyr)
library(tidyverse)
library(purrr)
library(glue)
# google APIs
library(googleway)
library(googledrive)
# webscraping
library(rvest)
# send emails
library(mailR)
library(XML)
library(RCurl)
# html widgets
library(DT)
library(leaflet)
knitr::opts_chunk$set(cache.extra = knitr::rand_seed, message = FALSE,warning = FALSE, error = FALSE)
set.seed(123)# Seed for random number generation
options(scipen=999)# Disable scientific number format
gmail_wedding <- config::get("gmail_wedding")
google_key <- config::get("google_cloud")

List of Venues with google place API

Because almost everything is possible in R thanks to our awesome community, I was thinking of getting a list of venues from google. And thankfully there is a package to do that called googleway. Google API has many different services related to geocoding such as Direction API, Geolocalisation API or Place API which I’m using to get venues from search key words. To use it you just need to register on google cloud your Credit/Debit Card to obtain an API key, but no worries if you use the service in a gentle way, that won’t cost you anything. I found a very useful answer from stack overflow to use the googleway package.

List of targeted cities

I also don’t want to organize my wedding everywhere in France, I’d it to be in the Auvergne-Rhone-Alpes region which is a lovely area. So I wasn’t sure that by using the key word “Auvergne-Rhone-Alpes” I’ll find all the venues I wanted, so I decided to loop the search on a list of cities in this region based on their department numbers.

dept_target <- c(01,07,26,38,69,73,74)
#
list_city <- read.csv(
  file = url("https://sql.sh/ressources/sql-villes-france/villes_france.csv"),
  header = FALSE) %>%
  dplyr::select(dept = V2, city = V5, pop2010 = V15) %>%
  dplyr::mutate(city = as.character(city)) %>%
  dplyr::filter(dept %in% dept_target) %>% # filter by target departments
  dplyr::filter(pop2010 > 5000) %>% # filter by city population size
  magrittr::use_series(city)

Querying google place API

Once the list of cities obtained, I made a loop to query Google place for these cities. A tricky part is to get the search results from other pages. If a “next_page_token” is found, an while statement is querying for this new page. If no result is found the loop goes to the next city.

df_places_final <- NULL
for(city in list_city){
  
  #print(city)
  
  df_places <- googleway::google_places(
    search_string = paste("mariage", city, "france"), 
    key = google_key$key) # replace by your Google API key
  
  if(length(df_places$results) == 0) next
  
  df_places_results <- df_places$results
  geometry <- df_places_results$geometry$location
  df_places_results <- df_places_results[,c("name","formatted_address","place_id","types")]
  df_places_results <- cbind(df_places_results,geometry)
  
  while (!is.null(df_places$next_page_token)) {
    df_places <- googleway::google_places(
      search_string = paste("mariage", city, "france"),
      page_token = df_places$next_page_token,
      key = google_key$key)
    
    df_places_next <- df_places$results
    
    if(length(df_places_next)>0){
      geometry <- df_places_next$geometry$location
      df_places_next <- df_places_next[,c("name","formatted_address","place_id","types")]
      df_places_next <- cbind(df_places_next,geometry)
      df_places_results <- rbind(df_places_results, df_places_next)
    }
    Sys.sleep(2) # time to not overload  the google API
  }
  df_places_final <- rbind(df_places_final,df_places_results)
}

Obviously in the search results we obtain not only wedding venues but also photographers, caterers, and wedding shops. So an easy solution is to filter by name to find “castle”, “mansions” and “domains”. It should be noticed that there are some duplicated values to be filtered as well.

df_places_filtered <- df_places_final %>%
  dplyr::filter(grepl('castle|chateau|domaine|manoir|ferme', name,ignore.case = TRUE)) %>%
  dplyr::distinct(place_id, .keep_all = TRUE)

We can have an overview of the localisation of the venues thanks to the leaflet package.

leaflet() %>%
  addTiles() %>%  # Add default OpenStreetMap map tiles
  addMarkers(lng=df_places_filtered$lng, lat=df_places_filtered$lat, popup=df_places_filtered$name)

Obtaining venues’ website

googleway::google_place() is great to find places with their addresses, GPS coordinates and types but the first loop doesn’t provide their website URLs. In order to get them we have to query the Google place API using venues “places_id” with googleway::google_place_details() by applying a small function to with purrr.

get_website <- function(place_id){
  #print(place_id)
  place_id <- as.character(place_id)
  dat <- googleway::google_place_details(place_id = place_id, key = google_key$key)
  res <- ifelse(is.null(dat$result$website),"no_website",dat$result$website)
  return(res)
}

website_list <- df_places_filtered$place_id %>%
  purrr::map(get_website) %>%
  unlist()
df_places_filtered$website <- website_list

For the next stages of the process I’m going to remove the venues without website URL but if like me you are organizing your wedding I suggest to manually contact them. Most of the URL are clean but sometimes some errors are possible so it is possible to remove them with a gsub() for example. I’m creating a new variable called “website_contact” which will be used as well for web scraping.

df_places_filtered <- df_places_filtered %>%
  dplyr::filter(website != "no_website") %>%
  dplyr::mutate(website = gsub("\\,.*","",website)) %>%
  dplyr::mutate(website = gsub("com/fr","com",website)) %>%
  dplyr::mutate(website_contact = paste0(website,"contact"))

The list of venues is now “clean” we can start the web scraping to obtain venues’ emails.

Website scraping for emails

I already said that Google place is great but as far as I know it doesn’ provides venues emails. However we won’t stop here and R is providing some excellent tool like rvest package in order to get information for the web. Thankfully, venues websites made their emails very easy to get on their home page or on their contact page so the idea is to web scrap these pages to see if we can find venues emails in a very short function. The function contains trycatch to check the URL before scraping for emails.

extract_email <- function(website){
  #print(website)
  url_test <- tryCatch(xml2::read_html(website), error=function(e) print("url_error"))
  if(url_test == "url_error"){
    return(NA)
  } else {
    text_web <- xml2::read_html(website)%>%
      rvest::html_text()
    email_text <- unlist(regmatches(text_web, gregexpr("([_a-z0-9-]+(\\.[_a-z0-9-]+)*@[a-z0-9-]+(\\.[a-z0-9-]+)*(\\.[a-z]{2,4}))", text_web)))
    email_text <- gsub("\n","",email_text)
    email_text <- gsub(" ","",email_text)
    return(email_text[1])
  }
}
# web scraping home page
email_list <- df_places_filtered$website %>%
  purrr::map(extract_email) %>%
  unlist()
## [1] "url_error"
## [1] "url_error"
df_places_filtered$email <- email_list
# web scraping contact page
email_list <- df_places_filtered$website_contact %>%
  purrr::map(extract_email) %>%
  unlist()
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
## [1] "url_error"
df_places_filtered$email_contact <- email_list
# merge email and email_contact
df_places_filtered <- df_places_filtered %>%
  dplyr::mutate(email = ifelse(is.na(email),email_contact,email)) %>%
  dplyr::filter(!is.na(email)) %>%
  dplyr::select(-email_contact, -types)
df_places_filtered %>%
  dplyr::select(name, website) %>%
  DT::datatable(options = list(pageLength = 5))

Excellent we obtained a list of some venues to email!

google drive and automatic emails

A classic advice for wedding planning is to setup a specific email only dedicated to this task. One advantage of Google is not only very easy email setup but also the access to a Google drive to store your documents in order to keep track. Of course it’s possible to store it locally but I found Google drive nice for sharing with your partner.

Once it’s done, Google drive documents are easily accessed with the package googledrive (see https://googledrive.tidyverse.org/index.html for some information about googledrive).

Upload list of venues to google drive

the workflow of googledrive is quite specific, we must save the data frame locally first and then upload the file.

# first save the list of venues local
write.csv(df_places_filtered, "list_venues.csv",row.names = FALSE)
# upload to google drive
drive_upload(media = "list_venues.csv", name = "list_venues",type = "spreadsheet")

Download list of venues from google drive

Once it is done, we have to download the file locally and to read it again.

# select file id from google drive
list_venues_id <- drive_find() %>%
  dplyr::filter(name == "list_venues") %>%
  magrittr::use_series(id)
# download list of venues locally
drive_download(as_id(list_venues_id),overwrite = TRUE,  type = "csv")
# read local list of venues file
list_venues <- read.csv("list_venues.csv",row.names = NULL) %>%
  dplyr::mutate_if(is.factor,as.character)

Select email to be send

Now the list of venues is stored in Google drive it’s time to send our emails with R. Because it is easier for me, I’ve set up another for loop (yes it’s not great but very re insuring at least). For each row of the data frame we are going to extract the venue name and email and send the same text ask for availability at a certain date.

An important thing to be able to send emails from R is to allow less secure app: Yes in gmail settings.

email_to_send <- list_venues
#
# Email to send
email_text <- "<p>Dear owner/manager of '{name}', <br><br>We are contacting you because we would like to organise our wedding <b>Sunday 9 of June 2019</b> and your plac would be amazing for it.<br><br>That's why we would like to know if your venue '{name}' is available <b>Sunday 9 of June 2019</b>?</b><br><br>Best regards,<br><br>YOUR NAMES</p>"
#
for(i in 1:nrow(email_to_send)){
  df <- email_to_send[i,]
  name <- as.character(df$name)
  ################################
  send.mail(from = gmail_wedding$email,
            to = as.character(df$email),
            subject = "Availability for a wedding on the 09/06/2019",
            body = glue::glue(email_text),
            smtp = list(host.name = "smtp.gmail.com", port = 465, 
                        user.name = gmail_wedding$email,            
                        passwd = gmail_wedding$passwd, ssl = TRUE),
            authenticate = TRUE,
            send = TRUE,
            html = TRUE)
}

You can have a look at the email send in your mail box in order to check that the process worked.

Then, the final stage it to update the list of venue with the contact date in order to not send an email twice.

email_to_send <- email_to_send %>%
  dplyr::mutate(date_contact = as.character(as.Date(Sys.Date()))) %>%
  dplyr::mutate(type_contact = "automatic email")
# Checks in case of different batch of email sending
id <- match(list_venues$name, email_to_send$name, nomatch = 0L)
list_venues$date_contact[id != 0] <- email_to_send$date_contact[id]
list_venues$type_contact[id != 0] <- email_to_send$type_contact[id]
# Write data on local and Upload data from local to google drive
write.csv(list_venues,"ist_venues.csv",row.names = FALSE)
drive_update(file = "list_venues", media = "list_venues.csv")

I hope these scripts will help you in finding the best place for your wedding. And good luck for the organisation!

Damien Dupré
Damien Dupré
Assistant Professor of Business Research Methods

My research interests relies on time-series analyses of psychological and physiological measures.