|Population and dwelling counts||Population, 2016|
(Total Sex / Total)
|Median after-tax income in 2015 among recipients ($)|
(Total Sex / Total)
|Number of individuals aged 15 years and over with a postsecondary certificate, diploma or degree|
(Total Sex / Total)
|Number of Home Owners|
Quintiles were created using ArcGIS 10.5 using the natural breaks classification. Natural breaks were used after concluding with project liaisons that this method would be ideal in highlighting distinct or statistically significant differences in the data (V. Crooks & N. Schuurman, personal communication, March 15, 2018). This allowed for visual comparisons to be done at regional and provincial levels based on our chosen income, education, and housing ownership measures.
The exploratory nature of our work permits the utilization of exploratory spatial data analysis (ESDA). The genesis of the medical crowdfunding dataset provides an avenue to formulate questions and to propose future research trajectories. ESDA facilitates the investigation of prior assumptions and guides the identification of spatial patterns (Bailey & Gatrell, 1995; Haining, Wise, & Ma, 1998). Utilizing graphic displays of spatially-referenced data enables generation of informed hypotheses and assists selection of relevant statistical methods (Bailey & Gatrell, 1995). Through cartographic and tabular representations, discovery of potential attribute correlations existing within the available dataset enables more meaningful future analyses (Haining et al., 1998).
To conduct the frequent text mining analyses, it was determined to extract frequent terms from the campaign titles and descriptions separately. The following steps were conducted on the campaign titles and descriptions separately: 1) create corpora of terms included in each entry of the dataset, 2) remove stop-words, punctuation, whitespace, numbers, and common names from the text, 3) create term-document matrix with the cleaned results, 4) remove sparse terms, and 5) extract the frequently occurring words. Results of running the frequent text mining procedure on the campaign titles and descriptions were then combined with the FSAs. For each row in the resulting matrix, we can extract which frequent terms appeared per campaign and how many times each of these terms appeared. Since multiple campaigns can exist for each FSA, results were aggregated to generate a dataset of frequent terms per FSA. The top 3 terms per FSA were extracted and appended to each FSA entry. These results were geocoded using the Google Maps API and then stored in a GeoJSON format for display in the interactive web mapping tool. Users are able to view the most frequent term per FSA, and upon clicking the text, they are able to view the top 3 terms for each FSA.
Received the inaugral medical crowdfunding dataset containing Canadian cancer-related campaigns from GoFundMe.
Selected Forward Sortation Areas (FSAs), Aggregated Dissemination Areas (ADAs), and 2016 Canadian Cenus data (including income, housing ownership, and education).
Campaign records were cleaned of unknown characters.
Campaign records were geocoded with the Google Maps API and CSVs were converted to GeoJSON format for display in the interactive web mapping application.
Socioeconomic attributes selected previously were joined with ADAs.
Due to the unavailability of FSA Census Profiles from Stats Canada during this project, we developed a tool that computed weighted socioeconomic data from Cenus data available at an ADA level. The percent overlap of each FSA and ADA was considered in assigning weighted socioeconomic data available at an ADA level to the FSA level.
A frequent text mining method available from the "tm" package in R was utilized to extract frequent terms from campaign titles and descriptions.
Tables were created to display quintiles for campaign frequency, income, education, and housing. These also displayed the number of FSAs and campaigns belonging to each quintile for the respective attributes.
Word clouds and bar graph outputs were produced to explore frequent terms. Frequent terms were also counted for each FSA and displayed to create a "Geographic Wordle" with Mapbox.
Frequent terms, campaign markers, hospital locations, and socioeconomic layers were displayed in an interactive web mapping application to support exploratory spatial data analysis.
Geographic distributions of campaigns are assessed and evaluated in the context of the socioeconomic data layers and frequent text mining results.
New information about Canadian medical crowdfunding and future research trajectories are discovered.