Data & Methods

Data & Main methods

Data

The datasets selected for this project include the following:

  • Crowdfunding datasets including cancer-related campaigns from GoFundMe (2016)
  • 2016 Census Profile for aggregate dissemination areas
  • Aggregate dissemination area (ADA) boundaries
  • Forward sortation area (FSA) boundaries

Variables Selected from the 2016 Census Profile

Variable Data Obtained
Population and dwelling counts Population, 2016
Income
(Total Sex / Total)
Median after-tax income in 2015 among recipients ($)
Education
(Total Sex / Total)
Number of individuals aged 15 years and over with a postsecondary certificate, diploma or degree
Housing
(Total Sex / Total)
Number of Home Owners
Note: Values were obtained for each aggregate dissemination area

Methods

Weighted Socioeconomic Data Conversion and Creation of Socioeconomic Quintiles

Quintiles were created using ArcGIS 10.5 using the natural breaks classification. Natural breaks were used after concluding with project liaisons that this method would be ideal in highlighting distinct or statistically significant differences in the data (V. Crooks & N. Schuurman, personal communication, March 15, 2018). This allowed for visual comparisons to be done at regional and provincial levels based on our chosen income, education, and housing ownership measures.

Exploratory Spatial Data Analysis

The exploratory nature of our work permits the utilization of exploratory spatial data analysis (ESDA). The genesis of the medical crowdfunding dataset provides an avenue to formulate questions and to propose future research trajectories. ESDA facilitates the investigation of prior assumptions and guides the identification of spatial patterns (Bailey & Gatrell, 1995; Haining, Wise, & Ma, 1998). Utilizing graphic displays of spatially-referenced data enables generation of informed hypotheses and assists selection of relevant statistical methods (Bailey & Gatrell, 1995). Through cartographic and tabular representations, discovery of potential attribute correlations existing within the available dataset enables more meaningful future analyses (Haining et al., 1998).

Frequent Text Mining

To conduct the frequent text mining analyses, it was determined to extract frequent terms from the campaign titles and descriptions separately. The following steps were conducted on the campaign titles and descriptions separately: 1) create corpora of terms included in each entry of the dataset, 2) remove stop-words, punctuation, whitespace, numbers, and common names from the text, 3) create term-document matrix with the cleaned results, 4) remove sparse terms, and 5) extract the frequently occurring words. Results of running the frequent text mining procedure on the campaign titles and descriptions were then combined with the FSAs. For each row in the resulting matrix, we can extract which frequent terms appeared per campaign and how many times each of these terms appeared. Since multiple campaigns can exist for each FSA, results were aggregated to generate a dataset of frequent terms per FSA. The top 3 terms per FSA were extracted and appended to each FSA entry. These results were geocoded using the Google Maps API and then stored in a GeoJSON format for display in the interactive web mapping tool. Users are able to view the most frequent term per FSA, and upon clicking the text, they are able to view the top 3 terms for each FSA.

Building a Geovisualization to Support ESDA and Frequent Text Visualization

An interactive 2D web mapping application was developed using the Mapbox GL Javascript API to support ESDA tasks and frequent text visualization per FSA. Crowdfunding campaign markers were depicted and afford interaction by clicking and through a searchable list accompanying the cartographic representation.

Project Process

Received Crowdfunding Dataset

Received the inaugral medical crowdfunding dataset containing Canadian cancer-related campaigns from GoFundMe.

Data Selection

Selected Forward Sortation Areas (FSAs), Aggregated Dissemination Areas (ADAs), and 2016 Canadian Cenus data (including income, housing ownership, and education).

Data Pre-Processing

Campaign records were cleaned of unknown characters.

Data Transformation

Campaign records were geocoded with the Google Maps API and CSVs were converted to GeoJSON format for display in the interactive web mapping application.

Data Transformation

Socioeconomic attributes selected previously were joined with ADAs.

Data Transformation

Due to the unavailability of FSA Census Profiles from Stats Canada during this project, we developed a tool that computed weighted socioeconomic data from Cenus data available at an ADA level. The percent overlap of each FSA and ADA was considered in assigning weighted socioeconomic data available at an ADA level to the FSA level.

Pattern Extraction

A frequent text mining method available from the "tm" package in R was utilized to extract frequent terms from campaign titles and descriptions.

Socioeconomic Data Quintiles

Tables were created to display quintiles for campaign frequency, income, education, and housing. These also displayed the number of FSAs and campaigns belonging to each quintile for the respective attributes.

Visualizing Frequent Terms

Word clouds and bar graph outputs were produced to explore frequent terms. Frequent terms were also counted for each FSA and displayed to create a "Geographic Wordle" with Mapbox.

Building a Geovisualization to Support Exploration & Analyses

Frequent terms, campaign markers, hospital locations, and socioeconomic layers were displayed in an interactive web mapping application to support exploratory spatial data analysis.

Interpretation & Evaluation

Geographic distributions of campaigns are assessed and evaluated in the context of the socioeconomic data layers and frequent text mining results.

Knowledge Gain & Future Research Directions

New information about Canadian medical crowdfunding and future research trajectories are discovered.

End of Project Process

TOP