msteinideas

Data for Good

Data for Good is a way for people to contribute to the betterment of society using their expertise in data science (DataforGood). Datathons are one way to reach these goals. In a Datathon, teams work on a problem provided by a client (often an NGO), using data provided by the client and Data for Good. The time for the Datathon is limited (e.g. a Saturday) and at the end, each team presents their solution to the participants and the client.

My Activities
Preparation for Datathon - Data Cleaning:
• Analyzing data sets.
• Ensuring that formatting is correct and consistent.
• Drop rows and columns that contain no information or contain superfluous information.
• Gather additional information as discussed by the team.
• Data cleaning is mostly done with Python (Pandas), additional information is gathered with online research and web scraping.
Participation in Datathons:
• Review of available data and the client’s objectives with the team.
• Review available data sets and select data for further analysis.
• Analyze data and put a presentation together.
• Present final results to clients and the other teams.
• The data is usually provided as *.csv file.
• For data selection and analysis Python and/or Tableau are used.

Example

Identify opportunities for maximizing participation and fundraising for an annual walk, the main fund raiser forKids Help Phone.

Large data sets for participants and donors were analyzed with and transformed with Python. Further analysis and visualization was then done with Tableau. The presentation included:
• Geographic distribution of participants and donors.
• Identification of ‘virtual walkers’ (walkers that do not join an official walk, but participate in the campaign and do a walk on their own or with friends) as an important participant group.
• Identification of a data-collection problem – donations to walking teams are not tracked correctly, which makes it impossible to determine the amounts of money raised by walkers participating in these teams.
• Identification of repeating donors as especially valuable donor groups (donors contributing every year tend to increase their donations each year).

Several small programs were written in Python (using Jupyter notebook). This was done for several reasons:
• Due to the large size of the data sets and the time required for some operations, re-running a notebook (which is required for a variety of reasons) needed to be minimized.
• Other team members could already start working on the prepared files while further analysis was done.
• The tight schedule did not leave any time for code optimization.

Relationship between the Code (blue - each circle is one notebook) and Data (green - each circle is one *.csv file) for the project.

All data as well as the solutions developed during the Datathons are proprietary. Therefore the code and a detailed presentation of results are not available.