Student Assignment – Audits Data Analysis
Overview and Rationale
Being able to ask appropriate questions of data is an important part of the work of data analytics. It is also critical to be able to interpret the results of the analysis. This assignment is intended to familiarize you with the Audits data set and to get you thinking about key business questions you can answer from this data.
This assignment is directly linked to the following learning outcomes from the course syllabus:
· Visualize data in a compelling way to enable data driven storytelling.
· Conduct basic analytic tasks programmatically using the R language*
* R is not required for this assignment, but you may choose to use it if you feel confident in your R skills
There are two Excel sheets with data – one concerning “Good Working Practice” audits (GWP_Audits_Data) and one concerning “Computerized System Quality Assurance” audits (CSQA_Audits_Data). Please see the accompanying Data Dictionary to understand the fields and values.
You may use any software to perform the analyses specified below. Collaboration is encouraged, but you must not submit identical assignments.
The assignment has three parts. In the Appendix of this assignment, you are provided an example of how the questions in Part I should be answered.
Please review the Vertex Data Dictionary document as you review the Excel datasheets.
In order to understand the data we first need to run some descriptives on the data set. For both the GxP Audits and the CSQA Audits sheets, we want to look at the following variables:
· Audit Status
· In USA or OUS
· GxP Area
· Audit Type
· Audit Method
· Proposed Quarter
Start by providing the following for each variable:
1. A table that provides the frequency and percent of each value
2. A graphic representation of the count of each value
3. A graphic representation of the percent of each value
4. What business question do your descriptive analyses answer? Provide a brief discussion of the findings. If there are any unusual values, discuss them. Are these values “out of range”? If so, the data cleaning is not complete. Delete the out of range values and run the analysis again. If this is the case for any of the variables, present both the analysis with the out of range values and the analysis with the deleted out of range value.
Please first present your findings for the 2017 GxP Audits data and then the findings for the 2017 CSQA Audits data.
Note: Appendix 1 is only an example and you must complete your own analysis.
For each worksheet, compute the number of days lapsed between:
1. “Date of Intake” and “Date Q Sent”. Name that variable “Days_Intake_QSent”
2. “Date Q Sent” and “Date Q Received”. Name that variable “Days_QSent_QReceived. Based on the name of the variables, what do you think that variable means? Does it apply to all audits? Why?
3. “Date On Site Scheduled” and “Audit Start Date”. Name that variable “Days_OnSiteScheduled_AuditStartDate. Does this variable apply to all audits? Why?
4. “Audit Start Date and “Audit End Date”. Name that variable “Days_StartDate_EndDate”.
5. “Audit End Date” and “Date Final Report Due”. Name that variable “Days_AuditEnd_FinalReportDue”
6. “Date Final Report Due” and “Date of Completion”. Name that variable “Days_FinalReportDue_CompletionDate”
Then, compute the mean and median for each of the 6 variables you have computed.
Would you recommend merging the sheets “2017 GxP Audits” and “2017 CSQA” Audits? Why or why not?