Chapter 3

 

Home
Index
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
References
Appendix 13
Appendix 14
Appendix 15
Appendix 16
Appendix 17
Contact

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Chapter 3 – Focus 2002 & GSHE data overview, selection and pre-treatment

 3.1 - Introduction

Prior to the application of the statistical analysis techniques described above to the Focus 2002 and GSHE data, it is necessary to deal with questions 6 and 7 (Table 1.1).  In addition to answering the above research questions, the purpose of this chapter is to:   

Justify and record the decision as to which AstraZeneca nations and sites are to be included in the scope of this project.
Explain the pre-treatment that has been applied to the Focus 2002 and GSHE data prior to the analysis detailed within Chapters 4 and 5.
Provide a better understand of the underlying structure of the Focus 2002 response data. 

3.2 – The AstraZeneca Focus 2002 survey and its ability to measure organisational culture

 

Section 2.3 outlined the work of Flin et al [60] that analysed the themes contained within eighteen publicly available industrial-based safety climate surveys.  They concluded that the range of culture dimensions surveyed could be distilled into five themes, namely, ‘Management/Supervision’, ‘Safety system’, ‘Risk’, ‘Work pressure’ and ‘Competence’.  A subjective comparison of the Focus 2002 survey questions against these ‘thematic factors’ was performed.  Because of the subjective nature of the above comparison, it is recognised that others performing the comparison may obtain different results.  The results are detailed in Table 3.1. 

 

Table 3.1 indicates that the AstraZeneca Focus 2002 survey contains attitudinal questions that address all of the Flin et al [60] thematic factors.   The comparison therefore suggests that the responses to the attitudinal questions contained within the Focus 2002 survey may be able to measure AstraZeneca’s organisational culture. 

 

The Focus 2002 survey addresses four themes not covered by Flin et al [60] namely, My Job, Team, Our Company, and Pay and Benefits.   These additional attitudinal questions provide an opportunity to explore if any of these factors influence, or are related to, SHE performance.  Exploring those cultural factors not directly associated with safety would also address the recommendations of Coyle et al [31] .  The subjective comparison addresses research question 6 listed in Table 1.1.

 

The information contained within the AstraZeneca Focus 2002 survey will allow investigations into the psychological ‘safety climate’ element of Cooper’s [26] model of safety culture.  However, the ‘behavioural’ and ‘safety management system’ elements of his model are not addressed within the Focus 2002 survey.  Sources of behavioural elements can be inferred from a number of sources currently available within AstraZeneca; these include:

 

·        The percentage of Focus survey questionnaires returned;

·        Financial under- or over-spends;

·        The findings of behavioural based safety audits;

·        Sickness rates;

·        Perceived degree of over- or under-reporting of accidents, incidents etc.

 

Information pertaining to ‘safety management system’ elements may also be inferred by SHE and other management systems audit findings. 

 

Although it is practicable to obtain ‘behavioural’ and ‘safety management’ information for AstraZeneca, it was decided to exclude them from the scope of Chapters 4 and 5.  The principal reason for exclusion is a desire to establish a base ‘safety climate’ model with the minimum number of confounding variables and factors.  

 

Flin et al’s [60]2]   Five Cultural Themes

AstraZeneca Focus 2002 Factors

 

SHE

Leadership

Communication And Feedback

Diversity

Innovation

Learning & Development

My Job

Team

Our Company

Pay & Benefits

Work-Life Balance

Management /Supervision

41,48

9a, 9b,20a,20b,

20c,20d,22,57,58

3,11,16,32,38a,

38b,38c,53a,

53b,53c

6,18,29,

37,42

10a,10c,10e,

10f

 

2,13,21,44

1,12

 

40,59

4,15,27a,

27b,27c,54

Safety System

17

 

 

 

 

 

 

 

 

 

 

Risk

5,17,28,

36,41,48

 

 

 

 

 

 

 

45

 

 

Work Pressure

17

 

32

 

10a, 10b

 

44

 

 

 

 

Competence

 

 

 

 

 

14,24,26,35,

39,47

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Focus 2002 Questions Not Covered By Flin et al Thematic Factors

None

None

None

None

10d

None

7,31,49,52

8a,8b,8c, 23,33,

46,50,

10g,19,

30a, 30b, 43,51,60

25,34,55a,

55b

None

 

Table 3.1 – Comparison of the Focus 2002 survey factors with the Flin et al [60] safety thematic factors

 

 

Note – A blank cell within Table 3.1 indicates no match between the Focus 2002 and the Flin et al’s factors   

3.3 – Definition of project scope and metrics

3.3.1 - Introduction  

The purpose of this section is to record the rationale behind the selection of the AstraZeneca sites and the GSHE lagging SHE performance indicators to be included within the scope of Chapters 4 and 5.   

3.3.2 - Selection of nations to be included within the project scope

 It was considered desirable to examine the data from more than one country to allow investigation into how organisational culture and its relationship to lagging SHE performance indicators varies between nations.  The scope of this research was restricted to include data only from the United Kingdom (UK), Sweden (SE) and United States (US) for the following reasons:

·        There is a large data set from each of the three countries and collectively they represent 57% of the Focus 2002 response data.

·        The SHE statistics reported by these countries is known to be robust, as demonstrated by local and corporate audit data (There is less confidence in the accuracy of data from some other territories).

·        There are distinct national cultures in the three countries.

·        Each territory supports the full range of AstraZeneca activities (for example, manufacturing, research and development, marketing) and is therefore representative of the Company as a whole.

 Appendix 7 details the number of respondents for each of the 10 UK, 11 SE and 6 US sites selected.  

 

3.3.3 - Selection of lagging SHE performance indicators 

 The desirable features of lagging SHE performance indicators are discussed in Section 2.5.  Appendix 3 details the reported 2002 SHE performance indicators for all of the sites and functions in the three countries for which data are available from the 2002 Focus 2002 survey.  The following observations and comments regarding the 2002 GSHE lagging SHE performance indicators are made.   

AstraZeneca minor injuries:  The use of minor injury rates is not favoured because, as explained in Section 2.5, they are likely to be under-reported compared with significant injuries.  This view is reinforced by the fact that only 10 of the 27 sites reported minor injuries.  This is in contrast to significant injuries where 25 of the 27 sites reported at least one significant injury.    

Additionally, where data are available, the expected ratio of serious to minor injuries of 1:10 [10] is not seen.  This brings into question the robustness of the number of reported minor injuries.  There is more confidence in the reporting data for serious injuries because there is generally also a legal requirement to report.  The minor injury data are therefore not utilised in this project.   

Contractor minor and significant injuries:  The use of contractor injury information was rejected because only AstraZeneca personnel took part in the Focus 2002 survey.  Although it is foreseeable that the actions and omissions of contractors may affect AstraZeneca personnel, the relationship to AstraZeneca’s organisational culture cannot be assessed using the available data.  It is, however, interesting to note that the ratio between reported minor and significant injuries mirrors the AstraZeneca data.  This again brings into question the validity of the number of minor injuries reported. 

Non-Injury Information:  These data are rejected because, for any given GSHE reported criteria, the majority of the sites reported a zero value during 2002.  

AstraZeneca Significant Injuries:  Significant injury reports appear to be the most useful lagging SHE performance indicator because:

There is often an associated legal requirement to report these injuries.
Significant injuries normally involve medical treatment and are hence more visible.
All but two of the sites reported at least one significant injury.
The number of significant injuries reported shows a large degree of variation between sites.

 The significant injury data are therefore chosen as the preferred lagging SHE performance indicator to be included within the scope of Chapters 4 and 5.   

3.3.4 – Selection of Focus 2002 metrics

Section 2.8.2 provided an overview of the Focus 2002 data set.  A number of different metrics are able to describe the characteristics and distribution of the Focus 2002 data.  Metrics include the mean, standard deviations, kurtosis and skewness.  A comprehensive list and explanation of distribution metrics can be found in Price [133] and the internet glossary website Risk Glossary.Com [142] .   The categorical nature and limited range of possible responses to each Focus 2002 question need to be taken into consideration when choosing a metric to describe the data.  The Focus 2002 questions have a maximum of five and a minimum of two possible responses.  Distribution metrics such as standard deviation and kurtosis do not provide meaningful information for categorical data as the number of categories approaches two.  The usefulness of distribution metrics such as skewness and kurtosis to describe the Focus 2002 data with a maximum of five categories is questionable.  The choice of metric to describe the Focus 2002 data should ideally reflect the distribution of organisational culture and the hypothesised accident causation model within AstraZeneca.  Conceptually, two extreme cases are possible.  The first case is where the culture is uniform or homogenous and all personnel contribute, and are equally at risk, from a negative SHE outcome such as an injury.  The second case is where the organisational culture is heterogeneous and a minority of individuals cause the negative SHE outcomes.  The use of the arithmetic mean Focus 2002 question response will be most applicable to the first case.  Distribution metrics that are not dependent upon a normal distribution, such as Kurtosis, may be more appropriate for the second case.   The literature review in Chapter 2 identified that the mean response appears to be the only metric used in previous research attempting to correlate metrics of organisational culture with lagging SHE performance indicators [41, 58, 88, 154].   A subjective insight into the degree of AstraZeneca’s organisational cultural homogeneity may arguably be obtained by inspection of the distribution of Focus 2002 question responses.  Question response distributions with a low standard deviation may be indicative of a homogeneous culture.  Question responses that show more than one peak in the response distribution may be indicative of a heterogeneous organisational culture.  The separate peaks may be indicative of sub-cultures.  The presence of more than one response peak will change a number of distribution metrics.  Compared with a normally distributed response, a multi-peak response distribution will have a larger standard deviation and may have a lower kurtosis.  Information regarding the Focus 2002 response distributions is provided within Section 2.5.3.    The question ‘Are the majority of accidents caused by a minority of individuals, either being exposed to a higher level of risk or due to having an ‘undesirable’ culture?’ is potentially difficult to answer.  Information regarding the working environment and organisational culture would need to be obtained for those who are involved with undesirable SHE outcomes.  This information would then need to be compared with the same information obtained from those personnel who are not involved with accidents.  This information could then be used to identify to what extent culture and the degree of exposure to risk is related to undesirable SHE outcomes.  The identity of Focus 2002 participants was kept confidential.  Due to the confidentiality of the responses it is not possible to answer the above question as part of this project. 

 Chapters 4 and 5 are concerned with two principal objectives, namely: 

Establishment of the relationship between the Focus 2002 data and SHE lagging performance indicators, and;
Identification of those organisational factors that discriminate nations and sites with differing SHE performance.

It would appear that the mean site response to the Focus 2002 questions has the potential to address these research objectives.  The use of the mean does have limitations in that it is unable to provide an insight into the relationship between cultural heterogeneity and lagging SHE performance indicators.   Based upon the above arguments, the mean and standard deviation have been chosen as the metrics to be used to represent the Focus 2002 data in the subsequent analysis.  The mean is chosen because it represents the average site attitude of the respondents.  The standard deviation has been chosen because it represents the degree of spread of responses to a particular question and therefore, may be related to the degree of organisational heterogeneity. 

3.4 - Data pre-treatment 

3.4.1 - Introduction 

As received by the author, the Focus 2002 and GSHE lagging SHE performance indicator data were not in a suitable format to use directly.  This section summarises the pre-treatment that was applied to the data sets prior to the work detailed in Chapters 4 and 5. 

3.4.2 - Focus 2002 data pre-treatment

 The Focus 2002 database was obtained as a data file from AstraZeneca GSHE Department.  The database contained the 41,779 individual responses from all of the survey participants.  The file consisted of 41,779 rows by 139 columns of data.  Each row of the file contained the responses from individual survey participants.  The columns of the file recorded the information contained within the Focus 2002 questionnaire.  All of the demographic data associated with each respondent were coded.  For example, columns 1 to 7 inclusive of the data file related to the ‘sequence number’.  The sequence number was a unique respondent identifier.  Columns 8 and 9 related to a Global location code etc.  The data file was imported into Microsoft Excel 2000 (version 9.0.6926 – SP3) and saved as a workbook file.  During the file import the ‘fixed width - text input’ tool within Excel was used to merge appropriate columns of data together.  For example, columns 1 to 7 were merged to form one column of seven digit data within the Excel worksheet.  The data within the resultant Excel workbook were write-protected and saved for use in the analysis detailed in Chapters 4 and 5.  The robustness of the transformation was checked by visually cross checking several rows of the data file with the corresponding rows in the resultant Excel worksheet.  The transformation was found to be successful with no observed errors or omissions. 

 

3.4.3 - GSHE annual report data pre-treatment

Section 2.10.3 highlighted the importance of using accident rates rather than numbers of accidents when comparing the SHE performance of several sites.   The number of significant injuries occurring at each UK, SE and US site during 2002 was obtained directly from the GSHE 2002 annual site reports.  Equation 1 was adapted to reflect the number of hours AstraZeneca employees work per annum to give the significant-injury rate as per Dodsworth [46] .  The AstraZeneca injury frequency rate equation is given in Equation 5.

 

                                                   (5)

 This calculation was used to enable benchmarking with sites other than those in the UK, SE and US.  The 100,000 figure within Equation 5 represents the approximate number of hours a person will work during a lifetime of employment.  The 1450 figure within Equation 5 represents the number of contractual hours a person within AstraZeneca works in a year.  The AstraZeneca injury frequency rate therefore represents the approximate number of injuries an employee will be exposed to during his working life.  As defined within Equation 5, the injury frequency rate does not have any units.  Since the injury frequency rate does not have any units, it is not a true rate.   Equation 5 was entered into Excel 2000 (version 9.0.6926 – SP3) and used to calculate the significant-injury frequency rates for all UK, SE and US sites.  The resultant significant-injury frequency rates are reproduced in Appendix 7.

 

3.4.4 - Calculation of the mean Focus 2002 responses

The arithmetic mean responses to each of the Focus 2002 questions were calculated for each UK, SE and US site.  The calculations were performed using the ‘AVERAGE’ tool within Excel 2000 (version 9.0.6926 – SP3).  The arithmetic mean responses for the UK, SE and US sites are reproduced in Appendices 8.1 to 8.3.   

3.4.5 – Calculation of the standard deviation of the Focus 2002 responses

 The standard deviation of the Focus 2002 question responses was calculated for each UK, SE and US site using the ‘STDEV’ tool within Excel 2000 (version 9.0.6926-SP3).  The resultant standard deviations are reproduced in Appendices 8.4 to 8.6.   

3.5 - Understanding the structure of the Focus 2002 survey data

 3.5.1 - Introduction

Section 2.9.3.2 explained the PLS modelling strategy.  The first step in the PLS modelling strategy is to carry out a preliminary statistical analysis, which is performed for two reasons.  Firstly, it confirms that the data are suitable and sufficient for further analysis.  Secondly, it provides a priori knowledge regarding the underlying structure of the data that may in turn influence the selection and application of PLS, PLS-DA and SIMCA modelling parameters.   The purpose of this Section is to summarise the preliminary statistical analysis that was performed on the Focus 2002 question responses.  The following statistics were calculated:   

The percentage of missing responses within the Focus 2002 data.
The distribution of responses to the Focus 2002 questions for the UK, US and SE nations.
The (Pearson) correlations between the Focus 2002 mean responses for all UK, US and SE sites.
For each UK, US and SE site, the (Pearson) correlation coefficients between AstraZeneca significant-injury frequency rate and the Focus 2002 question response mean and standard deviations.

 

3.5.2 – Calculation of the incidence of missing responses within the Focus 2002 data

It is expected that survey respondents will not answer all of the questions asked.  Possible reasons for not answering a question include not understanding the question, not having sufficient time to complete the survey and the question not being applicable to the respondent.  If unknown to the researcher, or ignored, missing responses have the potential to significantly influence the results of any statistical analysis performed.  The incidence of missing question responses is important for two reasons.  Firstly, it can introduce statistical bias and secondly, it may indicate that a particular group or cluster of respondents have difficulty or cannot associate with the question being asked.  It is therefore essential to be aware of the incidence of missing responses prior to statistical analysis.   The incidence of missing responses for the UK, SE and US site data sets was calculated for each Focus 2002 question using the ‘workset – statistics’ option within SIMCA P+.  The SIMCA P+ output is reproduced in Appendix 9.  The average incidence of missing responses was found to be less than 0.5% for the UK and US and less than 0.7% for SE.  Questions 8b, 8c and 9a had the maximum percentage missing responses of 2.8, 2.9 and 1.2 respectively.  All other questions had percentage missing response of less than 1%.  Based upon these figures the author considers that all of the Focus 2002 questions are suitable for inclusion in the subsequent analysis. 

3.5.3 – Focus 2002 response distribution

Section 2.9.2.2 and 2.9.3.2 explained that both PCA and PLS are best able to model data when the data being modelled are approximately normally distributed.  The purpose of this section is to describe the work that was performed to understand the distribution of Focus 2002 question responses. The ‘DCOUNT’ function within Excel 2000 (version 9.0.6926-SP3) was used to count the number of times each Focus 2002 question response option was responded to in the UK, SE and US nations.  The resultant response count information was converted into a percentage response to allow inter-nation comparisons to be made.  The percentage response information for each question was imported into Microsoft PowerPoint 2000 (version 9.0.6620-SP3) and grouped histograms were drawn.  The response distribution histograms for the UK, SE and US sites are reproduced in Appendix 10.   Examination of the response distribution figures exemplifies the difficulties associated with visual identification of trends and relationships within multivariate data sets; this being difficult, the following subjective observations are made.  The US gave the highest level of response to 48 out of the 82 Focus 2002 questions.  This may indicate the presence of fewer sub-cultures compared to the other two nations or that US respondents may be more polarised in their views.   Examples of the Focus 2002 response distributions are provided in Figure 3.1.  The response distribution for question 5 is an example where the responses are not able to discriminate the SE, UK and US sites, whereas questions 30b is able to discriminate at least one nation’s sites from the other two.  The inclusion of questions that discriminate national cultures may introduce unwanted variance into a model correlating responses with lagging SHE performance indicators on a global scale.  The response distributions for question 42 are approximately normally distributed.  The response distributions for question 47 are skewed.  The response distributions for question 56 indicate two distinct peaks.  The presence of two distinct peaks may indicate the presence of sub-cultures.   Inspection of the distribution histograms indicates that PCA and PLS should be well able to model a significant proportion of the Focus 2002 question responses as they are approximately normally distributed.  PCA and PLS analysis of non-normally distributed question responses should be possible after distribution transformation (Section 2.9.3.2 explained the process of transformation of non-normally distributed data, prior to analysis).


       United Kingdom                           United States                                          Sweden          

 

 

 

 

Figure 3.1 – Example Focus 2002 survey response distributions

 

3.5.4- Focus 2002 question response and significant-injury frequency correlations

The literature review detailed in Chapter 2 indicated that previous research has correlated the responses to single climate questions with lagging SHE performance indicators.  Calculation of a (Pearson) correlation matrix of Focus 2002 question responses and AstraZeneca SIFR allows:

·        The rapid identification of those Focus 2002 question responses that correlate with SIFR.

·        Identification of question responses that are highly correlated with one another.

 A priori knowledge of those Focus 2002 question responses that correlate with SIFR prior to detailed analysis is useful.  A model based upon these questions will potentially be more able to predict SIFR performance compared with a model that has been built without a priori knowledge.  Focus 2002 question responses that are highly correlated with one another may indicate that those questions are measuring the same latent construct.  If problems are encountered in subsequent PLS, PCA or PLS-DA analysis, the correlation information can be used to simplify the data set by removal of one or more of the question responses that correlate with a question remaining within the PLS, PCA or PLS-DA model.  The following correlations were calculated for the UK, SE and US sites: 

·        Pearson correlation coefficients between the arithmetic mean Focus 2002 question responses and site SIFR.

·        Pearson correlation coefficients between the standard deviation of each Focus 2002 question response and site SIFR.

As will be discussed later, the correlation coefficients between the standard deviation of the Focus 2002 questions and site SIFR were calculated to investigate the relationship between the spread of question responses and SIFR.  Pearson correlation matrices were calculated using the ‘tools-data analysis - correlation function’ within Microsoft Excel 2000 (version 9.0.6926-SP3).  The resultant correlation coefficients are reproduced in Appendix 11.  In Section 3.2 it was subjectively asserted that Focus 2002 questions 17, 28, 36, 41, 45 were associated with either Flin et al’s [60] ‘safety system’ or ‘risk’ cultural factors.  These two factors can arguably be associated with ‘safety’.  Based upon the above arguments the following observations are made:

         ·        25 UK, 17 SE and 16 US Pearson correlation coefficients are significant. 

·        14 of the significant US Pearson correlation coefficients are negative. 

·        All of the UK Pearson correlation coefficients are positive.  

·        Focus 2002 questions 24 and 55a are (Pearson) correlated to SIFR in all three countries above the level of significance.  It is noted that the Pearson correlation coefficient for the UK and SE is positive, whereas in the US, the Pearson correlation coefficient is negative.

·        0 UK, 0 US and 2 SE Focus 2002 responses that significantly (Pearson) correlate with SIFR are related to ‘safety’. 

·        A number of Focus 2002 question responses that positively correlate with SIFR in the UK or SE are found to be negatively correlate with SIFR in the US.

 

The following observations are made for the correlation matrices based upon the Focus 2002 question response standard deviations:

 ·        10 UK, 14 SE and 17 US Pearson correlation coefficients are significant.

·        Only Focus 2002 question number 22 is (Pearson) correlated to SIFR above the level of significance in the UK, SE and US.

 Visual examination of the UK, US and SE correlation matrices indicates that a high proportion of the question responses are strongly correlated with at least one other response.  A high correlation coefficient between two or more question responses may be indicative of the questions measuring the same underlying theoretical construct.  The existence of high inter-response correlations can be advantageous in reducing the complexity of multivariate models.  For instance, if several variables correlate with one another then there may be an opportunity to choose one to represent all of them.

 

3.5.5 – Calculation of the incidence of missing Focus 2002 question responses

It is important to understand, for each site, the percentage of personnel taking part in the survey prior to subsequent analysis.  If, for any one site, the number of respondents is low, the corresponding arithmetic mean Focus 2002 responses will not be sufficiently representative of the entire site.  Based upon the subjective opinion of the author, it was decided that only sites with an average question response rate of above an arbitrarily set threshold of 30% would be selected for subsequent analysis.   

The percentage of AstraZeneca personnel responding to the Focus 2002 survey was calculated for each UK, SE and US site.  The results are given in Appendix 7.  

With the exception of site UK9, all UK sites were found to have a response percentages of between 41% and 73%.  According to the information provided to the author, only 2% of site UK9 responded to the Focus 2002 survey.  It is not known if the low response rate is genuine or is a result of an error in the reporting database.  Regardless of whether the UK9 response rate is genuine, a value of 2% is insufficient to represent the site.  Site UK9 was therefore removed from the scope of the analysis detailed in Chapters 4 and 5.     

With the exception of sites US3 and SE8, all US and SE sites were found to have a response rate of between 51% and 96%.  The number of responses for sites US3 and SE8 were found to be greater than the respective site AstraZeneca survey populations.  It is unlikely that the total site population is incorrect.  The most likely source of error is the Focus 2002 site coding, detailed in Section 2.8.2.  An error in the site coding information could allow respondents from other sites to be included within site US3 and SE8 data.  For this reason, sites US3 and SE8 were removed from the scope of the analysis detailed in Chapters 4 and 5.    

 

3.5.6 - Principal Component Analysis

 

3.5.6.1 - Introduction

Section 2.9.2.2 provided an overview of how principal component analysis (PCA) can be used to identify latent constructs and simplify the complex nature of multivariate and megavariate data sets.   This section describes the application of PCA to the Focus 2002 question response data.  PCA modelling was performed for the following reasons:  

·        To better understand the underlying structure of the responses to the Focus 2002 questions.

·        To obtain principal component information that will be entered into subsequent SIMCA models.

The data were organised into two groups.  The first group involved the PCA modelling of only those responses to Focus 2002 questions that correlated with AstraZeneca SIFRs for each nation.  The second group involved the PCA modelling of all of the Focus 2002 question responses.  The principal reason for splitting the data into two groups was to discover the underlying structure of those questions known to (Pearson) correlate with SIFR.

 

3.5.6.2 – Principal Component Analysis – method

Two PCA models were created for each national data set.  The X block data for the first model consisted of those responses to the Focus 2002 questions identified in Section 3.5.4 as being (Pearson) correlated with the AstraZeneca SIFR.  The X block data for the second model consisted of the responses to all of the Focus 2002 questions.  The mean Focus 2002 responses and AstraZeneca significant-injury frequency rates were entered into SIMCA P+ (version 10.0.4.0).  The mean responses to the Focus 2002 questions were used as the X block data for both models.  The six resultant models are summarised in Table 3.2.

 

Model Name

Nation

Focus 2002 Question Responses Included

In The X block Data

PCA-UK1

UK

For all UK Sites: Those Focus 2002 question responses that (Pearson) correlated with SIFR above the level of statistical significance (at 95% confidence).

PCA-UK2

UK

All UK site mean responses to the Focus 2002 survey.

PCA-SE1

SE

For all SE Sites:  Those Focus 2002 question responses that (Pearson) correlated with SIFR above the level of statistical significance (at 95% confidence).

PCA-SE2

SE

All SE site mean responses to the Focus 2002 survey.

PCA-US1

US

For all US sites:  Those Focus 2002 question responses that (Pearson) correlated with SIFR above the level of statistical significance (at 95% confidence).

PCA-US2

US

All US site mean responses to the Focus 2002 survey.

Table 3.2 – The Focus 2002 PCA model inputs

The default mean centering and scaling option was selected within SIMCA P+.  Models PCA-UK2, PCA-SE2 and PCA-US2 were optimised by a process of inspection of the question response R2 and Q2 values and successively removing those responses that had a negative Q2 or R2 value, or a Q2 value of less than 0.5, or a R2/Q2 ratio of less than 0.7.  Upon each successive removal of a question response, the model was re-run and the new R2, Q2 plots inspected.  The process was repeated until all of the Q2 values were above 0.5 and the R2/Q2 value was above 0.7.  It is important to stress that the goal of model optimisation was to achieve satisfactorily high R2 and Q2 values and, at the same time, avoid over-optimisation, i.e. removal of too many question responses as an aid to drive ever increasing R2 and Q2 values.  Models PCA-UK1, PCA-SE1 and PCA-US1 were not optimised as they are based upon a priori information regarding which question responses correlate with SIFR. 

 

3.5.6.3 – Principal Component Analysis – results

 The graphical outputs from the PCA modelling are reproduced in Appendix 12.  Table 3.3 summarises the results.  A description of the columns in Table 3.3 follows:   

Header column 1

:

PCA model name.

Header column 2

:

The number of principal components in final model.

Header column 3

:

The cumulative model R2 and Q2 values.

Header column 4

:

The values of R2 and Q2 for the first principal component.

Header column 5

:

The cumulative R2 and Q2 values for the first two principal components.

Header column 6

:

The cumulative R2 and Q2 values for the first three principal components.

Header column 7

:

The cumulative R2 and Q2 values for the first four principal components.

Header column 8

:

The question responses that are retained in the final model.

 

PCA Model Name

No Of Comp

Model Cumulative

1st Principal Comp

2nd Principal Comp

3rd Principal Comp

4th Principal Comp

Focus 2002 Question Responses

Retained In Model

R2

Q2

R2

Q2

R2

Q2

R2

Q2

R2

Q2

 

PCA-UK1

1

0.731

0.556

0.731

0.556

na

na

na

na

na

na

7, 8a, 8b, 8c, 10f, 12, 16, 20a, 20c, 20d, 20e, 22, 23, 24, 25, 27b, 37, 38a, 38b, 40, 47, 53a, 53b, 53c, 55a.

PCA-UK2

4

0.956

0.628

0.5587

0.1163

0.782

0.295

0.8997

0.402

0.956

0.628

1, 3, 7, 8a, 8c, 9b, 13, 19, 20a, 20c, 20e, 20f, 22, 25, 26, 27a, 28, 29, 30a, 33, 34, 35, 37, 38a, 40, 42, 47, 50, 52, 53a, 53b, 55a, 57, 58, 59, 60.

PCA-SE1

1

0.837

0.764

0.837

0.764

na

na

na

na

na

na

5, 14, 17, 21, 23, 24, 25, 26, 28, 33, 34, 35, 44, 48, 55a, 55b, 58.

PCA-SE2

4

0.956

0.803

0.802

0.726

0.871

0.699

0.924

0.752

0.956

0.803

1, 3, 4, 5, 6, 7, 8a, 8b, 8c, 9a, 9b, 10a, 10c, 10d, 10e, 10f, 10g, 11,12,13,14, 15, 16, 18, 19, 20a, 20b, 20c, 20d, 20e, 20f, 23, 24, 25, 26, 27a, 27b, 27c, 29, 30a, 30b, 31, 32, 33, 34, 35, 36, 37, 38a, 38b, 38c, 40, 42, 43, 46, 47, 50, 51, 52, 53a, 53b, 53c, 55a, 55b, 56, 57, 58, 59.

PCA-US1

1

0.911

0.859

0.911

0.859

na

na

na

na

na

na

3, 4, 11, 20a, 20e, 22, 24, 30b, 38a, 38b, 38c, 53a, 53b, 53c, 55a, 57.

PCA-US2

2

0.928

0.740

0.737

0.447

0.928

0.740

na

na

na

na

1, 2, 4, 5, 6, 7, 8a, 8b, 8c, 9a, 9b, 10c, 10d, 10g, 11, 12, 13, 14, 16, 18, 20a, 20b, 20c, 20d, 20e, 20f, 22, 24, 25, 27a, 27b, 27c, 29, 30a, 30b, 31, 33, 34, 35, 37, 38a, 38b, 38c, 39, 40, 42, 43, 46, 47, 52, 53a, 53b, 53c, 58.

 

 

 

 

                                                                                                                                               Table 3.3 – Summary of the PCA results


3.5.6.4 – Principal Component Analysis - conclusions 

All of the PCA models built had cumulative R2 values in excess of 0.7 and Q2/R2 values in excess of 0.5.  Such results were not anticipated due to the inherently variable nature of human survey data.   

The number of model principal components within the final models varies between 1 and 4.  Each of the principal components may be envisaged as a dimension of organisational culture.  Given the graphical PCA outputs reproduced in Appendix 12, it was relatively straightforward and practicable to identify and group those questions that most strongly influence or load highly on each principal component.  This analysis is detailed in Chapters 4 and 5.      

As expected, the greatest amount of variation is accounted for by the first principal component in each model.  After optimisation, models PCA-UK2, PCA-SE2 and PCA-US2 were left with 36, 68 and 54 questions respectively in the final model.  The significant number of remaining questions suggests that the PCA models were not over-optimised.  As one would expect, not all questions in the PCA models starting with those questions known to correlate with SIFR are retained in the final models that start with all of the Focus 2002 questions.  For example, responses to Focus 2002 questions 10f, 16, 20d, 23, 24, 27b, 38b and 53c in model PCA-UK1 are not retained in model PCA-UK2.  This result is expected for two reasons; firstly, model PCA-UK1 was not optimised.  Those question responses in model PCA-UK1 that do not appear in model PCA-UK2 were obviously unable to be modelled as well as those responses retained within model PCA-UK2.  Secondly, the PCA-UK2 was built to best model the data.  No a priori information regarding how the question responses related to SIFR was entered into the model.    Models PCA-UK2, PCA-SE2 and PCA-US2 are all suitable for use in the national discrimination SIMCA analysis detailed in Chapter 5.  Inspection of the PCA score scatter plots in Appendix 12 indicates:

·        Sites UK2 and UK6 differ from the other UK sites.

·        Site SE 1 differs from the other SE sites.

·        Site US4 differs from the other US sites.

Appendix 7 details the SIFR data for the UK, SE and US sites.  Cross-referencing the above information with the SIFR of each site, it is noted that sites UK2, SE1 and US4 have significantly higher SIFR values compared with other sites in the same territory.  It can be concluded that the PCA models are measuring an organisational factor that is related to site SIFR performance.  One would expect this to be the case with models PCA-UK1, PCA-SE1 and PCA-US1 as these models are based on question responses known to (Pearson) correlate with SIFR.  The ability of the models PCA-UK2, PCA-SE2 and PCA-US2 to discriminate the better and poorer SIFR performing sites without this a priori information is surprising.

  3.6 – Summary and conclusions

Section 3.2 compared the Focus 2002 questions with organisational culture factors identified by previous culture surveys.  The comparison concluded that the Focus 2002 survey is capable of measuring AstraZeneca’s organisational culture. Section 3.3 explained the rationale behind restricting the scope of the further analysis to the UK, SE and US nations.  The 2002 GSHE annual accident reports were also examined with a view to identifying lagging SHE performance indicators that could be used in the analysis detailed within Chapters 4 and 5.  Examination of the data indicated that the number of reported significant injuries occurring was the most appropriate metric to use.  The site mean and standard deviations were selected as the metrics to represent the Focus 2002 responses.   Section 3.4 explained how the Focus 2002 and GSHE significant-injury information was pre-treated and analysed.  The transformation of the raw Focus 2002 data into a data set ready for analysis and the method of converting the number of significant injuries into significant-injury frequency rates were described.  The mean and the standard deviation for each Focus 2002 question response for all UK, SE and US sites were calculated.   Section 3.5 explored the structure of the Focus 2002 data.  The incidence of missing responses was calculated and found to be satisfactorily low.  The percentage of personnel taking part in the Focus 2002 survey was calculated for each UK, SE and US site.  The percentage responses for sites UK9, SE8 and US3 were found to be anomalous.  Sites UK9, SE8 and US3 were therefore removed from the scope of the analysis detailed within Chapters 4 and 5.   The response distribution histograms for each of the Focus 2002 questions were plotted.  Inspection of the histograms indicated that the majority of the responses are positively or negatively skewed.  A number of the histograms showed two distinct peaks that may be indicative of the presence of sub-cultures.  Given the range of response distributions within the data, one can conclude that it is unlikely that a single distribution metric may be suitable to best represent all of the question responses.  Pearson correlation coefficients were calculated between the Focus 2002 survey responses and the SIFRs for the UK, SE and US sites.  A significant number of question responses were found to be strongly correlated with SIFR for UK, SE and US sites.  The presence of a number of question responses that correlate with SIFR responses indicates that it is highly likely that it will be possible to produce a good predictive PLS model based upon them.  The joint UK, SE and US correlation matrix indicated weak (Pearson) correlations with SIFR at the national level.  Given this, the ability to build a single PLS model to predict SIFR performance in each of the three nations is questionable.  Based upon the results of the (Pearson) correlation matrices, three separate models will be built for each group of UK, SE and US sites.  The first model will include only those mean question survey responses that are known to (Pearson) correlate with SIFRs.  The second model will include all of the mean question survey responses.  The third model will include all of the standard deviation question survey responses.  Given the a priori information, the first model is expected to be able to create a robust, well predicting, PLS model.  Comparison of the predictive ability of the first two PLS models will give insight into how much SIFR predictive ability is contained in the survey responses with low (Pearson) correlations that would otherwise be ignored in typical OVAT analysis.  The third modelling approach will provide an insight into the usefulness of using question response metrics other than the mean.    Section 3.5 summarised the PCA that was performed on the mean Focus 2002 question responses for the UK, SE and US sites.  The ability to produce PCA models for each group of UK, SE and US sites suggested that PLS modelling of the Focus 2002 question responses was likely to be possible.