Chapter 5

 

Home
Index
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
References
Appendix 13
Appendix 14
Appendix 15
Appendix 16
Appendix 17
Contact

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Chapter 5 – Classification Analysis of the AstraZeneca Focus 2002 responses

5.1 – Introduction

 Sections 2.9.2.3 and 2.9.2.4 provided an overview of PLS-DA and SIMCA methodologies respectively.  The purpose of this chapter is to describe the methodology that was employed to address research questions 14 to 17 inclusive listed in Table 1.1.  

5.2 – Classification of nations 

5.2.1 – Nation classification method 

5.2.1.1 – Nation classification - PLS-DA method

 The UK, SE and US site mean responses to the Focus 2002 questions were entered into SIMCA P+ (Version 10.0.4.0).  Two separate a priori three-class PLS-DA models were built.  The three a priori classes for the Nation-PLS-DA1 model consisted of the UK, SE and US sites respectively.  In Nation-PLS-DA2 model the same sites were allocated arbitrarily into three classes.  Table 5.1 details the resultant class information for the two models. 

Model Name

Class 1

Class 2

Class 3

Nation-PLS-DA1

UK1, UK2, UK3, UK4, UK5, UK6, UK7, UK8, UK10.

SE1, SE2, SE3, SE4, SE5, SE6, SE7, SE9, SE10, SE11.

US1, US2, US4, US5, US6.

Nation-PLS-DA2

UK1, UK4, UK7, SE1, SE4, SE7, SE11, US4.

UK2, UK5, UK8, SE2, SE5, SE9, US1, US2.

UK3, UK6, UK10, SE3, SE6, SE10, US2, US6

Table 5.1 – Nation-PLS-DA class memberships

 The purpose of Nation-PLS-DA2 was to see if it was possible to build a valid PLS-DA model based upon an arbitrarily selected set of a priori classes.  Allocation of a site into one of the three classes was achieved by the sequential numbering of the sites e.g., UK1, UK2, UK3 etc.  All sites were then assigned in numerical order into class 1, class 2, class 3, class 1, class 2 etc.   The X block information for both models consisted of the mean site responses to the Focus 2002 questions.  The default mean centering and scaling option was selected within SIMCA P+.  The resultant models were refined by an iterative process of inspecting the variable importance plot (VIP), identifying and removing those responses that had a VIP value of less than 0.8, and then re-running the model.  This process was repeated until the score scatter plot showed good separation of the a priori classes, and Q2(cum) became optimally high.  YPredPS values and model membership probabilities at the 95% confidence level were calculated using the ‘predictions – classification list’ tool within SIMCA P+.  Final models were validated using the response permutation validation function within SIMCA P+ with the number of random shuffles set to 20.  A value of 20 random shuffles was selected as there are a restricted number of possible permutations with 3 classes. 

 

5.2.1.2 – Nation classification - SIMCA method 

The PCA-UK2, PCA-SE2 and PCA-US2 models created in Section 3.5.6 were reloaded into SIMCA P+ (Version 10.0.4.0).  All UK, SE and US site mean Focus 2002 responses were entered into the prediction set for models PCA-UK2, PCA-SE2 and PCA-US2.  The default mean centering and scaling option was selected within SIMCA P+.  The distance to model for all UK, SE and US sites was calculated for models PCA-UK2, PCA-SE2 and PCA-US2 using the ‘Predictions-X Block-Column Plot’ option within SIMCA P+.  The membership probability at the 95% confidence level was calculated for each site using the ‘Prediction-Prediction List’ option within SIMCA P+.  A Coomans plot of PCA-UK2 versus PCA-SE2 was created by using the ‘Coomans Plot’ option within SIMCA P+.

 

5.2.2 – Nation classification results 

5.2.2.1 – Nation classification - PLS-DA results 

The six component Nation-PLS-DA1 model had the following values: R2X = 0.883, R2Y = 0.986 and Q2 (cum) = 0.916.  The first four components account for 96% of the variance and Q2 (cum) = 0.88.  SIMCA P+ was unable to produce a model for the Nation-PLS-DA2 data.  The score and weightings scatter plots for the first three principal components for model Nation-PLS-DA1 are reproduced in Figures 5.1 to 5.4 inclusive.  Figure 5.5 details the resultant variable importance plot. 

Figure 5.1– Nation-PLS-DA1  - Score scatter plot of the first two principal components showing the discrimination of the UK, SE and US sites

Figure 5.2 – Nation-PLS-DA1 - Loadings scatter plot for the first two principal components showing those question responses that discriminate the UK, SE and US sites

Figure 5.3 - Nation-PLS-DA1 – Score scatter plot for principal components 2 and 3 showing the discrimination of the UK, SE and US sites

Figure 5.4 - Nation-PLS-DA1- Loadings scatter plot for principal components 2 and 3 showing those question responses that discriminate the UK, SE and US sites

 

Figure 5.5 - Nation-PLS-DA1 – Variable importance plot

The model overview plot, cross permutation validation plots, membership probabilities and YPredPS values for model Nation-PLS-DA1 are reproduced in Appendix 14.  The Q2 ordinate intercept of all three cross permutation validation plots for model Nation-PLS-DA1 are lower than –0.5.  The predictive abilities of the permuted models are therefore significantly less than the non-permuted model.  Model Nation-PLS-DA1 is therefore valid.   Figure 5.1 shows that the final Nation-PLS-DA1 model is able to discriminate SE from the UK and US sites.  The model is also able to discriminate the UK and US sites, although the UK4 and US6 sites are not discriminated well by the first two components.  The UK4 and US6 sites are, however, separated by the 3rd principal component as shown in Figure 5.3.   The SE sites are significantly discriminated from the UK and US sites by the first principal component.  SE is discriminated by higher than average responses to the questions on the left, and lower than average responses to those questions on the right of the loadings scatter plot origin in Figure 5.2.  Simultaneous examination of the variable importance plot in Figure 5.5 together with the loadings plot in Figure 5.2 indicates those questions that are most able to discriminate the SE from UK and US sites.  SE is discriminated from the UK and US sites by the greater than average responses to questions 6, 25, 34, 37, 45, 48, 49, 60, and less than average responses to questions 7, 10d, 10g, 39.  The US is not well discriminated from the UK by the first principal component.  Figures 5.3 shows that the second and, to a lesser extent, the third principal components discriminate the US from the UK sites.  The US is discriminated from the UK by the higher than average responses to questions in the upper half, and lower than average responses to the questions in the lower half of the loadings scatter plot in Figure 5.2.  The UK is discriminated from the US by the higher than average responses to questions in the lower half and lower than average responses to the questions in the upper half of the loadings scatter plot in Figure 5.2.  Those questions furthest from the origin of Figure 5.2 are most influential in distinguishing the UK and US sites.  All of the above results are verified by visual comparison with the distribution histograms of the Focus 2002 responses in Appendix 10.    The good discrimination between UK, SE and US sites is exemplified by the YPredPS values that unambiguously and correctly allocate sites into their respective nations.  As indicated by the inability to formulate principal components, SIMCA P+ was unable to model the information contained in Nation-PLS-DA2.

 

5.2.2.2 – Nation classification - SIMCA results 

The results of the SIMCA analysis are presented graphically in Figures 5.6 to 5.9 inclusive.  Class membership probabilities of each UK, SE and US site belonging to models PCA-SE2, PCA-UK2 and PCA-US2 are tabulated in Appendix 15.  Inspection of Figure 5.6 demonstrates that: 

·        All UK and US sites are above the critical distance to model. 

·        All SE sites are below the critical distance to model. 

The PCA-SE2 model is therefore able to discriminate SE sites from UK and US sites.

Figure 5.6 – PCA-SE2 – Distance to model plot showing all non-SE sites above

the critical distance to model

 Figure 5.7 demonstrates that:

·        All SE and US sites are above the critical distance to model. 

·        All UK sites are below the critical distance to model.

The PCA-SE2 model is therefore able to discriminate UK sites from SE and US sites. 

Figure 5.7 – PCA-UK2 distance to model plot showing all non-UK sites above

the critical distance to model

Figure 5.8 demonstrates that:

·        All UK and SE sites are above the critical distance to model.

·        All US sites are below the critical distance to model.   

The PCA-US2 model is therefore able to discriminate US sites from UK and SE sites.

 

Figure 5.8 – PCA-US2 – Distance to model plot showing all non-US sites above

the critical distance to model

 

The Coomans plot in Figure 5.9 is a more efficient way of reporting the information contained within Figures 5.5 and 5.6.  Figure 5.9 shows:

All UK sites are within the critical distance to model of PCA-UK2 and outside the critical distance to model of PCA-SE2.
All SE sites are within the critical distance to model of PCA-SE2 and outside the critical distance to model of PCA-UK2.

 All US sites are shown to be outside the critical distance to model of both PCA-UK2 and PCA-SE2.

Figure 5.9– Coomans plot of models PCA-UK2 versus PCA-SE2

               

The probabilities of each of the UK, SE and US sites belonging to models PCA-UK2, PCA-SE2 and PCA-US2 are tabulated in Appendix 15.  Appendix 15 indicates that the membership probabilities of sites from one nation belonging to the PCA model of another nation are less than 1.3 x 10-2 (and generally several orders of magnitude lower than this). 

 

5.2.3 – Nation classification conclusions   

The PLS-DA modelling detailed in Section 5.2.2.1 and the SIMCA modelling detailed in Section 5.2.2.2 have been shown able to discriminate UK, SE and US sites.  SIMCA P+ was not able to model the randomly assigned a priori class information contained within the Nation-PLS-DA2 data.  One can therefore conclude that the discrimination within Nation-PLS-DA1 is not due to SIMCA P+’s ability to discriminate the UK, SE and US nations purely by chance or as a result of the multivariate nature of the Focus 2002 responses.  This fact is reinforced by the results of the cross permutation validation plots that indicated that the predictive ability of the permuted models was very much less than that of the original models.  The PLS-DA YPredPS values detailed in Appendix 14 cannot be numerically compared with the membership probabilities arising from the SIMCA modelling detailed in Appendix 15.  Although they cannot be numerically compared, subjective comparison of the PLS-DA and SIMCA techniques indicates that PLS-DA is more consistent in its classification compared to SIMCA, for example, sites UK5, UK7, UK8, SE3, SE10 and US2 all have SIMCA membership probabilities of belonging to their nation of 0.23 or less.  This is in contrast to the PLS-DA technique in which all sites have YPredPS values (listed in Appendix 14.5) of 1 +/- 0.1 (corresponding to a site belonging to its nation).  The ability of PLS-DA and SIMCA models to discriminate the AstraZeneca UK, SE and US sites is unsurprising.  One would expect that the differences in the pattern of responses are due to a combination of national and organisational cultural differences.  The work in this chapter was unable to determine whether national cultural differences dominate over organisational cultural issues or visa versa.  It is hypothesised that national cultural differences dominate over organisational culture, however, further work in this area is required to support this hypothesis.  By superimposition of the PLS-DA score and loadings scatter plots, it has been shown possible to identify those questions that discriminate the UK, SE and US sites.  Table 5.2 details those Focus 2002 question responses that discriminate the SE sites from the US and UK sites.  Table 5.3 details those Focus 2002 question responses that discriminate the UK and US sites.  Being mindful of the subjective nature of assigning themes to organisational dimensions, the following observations are made with respect to the information in Tables 5.2 and 5.3:   

SE personnel are far less satisfied than UK and US personnel with respect to pay, rewards and satisfaction with management.
UK personnel are distinguished from their US colleagues in that they are more likely to feel undervalued and perhaps oppressed by management.  



Focus 2002 Question Number

Focus 2002 Question

SE Response Bias

US/UK Response Bias

34

I am happy with the degree of choice and flexibility I have in shaping my pay and benefit package.

Tend to disagree.

Tend to agree.

45

AstraZeneca is socially responsible in the community.

Tend to disagree.

Tend to agree.

6

Management supports equal opportunity for all employees.

Tend to disagree.

Tend to agree.

49

I am frequently worried about being made redundant.

Tend to disagree.

Tend to agree.

10a

In AstraZeneca:

Our traditional ways of doing things can be challenged

Tend to disagree.

Tend to agree.

48

AstraZeneca demonstrates commitment to the health and well-being of its employees.

Tend to disagree.

Tend to agree.

56

Decision-making in AstraZeneca is:

Tend to ‘too fast’ through to ‘no opinion’.

Tend to ‘about right.

59

How good a job is AstraZeneca doing in linking pay to performance.

Tend to no opinion’.

Tend to ‘very good’.

25

Pay in AstraZeneca is as good as or better than the pay in other organisations in our industry.

Tend to disagree.

Tend to agree.

13

AstraZeneca makes adequate use of recognition other than money to encourage good performance.

Tend to disagree.

Tend to agree.

39

In AstraZeneca, there is adequate opportunity for employees to learn about internal vacancies.

Tend to agree.

Tend to disagree.

10g

In AstraZeneca: People have fun while doing their work.

Tend to agree.

Tend to disagree.

10d

In AstraZeneca: People dare to take the initiative.

Tend to agree.

Tend to disagree.

 

Table 5.2 – The Focus 2002 question responses that discriminate the US sites from UK and SE sites

 


Focus 2002 Question Number

Focus 2002 Question

UK Bias

US Bias

60

At the present time, are you seriously considering leaving AstraZeneca?

Tend to yes.

Tend to no/don’t know.

46

My team work well together.

Tend to agree.

Tend to disagree.

26

I receive the training and development I need to help prepare me for other roles.

Tend to disagree.

Tend to agree.

41

I believe AstraZeneca is an environmentally responsible company.

Tend to disagree.

Tend to agree.

10e

In AstraZeneca: New ideas can fail without penalty to the originating person

Tend to disagree.

Tend to agree.

37

Management supports diversity in the workplace.

Tend to disagree.

Tend to agree.

7

I think my job is considered important in AstraZeneca.

Tend to disagree.

Tend to agree.

10c

In AstraZeneca: People receive recognition for innovation

Tend to disagree.

Tend to agree.

43

There is good co-operation across functions/companies in AstraZeneca.

Tend to disagree.

Tend to agree.

 

Table 5.3 – The Focus 2002 question responses that discriminate the UK and US sites

 

5.3 – Classification of sites based upon SIFR performance

 5.3.1 – SIFR site classification methods

 5.3.1.1 – SIFR PLS-DA method  

The mean responses to the Focus 2002 survey questions for each of the UK, SE and US sites were entered into SIMCA P+ (Version 10.0.4.0). The default mean centering and scaling option was selected within SIMCA P+.  The following four separate PLS-DA models were created:  

SIFR-PLS-DA-UK1 model consisted of all UK sites classified into one of two a priori classes. 
SIFR-PLS-DA-UK2 model consisted of all UK sites classified into one of three a priori classes. 
SIFR-PLS-DA-SE1 model consisted of all SE sites classified into one of two a priori classes. 
SIFR-PLS-DA-US1 model consisted of all US sites classified into one of two a priori classes.

The a priori class limits were assigned arbitrarily as representing reasonably distinct bands of SIFR performance.  The reason for creating a three- as well as a two-class model for the UK was to test the discriminatory power of PLS-DA.  The UK was chosen for the exercise as it possessed three reasonably distinct bands of SIFR performance.  Details of the a priori class range for each of the models are summarised in Table 5.4.   The PLS-DA model option within SIMCA P+ was selected.  The initial models were refined by the iterative process of inspecting the variable importance plot (VIP), identifying and removing those responses that had a VIP of less than 0.8 and re-running the model.  This process was repeated until the score scatter plot showed good separation between the a priori classes and Q2 (cum) became optimally high.

 



SIFR-PLS-DA Model Name

SIFR Class Range (yr-1)

Class 1

Class 2

Class 3

SIFR-PLS-DA-UK1

Class 1            <6

Class 2            >6

 

UK1, UK3, UK4, UK5, UK7, UK8, UK10

UK2, UK6

Not applicable

SIFR-PLS-DA-UK2

Class 1             <2

Class 2        2 to 6

Class 3             >6

UK3, UK4, UK7,

 

UK1, UK5, UK8, UK10

UK 2, UK 6

SIFR-PLS-DA-SE1

Class 1             <6

Class 2             >6

SE2, SE3, SE4, SE5, SE6, SE7, SE10, SE11

SE1, SE9.

 

Not applicable

SIFR-PLS-DA-US1

Class 1             <7

Class 2             >7

US1, US5, US6

US2, US4

Not applicable

 

Table 5.4 – SIFR-PLS-DA model classes

5.3.1.2 – SIFR SIMCA method 

All of the UK sites were classified into one of three classes and the SE and US sites classified into one of two classes based upon the site injury frequency rates.  The three classes, together with SIMCA model names, are given in Table 5.5.  


Nation

SIFR Class Range (yr-1)

Class 1

[Model Name]

Class 2

[Model Name]

Class 3

UK

Class 1          <2

Class 2     2 to 6

Class 3          >6

UK3, UK4, UK7.

 

[SIFR-SIMCA-UK1]

UK1, UK5, UK8, UK10.

[SIFR-SIMCA-UK2]

UK 2, UK 6

SE

Class 1          <6

Class 2          >6

SE2, SE3, SE4, SE5, SE6, SE7, SE10, SE11

[SIFR-SIMCA-SE1]

SE1, SE9.

 

 

 

Not applicable

US

Class 1          <7

Class 2          >7

US1, US5, US6

 

[SIFR-SIMCA-US1]

US2, US4

Not applicable.

Table 5.5 – SIFR-SIMCA model classes

 The arithmetic mean UK, SE and US mean Focus 2002 responses were entered into SIMCA P+ (Version 10.0.4.0).  The default mean centering and scaling option was selected within SIMCA P+.  The SIMCA PCA models were refined using the same methodology explained  in Section 5.2.1.2.  The SIMCA P+ ‘Predictions-X Block-Column Plot’ option was used to calculate the distance to model of:

All UK sites from SIFR-SIMCA-UK1 and SIFR-SIMCA-UK2;
All SE sites from SIFR-SIMCA-SE1;
All US sites from SIFR-SIMCA-US1.

 Class membership probabilities at the 95% confidence level were calculated for each site using the ‘Prediction-Prediction List’ option within SIMCA P+.  A Coomans plot of SIFR-SIMCA-UK1 versus SIFR-SIMCA-UK2 was created.   

5.3.2. – SIFR classification results 

5.3.2.1 – SIFR-PLS-DA - results  

All SIFR-PLS-DA models were successfully built.  The results of the SIFR-PLS-DA models are summarised in Table 5.6.    

Model Name

Number Of Principal Components

R2X

R2Y

Q2 (cum)

SIFR-PLS-DA-UK1

2

0.793

0.992

0.922

SIFR-PLS-DA-UK2

3

0.705

0.950

0.689

SIFR-PLS-DA-SE1

3

0.881

0.976

0.590

SIFR-PLS-DA-US1

1

0.636

0.770

0.627

 

Table 5.6 – SIFR-PLS-DA results summary

 

The score and loadings scatter plots for the SIFR-PLS-DA models are reproduced in Figures 5.10 to 5.20.  The graphical model overview, cross permutation validation plots, model membership probabilities and YPredPS values for the SIFR-PLS-DA models are reproduced in Appendix 16.   The Q2 ordinate intercepts of all of the SIFR-PLS-DA cross permutation validation plots are below 0.05.  The SIFR-PLS-DA models can therefore be assumed to have some validity.  Inspection of Figures 5.10, 5.16 and 5.19 of the score scatter plots for the two-class models SIFR-PLS-DA-UK1, SIFR-PLS-DA-SE1 and SIFR-PLS-DA-US1 indicates that PLS-DA is able to discriminate the poorer performing AstraZeneca sites in the UK, SE and US.  Figure 5.13 of the three-class SIFR-PLS-DA-UK2 model indicates that PLS-DA is able to discriminate sites with ‘good’, ‘average’ and ‘poor’ significant-injury rate performance.  Inspection of the SIFR-PLS-DA-UK1, SIFR-PLS-DA-UK2, SIFR-PLS-DA-SE1 and SIFR-PLS-DA-US1 variable importance plots (Figures 5.12, 5.15, 5.18 and 5.21 respectively) indicates that:

The 28 question responses retained within model SIFR-PLS-DA-UK1 have similar variable importance.
The majority of the 71 question responses retained in model SIFR-PLS-UK2 have similar variable importance.  Question responses 8b, 12, 39, 30b, and 41 have marginally higher variable importance.
Of the 74 retained questions in model SIFR-PLS-DA-SE1, question responses 15, 17, 21, 22, 23, 28, 41, 44, and 60 are significantly more influential at discriminating the SE sites compared to the other question responses.
Of the 39 question responses retained in model SIFR-PLS-DA-US1, the majority of the responses have similar variable importance.  Question responses 6, 19, 37 and 49 have relatively low variable importance.    

Figure 5.10 – SIFR-PLS-DA-UK1 - Score scatter plot for the first two principal components showing the discrimination of the poorer SIFR performing UK sites (class 2) from the other UK sites (class 1)

Figure 5.11 – SIFR-PLS-DA- UK1 - Loadings scatter plot for the first two principal components showing those Focus survey question responses that discriminate the poorer SIFR performing UK sites (class 2) from the other UK sites (class 1)

 

 

Figure 5.12 - SIFR-PLS-DA-UK1 – Variable importance plot

 

Figure 5.13 – SIFR-PLS-DA-UK2 - Score scatter plot for the first two principal components showing the discrimination between poor (class 3), average (class 2) and better SIFR performing UK sites (class 1)

 

 

Figure 5.14 – SIFR-PLS-DA-UK2 - Loadings scatter plot for the first two principal components showing those questions that discriminate the poor (class 3), average (class 2) and better (class 1)

SIFR performing UK sites

 

Figure 5.15 - SIFR-PLS-DA-UK2 – Variable importance plot

 

Figure 5.16 – SIFR-PLS-DA-SE1 - Score scatter plot for the first two principal components showing the discrimination of the poorer SIFR performing SE sites (class 2) from the other SE sites (class 1)

 

Figure 5.17 – SIFR-PLS-DA-SE1 -- Loadings scatter plot for the first two principal components showing those questions that discriminate the poorer SIFR performing SE sites (class 2) from the

other SE sites (class 1)

 

 

 

Figure 5.18 - SIFR-PLS-DA-SE1 – Variable importance plot

 

 

Figure 5.19– SIFR-PLS-DA-US1 - Score scatter plot for the first principal component showing discrimination of the poorer SIFR performing US sites (class 2) from the other US sites (class 1)

 

 

 

Figure 5.20 – SIFR-PLS-DA-US1 – Loadings column plot for the first principal component showing those questions that discriminate the poorer SIFR performing US sites (class 2) from the other US sites (class 1)

 

Figure 5.21 - SIFR-PLS-DA-US1 - Variable importance plot

 

Section 2.9.2.2 explained how comparison of a model’s score and scatter plots allows the identification of those questions that are influential in discriminating the a priori classes, in this case, those that discriminate sites of differing SIFR performance.  Inspection of Figures 5.10 and 5.11 for model SIFR-PLS-DA-UK1 demonstrates that the poorer SIFR performing sites UK2 and UK6 are discriminated from the other UK sites by the following response biases:  higher than average responses to those Focus 2002 questions in the left half of the loadings scatter plot origin of Figure 5.11.   Inspection of Figures 5.16 and 5.17 for model SIFR-PLS-DA-SE1 indicates that the poorer SIFR performing sites SE1 and SE9 are discriminated from other SE sites by the following response biases:  higher than average Focus 2002 responses to those questions in the lower left quadrant and less than average responses to those Focus 2002 questions in the upper right quadrant of the loadings scatter plot of Figure 5.17.  Finally, inspection of Figures 5.19 and 5.20 for model SIFR-PLS-DA-US1 indicates that the poorer SIFR performing sites US2 and US4 are discriminated from other US sites by the following response biases:  above average responses to those Focus 2002 questions in the lower half and below average responses to those Focus 2002 questions in the upper half of Figure 5.20.   

5.3.2.2 –SIFR SIMCA results

 

PCA models SIFR-SIMCA-UK1, SIFR-SIMCA-UK2, SIFR-SIMCA-SE1 and SIFR-SIMCA-US1 were successfully created.  The results of the modelling are summarised in Table 5.7.  

 

Model Name

Sites

Number Of Components

R2X

Q2(cum)

SIFR- SIMCA-UK1

UK3, UK4, UK7

1

0.936

0.889

SIFR-SIMCA-UK2

UK1, UK5, UK8, UK10

1

0.860

0.793

SIFR-SIMCA-SE1

SE2, SE3, SE4, SE5, SE6, SE7, SE10, SE11

3

0.980

0.930

SIFR- SIMCA-US1

US1, US5, US6

1

0.973

0.951

 

Table 5.7 – SIFR-SIMCA – Results summary

 

The distance to model information is provided in Figures 5.22 to 5.26.  The SIMCA P+ graphical outputs and membership probability tables for the above models are provided in Appendix 17.

 

The ability of the model SIFR-SIMCA-UK1 to discriminate UK3, UK4 and UK7 from the other sites can be determined by inspection of Figure 5.22.   Figure 5.22 shows that sites UK3, UK4 and UK7 (class 1) are within the critical distance to model of SIFR-SIMCA-UK1.  All other sites are found outside the critical distance to model.

Figure 5.22 SIFR-SIMCA-UK1 - Distance to model plot showing the best SIFR performing UK Sites (Class 1:  UK3, UK4, UK7) below the critical distance to model

 

The ability of model SIFR-SIMCA-UK2 to classify UK1, UK5, UK8 and UK10 (class 2) from the other UK sites can be determined by inspection of Figure 5.23.   Figure 5.23 shows that UK1, UK5, UK8 and UK10 are within the critical distance to model of SIFR-SIMCA-UK2.  All other sites are found to be above the critical distance to model. 

Figure 5.23 SIFR-SIMCA-UK2 - Distance to model plot showing the average SIFR performing UK sites (Class 2: UK1, UK5, UK8, UK10) below the critical distance to model

Inspection of the Coomans plot in Figure 5.24 indicates that sites UK2 and UK6 are outside the critical distance to model of both SIFR-SIMCA-UK1 and SIFR-SIMCA-UK2. 

Figure 5.24 –SIFR SIMCA-UK1 Versus SIFR SIMCA-UK2 Coomans Plot showing the poorer SIFR performing UK sites (Class 3: UK2, UK6) outside the critical distance to model for both models

Figure 5.25 – SIFR-SIMCA-SE1 - Distance to model plot showing the poorer SIFR performing SE sites (Class 2: SE1, SE9) above the critical distance to model

 

The distance to model SIFR-SIMCA-SE1 plot in Figure 5.25 shows that sites SE1 and SE9 (class 2) are both above the critical distance to model. 

Figure 5.26 – SIFR-SIMCA-US1 - Distance to model plot showing the poorer SIFR performing US sites (Class 2: US2, US4) above the critical distance to model

 

The distance to model SIFR-SIMCA-US1 plot in Figure 5.26 shows that sites US2 and US4 (class 2) are both above the critical distance to model.   The SIFR-SIMCA model membership probabilities are detailed in Appendix 17.  Inspection of the SIFR-SIMCA-UK1 membership probabilities indicates that sites other than UK3, UK4, and UK7 have SIFR-SIMCA-UK1 membership probabilities of 0.  Sites UK3, UK4, and UK7 all have SIFR-SIMCA-UK1 membership probabilities of greater than 0.23.  Sites UK2 and UK6 which do not belong to either SIFR-SIMCA-UK1 or SIFR-SIMCA-UK2 have membership probabilities of 0.  Inspection of the SIFR-SIMCA-UK2 membership probabilities indicates that sites other than UK1, UK5, UK8 and UK10 have membership probabilities of less than 0.03.  Sites UK1, UK5, UK8 and UK10 all have SIFR-SIMCA-UK2 membership probabilities of greater than 0.21.   Inspection of the SIFR-SIMCA-SE1 membership probabilities indicates that site SE1 has a membership probability of 0.  The membership probability of SE9 is 0.02.  All other SE site membership probabilities are greater than 0.16.   Inspection of the SIFR-SIMCA-US1 membership probabilities indicates that site US4 has a membership probability of 0.  The membership probability of US2 is 0.006.  The remaining US sites have membership probabilities of greater than 0.24.  This shows that the model is able to discriminate sites US4 and US2 from the other US sites.   Based upon the above information, models SIFR-SIMCA-UK1, SIFR-SIMCA-UK2, SIFR-SIMCA-SE1 and SIFR-SIMCA-US1 have been proven to be able to discriminate the poorer SIFR performing sites from the better SIFR performing sites in the UK, SE and US respectively.  

5.3.3 – SIFR classification conclusions 

The YPredPS values for the two-class SIFR-PLS-UK1 model are higher than that for the three-class SIFR-PLS-UK2 model.  These values fall in line with expectations.  Intuitively, for the same data, one would expect the YPredPS values of a two-class model to be higher than those of a three-class model.  Lower YPredPS values are expected due to decreased resolution between the classes as the number of classes increases.   

Inspection of the score scatter plots together with the corresponding loadings scatter plots of the SIFR-PLS-DA models shows those questions that discriminate the sites based upon their SIFR performance.  Tables 5.8, 5.9 and 5.10 detail those questions that discriminate the poorer SIFR performing sites in the UK, SE and US respectively.  The Focus 2002 question responses within Tables 5.8, 5.9 and 5.10 include only those questions that load highly within the model, i.e. those that are furthest away from the PLS-DA score and scatter plot origins and have correspondingly high variable importance values.  

Table 5.8 provides an insight into the organisational cultural factors that are related to SIFR performance within the UK.  All of the question response biases within Table 5.8 fall in line with expectations; for example, one can easily envisage that an accident-prone site may be characterised by the following attributes:

·        Managers who do not provide direction (Question 20a).

·        Managers who do not effectively communicate ideas (Question 20c).

·        Managers who do not provide constructive feedback for improvement (Question 20e).

·        Personnel who have unclear performance targets (Question 22).

It is noted that none of the Focus 2002 question responses that discriminates the poorer performing sites UK2 and UK6 from the other UK sites are directly related to safety.  It is also noted, again in line with expectations, that the majority of questions loading highly within SIFR-PLS-DA-UK1 (Pearson) correlate with SIFR performance at the 95% significance level.   

Inspection of Table 5.9 provides an insight into the organisational cultural factors that are related to SIFR performance in SE.  With the exception of Focus 2002 question number 41, all of the questions fall in line with expectations.  For example, one would expect poorer performing sites to be characterised by:  

Having employees who have unclear performance targets (question 22), are dissatisfied with their pay (question 55b), who want to leave AstraZeneca employment (question 60) and who are of the opinion that there are insufficient resources to do the job well (question 44).
An environment where safety rules are broken (question 17).

One would, however, not expect poorer performing SIFR sites to be characterised by workers believing that AstraZeneca is an environmentally responsible company (question 41).

Focus 2002 Question Number

(Note 1)

Focus 2002 Question

Response Bias For Sites UK2 And UK6

Response Bias For UK sites Other Than UK2 and UK6.

1

In AstraZeneca teamwork is encouraged.

Tend to disagree.

Tend to agree.

(8a)

Communication in my team is: Open

Tend to disagree.

Tend to agree.

(8c)

Communication in my team is:

Direct

Tend to disagree.

Tend to agree.

(20a)

My immediate manager communicates a clear direction for our team.

Tend to disagree.

Tend to agree.

(20c)

My immediate manager effectively communicates his/her ideas.

Tend to disagree.

Tend to agree.

(47)

My immediate manager encourages me to take responsibility for my own development.

Tend to disagree.

Tend to agree.

(20e)

My immediate manager gives constructive feedback for improvement.

Tend to disagree.

Tend to agree.

(22)

My performance targets are clear.

Tend to disagree.

Tend to agree.

(27b)

My immediate manager takes work/life balance into account when: Assigning work.

Tend to disagree.

Tend to agree.

(53b)

I have a clear understanding of the performance targets of: My function/company.

Tend to disagree.

Tend to agree.

(38a)

 I am sufficiently informed about the performance of: My team.

Tend to disagree.

Tend to agree.

Table 5.8 – The Focus 2002 questions that discriminate UK sites of differing SIFR performance

                               


Question Number (Note 1)

Focus 2002 Question

Response Bias For Sites SE1 And SE9

Response Bias For SE sites Other Than SE1 and SE9

(21)

I have sufficient authority to do my job well.

Tend to disagree.

Tend to agree.

22

My performance targets are clear.

Tend to disagree.

Tend to agree.

(44)

I have the resources I need to do my job well.

Tend to disagree.

Tend to agree.

(28)

My work area is a safe place to work.

Tend to disagree.

Tend to agree.

(17)

Safety rules are carefully observed, even if it means work is slowed down.

Tend to disagree.

Tend to agree.

(25)

Pay in AstraZeneca is as good as or better than the pay in other organisations in our industry.

Tend to disagree.

Tend to agree.

(23)

The quality of work produced by my team is excellent.

Tend to disagree.

Tend to agree.

(55b)

How satisfied are you with your benefits package?

Tend toward very dissatisfied.

Tend to Very satisfied.

60

At the present time, are you seriously considering leaving AstraZeneca?

Tend to Yes.

Tend to No.

41

I believe AstraZeneca is an environmentally responsible company.

Tend to agree.

Tend to disagree.

 Note 1: The Focus question numbers in parentheses indicate those that correlate with SIFR above the significance threshold at 95% confidence limit.

 

Table 5.9 – The Focus 2002 questions that discriminate SE sites of differing SIFR performance

 

Question Number (Note 1)

Focus 2002 Question

Response Bias For Sites US2 and US4

Response Bias For SE sites Other Than US2 and US4

5

There are adequate security measures where I work.

Tend to disagree.

Tend to agree.

2

I have a very clear idea of my job responsibilities.

Tend to agree.

Tend to disagree.

10f

In AstraZeneca: Ideas are put into action.

Tend to agree.

Tend to disagree.

26

I receive the training and development I need to help prepare me for other roles.

Tend to agree.

Tend to disagree.

14

I receive the training and development I need to do my current job.

Tend to agree.

Tend to disagree.

(55a)

How satisfied are you with:

Your pay?

Tend to disagree.

Tend to agree.

21

I have sufficient authority to do my job well.

Tend to agree.

Tend to disagree.

(22)

My performance targets are clear.

Tend to agree.

Tend to disagree.

44

I have the resources I need to do my job well.

Tend to agree.

Tend to disagree.

34

I am happy with the degree of choice and flexibility I have in shaping my pay and benefit package.

Tend to agree.

Tend to disagree.

40

My job performance is evaluated fairly.

Tend to agree.

Tend to disagree.

 Note 1: The Focus question numbers in parentheses indicate those that correlate with SIFR above the significance threshold at 95% confidence limit.  

Table 5.10 – The Focus 2002 questions that discriminate US sites of differing SIFR performance

 Unlike the UK discriminatory-question responses, the SE discriminatory-question responses include a question that is directly related to safety, namely, question 5.    Table 5.10 shows that that the responses to some of the US Focus 2002 questions that discriminate the poorer performing sites from the other US sites fall in line with expectations; for example, poorer US SIFR performing sites are characterised by staff who are dissatisfied with pay (Question 55a) and are of the opinion that there are inadequate security measures where they work (Question 5).  One would, however, not expect that the poorer performing sites were characterised by staff who had clear performance targets (Question 22), adequate resources to do their job well (Question 44), were happy with choice of flexibility in shaping the pay and benefit packages (Question 34) and were of the opinion that job performance is evaluated fairly (Question 40).   

Section 3.5.4 (Pearson) correlated the Focus 2002 questions with SIFR rates for the UK, SE and US sites.  The parentheses in column 1 of tables 5.8, 5.9 and 5.10 indicates those questions that were discovered to be (Pearson) correlated with SIFR performance at the 95% confidence level.  Tables 5.8, 5.9 and 5.10 therefore indicate, as one would expect, that PLS-DA can discriminate sites of differing SIFR performance based upon question responses that are not themselves (Pearson) correlated to SIFR above the level of significance.   Section 5.2.3 compared the relative efficiencies of PLS-DA and SIMCA techniques in classifying sites of different nations.  Section 5.2.3 concluded that PLS-DA was more consistent in its ability to classify UK, SE and US sites to their respective nations.  Comparison of the SIFR PLS-DA YPredPS values detailed in Appendix 16 with the SIFR SIMCA membership probabilities detailed in Appendix 17 indicates that PLS-DA is also more consistent than SIMCA with regard to its ability to discriminate sites of varying SIFR performance.  As an example, sites UK1 and UK7 have SIMCA class membership probabilities (of the site belonging to either SIMCA-UK1 or SIMCA-UK2) of 0.21 and 0.23 respectively.  The other UK sites have SIMCA membership probabilities ranging from 0.42 (UK3 being a member of SIMCA-UK1) to 1 (UK10 being a member of SIMCA-UK2).  In comparison, the YPredPS values for a UK site belonging to its class within the three-class SIFR-PLS-DA-UK2 model is about 1 +/-  0.16.     

5.4 – Classification Analysis conclusions  

By the application of PLS-DA and SIMCA techniques on the site mean responses to the Focus 2002 survey:   

The work within Section 5.2 showed that it is possible to discriminate AstraZeneca UK, SE and US sites from one another.
The work within Section 5.3 showed that it is possible to discriminate poorer SIFR performing AstraZeneca UK, SE and US sites from those with better SIFR performance.

Simultaneous inspection of the PLS-DA score and loadings scatter plots allowed the identification of the questions that facilitated the above discrimination.  Examination of the PLS-DA variable importance plots identified those questions that most discriminated the class groups.  The above observations answer research questions 14, 15, 16, and 17 listed in Table 1.1.   

The ability to discriminate nations was hypothesised to be due to a combination of national and organisational cultural differences.  It was suggested that further work was required to determine whether the Focus 2002 responses were dominated by national or organisational culture.     

PLS-DA and SIMCA techniques have proven useful in answering the above research questions.  Both techniques were shown able to discriminate sites of differing SIFR performance using Focus 2002 question responses that do not (Pearson) correlate with SIFR performance.  These question responses would be typically disregarded in standard bivariate analysis.   

Comparison of the nation and SIFR PLS-DA YPredPS values and SIMCA model membership probabilities indicated that the PLS-DA is more consistent in its ability to discriminate classes (within the AstraZeneca Focus and GSHE SIFR 2002 data).  Discovery that, for the AstraZenca 2002 data, PLS-DA is superior to the SIMCA methodology is unsurprising.  In PLS-DA the principal components are constructed so as to maximise the discrimination between the a priori classes.  During the process of PLS-DA model optimisation, X block data that do not help discriminate classes is removed from the model.  In the SIMCA approach, separate a priori class PCA models are built.  In the process of PCA model optimisation, the principal components are constructed to best represent all of the X block data.  During PCA model optimisation X block data that cannot be predicted well by the model are removed.  One would expect that during the process of PCA model optimisation, question responses may be removed that would be useful at discriminating nations.  Given that the PLS-DA model optimisation process is focused toward class discrimination, one would expect its class discriminating ability to be superior to the SIMCA approach.  Given that the PLS-DA technique appears to provide better class discrimination than the SIMCA technique and, PLS-DA’s ability to graphically identify the variables that most discriminate objects, PLS-DA is likely to be the preferred technique when the number of a priori classes is 5 or less.  If the number of a priori classes exceeds 5, the preferred technique is likely to be SIMCA, due to easier model interpretation, as explained in Section 2.9.2.4.   

Although the purpose of this chapter was not to label an organisational construct that discriminates good SIFR performing sites from poorer performing ones, inspection of Tables 5.8, 5.9 and 5.10 provides an insight into the factors that are related to SIFR performance at the AstraZeneca UK, SE and US sites.  Based upon the above results, the following themes are noted:   

·        Poorer SIFR performing sites within the UK are characterised by individuals who perceive management to be poor communicators and directors.

·        Poorer SIFR performing sites within SE are characterised by individuals who perceive they work in an environment that is under-resourced and where individuals within it are not appropriately rewarded.   

Labelling of a theme to characterise poorer SIFR performing US sites is perhaps inappropriate due to the majority of the Focus 2002 question responses that correlate with poor performance being non-intuitive.  This US discrepancy may be a result of the poorer SIFR-PLS-DA-US1 model only having five site data points.