













| |
Chapter
5 – Classification Analysis of the AstraZeneca Focus 2002 responses
5.1 – Introduction
Sections
2.9.2.3 and 2.9.2.4 provided an overview of PLS-DA and SIMCA methodologies
respectively. The purpose of this
chapter is to describe the methodology that was employed to address research
questions 14 to 17 inclusive listed in Table 1.1.
5.2 – Classification of nations
5.2.1 – Nation classification method
5.2.1.1 – Nation classification - PLS-DA method
The UK,
SE and US site mean responses to the Focus 2002 questions were entered into
SIMCA P+ (Version 10.0.4.0). Two
separate a priori three-class PLS-DA models were built.
The three a priori classes for the Nation-PLS-DA1 model consisted
of the UK, SE and US sites respectively. In
Nation-PLS-DA2 model the same sites were allocated arbitrarily into three
classes. Table 5.1 details the
resultant class information for the two models.
|
Model
Name
|
Class 1
|
Class 2
|
Class 3
|
|
Nation-PLS-DA1
|
UK1, UK2,
UK3, UK4, UK5, UK6, UK7, UK8, UK10.
|
SE1, SE2,
SE3, SE4, SE5, SE6, SE7, SE9, SE10, SE11.
|
US1, US2,
US4, US5, US6.
|
|
Nation-PLS-DA2
|
UK1, UK4,
UK7, SE1, SE4, SE7, SE11, US4.
|
UK2, UK5,
UK8, SE2, SE5, SE9, US1, US2.
|
UK3, UK6,
UK10, SE3, SE6, SE10, US2, US6
|
Table
5.1 – Nation-PLS-DA class memberships
The
purpose of Nation-PLS-DA2 was to see if it was possible to build a valid PLS-DA
model based upon an arbitrarily selected set of a priori classes.
Allocation of a site into one of the three classes was achieved by the
sequential numbering of the sites e.g., UK1, UK2, UK3 etc.
All sites were then assigned in numerical order into class 1, class 2,
class 3, class 1, class 2 etc. The
X block information for both models consisted of the mean site responses to the
Focus 2002 questions. The default
mean centering and scaling option was selected within SIMCA P+.
The resultant models were refined by an iterative process of inspecting
the variable importance plot (VIP), identifying and removing those responses
that had a VIP value of less than 0.8, and then re-running the model.
This process was repeated until the score scatter plot showed good
separation of the a priori classes, and Q2(cum) became
optimally high. YPredPS values and
model membership probabilities at the 95% confidence level were calculated using
the ‘predictions – classification list’ tool within SIMCA P+.
Final models were validated using the response permutation validation
function within SIMCA P+ with the number of random shuffles set to 20.
A value of 20 random shuffles was selected as there are a restricted
number of possible permutations with 3 classes.
5.2.1.2 – Nation classification - SIMCA method
The PCA-UK2,
PCA-SE2 and PCA-US2 models created in Section 3.5.6 were reloaded into SIMCA P+
(Version 10.0.4.0). All UK, SE and
US site mean Focus 2002 responses were entered into the prediction set for
models PCA-UK2, PCA-SE2 and PCA-US2. The
default mean centering and scaling option was selected within SIMCA P+.
The distance to model for all UK, SE and US sites was calculated for
models PCA-UK2, PCA-SE2 and PCA-US2 using the ‘Predictions-X Block-Column
Plot’ option within SIMCA P+. The
membership probability at the 95% confidence level was calculated for each site
using the ‘Prediction-Prediction List’ option within SIMCA P+.
A Coomans plot of PCA-UK2 versus PCA-SE2 was created by using the
‘Coomans Plot’ option within SIMCA P+.
5.2.2 – Nation
classification results
5.2.2.1 – Nation classification - PLS-DA results
The six component
Nation-PLS-DA1 model had the following values: R2X = 0.883, R2Y
= 0.986 and Q2 (cum) = 0.916. The
first four components account for 96% of the variance and Q2 (cum) =
0.88. SIMCA P+ was unable to
produce a model for the Nation-PLS-DA2 data.
The score and weightings scatter plots for the first three principal
components for model Nation-PLS-DA1 are reproduced in Figures 5.1 to 5.4
inclusive. Figure 5.5 details the
resultant variable importance plot.

Figure
5.1– Nation-PLS-DA1 - Score
scatter plot of the first two principal components showing the discrimination of
the UK, SE and US sites

Figure
5.2 – Nation-PLS-DA1 - Loadings scatter plot for the first two principal
components showing those question responses that discriminate the UK, SE and US
sites

Figure
5.3 - Nation-PLS-DA1 – Score scatter plot for principal components 2 and 3
showing the discrimination of the UK, SE and US sites

Figure
5.4 - Nation-PLS-DA1- Loadings scatter plot for principal components 2 and 3
showing those question responses that discriminate the UK, SE and US sites

Figure
5.5 - Nation-PLS-DA1 – Variable importance plot
The model
overview plot, cross permutation validation plots, membership probabilities and
YPredPS values for model Nation-PLS-DA1 are reproduced in Appendix 14.
The Q2 ordinate intercept of all three cross permutation
validation plots for model Nation-PLS-DA1 are lower than –0.5.
The predictive abilities of the permuted models are therefore
significantly less than the non-permuted model.
Model Nation-PLS-DA1 is therefore valid. Figure 5.1 shows
that the final Nation-PLS-DA1 model is able to discriminate SE from the UK and
US sites. The model is also able to
discriminate the UK and US sites, although the UK4 and US6 sites are not
discriminated well by the first two components. The UK4 and US6 sites are, however, separated by the 3rd
principal component as shown in Figure 5.3.
The SE sites are significantly discriminated from the UK and US sites by
the first principal component. SE
is discriminated by higher than average responses to the questions on the left,
and lower than average responses to those questions on the right of the loadings
scatter plot origin in Figure 5.2. Simultaneous
examination of the variable importance plot in Figure 5.5 together with the
loadings plot in Figure 5.2 indicates those questions that are most able to
discriminate the SE from UK and US sites. SE
is discriminated from the UK and US sites by the greater than average responses
to questions 6, 25, 34, 37, 45, 48, 49, 60, and less than average responses to
questions 7, 10d, 10g, 39. The US is not well discriminated from the UK by
the first principal component. Figures
5.3 shows that the second and, to a lesser extent, the third principal
components discriminate the US from the UK sites.
The US is discriminated from the UK by the higher than average responses
to questions in the upper half, and lower than average responses to the
questions in the lower half of the loadings scatter plot in Figure 5.2.
The UK is discriminated from the US by the higher than average responses
to questions in the lower half and lower than average responses to the questions
in the upper half of the loadings scatter plot in Figure 5.2. Those questions furthest from the origin of Figure 5.2 are
most influential in distinguishing the UK and US sites. All of the above results are verified by visual comparison
with the distribution histograms of the Focus 2002 responses in Appendix 10.
The good discrimination between UK, SE and US sites is exemplified by the
YPredPS values that unambiguously and correctly allocate sites into their
respective nations. As indicated by the inability to formulate principal
components, SIMCA P+ was unable to model the information contained in
Nation-PLS-DA2.
5.2.2.2 – Nation classification - SIMCA results
The results of
the SIMCA analysis are presented graphically in Figures 5.6 to 5.9 inclusive.
Class membership probabilities of each UK, SE and US site belonging to
models PCA-SE2, PCA-UK2 and PCA-US2 are tabulated in Appendix 15.
Inspection of Figure 5.6 demonstrates that:
·
All UK and US sites are above the critical distance to model.
·
All SE sites are below the critical distance to model.
The PCA-SE2 model is therefore
able to discriminate SE sites from UK and US sites.

Figure
5.6 – PCA-SE2 – Distance to model plot showing all non-SE sites above
the
critical distance to model
Figure 5.7 demonstrates
that:
·
All SE and US sites are above the critical distance to model.
·
All UK sites are below the critical distance to model.
The PCA-SE2
model is therefore able to discriminate UK sites from SE and US sites.

Figure
5.7 – PCA-UK2 distance to model plot showing all non-UK sites above
the
critical distance to model
Figure 5.8
demonstrates that:
·
All UK and SE sites are above the critical distance to model.
·
All US sites are below the critical distance to model.
The PCA-US2
model is therefore able to discriminate US sites from UK and SE sites.

Figure
5.8 – PCA-US2 – Distance to model plot showing all non-US sites above
the
critical distance to model
The Coomans
plot in Figure 5.9 is a more efficient way of reporting the information
contained within Figures 5.5 and 5.6. Figure
5.9 shows:
 | All UK sites are within the critical distance to
model of PCA-UK2 and outside the critical distance to model of PCA-SE2. |
 | All SE sites are within the critical distance to
model of PCA-SE2 and outside the critical distance to model of PCA-UK2. |
All US sites are shown to
be outside the critical distance to model of both PCA-UK2 and PCA-SE2.

Figure
5.9– Coomans plot of models PCA-UK2 versus PCA-SE2
The probabilities of each of the
UK, SE and US sites belonging to models PCA-UK2, PCA-SE2 and PCA-US2 are
tabulated in Appendix 15. Appendix
15 indicates that the membership probabilities of sites from one nation
belonging to the PCA model of another nation are less than 1.3 x 10-2
(and generally several orders of magnitude lower than this).
5.2.3 – Nation
classification conclusions
The PLS-DA
modelling detailed in Section 5.2.2.1 and the SIMCA modelling detailed in
Section 5.2.2.2 have been shown able to discriminate UK, SE and US sites.
SIMCA P+ was not able to model the randomly assigned a priori class
information contained within the Nation-PLS-DA2 data.
One can therefore conclude that the discrimination within Nation-PLS-DA1
is not due to SIMCA P+’s ability to discriminate the UK, SE and US nations
purely by chance or as a result of the multivariate nature of the Focus 2002
responses. This fact is reinforced
by the results of the cross permutation validation plots that indicated that the
predictive ability of the permuted models was very much less than that of the
original models. The PLS-DA YPredPS values detailed in Appendix 14 cannot
be numerically compared with the membership probabilities arising from the SIMCA
modelling detailed in Appendix 15. Although
they cannot be numerically compared, subjective comparison of the PLS-DA and
SIMCA techniques indicates that PLS-DA is more consistent in its classification
compared to SIMCA, for example, sites UK5, UK7, UK8, SE3, SE10 and US2 all have
SIMCA membership probabilities of belonging to their nation of 0.23 or less.
This is in contrast to the PLS-DA technique in which all sites have
YPredPS values (listed in Appendix 14.5) of 1 +/- 0.1 (corresponding to a site
belonging to its nation). The ability of PLS-DA and SIMCA models to
discriminate the AstraZeneca UK, SE and US sites is unsurprising.
One would expect that the differences in the pattern of responses are due
to a combination of national and organisational cultural differences.
The work in this chapter was unable to determine whether national
cultural differences dominate over organisational cultural issues or visa versa.
It is hypothesised that national cultural differences dominate over
organisational culture, however, further work in this area is required to
support this hypothesis. By superimposition of the PLS-DA score and
loadings scatter plots, it has been shown possible to identify those questions
that discriminate the UK, SE and US sites.
Table 5.2 details those Focus 2002 question responses that discriminate
the SE sites from the US and UK sites. Table
5.3 details those Focus 2002 question responses that discriminate the UK and US
sites. Being mindful of the subjective nature of assigning themes to
organisational dimensions, the following observations are made with respect to
the information in Tables 5.2 and 5.3:
 | SE personnel are
far less satisfied than UK and US personnel with respect to pay, rewards and
satisfaction with management. |
 | UK personnel are
distinguished from their US colleagues in that they are more likely to feel
undervalued and perhaps oppressed by management. |
Focus
2002 Question Number
|
Focus
2002 Question
|
SE
Response Bias
|
US/UK
Response Bias
|
|
34
|
I am happy with the degree of choice and flexibility
I have in shaping my pay and benefit package.
|
Tend to disagree.
|
Tend to agree.
|
|
45
|
AstraZeneca is socially responsible in the community.
|
Tend to disagree.
|
Tend to agree.
|
|
6
|
Management supports equal opportunity for all
employees.
|
Tend to disagree.
|
Tend to agree.
|
|
49
|
I am frequently worried about being made redundant.
|
Tend to disagree.
|
Tend to agree.
|
|
10a
|
In AstraZeneca:
Our traditional ways of doing things can be
challenged
|
Tend to disagree.
|
Tend to agree.
|
|
48
|
AstraZeneca demonstrates commitment to the health and
well-being of its employees.
|
Tend to disagree.
|
Tend to agree.
|
|
56
|
Decision-making
in AstraZeneca is:
|
Tend to ‘too fast’ through to ‘no opinion’.
|
Tend to ‘about right.
|
|
59
|
How good a job is AstraZeneca doing in linking pay to
performance.
|
Tend to no opinion’.
|
Tend to ‘very good’.
|
|
25
|
Pay in AstraZeneca is as good as or better than the
pay in other organisations in our industry.
|
Tend to disagree.
|
Tend to agree.
|
|
13
|
AstraZeneca makes adequate use of recognition other
than money to encourage good performance.
|
Tend to disagree.
|
Tend to agree.
|
|
39
|
In AstraZeneca, there is adequate opportunity for
employees to learn about internal vacancies.
|
Tend to agree.
|
Tend to disagree.
|
|
10g
|
In AstraZeneca: People have fun while doing their
work.
|
Tend to agree.
|
Tend to disagree.
|
|
10d
|
In AstraZeneca: People
dare to take the initiative.
|
Tend to agree.
|
Tend to disagree.
|
Table
5.2 – The Focus 2002 question responses that discriminate the US sites from UK
and SE sites
Focus
2002 Question Number
|
Focus
2002 Question
|
UK
Bias
|
US
Bias
|
|
60
|
At
the present time, are you seriously considering leaving AstraZeneca?
|
Tend to yes.
|
Tend to
no/don’t know.
|
|
46
|
My
team work well together.
|
Tend to agree.
|
Tend to
disagree.
|
|
26
|
I
receive the training and development I need to help prepare me for other
roles.
|
Tend to disagree.
|
Tend to agree.
|
|
41
|
I
believe AstraZeneca is an environmentally responsible company.
|
Tend to disagree.
|
Tend to agree.
|
|
10e
|
In AstraZeneca: New ideas can fail without penalty to
the originating person
|
Tend to disagree.
|
Tend to agree.
|
|
37
|
Management
supports diversity in the workplace.
|
Tend to disagree.
|
Tend to agree.
|
|
7
|
I
think my job is considered important in AstraZeneca.
|
Tend to disagree.
|
Tend to agree.
|
|
10c
|
In
AstraZeneca: People receive recognition for innovation
|
Tend to disagree.
|
Tend to agree.
|
|
43
|
There
is good co-operation across functions/companies in AstraZeneca.
|
Tend to disagree.
|
Tend to agree.
|
Table
5.3 – The Focus 2002 question responses that discriminate the UK and US sites
5.3 – Classification of sites based upon SIFR performance
5.3.1 – SIFR
site classification methods
5.3.1.1 – SIFR PLS-DA
method
The mean
responses to the Focus 2002 survey questions for each of the UK, SE and US sites
were entered into SIMCA P+ (Version 10.0.4.0). The default mean centering and
scaling option was selected within SIMCA P+. The following four separate PLS-DA models were created:
 | SIFR-PLS-DA-UK1 model consisted of all UK sites
classified into one of two a priori classes. |
 | SIFR-PLS-DA-UK2 model consisted of all UK sites
classified into one of three a priori classes. |
 | SIFR-PLS-DA-SE1 model consisted of all SE sites
classified into one of two a priori classes. |
 | SIFR-PLS-DA-US1 model consisted of all US sites
classified into one of two a priori classes. |
The
a priori class limits were assigned arbitrarily as representing
reasonably distinct bands of SIFR performance.
The reason for creating a three- as well as a two-class model for the UK
was to test the discriminatory power of PLS-DA. The UK was chosen for the exercise as it possessed three
reasonably distinct bands of SIFR performance.
Details of the a priori class range for each of the models are
summarised in Table 5.4. The PLS-DA model option within SIMCA P+ was
selected. The initial models were
refined by the iterative process of inspecting the variable importance plot
(VIP), identifying and removing those responses that had a VIP of less than 0.8
and re-running the model. This
process was repeated until the score scatter plot showed good separation between
the a priori classes and Q2 (cum) became optimally high.
SIFR-PLS-DA
Model Name
|
SIFR
Class Range (yr-1)
|
Class 1
|
Class 2
|
Class 3
|
|
SIFR-PLS-DA-UK1
|
Class 1
<6
Class
2
>6
|
UK1, UK3,
UK4, UK5, UK7, UK8, UK10
|
UK2, UK6
|
Not
applicable
|
|
SIFR-PLS-DA-UK2
|
Class 1
<2
Class 2
2 to 6
Class 3
>6
|
UK3, UK4,
UK7,
|
UK1, UK5,
UK8, UK10
|
UK 2, UK 6
|
|
SIFR-PLS-DA-SE1
|
Class 1
<6
Class
2 >6
|
SE2, SE3,
SE4, SE5, SE6, SE7, SE10, SE11
|
SE1, SE9.
|
Not
applicable
|
|
SIFR-PLS-DA-US1
|
Class 1
<7
Class 2
>7
|
US1, US5,
US6
|
US2, US4
|
Not
applicable
|
Table
5.4 – SIFR-PLS-DA model classes
5.3.1.2 – SIFR SIMCA method
All of the UK
sites were classified into one of three classes and the SE and US sites
classified into one of two classes based upon the site injury frequency rates.
The three classes, together with SIMCA model names, are given in Table
5.5.
Nation
|
SIFR
Class Range (yr-1)
|
Class 1
[Model
Name]
|
Class 2
[Model
Name]
|
Class 3
|
|
UK
|
Class 1
<2
Class 2
2 to 6
Class 3
>6
|
UK3, UK4,
UK7.
[SIFR-SIMCA-UK1]
|
UK1, UK5,
UK8, UK10.
[SIFR-SIMCA-UK2]
|
UK 2, UK 6
|
|
SE
|
Class 1
<6
Class 2
>6
|
SE2, SE3,
SE4, SE5, SE6, SE7, SE10, SE11
[SIFR-SIMCA-SE1]
|
SE1, SE9.
|
Not
applicable
|
|
US
|
Class 1
<7
Class 2
>7
|
US1, US5,
US6
[SIFR-SIMCA-US1]
|
US2, US4
|
Not
applicable.
|
Table
5.5 – SIFR-SIMCA model classes
The
arithmetic mean UK, SE and US mean Focus 2002 responses were entered into SIMCA
P+ (Version 10.0.4.0). The default
mean centering and scaling option was selected within SIMCA P+.
The SIMCA PCA models were refined using the same methodology explained
in Section 5.2.1.2. The
SIMCA P+ ‘Predictions-X Block-Column Plot’ option was used to calculate the
distance to model of:
 | All UK sites from SIFR-SIMCA-UK1 and SIFR-SIMCA-UK2; |
 | All SE sites from SIFR-SIMCA-SE1; |
 | All US sites from SIFR-SIMCA-US1. |
Class
membership probabilities at the 95% confidence level were calculated for each
site using the ‘Prediction-Prediction List’ option within SIMCA P+.
A Coomans plot of SIFR-SIMCA-UK1 versus SIFR-SIMCA-UK2 was created.
5.3.2. – SIFR classification results
5.3.2.1 – SIFR-PLS-DA - results
All SIFR-PLS-DA
models were successfully built. The
results of the SIFR-PLS-DA models are summarised in Table 5.6.
|
Model Name
|
Number
Of Principal Components
|
R2X
|
R2Y
|
Q2
(cum)
|
|
SIFR-PLS-DA-UK1
|
2
|
0.793
|
0.992
|
0.922
|
|
SIFR-PLS-DA-UK2
|
3
|
0.705
|
0.950
|
0.689
|
|
SIFR-PLS-DA-SE1
|
3
|
0.881
|
0.976
|
0.590
|
|
SIFR-PLS-DA-US1
|
1
|
0.636
|
0.770
|
0.627
|
Table
5.6 – SIFR-PLS-DA results summary
The score and
loadings scatter plots for the SIFR-PLS-DA models are reproduced in Figures 5.10
to 5.20. The graphical model
overview, cross permutation validation plots, model membership probabilities and
YPredPS values for the SIFR-PLS-DA models are reproduced in Appendix 16.
The Q2 ordinate intercepts of all of the SIFR-PLS-DA cross
permutation validation plots are below 0.05.
The SIFR-PLS-DA models can therefore be assumed to have some validity.
Inspection of Figures 5.10, 5.16 and 5.19 of the score scatter plots for
the two-class models SIFR-PLS-DA-UK1, SIFR-PLS-DA-SE1 and SIFR-PLS-DA-US1
indicates that PLS-DA is able to discriminate the poorer performing AstraZeneca
sites in the UK, SE and US. Figure
5.13 of the three-class SIFR-PLS-DA-UK2 model indicates that PLS-DA is able to
discriminate sites with ‘good’, ‘average’ and ‘poor’
significant-injury rate performance. Inspection
of the SIFR-PLS-DA-UK1, SIFR-PLS-DA-UK2, SIFR-PLS-DA-SE1 and SIFR-PLS-DA-US1
variable importance plots (Figures 5.12, 5.15, 5.18 and 5.21 respectively)
indicates that:
 | The 28 question responses retained within model
SIFR-PLS-DA-UK1 have similar variable importance. |
 | The majority of the 71 question responses retained
in model SIFR-PLS-UK2 have similar variable importance.
Question responses 8b, 12, 39, 30b, and 41 have marginally higher
variable importance. |
 | Of the 74 retained questions in model
SIFR-PLS-DA-SE1, question responses 15, 17, 21, 22, 23, 28, 41, 44, and 60
are significantly more influential at discriminating the SE sites compared
to the other question responses. |
 | Of the 39 question responses retained in model
SIFR-PLS-DA-US1, the majority of the responses have similar variable
importance. Question responses
6, 19, 37 and 49 have relatively low variable importance. |

Figure
5.10 – SIFR-PLS-DA-UK1 - Score scatter plot for the first two principal
components showing the discrimination of the poorer SIFR performing UK sites
(class 2) from the other UK sites (class 1)

Figure
5.11 – SIFR-PLS-DA- UK1 - Loadings scatter plot for the first two principal
components showing those Focus survey question responses that discriminate the
poorer SIFR performing UK sites (class 2) from the other UK sites (class 1)
Figure
5.12 - SIFR-PLS-DA-UK1 – Variable importance plot

Figure
5.13 – SIFR-PLS-DA-UK2 - Score scatter plot for the first two principal
components showing the discrimination between poor (class 3), average (class 2)
and better SIFR performing UK sites (class 1)

Figure
5.14 – SIFR-PLS-DA-UK2 - Loadings scatter plot for the first two principal
components showing those questions that discriminate the poor (class 3), average
(class 2) and better (class 1)
SIFR
performing UK sites
Figure
5.15 - SIFR-PLS-DA-UK2 – Variable importance plot

Figure
5.16 – SIFR-PLS-DA-SE1 - Score scatter plot for the first two principal
components showing the discrimination of the poorer SIFR performing SE sites
(class 2) from the other SE sites (class 1)

Figure
5.17 – SIFR-PLS-DA-SE1 -- Loadings scatter plot for the first two principal
components showing those questions that discriminate the poorer SIFR performing
SE sites (class 2) from the
other
SE sites (class 1)

Figure
5.18 - SIFR-PLS-DA-SE1 – Variable importance plot

Figure
5.19– SIFR-PLS-DA-US1 - Score scatter plot for the first principal component
showing discrimination of the poorer SIFR performing US sites (class 2) from the
other US sites (class 1)

Figure
5.20 – SIFR-PLS-DA-US1 – Loadings column plot for the first principal
component showing those questions that discriminate the poorer SIFR performing
US sites (class 2) from the other US sites (class 1)

Figure
5.21 - SIFR-PLS-DA-US1 - Variable importance plot
Section 2.9.2.2
explained how comparison of a model’s score and scatter plots allows the
identification of those questions that are influential in discriminating the a
priori classes, in this case, those that discriminate sites of differing
SIFR performance. Inspection of Figures 5.10 and 5.11 for model
SIFR-PLS-DA-UK1 demonstrates that the poorer SIFR performing sites UK2 and UK6
are discriminated from the other UK sites by the following response biases:
higher than average responses to those Focus 2002 questions in the left
half of the loadings scatter plot origin of Figure 5.11.
Inspection of Figures 5.16 and 5.17 for model SIFR-PLS-DA-SE1 indicates
that the poorer SIFR performing sites SE1 and SE9 are discriminated from other
SE sites by the following response biases:
higher than average Focus 2002 responses to those questions in the lower
left quadrant and less than average responses to those Focus 2002 questions in
the upper right quadrant of the loadings scatter plot of Figure 5.17.
Finally, inspection of Figures 5.19 and 5.20 for model SIFR-PLS-DA-US1 indicates
that the poorer SIFR performing sites US2 and US4 are discriminated from other
US sites by the following response biases:
above average responses to those Focus 2002 questions in the lower half
and below average responses to those Focus 2002 questions in the upper half of
Figure 5.20.
5.3.2.2 –SIFR SIMCA results
PCA models
SIFR-SIMCA-UK1, SIFR-SIMCA-UK2, SIFR-SIMCA-SE1 and SIFR-SIMCA-US1 were
successfully created. The results
of the modelling are summarised in Table 5.7.
|
Model
Name
|
Sites
|
Number
Of Components
|
R2X
|
Q2(cum)
|
|
SIFR-
SIMCA-UK1
|
UK3,
UK4, UK7
|
1
|
0.936
|
0.889
|
|
SIFR-SIMCA-UK2
|
UK1,
UK5, UK8, UK10
|
1
|
0.860
|
0.793
|
|
SIFR-SIMCA-SE1
|
SE2,
SE3, SE4, SE5, SE6, SE7, SE10, SE11
|
3
|
0.980
|
0.930
|
|
SIFR-
SIMCA-US1
|
US1,
US5, US6
|
1
|
0.973
|
0.951
|
Table
5.7 – SIFR-SIMCA – Results summary
The
distance to model information is provided in Figures 5.22 to 5.26.
The SIMCA P+ graphical outputs and membership probability tables for the
above models are provided in Appendix 17.
The
ability of the model SIFR-SIMCA-UK1 to discriminate UK3,
UK4 and UK7 from the other sites can be determined by inspection of Figure 5.22.
Figure 5.22 shows that sites UK3, UK4 and UK7 (class 1) are within the
critical distance to model of SIFR-SIMCA-UK1.
All other sites are found outside the critical distance to model.
Figure 5.22 –
SIFR-SIMCA-UK1 - Distance to model plot
showing the best SIFR performing UK Sites (Class 1:
UK3, UK4, UK7) below the critical distance to model
The
ability of model SIFR-SIMCA-UK2 to classify UK1, UK5, UK8 and UK10 (class 2)
from the other UK sites can be determined by inspection of Figure 5.23.
Figure 5.23 shows that UK1, UK5, UK8 and UK10 are within the critical
distance to model of SIFR-SIMCA-UK2. All
other sites are found to be above the critical distance to model.
Figure 5.23 –
SIFR-SIMCA-UK2 - Distance to model plot
showing the average SIFR performing UK sites (Class 2: UK1, UK5, UK8, UK10)
below the critical distance to model
Inspection
of the Coomans plot in Figure 5.24 indicates that sites UK2 and UK6 are outside
the critical distance to model of both SIFR-SIMCA-UK1 and SIFR-SIMCA-UK2.

Figure
5.24 –SIFR SIMCA-UK1 Versus SIFR SIMCA-UK2 Coomans Plot showing the poorer
SIFR performing UK sites (Class 3: UK2, UK6) outside the critical distance to
model for both models

Figure
5.25 – SIFR-SIMCA-SE1 - Distance to model plot showing the poorer SIFR
performing SE sites (Class 2: SE1, SE9) above the critical distance to model
The distance to
model SIFR-SIMCA-SE1 plot in Figure 5.25 shows that sites SE1 and SE9 (class 2)
are both above the critical distance to model.

Figure
5.26 – SIFR-SIMCA-US1 - Distance to model plot showing the poorer SIFR
performing US sites (Class 2: US2, US4) above the critical distance to model
The distance to
model SIFR-SIMCA-US1 plot in Figure 5.26 shows that sites US2 and US4 (class 2)
are both above the critical distance to model.
The SIFR-SIMCA model membership probabilities are detailed in Appendix
17. Inspection of the SIFR-SIMCA-UK1 membership probabilities indicates
that sites other than UK3, UK4, and UK7 have SIFR-SIMCA-UK1 membership
probabilities of 0. Sites UK3, UK4,
and UK7 all have SIFR-SIMCA-UK1 membership probabilities of greater than 0.23.
Sites UK2 and UK6 which do not belong to either SIFR-SIMCA-UK1 or
SIFR-SIMCA-UK2 have membership probabilities of 0. Inspection of the
SIFR-SIMCA-UK2 membership probabilities indicates that sites other than UK1,
UK5, UK8 and UK10 have membership probabilities of less than 0.03. Sites UK1, UK5, UK8 and UK10 all have SIFR-SIMCA-UK2
membership probabilities of greater than 0.21. Inspection of the
SIFR-SIMCA-SE1 membership probabilities indicates that site SE1 has a membership
probability of 0. The membership
probability of SE9 is 0.02. All
other SE site membership probabilities are greater than 0.16.
Inspection of the SIFR-SIMCA-US1 membership probabilities indicates that
site US4 has a membership probability of 0.
The membership probability of US2 is 0.006.
The remaining US sites have membership probabilities of greater than
0.24. This shows that the model is
able to discriminate sites US4 and US2 from the other US sites.
Based upon the above information, models SIFR-SIMCA-UK1, SIFR-SIMCA-UK2,
SIFR-SIMCA-SE1 and SIFR-SIMCA-US1 have been proven to be able to discriminate
the poorer SIFR performing sites from the better SIFR performing sites in the
UK, SE and US respectively.
5.3.3 – SIFR classification conclusions
The YPredPS values for the
two-class SIFR-PLS-UK1 model are higher than that for the three-class
SIFR-PLS-UK2 model. These values
fall in line with expectations. Intuitively,
for the same data, one would expect the YPredPS values of a two-class model to
be higher than those of a three-class model.
Lower YPredPS values are expected due to decreased resolution between the
classes as the number of classes increases.
Inspection of
the score scatter plots together with the corresponding loadings scatter plots
of the SIFR-PLS-DA models shows those questions that discriminate the sites
based upon their SIFR performance. Tables
5.8, 5.9 and 5.10 detail those questions that discriminate the poorer SIFR
performing sites in the UK, SE and US respectively.
The Focus 2002 question responses within Tables 5.8, 5.9 and 5.10 include
only those questions that load highly within the model, i.e. those that are
furthest away from the PLS-DA score and scatter plot origins and have
correspondingly high variable importance values.
Table 5.8
provides an insight into the organisational cultural factors that are related to
SIFR performance within the UK. All
of the question response biases within Table 5.8 fall in line with expectations;
for example, one can easily envisage that an accident-prone site may be
characterised by the following attributes:
·
Managers who do not provide direction (Question 20a).
·
Managers who do not effectively communicate ideas (Question 20c).
·
Managers who do not provide constructive feedback for improvement
(Question 20e).
·
Personnel who have unclear performance targets (Question 22).
It is noted
that none of the Focus 2002 question responses that discriminates the poorer
performing sites UK2 and UK6 from the other UK sites are directly related to
safety. It is also noted, again in
line with expectations, that the majority of questions loading highly within
SIFR-PLS-DA-UK1 (Pearson) correlate with SIFR performance at the 95%
significance level.
Inspection of
Table 5.9 provides an insight into the organisational cultural factors that are
related to SIFR performance in SE. With
the exception of Focus 2002 question number 41, all of the questions fall in
line with expectations. For
example, one would expect poorer performing sites to be characterised by:
 | Having employees who have unclear performance
targets (question 22), are dissatisfied with their pay (question 55b), who
want to leave AstraZeneca employment (question 60) and who are of the
opinion that there are insufficient resources to do the job well (question
44). |
 | An environment where safety rules are broken
(question 17). |
One would,
however, not expect poorer performing SIFR sites to be characterised by workers
believing that AstraZeneca is an environmentally responsible company (question
41).
|
Focus
2002 Question Number
(Note 1)
|
Focus
2002 Question
|
Response
Bias For Sites UK2 And UK6
|
Response
Bias For UK sites Other Than UK2 and UK6.
|
|
1
|
In AstraZeneca teamwork is encouraged.
|
Tend to disagree.
|
Tend to agree.
|
|
(8a)
|
Communication
in my team is: Open
|
Tend to disagree.
|
Tend to agree.
|
|
(8c)
|
Communication in my team is:
Direct
|
Tend to disagree.
|
Tend to agree.
|
|
(20a)
|
My
immediate manager communicates a clear direction for our team.
|
Tend to disagree.
|
Tend to agree.
|
|
(20c)
|
My immediate manager
effectively communicates his/her ideas.
|
Tend to disagree.
|
Tend to agree.
|
|
(47)
|
My
immediate manager encourages me to take responsibility for my own
development.
|
Tend to disagree.
|
Tend to agree.
|
|
(20e)
|
My immediate manager gives
constructive feedback for improvement.
|
Tend to disagree.
|
Tend to agree.
|
|
(22)
|
My
performance targets are clear.
|
Tend to disagree.
|
Tend to agree.
|
|
(27b)
|
My
immediate manager takes work/life balance into account when: Assigning
work.
|
Tend to disagree.
|
Tend to agree.
|
|
(53b)
|
I have a clear
understanding of the performance targets of: My function/company.
|
Tend to disagree.
|
Tend to agree.
|
|
(38a)
|
I
am sufficiently informed about the performance of: My team.
|
Tend to disagree.
|
Tend to agree.
|
Note 1: The Focus question numbers in parentheses
indicate those that (Pearson) correlate with SIFR above the significance
threshold at 95% confidence limit.
Table
5.8 – The Focus 2002 questions that discriminate UK sites of differing SIFR
performance
Question
Number (Note 1)
|
Focus
2002 Question
|
Response
Bias For Sites SE1 And SE9
|
Response
Bias For SE sites Other Than SE1 and SE9
|
|
(21)
|
I
have sufficient authority to do my job well.
|
Tend to disagree.
|
Tend to agree.
|
|
22
|
My
performance targets are clear.
|
Tend to disagree.
|
Tend to agree.
|
|
(44)
|
I
have the resources I need to do my job well.
|
Tend to disagree.
|
Tend to agree.
|
|
(28)
|
My
work area is a safe place to work.
|
Tend to disagree.
|
Tend to agree.
|
|
(17)
|
Safety
rules are carefully observed, even if it means work is slowed down.
|
Tend to disagree.
|
Tend to agree.
|
|
(25)
|
Pay
in AstraZeneca is as good as or better than the pay in other organisations
in our industry.
|
Tend to disagree.
|
Tend to agree.
|
|
(23)
|
The
quality of work produced by my team is excellent.
|
Tend to disagree.
|
Tend to agree.
|
|
(55b)
|
How
satisfied are you with your benefits package?
|
Tend toward very
dissatisfied.
|
Tend to Very satisfied.
|
|
60
|
At
the present time, are you seriously considering leaving AstraZeneca?
|
Tend to Yes.
|
Tend to No.
|
|
41
|
I
believe AstraZeneca is an environmentally responsible company.
|
Tend to agree.
|
Tend to disagree.
|
Note
1: The Focus question numbers in parentheses indicate those that correlate with
SIFR above the significance threshold at 95% confidence limit.
Table
5.9 – The Focus 2002 questions that discriminate SE sites of differing SIFR
performance
|
Question
Number (Note 1)
|
Focus
2002 Question
|
Response
Bias For Sites US2 and US4
|
Response
Bias For SE sites Other Than US2 and US4
|
|
5
|
There
are adequate security measures where I work.
|
Tend to disagree.
|
Tend to agree.
|
|
2
|
I
have a very clear idea of my job responsibilities.
|
Tend to agree.
|
Tend to disagree.
|
|
10f
|
In
AstraZeneca: Ideas are put into action.
|
Tend to agree.
|
Tend to disagree.
|
|
26
|
I
receive the training and development I need to help prepare me for other
roles.
|
Tend to agree.
|
Tend to disagree.
|
|
14
|
I receive the training and development I need to do
my current job.
|
Tend to agree.
|
Tend to disagree.
|
|
(55a)
|
How
satisfied are you with:
Your pay?
|
Tend to disagree.
|
Tend to agree.
|
|
21
|
I
have sufficient authority to do my job well.
|
Tend to agree.
|
Tend to disagree.
|
|
(22)
|
My
performance targets are clear.
|
Tend to agree.
|
Tend to disagree.
|
|
44
|
I
have the resources I need to do my job well.
|
Tend to agree.
|
Tend to disagree.
|
|
34
|
I
am happy with the degree of choice and flexibility I have in shaping my
pay and benefit package.
|
Tend to agree.
|
Tend to disagree.
|
|
40
|
My
job performance is evaluated fairly.
|
Tend to agree.
|
Tend to disagree.
|
Note
1: The Focus question numbers in parentheses indicate those that correlate with
SIFR above the significance threshold at 95% confidence limit.
Table
5.10 – The Focus 2002 questions that discriminate US sites of differing SIFR
performance
Unlike
the UK discriminatory-question responses, the SE discriminatory-question
responses include a question that is directly related to safety, namely,
question 5. Table 5.10 shows
that that the responses to some of the US Focus 2002 questions that discriminate
the poorer performing sites from the other US sites fall in line with
expectations; for example, poorer US SIFR performing sites are characterised by
staff who are dissatisfied with pay (Question 55a) and are of the opinion that
there are inadequate security measures where they work (Question 5).
One would, however, not expect that the poorer performing sites were
characterised by staff who had clear performance targets (Question 22), adequate
resources to do their job well (Question 44), were happy with choice of
flexibility in shaping the pay and benefit packages (Question 34) and were of
the opinion that job performance is evaluated fairly (Question 40).
Section 3.5.4 (Pearson)
correlated the Focus 2002 questions with SIFR rates for the UK, SE and US sites.
The parentheses in column 1 of tables 5.8, 5.9 and 5.10 indicates those
questions that were discovered to be (Pearson) correlated with SIFR performance
at the 95% confidence level. Tables
5.8, 5.9 and 5.10 therefore indicate, as one would expect, that PLS-DA can
discriminate sites of differing SIFR performance based upon question responses
that are not themselves (Pearson) correlated to SIFR above the level of
significance. Section 5.2.3
compared the relative efficiencies of PLS-DA and SIMCA techniques in classifying
sites of different nations. Section
5.2.3 concluded that PLS-DA was more consistent in its ability to classify UK,
SE and US sites to their respective nations.
Comparison of the SIFR PLS-DA YPredPS values detailed in Appendix 16 with
the SIFR SIMCA membership probabilities detailed in Appendix 17 indicates that
PLS-DA is also more consistent than SIMCA with regard to its ability to
discriminate sites of varying SIFR performance.
As an example, sites UK1 and UK7 have SIMCA class membership
probabilities (of the site belonging to either SIMCA-UK1 or SIMCA-UK2) of 0.21
and 0.23 respectively. The other UK
sites have SIMCA membership probabilities ranging from 0.42 (UK3 being a member
of SIMCA-UK1) to 1 (UK10 being a member of SIMCA-UK2).
In comparison, the YPredPS values for a UK site belonging to its class
within the three-class SIFR-PLS-DA-UK2 model is about 1 +/-
0.16.
5.4 – Classification Analysis conclusions
By the application of PLS-DA and
SIMCA techniques on the site mean responses to the Focus 2002 survey:
 | The work within Section 5.2 showed that it is
possible to discriminate AstraZeneca UK, SE and US sites from one another. |
 | The work within Section 5.3 showed that it is
possible to discriminate poorer SIFR performing AstraZeneca UK, SE and US
sites from those with better SIFR performance. |
Simultaneous
inspection of the PLS-DA score and loadings scatter plots allowed the
identification of the questions that facilitated the above discrimination.
Examination of the PLS-DA variable importance plots identified those
questions that most discriminated the class groups.
The above observations answer research questions 14, 15, 16, and 17
listed in Table 1.1.
The ability to
discriminate nations was hypothesised to be due to a combination of national and
organisational cultural differences. It
was suggested that further work was required to determine whether the Focus 2002
responses were dominated by national or organisational culture.
PLS-DA and
SIMCA techniques have proven useful in answering the above research questions.
Both techniques were shown able to discriminate sites of differing SIFR
performance using Focus 2002 question responses that do not (Pearson) correlate
with SIFR performance. These
question responses would be typically disregarded in standard bivariate
analysis.
Comparison of
the nation and SIFR PLS-DA YPredPS values and SIMCA model membership
probabilities indicated that the PLS-DA is more consistent in its ability to
discriminate classes (within the AstraZeneca Focus and GSHE SIFR 2002 data).
Discovery that, for the AstraZenca 2002 data, PLS-DA is superior to the
SIMCA methodology is unsurprising. In
PLS-DA the principal components are constructed so as to maximise the
discrimination between the a priori classes.
During the process of PLS-DA model optimisation, X block data that do not
help discriminate classes is removed from the model.
In the SIMCA approach, separate a priori class PCA models are
built. In the process of PCA model
optimisation, the principal components are constructed to best represent all of
the X block data. During PCA model
optimisation X block data that cannot be predicted well by the model are
removed. One would expect that
during the process of PCA model optimisation, question responses may be removed
that would be useful at discriminating nations.
Given that the PLS-DA model optimisation process is focused toward class
discrimination, one would expect its class discriminating ability to be superior
to the SIMCA approach. Given that
the PLS-DA technique appears to provide better class discrimination than the
SIMCA technique and, PLS-DA’s ability to graphically identify the variables
that most discriminate objects, PLS-DA is likely to be the preferred technique
when the number of a priori classes is 5 or less. If the number of a priori classes exceeds 5, the
preferred technique is likely to be SIMCA, due to easier model interpretation,
as explained in Section 2.9.2.4.
Although the
purpose of this chapter was not to label an organisational construct that
discriminates good SIFR performing sites from poorer performing ones, inspection
of Tables 5.8, 5.9 and 5.10 provides an insight into the factors that are
related to SIFR performance at the AstraZeneca UK, SE and US sites.
Based upon the above results, the following themes are noted:
·
Poorer SIFR performing sites within the UK are characterised by
individuals who perceive management to be poor communicators and directors.
·
Poorer SIFR performing sites within SE are characterised by
individuals who perceive they work in an environment that is under-resourced and
where individuals within it are not appropriately rewarded.
Labelling of a
theme to characterise poorer SIFR performing US sites is perhaps inappropriate
due to the majority of the Focus 2002 question responses that correlate with
poor performance being non-intuitive. This
US discrepancy may be a result of the poorer SIFR-PLS-DA-US1 model only having
five site data points.
|