Download Manual p2

Transcript
Manual p2
version 2.0.0.7
Bonne J.H. Zijlstra
Marijtje A.J. van Duijn
Contents
1 General Information
2
2 A short introduction to the p2 model
2.1 Network data . . . . . . . . . . . . . . . . . . .
2.2 The p1 model . . . . . . . . . . . . . . . . . . .
2.3 Extending the p1 model to include covariates: p2
2.4 Dyadic attributes (Zij ) . . . . . . . . . . . . . .
2.5 Covariate effects . . . . . . . . . . . . . . . . . .
3
3
4
4
5
6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Getting started
7
4 Input data
8
5 Model selection
9
6 Model specifications
9
7 Output
12
8 Formulas for effects
16
9 Limitations
17
10 p2 files
18
10.1 the data file . . . . . . . . . . . . . . . . . . . . . . . 18
10.2 the input file . . . . . . . . . . . . . . . . . . . . . . 20
10.3 the design file . . . . . . . . . . . . . . . . . . . . . . 21
11 Creating dyadic covariates for
attributes
11.1 adjusting the input file . . .
11.2 adjusting the design file . .
11.3 running p2 . . . . . . . . . .
specific values of actor
23
. . . . . . . . . . . . . . 23
. . . . . . . . . . . . . . 24
. . . . . . . . . . . . . . 24
12 Appendix A:
Input file for example
26
13 Appendix B:
Design file for example
26
14 References
27
1
1
General Information
The p2 program performs calculations for the p2 model as proposed
by Van Duijn and Snijders (see e.g. Lazega & Van Duijn, 1997;
Van Duijn, 1995). The p2 model is a model for the analysis of
directed binary relationship data. It computes sender, receiver,
density, and reciprocity effects. Covariates can be included to explain these effects. The p2 program is incorporated in StOCNET
(Boer, P. et al., 2001), an open software system for the analysis of
social networks. StOCNET is freely distributed from the website
http://stat.gamma.rug.nl/stocnet.
2
Part I
User’s manual
2
A short introduction to the p2 model
The p2 model is designed for statistical analysis of social networks.
Social networks consist of actors and variables indicating their ties.
Often ties simply are recorded by asking actors with whom they have
ties. However, there are many possible measures for ties and actors
do not necessarily have to be persons, but could also be firms, countries, etc. The p2 model analyzes complete networks. This means
that within the networks everyone can possibly be tied to everyone
else, although information on the presence of some ties are allowed
to be missing. In practice, this is the case in closed settings, e.g.
villages, organizations or school classes. The p2 model allows for
some observations from these complete networks to be missing.
2.1
Network data
The p2 model analyzes dichotomous network data, representing ties
that are either present (1) or absent (0). The data are collected in
a square adjacency matrix Y with elements yij indicating a relation
from actor i directed towards actor j (i indicates rows and j columns
in Y ). Below, an example of an adjacency matrix (Y ) indicating the
relations between actors a, b, c, and d is shown:
a
b
c
d





a
0
1
1
0
b
1
0
0
1
c
0
1
0
0
d
0
0
1
0





From such an adjacency matrix, dyads can be derived (Wasserman
& Faust, 1994). A dyad consists of two tie indicator variables Y for
two directed ties between two actors, usually denoted by Dij = (yij ,
yji )= (y1 , y2 ).
3
Three types of dyads are distinguished:
reciprocated (or mutual) dyad → (ya,b , yb,a )= (1, 1)
asymmetric dyad
→ (ya,c , yc,a )= (0, 1)
null dyad
→ (ya,d , yd,a )= (0, 0)
2.2
The p1 model
The p2 model can be seen as an extension of the p1 model introduced by Holland and Leinhardt (1981). The p1 model specifies the
probability for dyads in a network with n actors:
P (Yij = y1 , Yji = y2 )
= exp{y1 (µ + αi + βj ) + y2 (µ + αj + βi ) + y1 y2 ρ}/kij ,
for y1 , y2 = 0, 1; i, j = 1, . . . , n; i 6= j.
Here, ρ can be seen as a reciprocity parameter, since it is the only
term that is involved when ties in both directions are present. The
parameter µ can be recognized as an overall density parameter. For
sender i in the tie indicator variable yij , αi can be seen to be a sender
parameter. For sender j in the opposed tie indicator variable yji , αj
can again be seen as a sender parameter. In a similar manner, the
parameters βj and βi can be recognized as a receiver parameter for
the tie indicator variables yij and yji , respectively. The parameter
kij is a normalizing constant.
Note that in the p1 model the dyads are mutually independent.
2.3
Extending the p1 model to include covariates: p2
The p2 model allows covariates as predictors for the sender, receiver,
density, and reciprocity effects. Within the p2 model the sender and
receiver effects are reformulated, using a regression model, as:
α = X1 γ1 + A,
and
β = X2 γ2 + B,
where α and β are vectors containing the sender and receiver effects
and X1 , X2 are matrices with covariates for the sender and receiver
effects with coefficients γ1 and γ2 , respectively. A and B are random effects with (co)variances σA2 , σB2 , σAB and E(A)= E(B)= 0.
4
Between actors the random effects are assumed independent.
These substitutions can be regarded as a bivariate regression model
for the pairs (αi , βi ).
Replacing the sender and receiver effects by a function of covariates
and random effects, reduces the number of parameters to be estimated. This allows the density and reciprocity parameters to vary
over the dyads. In the p1 model this was not allowed. The density
and reciprocity parameters are reformulated as:
µij = µ + Z1ij δ1 ,
and
ρij = ρ + Z2ij δ2 ,
where Z1ij , Z2ij are matrices containing dyadic attributes for the
density and the reciprocity effects with δ1 and δ2 vectors containing coefficients for the density and reciprocity effect, respectively.
(Dyadic attributes have values for each pair of actors, i, j = 1, . . . , n,
i 6= j.) µ And ρ are the constant parts of µij and ρij . Because ρij
represents reciprocity, ρij = ρji is assumed and therefore the dyadic
attributes that are used as covariates for the reciprocity parameter
are supposed to be equal as well (Z2ij = Z2ji ).
2.4
Dyadic attributes (Zij )
Covariates for the density and reciprocity parameters can vary over
dyads. Hence, they are called dyadic attributes. They can be represented by a matrix. For attributes that are covariates for the
reciprocity parameter, the matrix must be symmetrical.
Dyadic attributes can be collected for each combination of actors
(each dyad), like network data are collected. Dyadic attributes can
be derived from actor attributes as well. We often use differences
and absolute differences of actor attributes in dyads. Below, there
is an example with two male (coded ’1’) and two female (coded ’0’)
actors. Both the difference between the actors and the absolute difference derived from this dummy variable are illustrated. (Of course,
there are more possibilities for deriving dyadic attributes from actor
attributes.)
5
actor
sex dummy
a
male
1
b
male
1
c
female
0
d
female
0
difference
0 0 1 1
0 0 1 1
-1 -1 0 0
-1 -1 0 0
absolute
difference
0 0 1 1
0 0 1 1
1 1 0 0
1 1 0 0
Note that when covariates for the density parameter are derived
from actor attributes, either the difference or the absolute difference can be applied. For covariates for the reciprocity parameter,
only the absolute difference can be used, since this derivation is symmetrical regarding both directions of the dyads.
A model with a certain parameter for a sender covariate and the
same (negative) parameter for a receiver covariate, is equivalent to
a model with the same parameter for the density difference covariate if all these covariates are derived from the same actor covariate. Thus including all the above effects results in an unidentifiable
model. Estimates from such a model will be poor. Do not use them!
2.5
Covariate effects
The p2 model gives parameter estimates and standard errors for
random effects (sender and receiver variance and their covariance)
and for overall density and reciprocity effects. For specific covariates,
the parameters and standard errors for their effects on the sender,
receiver, density, and reciprocity effects are provided. For an overall
test of the effect of a covariate, the p2 program provides the Wald
test statistic (see, e.g., Serfling, 1980, p. 157):
W = θ̂0 V̂−1 θ̂,
with θ containing all involved parameters for the covariate and V̂
the covariance matrix of these parameters. The Wald statistic tests
the hypothesis that θ = 0. W has an approximate χ2 distribution
with the dimension of θ as the number of degrees of freedom.
6
3
Getting started
To run the p2 program within StOCNET, specific actions are required. These are in short:
1.
2.
3.
4.
5.
6.
7.
Select network data
If necessary, recode network data to be dichotomous (0/1)
Select p2 model and files required for analysis
Specify the model
If necessary, use the advanced model specification
Run p2
View results
In the next sections we will treat an example. The example will be
discussed in a text box, like this one.
We will treat an example of an analysis using p2 on network data
concerning ties between American lawyers. This is a subsection of
the data treated in Lazega and Van Duijn (1997).
Ties represent lawyers seeking advice among 35 partners of a law
firm in two offices. Lawyers indicated to whom they go for advice.
Actor attributes are ’seniority rank number’ (starting with ’1’ for
the highest rank and ending with ’35’ for the lowest rank) and
’office’, the office in which the lawyers work (coded by ’0’ and ’1’)
as covariates.
We also use a dyadic attribute ’cowork’ for which the lawyers were
asked with whom they have worked together.
7
4
Input data
The p2 model is a model for the analysis of binary network data.
This means that the dependent variable in the analysis needs to be a
binary coded network. This network data is expected to be a square
matrix with elements (i,j) representing a tie indicator variable for a
tie from actor ’i’ towards actor ’j’ (i indicates rows and j columns).
A tie has to be represented by ”1”, the absence of a tie by ”0”.
Elements on the diagonal of the network data representing ties from
actors towards themselves are not considered by the p2 model, but
are advised to be set to ”0” for clarity.
If you do not have a binary coded network file, the network data can
easily be transformed to the required binary format within StOCNET. For this option, see the StOCNET manual, Boer et al. (2002).
Covariates can either be actor attributes or dyadic attributes. Note
that networks can be dyadic attributes as well. Separate files are
expected for actor attributes and networks. Dyadic attributes derived from actor attributes (e.g. difference and absolute difference,
see section 2.4) are generated by the p2 program and do not need
to be in a separate file. Covariates are not restricted to particular
values. Thus when network files are used as dyadic attribute, they
are not restricted to binary values.
Note that the actor attributes and the dyadic attributes are supposed to cover the same actors as the dependent network. Thus the
ordering of actors in all these files should be identical.
All data files should only contain data and all values have to be
separated by tabs or spaces. If there is additional information in
the files (e.g. variable names in the first line), the program will not
work. Files are expected to be in ascii format with actors on subsequent lines and different values on a single line.
For each session, StOCNET asks the user to select files containing
network data and files containing actor attributes. Here, select all
the files that you want to use in different analyses. For specific analyses, StOCNET will ask the user which network is the dependent
variable and which files containing actor attributes have to be used.
8
5
Model selection
Under the button ’StOCNET model’ (step four in StOCNET), select the p2 model in the pull-down menu in the box ’model choice’.
After the p2 model has been selected, the available network files and
actor attribute files are displayed. From these files, select those that
contain information that you want to use in the analysis. This will
enable specification of covariates later on in the analysis.
model selection window
Select one of the available network files under ’Digraph’. This network will then be the dependent variable in the analysis. If present,
remaining networks can be selected as dyadic covariates. Further,
select the attribute files that contain the actor attributes to be used
as covariates in the analysis.
Pressing ’Model specification’ will allow you to specify your model.
Pressing ’Run!’ starts the p2 estimation process. If you have not
specified a model, the empty model will be estimated on the network that is the dependent variable. We advice to estimate the
empty model first in each new session. This provides a baseline
model for models with covariate effects.
6
Model specifications
In ’model specifications’, specify which covariates to include in the
model. Covariates for the density, reciprocity, sender, and receiver
9
effects can be specified. As mentioned before, covariates for the
density and reciprocity effects are called dyadic attributes. In the
upper half of the ’model specifications’ screen the dyadic attributes
are displayed. These are dyadic attributes derived from actor attributes (differences and absolute differences over dyads) as well as
selected network files. In the lower half of the ’model specifications’
screen actor attributes are displayed. These are the possible covariates for the sender and receiver effects. Marking the checkboxes in
front of any of the covariates will include them in the model.
Note that including a sender and receiver effect for a covariate corresponds to including a density difference effect for this covariate.
Including all the above effects results in an unidentifiable model.
Estimates from such a model will be poor. Do not use them!
model specifications window
Pressing the button ’advanced’ will open the screen with ’advanced
P2 options’. Here, all selected covariates are displayed. Marking the
checkbox in front of these will fix the parameter of the covariate to
a certain value. The value to which the parameter is fixed can be
entered on the right of the covariate name. Novice users are advised
not to use this option.
10
Below these options, a choice for the convergence criterion can be
entered. Either the number of iterations or a measure for convergence can be entered. The measure for convergence is the maximum
difference of all parameter estimates with the estimates from the previous iteration cycle.
advanced model specifications window
11
7
Output
The output of the p2 program is displayed automatically in StOCNET after the iteration process has finished. For a new session, the
output will be visible immediately. For an analysis in an existing
session, the output will be appended to the previous output from
this session. Then, in the output screen of StOCNET, scroll down
to find the output of the last analysis.
The p2 output is organized in several parts. First there is some basic
information; the version number of p2 , the name of the output file,
and the date and time:
P2 Version 2.0.0.6
example.out
December 17, 2002, 3:26:52 PM
General information on the specific analysis is provided afterwards.
First, the digraph (the network that is the dependent variable in
the analysis) is indicated. Second, the number of valid tie indicator variable observations is printed. Note that this depends on the
number of actors in the network and the number of missing values in
the data. Third, the iteration process is summarized. Other possible messages state the assigned number of iterations and the largest
difference of parameter estimates between the last two performed
iterations.
General Information:
Digraph: C:\program files\stocnet\ADVICE35.DAT
Number of valid tie indicator observations: 1190
Convergence criterion: 0.0001 reached after 8 iterations.
In this example the dependent network is advice.dat. From the number of valid tie
indicator observations it is clear that there are no missing values in this dataset. Since
the number of actors is 35, the total number of (directed) tie indicator observations
is 35 × 34 = 1190. Thus, all possible tie indicator observations are valid.
12
The next part of the output displays the variances of the random
effects. These are σA2 , σB2 , and σAB , referred to in section 2 of this
manual:
Random effects:
parameter
estimate
sender variance:
0.7332
receiver variance:
0.6920
sender receiver covariance: -0.3543
standard
error
0.1633
0.1561
0.1227
Here the amount of variation in sender and receiver activity is presented. That is,
after controlling for the covariates in this analysis. Note that these effects covary
negatively; the more lawyers tend to seek advice, the less likely it is advice is sought
from them.
Following, the output displays fixed effects. First the overall fixed
effects are displayed. These are the overall density and reciprocity
effects as mentioned in section 2. For details on interpreting these
effects, see section 8.
Overall effects:
parameter
estimate
Density: -1.3079
Reciprocity:
1.2648
standard
error
0.3884
0.2994
The negative value of the density parameter indicates that the probability of a relation
is smaller than 0.5 (see section 8) for covariate values equal to zero. The reciprocity
parameter is positive, but not very large, indicating that advice relations have a
tendency to be symmetrical, but not an extremely strong tendency.
Below are the values of the Wald statistic (see section 2) and the pvalues under the approximating χ2 distribution. The Wald statistic
combines the separate t-tests for each covariate.
Overall covariate effects:
Overall effects of covariates including diff and absdiff manipulations.
Covariate
seniority
office
cowork
Wald test
statistic
25.2689
25.3851
133.8964
13
df
3
2
1
P
0.0000
0.0000
0.0000
The covariate seniority is used four times as covariate (sender, receiver, and density
effects). Above is the combined effect of all the instances it was used. Office was used
twice and and cowork just once. The interpretation of these effects should be based
on the specific covariate effects that are shown below. All covariate effects are highly
significant. (This is, of course, not always the case.)
Below are the parameters and standard errors of covariates for the
sender, receiver, density, and reciprocity effects.
Sender covariates:
Covariate
seniority
parameter
estimate
0.0528
standard
error
0.0162
The seniority rank number is positively related to seeking advice. Thus, the higher
the seniority rank number (i.e. the lower the seniority!), the more lawyers tend to
seek advice. More senior lawyers seek less advice than less senior lawyers. Note that
the magnitude of the parameters is related to the range of values of the covariate,
just like unstandardized coefficients in regression analysis. Here the rank numbers
range from 1 to 35. At first sight the parameter may not seem very large. However,
taking into account the range of the covariate, the parameter is rather large.
Receiver covariates:
Covariate
seniority
parameter
estimate
-0.0497
standard
error
0.0160
The seniority rank number is negatively related to advice being sought. Thus, more
advice is sought from the more senior lawyers (lawyers with a low seniority rank
number).
Density covariates:
Covariate
abs_diff_seniority
abs_diff_office
cowork
parameter
estimate
-0.0368
-0.9102
2.0056
standard
error
0.0096
0.2240
0.1733
The negative effect of the absolute difference in seniority rank number indicates that
the probability of an advice relation decreases as the difference in seniority increases.
The negative effect of the absolute difference of the office indicates that the probability
of an advice relation outside the office is smaller than the probability of an advice
relation within one’s own office. Cowork is a dyadic covariate where lawyers indicated
whether they work together with someone else. Working together with someone
increases the chance of seeking advice from that person.
14
Notice that less senior lawyers tend to seek advice more and that advice is sought more
from more senior lawyers. Thus advice appears to ’flow’ from more senior lawyers to
less senior lawyers. Considering this, a positive effect for the difference of seniority
rank would be expected (lawyers with a large rank number seek more advice from
lawyers with a low rank number). Leaving out seniority rank as sender and receiver
covariate will indeed display this effect. Recall from section 2 that for the same
covariate including a sender and receiver effect is equivalent to including a density
difference effect.This problem of unidentifiability is comparable to the collinearity
problem in regression analysis. You are kindly invited to try including covariates
for the different effects to gain more insight in this phenomenon). Section 9 of this
manual deals with this problem as well.
Reciprocity covariates:
Covariate
abs_diff_office
parameter
estimate
0.3365
standard
error
0.4763
Here, there is no increased probability for reciprocal relations as an effect of the
absolute difference of office. Thus here the degree to which advice is a symmetric
relation is not dependent on working in the same office.
15
8
Formulas for effects
In the output of the p2 program parameter estimates are given with
their standard errors. Dividing the former by the latter gives the
t-test statistic. This is the test statistic for the null hypothesis that
the parameter is zero. A commonly used rule of thumb is to accept that the parameter deviates from zero if the absolute value
of the parameter estimate divided by the standard error is two or
larger. More informative are the magnitude and sign of parameters.
Note that the magnitude of parameters for covariates depends on
the range of values of the covariates.
The density and reciprocity parameters have special formulas for
their effects. The density parameter µ can be seen as a log-odds.
The reciprocity parameter ρ can be viewed as the log of an oddsratio.
The definition of µij is the log of the odds:
P (Yij = 1|Yji = 0)/P (Yij = 0|Yji = 0)
i, j = 1, . . . , n; i 6= j .
The definition of ρij is the log of the ratio:
P (Yij = 1|Yji = 1)/P (Yij = 0|Yji = 1)
P (Yij = 1|Yji = 0)/P (Yij = 0|Yji = 0)
=
P (Yij = 1, Yji = 1)P (Yij = 0, Yji = 0)
P (Yij = 1, Yji = 0)P (Yij = 0, Yji = 1)
i, j = 1, . . . , n; i ≤ j.
i, j = 1, . . . , n; i ≤ j.
It represents the log of the increase in the odds that Yij = 1 given
that Yji = 1. The second expression for ρij shows that a higher
value of ρ not only indicates an increased probability of a mutual
tie (1,1), but also indicates an increased probability of a null dyad
(0,0). Thus ρ is a parameter for both symmetric types of dyads (null
and mutual).
As you can see, the density and reciprocity effects are intertwined.
Note that the above interpretations are valid when no covariates are
included in the model. When covariates are added to the model the
same interpretations hold, but only for actors with the same values
on the covariates.
16
9
Limitations
The p2 model has practical limitations for applying it and some more
fundamental limitations concerning the estimation procedure.
A practical limitation may occur when two or more parameters ”estimate” the same information. This will result in unidentifiable estimates, possibly causing overflow (implausibly large estimates). The
same kind of problem is encountered with collinearity in regression
analysis.
Selecting parameters for the sender effect, receiver effect, and density difference effect for the same covariate will result in this problem. In this case it is commonly observed that the convergence
criterion gets stuck at a certain value.
The same problem may arise when a covariate carries very little information. This may be the case when most actors have the same
value. Then again information that a covariate does not contain
may intended to be estimated from the covariate.
Another practical limitation is the maximum number of actors in
the network. Up to version 2.0.0.4 the maximum number of actors
is 90. Note, however, that the number of observations of the tie
indicator variables grows almost quadratically with the number of
actors. So does the computing time. Therefore, when using large
networks, expect long computing times.
For version 2.0.0.5 and higher we estimate the maximum number
of actors to be 180. For 150 actors we know for sure that the program runs satisfactorily. However, expect long computing times for
large networks. In the future we hope to optimize the computing
procedure further and consequently shorten computing time.
A more fundamental problem lies in the estimation procedure applied by the p2 program. The p2 model is a generalized linear mixed
model (thus applying a non-linear link function). The p2 program
uses an IGLS estimation procedure that applies a first order Taylor
approximation of the non-linear link function (see Van Duijn, 1995,
for a similar approach, see e.g. Goldstein, 1991). Such a procedure
has been shown to sometimes underestimate variances in non-linear
mixed models (Rodriguez and Goldman, 1995). In the near future
we hope to offer alternative estimation procedures for the p2 model.
17
Part II
Working with the p2 executable
10
p2 files
The p2 program creates several files. All these files will carry the
session name with different extensions.
Actors are assigned numbers according to their order in the network
file. This ordering should correspond to the ordering of actors in the
files containing actor attributes and dyadic attributes. Information
from all the above files is combined in one single data file. This
file has the extension ’.dat’. Which files contain the required data
(dependent network, covariates, and covariate networks) along with
additional information is stored in the input file. This file has the
extension ’.in’. The model specification is stored in the (model-)
design file. This file has the extension ’.des’.
10.1
the data file
Within the p2 program the data file is created automatically. In
the data file each dyad is represented on two lines; one line for each
directed tie indicator variable in a dyad.
In the data file, each entry on a line carries specific information.
Entries need to be separated by spaces. In the first two entries in
the data file the numbers (according to their order) of the actors
are displayed. Each dyad is represented in two lines. The first line
refers to the ’first’ tie indicator variable, (Yij ), indicating a tie from
the actor on the first entry towards the actor on the second entry. The second line refers to the ’second’, reversed tie indicator
variable, (Yji ). Below there is an overview of the contents of the
data file (with actor covariates c = 1, . . . , C and network covariates
d = 1, . . . , D).
18
Organization of the data file
Entry
1, 2
Contents
Actors, represented by numbers according to their
order in the network files and covariate files
3
4
Dummy variable coding a first line representing a dyad
Dummy variable coding a second line representing a dyad
5
Dependent network value for the (first and second)
tie indicator variables
Variable stating ’1’ for first line denoting a dyad and the
dependent network value of the first tie indicator variable on
the second line denoting a dyad
6
7
8
5+(2*c)
6+(2*c)
Value of the first covariate for the actor on the first
entry
Value of the first covariate for the actor on the
second entry
Value of the cth covariate for the actor on the first
entry
Value of the cth covariate for the actor on the
second entry
Value of the difference on the cth covariate. For
the first line denoting a dyad: (actor on first entry- actor on
second entry), for second line: (actor on second entry- actor
on first entry).
6+(2*C)+(2*c) Value of the absolute difference on the cth covariate
between actors on the first and second entry.
5+(2*C)+(2*c)
6+(2*C)+d
Covariate network value for the dth covariate network.
19
1
1
3
3
2
2
2
1
1
3
1
0
1
0
1
0
1
0
1
0
1
1
0
0
0
1
1
1
0
1
1
1
3
3
2
2
2
1
1
3
0
0
1
1
0
0
0
0
0
1
-1
1
2
-2
-1
1
1
2
2
1
0
0
1
-1
-1
0
0
1
1
1
0
0
0
0
0
Part of the data file for the example presented in the previous sections. There are two actor covariates: ’seniority rank number’ (note
that actors are ordered according to this covariate), and ’office’.
There is one network covariate: ’cowork’.
Each time you run the p2 program, this data file will be produced.
Whenever the same dependent network file and covariate files (actor and dyadic attribute) are selected, the data file will be identical.
Perhaps this seem somewhat wasteful, but this operation takes just
a minimal amount of time. For every differently specified analysis
(with effects for different covariates), different parts of the data file
will be used.
10.2
the input file
For creating the data file, the p2 program needs to know in which files
to find the dependent network, the covariates, and the networks that
are covariates. For creating the output file, the p2 program needs
to know the names of the dependent network and all the covariates.
This information is stored in the input file. For an example, see
Appendix A.
Within StOCNET, the input file is created automatically from the
information entered through the StOCNET interface. The input file
consists of lines that are reserved for specific information.
20
Organization of the input file
Line
1
1
1
2
3
Information
number of actors (+ space)
number of files containing actor covariates (+ space)
number of networks that are covariates
File containing the dependent network
Missing value codes for the dependent network (separated
by spaces) (optional)
subsequently, for all files containing actor covariates:
∗
file containing actor covariates
∗
Number of covariates in the file
subsequently, for all covariates within actor covariate files:
∗ Name of covariate
∗ Missing value for covariate (optional)
subsequently, for all files containing network covariates:
∗
file containing network
∗
missing values for the network (optional)
∗
name for the network
10.3
the design file
The design file contains information on how the model is specified.
That is, which effects are specified for which covariates. For each
covariate there is one line in the design file. Entries on the lines,
separated by spaces, carry specific information. For an example, see
appendix B.
21
Organization of the design file
Entry
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Contents
Dummy for including a parameter
Dummy for including a parameter
Dummy for including a parameter
for the difference
Dummy for including a parameter
for the absolute difference
Dummy for including a parameter
effect for the absolute difference
Dummy for fixing the parameter
Dummy for fixing the parameter
Dummy for fixing the parameter
for the difference
Dummy for fixing the parameter
for the absolute difference
Dummy for fixing the parameter
effect for the absolute difference
for the sender effect
for the receiver effect
for the density effect
for the density effect
for the reciprocity
for the sender effect
for the receiver effect
for the density effect
for the density effect
for the reciprocity
Value on which to fix the parameter
Value on which to fix the parameter
Value on which to fix the parameter
(difference)
Value on which to fix the parameter
(absolute difference)
Value on which to fix the parameter
effect (absolute difference)
for the sender effect
for the receiver effect
for the density effect
for the density effect
for the reciprocity
subsequently, after lines for all covariates:
∗
assigned number of iterations
∗
convergence criterion
∗
dummy for using the assigned number of iterations(1)
or the convergence criterion (0)
The executable of the p2 program is simply called p2.exe. It can be
found in the directory where you have installed StOCNET. Whenever there is an update available, you are strongly advised to use it.
This will ensure that you have the version with the best bug fixes.
Pressing ’run’ in StOCNET will execute the p2 program. However,
22
if you have specified the input file and the design file correctly, the
p2 executable will work outside StOCNET as well.
If the p2 program is used within StOCNET, references to files containing networks and covariates will be altered. The original filenames will preceded by a tilde (˜) . This is because StOCNET has
the option for selecting specific actors (step 3; Selection). Therefore, StOCNET produces a new file with the name of the old file,
preceded by a tilde. In StOCNET, it is this file that is referred to.
11
Creating dyadic covariates for specific values
of actor attributes
Sometimes the standard options of the p2 program will not be satisfactory. Suppose you find a negative effect for the absolute difference of sex on the density (of relations). Roughly, this means
that relations are less likely between actors of a different sex and
consequently more likely when actors share the same sex. Now, you
may wonder whether relationships between boys are equally more
likely as relationships between girls. The common analysis with p2
will not provide an answer to that question. What would provide
an answer to that question is a network covariate coding whether
dyads concern both boys or both girls (instead of just coding the
same sex). This is what we mean by creating dyadic covariates for
specific values of actor attributes. Note that this can be done for
the density effect and for the reciprocity effect.
With some extra effort, the p2 program can be used to create dyadic
covariates indicating equality for specific values of actor covariates.
To do this the input file and the design file need to be altered. These
altered files can be used by the p2 executable (outside StOCNET!)
to create network files containing such dyadic covariates. Of course
there are many other possibilities to produce a dyadic covariate indicating equality for a specific value of a covariate. How a dyadic
covariate was produced, makes no difference to the p2 program.
11.1
adjusting the input file
The first thing to do when you want to create a new network file
for a specific value of an actor attribute is entering a ’1’ on the first
empty line of the input file (that is, after the last non-empty line).
This is a dummy indicating you want to create this file.
23
On the next line you enter the name of the variable you want to
compute the new network file from.
On the next line enter the value for which you want to create the file.
For this specific value the new network file will contain a ’1’ only
if both actors have this specific value. Otherwise, the new network
file will contain a ’0’.
On the last line, enter the name of the new file. This should be a
full name, including an extension (assuming that is what you want).
After taking the previous actions, the original input file should on
the bottom part have something like the following added:
1
office
1
boston.net
After running the analysis, you will have a new network file (here:
boston.net). If you do not want to create the new network file again,
skip the above lines and add the network file to the other files in
your analysis. This can either be done in StOCNET (see StOCNET
manual or section 4 of this manual) or in the input file (see section
10.2).
11.2
adjusting the design file
The p2 program immediately recognizes the new network file as a
covariate. Thus you have to add a line with 15 entries (see section
10.2) on the bottom of the design part of the design file. If you want
to estimate parameters for the newly created dyadic covariate, enter
a ’1’ on the third entry for density and on fifth entry for reciprocity.
All other entries have to be ’0’ unless you want to fix the parameter
to a certain value (see section 10.3).
11.3
running p2
When creating a new network file for a specific value of an actor
attribute, the p2 program cannot be run from within StOCNET.
The easiest way to do this is by creating a shortcut to ’p2.exe’. In
the properties of the shortcut, there are some changes needed. After
the path, enter a space and the title of your session. This should
look like:
"C:\Program Files\p2\p2.exe" example
24
The shortcut should start in the directory where the input file and
design file are stored. If this is not the case, this should also be
changed in the properties of your shortcut.
If ’p2.exe’, the input file, and the design file are stored in the same
directory, entering:
p2 example
and hitting enter in your (Windows) Dos-emulator will work as well.
25
12
Appendix A:
Input file for example
35 1 1
C:\Program files\StOCNET\~ADVICE35.DAT
C:\Program files\StOCNET\~covp2.dat
2
seniority
office
C:\Program files\StOCNET\~COWORK35.DAT
cowork
Input file for the example first presented in section 3. There are
two actor covariates (from the same file): ’seniority rank number’
(note that actors are ordered according to this covariate), and ’office’. There is one network covariate: ’cowork’.
13
Appendix B:
Design file for example
1 1 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
200
0.0001
0
Design file for the example first presented in section 3. There are
two actor covariates (from the same file): ’seniority rank number’
(note that actors are ordered according to this covariate), and ’office’. There is one network covariate: ’cowork’.
26
14
References
Boer, P., Huisman, M., Snijders, T.A.B., & Zeggelink, E.P.H.
(2001). StOCNET: an open software system for the advanced statistical analysis of social networks. Groningen: ProGAMMA / ICS.
Website: http://stat.gamma.rug.nl/stocnet.
Boer, P., Huisman, M., Snijders, T.A.B., & Zeggelink, E.P.H.
(2002). StOCNET user’s manual Version 1.3. Available from website: http://stat.gamma.rug.nl/stocnet.
Goldstein, H. (1991). Nonlinear multilevel models, with an application to discrete response data. Biometrika, 78, 45–51.
Serfling, R.J. (1980). Approximation Theorems of Mathematical
Statistics. New York: John Wiley & Sons.
Lazega, E. & Van Duijn, M. (1997). Position in formal structure,
personal characteristics and choices of advisers in a law firm: a logistic regression model for dyadic network data. Social Networks,
19, p. 375- 397.
Rodriguez, G. & Goldman, N (1995). An assessment of estimation procedures for multilevel models with binary responses. Journal
of the Royal Statistical Society, A, 158, 73-89.
Van Duijn, M.A.J., 1995. Estimation of a random effects model
for directed graphs. In: Snijders, T.A.B. et al. (Ed.) SSS ’95.
Symposium Statistische Software, nr. 7. Toeval zit overal: programmatuur voor random-coffcint modellen, p. 113–131. Groningen, iec
ProGAMMA.
Wasserman, S. & Faust, K. (1994). Social Network Analysis:
Methods and Applications. Cambridge: Cambridge University Press.
27