Understanding Path Analysis

A Brief Introduction

  • Key Concepts
  • Major Sociologists
  • News & Issues
  • Research, Samples, and Statistics
  • Recommended Reading
  • Archaeology

Path analysis is a form of multiple regression statistical analysis that is used to evaluate causal models by examining the relationships between a dependent variable and two or more independent variables. By using this method, one can estimate both the magnitude and significance of causal connections between variables.

Key Takeaways: Path Analysis

  • By conducting a path analysis, researchers can better understand the causal relationships between different variables.
  • To begin, researchers draw a diagram that serves as a visual representation of the relationship between variables.
  • Next, researchers use a statistical software program (such as SPSS or STATA) to compare their predictions to the actual relationship between the variables.

Path analysis is theoretically useful because, unlike other techniques, it forces us to specify relationships among all of the independent variables. This results in a model showing causal mechanisms through which independent variables produce both direct and indirect effects on a dependent variable.

Path analysis was developed by Sewall Wright, a geneticist, in 1918. Over time the method has been adopted in other physical sciences and social sciences, including sociology. Today one can conduct path analysis with statistical programs including SPSS and STATA, among others. The method is also known as causal modeling, analysis of covariance structures, and latent variable models.

Prerequisites for Conducting a Path Analysis

There are two main requirements for path analysis:

  • All causal relationships between variables must go in one direction only (you cannot have a pair of variables that cause each other)
  • The variables must have a clear time-ordering since one variable cannot be said to cause another unless it precedes it in time.

How to Use Path Analysis

Typically path analysis involves the construction of a path diagram in which the relationships between all variables and the causal direction between them are specifically laid out. When conducting a path analysis, one might first construct an input path diagram , which illustrates the hypothesized relationships . In a path diagram , researchers use arrows to show how different variables relate to each other. An arrow pointing from, say, Variable A to Variable B, shows that Variable A is hypothesized to influence Variable B.

After the statistical analysis has been completed, a researcher would then construct an output path diagram , which illustrates the relationships as they actually exist, according to the analysis conducted. If the researcher’s hypothesis is correct, the input path diagram and output path diagram will show the same relationships between variables.

Examples of Path Analysis in Research

Let's consider an example in which path analysis might be useful. Say you hypothesize that age has a direct effect on job satisfaction, and you hypothesize that it has a positive effect, such that the older one is, the more satisfied one will be with their job. A good researcher will realize that there are certainly other independent variables that also influence our dependent variable of job satisfaction: for example, autonomy and income, among others.

Using path analysis, a researcher can create a diagram that charts the relationships between the variables. The diagram would show a link between age and autonomy (because typically the older one is, the greater degree of autonomy they will have), and between age and income (again, there tends to be a positive relationship between the two). Then, the diagram should also show the relationships between these two sets of variables and the dependent variable: job satisfaction.

After using a statistical program to evaluate these relationships, one can then redraw the diagram to indicate the magnitude and significance of the relationships. For example, the researcher might find that both autonomy and income are related to job satisfaction, that one of these two variables has a much stronger link to job satisfaction than the other, or that neither variable has a significant link to job satisfaction.

Strengths and Limitations of Path Analysis

While path analysis is useful for evaluating causal hypotheses, this method cannot determine the  direction  of causality. It clarifies correlation and indicates the strength of a causal hypothesis, but does not prove direction of causation. In order to fully understand the direction of causality, researchers can consider conducting experimental studies in which participants are randomly assigned to a treatment and control group.

Additional Resources

Students wishing to learn more about path analysis and how to conduct it can refer to the University of Exeter’s overview of Path Analysis and  Quantitative Data Analysis for Social Scientists  by Bryman and Cramer.

Updated by Nicki Lisa Cole, Ph.D.

  • Structural Equation Modeling
  • Definition of a Hypothesis
  • Linear Regression Analysis
  • Definition and Use of Instrumental Variables in Econometrics
  • How Intervening Variables Work in Sociology
  • Correlation Analysis in Research
  • What It Means When a Variable Is Spurious
  • What Is a Hypothesis? (Science)
  • Null Hypothesis Examples
  • Scientific Method Vocabulary Terms
  • A Review of Software Tools for Quantitative Data Analysis
  • Control Variable
  • Independent Variable Definition and Examples
  • The Importance of Exclusion Restrictions in Instrumental Variables
  • What Are the Elements of a Good Hypothesis?
  • The Significance of Negative Slope

Skip to content

Read the latest news stories about Mailman faculty, research, and events. 

Departments

We integrate an innovative skills-based curriculum, research collaborations, and hands-on field experience to prepare students.

Learn more about our research centers, which focus on critical issues in public health.

Our Faculty

Meet the faculty of the Mailman School of Public Health. 

Become a Student

Life and community, how to apply.

Learn how to apply to the Mailman School of Public Health. 

Path Analysis

Path analysis, a precursor to and subset of structural equation modeling, is a method to discern and assess the effects of a set of variables acting on a specified outcome via multiple causal pathways. Developed nearly a century ago by Sewall Wright, a geneticist working at the US Department of Agriculture, its early applications involved quantifying the contribution of genes vs. environment on traits such as guinea pig coloration and assessing whether temperature, humidity, radiation, or wind velocity had the greatest effect on transpiration in plants. Path analysis was slow to catch on in the world of biology, but in the second half of the 20th century found an avid following among social scientists and economists. Social and life course epidemiologists subsequently adopted the method as an effective way to distinguish direct from indirect effects and to test the strength of hypothesized patterns of causal relationships.

Description

Path analysis is based on a closed system of nested relationships among variables that are represented statistically by a series of structured linear regression equations. As such, path analysis is bound by the same set of assumptions as linear regression, as well as some additional restrictions that describe the allowable pattern of relations among variables. Variables are either exogenous, meaning their variance is not dependent on any other variable in the model, or endogenous, meaning their variance is determined by other variables in the model. Exogenous variables may or may not be correlated with other exogenous variables.

The pattern of relationships among variables is described by a path diagram, a type of directed graph. Variables are linked by straight arrows that indicate the directions of the causal relationships between them. Straight arrows may only point in one direction, as it is assumed that a variable cannot be both a cause and an effect of another variable; i.e., the model is recursive and there are no feedback loops. Curved, double-headed arrows indicate correlation between exogenous variables. Similar to DAGs, in path diagrams, causal “juice” can flow through arrows pointing in the same direction or pointing away from each other, but is blocked when two arrowheads meet. In addition to the arrows between variables in the model, there are arrows pointing toward each endogenous variable from points outside the model, indicating variance contributed by error and any unmeasured variables.

In figure 15.1, below, taken from Pedhazur’s Multiple Regression in Behavioral Research, variables 1 and 2 are exogenous and correlated, while variables 3, 4, and 5 are endogenous. The structural equation that would describe the relationship between variables 1 and 3 is:

r13 = p31 + p32*r12

where r is the correlation coefficient from a standard correlation matrix containing all of the variables in the model and the path coefficient p is the standardized beta coefficient from the linear regression model in which 1 and 2 are the independent variables and 3 is the dependent variable. (A note on notation: the first number in the path (or standardized beta) coefficient subscript represents the dependent variable (the head of the arrow) and the second number represents the independent variable (the tail of the arrow) in a causal relationship.) In general, a structural equation indicates that the total “juice,” or correlation between variables, is the sum of the “juice” that flows along each of the possible pathways that connect those two variables. In the example above, p31 is the proportion of the variance accounted for by the direct pathway between 1 and 3, while p32*r12 is the proportion of the variance accounted for by pathway that includes the segment between 1 and 2 and the segment between 2 and 3. The total variance along any particular pathway equals the product of the variance along the different segments of that pathway.

Similarly, the structural equation that would describe the relationship between variables 2 and 3 is:

r23 = p32 + p31*r12

and the series of structural equations that describe the contributions of variables 1, 2, and 3 to variable 4 (the coefficients of which come from the linear regression equation in which variable 4 is regressed on variables 1, 2, and 3) are:

r14 = p41 + p31*p43 + p42*r12 + p43*p32*r12

r24 = p42 + p32*p43 + p41*r12 + p43*p31*r12

r34 = p43 + p31*p41 + p32*p42 + p41*r12*p32 + p42*r12*p31

Notice that the number of structural equations (5) equals the number of parameters (p’s connecting variables) that need to be identified. This is called a just-identified model. The value of p3a is the square root of (1-r^2), using the unadjusted r-square value from the regression of 3 on variables 1 and 2, while the value of p4b is the square root of (1-r^2), using the unadjusted r-square value from the regression of 4 on variables 1, 2, and 3.

Once the path and correlation coefficients have been filled in, the utility of path analysis become clear. The total variance explained by each regression model can be partitioned, or “decomposed” into specific types of effects: direct, indirect, spurious (due to a common cause), and unanalyzed (because the directionality is unknown, as the path contributing to this effect includes a curved arrow). For example, in the equation for r14, p41 represents the direct effect, p31*p43 represents the indirect effect, and the remaining p42*r12 + p43*p32*r12 is unanalyzed. In the equation for r34 above, p43 represents the direct effect, while the entire remaining p31*p41 + p32*p42 + p41*r12*p32 + p42*r12*p31 is spurious; note that although they contain a curved arrow, the latter two pathways represent a common cause scenario.

Path analysis is always theory-driven; the same data can describe many different causal patterns, so it is essential to have an a priori idea of the causal relationships among the variables under consideration. That being said, path analysis can be used to refine a causal hypothesis. If, for example, a path coefficient is very small and the standardized beta is not statistically significant, it may make sense to eliminate that pathway. The new, “trimmed” model, which has the same number of variables but fewer pathways, can then be tested against the just-identified model (which becomes the null hypothesis) using any of several goodness-of-fit options. Failure to reject the null hypothesis indicates that the trimmed model still fits the data. In sum, path analysis may be used to test a causal model using data, but should not be used to develop a model from data.

While a path model may fit the data, beware–this does not mean that the causal hypothesis depicted in the path diagram has been validated. Some believe that the phrase “correlation does not imply causation” originated with Sewall Wright. Whether or not this is true, it is well remembered when performing path analysis. Although path diagrams are recursive, path models are based on correlations and cannot prove causation or even indicate the direction of a causal effect. Furthermore, those correlations are between variables in a given data set, so care must be taken before generalizing beyond the source population.

As mentioned above, path analysis is based on a number of assumptions:

Because path analysis involves the solution of multiple linear regression equations, the dependent variables for all equations must be approximately normally distributed and the relationships among the variables are assumed to be causal, linear and additive. Logistic regression equations, implying multiplicative relationships, cannot be substituted. Other curvilinear relations or interactions are also prohibited.

Residuals (a and b in the figure above) are not correlated with the variables that predict the outcome variables toward which they point. This means that a is not correlated with variables 1 and 2, and b is not correlated with variables 1, 2, and 3. This assumption implies that all relevant variables are included in the model, and any unmeasured variables are not correlated with the specified predictor variables.

Causation flows in one direction; there are no feedback loops.

The variables are measured without error.

Predictor variables may be continuous, ordinal categorical, or dichotomous, but there may be no dummy variables.

There is low multicollinearity among predictor variables in any of the linear regression equations.

In response to these limitations, structural equation modeling has evolved to allow for non-linear relations among variables, clustering, repeated measures, measurement error, feedback loops, and latent variables.

Textbooks & Chapters

Here’s a link to PDQ Statistics by Geoffrey R. Norman and David L. Streiner. Chapter 17 provides a readable introduction to path analysis and structural equation modeling: https://vcarrion.people.uic.edu/pdq_stats.pdf

Chapter 15 of Elazar J. Pedhazur’s Multiple Regression in Behavioral Research gives a thorough presentation—with all the regression calculations done by hand!: Pedhazur, Elazar J. Multiple Regression in Behavioral Research, 2nd ed. (Fort Worth, TX: Holt, Rinehart and Winston, Inc., 1982), p. 577-635.

A clear text which places path analysis in the context of causal inference: Shipley, Bill. Cause and Correlation in Biology. (Cambridge, UK: Cambridge University Press, 2000).

Methodological Articles

Here are Sewall Wright’s original articles:

https://naldc.nal.usda.gov/download/IND43966364/PDF

https://www.gwern.net/docs/statistics/1934-wright.pdf

This is a more contemporary, excellent description of path analysis:

https://www.sciencedirect.com/science/article/pii/B0123693985004837

This is a nice simple summary:

http://core.ecu.edu/psyc/wuenschk/MV/SEM/Path.pdf

Application Articles

Gamborg, M., Andersen, P.K., Baker, J.L., Budtz-Jorgensen, E., Jorgensen, T., Jensen, G., Sorensen, T.I.A. (2009) Life course path analysis of birth weight, childhood growth, and adult systolic blood pressure. American Journal of Epidemiology, 169(10):1167-1178.

Check out this amazing example of a path diagram! Wahlund, R. (1992). Tax changes and economic behavior: the case of tax evasion. Journal of Economic Psychology, 13:657-77.

Chemers, M. M., Hu, L.-T., and Garcia, B. F. (2001). Academic self-efficacy and first-year college student performance and adjustment. Journal of Educational Psychology, 93(1):55–64.

McLean, S.A., Paxton, S.J., Wertheim, E.H. (2013). Mediators of the relationship between media literacy and body dissatisfaction in early adolescent girls: implications for prevention. Body Image, March 5, e-pub ahead of print.

Leary, J.M., Lilly, C.L., Dino, G., Loprinzi, P.D., Cottrell, L. Parental influences in 7-9 year olds’ physical activity: a conceptual model. Preventive Medicine (2013), e-pub ahead of print.

Path analysis in SAS using PROC CALIS: https://stats.oarc.ucla.edu/sas/faq/how-can-i-do-path-analysis-in-sas/

Step-by-step description of how to do path analysis, including STATA and SPSS code: http://www3.nd.edu/~rwilliam/stats2/l62.pdf

In R, you can do path analysis using several different packages: lavaan, ggm, OpenMx, plspm, and sem. Here’s a whole e-book on path modeling in R using the plspm package: https://www.gastonsanchez.com/PLS_Path_Modeling_with_R.pdf

Short online SAS course from University of North Texas: http://www.unt.edu/rss/class/Jon/SAS_SC/SAS_Module8_Path.htm

Join the Conversation

Have a question about methods? Join us on Facebook

Advanced Statistics using R

Applied Data Science Meeting, July 4-6, 2023, Shanghai, China . Register for the workshops: (1) Deep Learning Using R, (2) Introduction to Social Network Analysis, (3) From Latent Class Model to Latent Transition Model Using Mplus, (4) Longitudinal Data Analysis, and (5) Practical Mediation Analysis. Click here for more information .

  • Example Datasets
  • Basics of R
  • Graphs in R
  • Hypothesis testing
  • Confidence interval
  • Simple Regression
  • Multiple Regression
  • Logistic regression
  • Moderation analysis
  • Mediation analysis
  • Path analysis
  • Factor analysis
  • Multilevel regression
  • Longitudinal data analysis
  • Power analysis

Path Analysis

Path analysis is a type of statistical method to investigate the direct and indirect relationship among a set of exogenous (independent, predictor, input) and endogenous (dependent, output) variables. Path analysis can be viewed as generalization of regression and mediation analysis where multiple input, mediators, and output can be used. The purpose of path analysis is to study relationships among a set of observed variables, e.g., estimate and test direct and indirect effects in a system of regression equations and estimate and test theories about the absence of relationships

Path diagrams

Path analysis is often conducted based on path diagrams. Path diagram represents a model using shapes and paths. For example, the diagram below portrays the multiple regression model $Y=\beta_0 + \beta_X X + \beta_W W + \beta_Z Z + e$.

path analysis hypothesis

In a path diagram, different shapes and paths have different meanings:

  • Squares or rectangular boxes: observed or manifest variables
  • Circles or ovals: errors, factors, latent variables
  • Single-headed arrows: linear relationship between two variables. Starts from an independent variable and ends on a dependent variable.
  • Double-headed arrows: variance of a variable or covariance between two variables
  • Triangle: a constant variable, usually a vector of ones

A simplified path diagram is often used in practice in which the intercept term is removed and the residual variances are directly put on the outcome variables. For example, for the regression example, the path diagram is shown below.

path analysis hypothesis

In R, path analysis can be conducted using R package lavaan . We now show how to conduct path analysis using several examples.

Example 1. Mediation analysis -- Test the direct and indirect effects

The NLSY data include three variables – mother's education (ME), home environment (HE), and child's math score. Assume we want to test whether home environment is a mediator between mother’s education and child's math score. The path diagram for the mediation model is:

path analysis hypothesis

To estimate the paths in the model, we use the R package lavaan . To specify the mediation model, we follow the rules below. First, a model is put into a pair of quotation marks. Second, to specify the regression relationship, we use a symbol ~ . The variable on the left is the outcome and the ones on the right are predictors or covariates. Third, parameter names can be used for paths in model specification such as a , b and cp . Fourth, we can define new parameters using the notation := . On the left is the name of the new parameter and on the right is the formula to define the new parameter such as a*b that defines the mediation effect and a*b + cp that defines the total effect.

To estimate the model, the sem() function from lavaan can be used. To view the results, the summary() function is used. For example, for the mediation example, the output is given below. From the output, we can see

  • An individual path can be tested. For example, the coefficient from ME to HE is 0.139, which is significant based on the z-test.
  • The residual variance parameters are also automatically estimated.
  • The mediation effect is estimated and tested using the defined parameter. For example, the mediation effect here is 0.065 with the standard error 0.028. It is significant based on a z-test (Sobel test). Note that the result is the same as the mediation analysis before.

Example 2. Testing a theory of no direct effect

Assume we hypothesize that there is no direct effect from ME to math. To test the hypothesis, we can fit a model illustrated below.

path analysis hypothesis

The input and output of the analysis are given below. To evaluate the hypothesis, we can check the model fit. The null hypothesis is “\(H_{0}\): The model fits the data well or the model is supported”. The alternative hypothesis is “\(H_{1}\): The model does not fit the data or the model is rejected”. The model with the direct effect fits the data perfectly. Therefore, if the current model also fits the data well, we fail to reject the null hypothesis. Otherwise, we reject it. The test of the model can be conducted based on a chi-squared test. From the output, the Chi-square is 14.676 with 1 degree of freedom. The p-value is about 0. Therefore, the null hypothesis is rejected. This indicates that the model without direct effect is not a good model.

Example 3: A more complex path model

Path analysis can be used to test more complex theories. In this example, we look at how age and education influence EPT using the ACTIVE data. Both age and education may influence EPT directly or through memory and reasoning ability. Therefore, we can fit a model shown below.

path analysis hypothesis

Suppose we want to test the total effect of age on EPT and its indirect effect. The direct effect is the path from age to ept1 directly, denoted by p1 . One indirect path goes through hvltt1 , that is p2*p7 . The second indirect effect through ws1 is p3*p8 . The third indirect effect through ls1 is p4*p9 . The last indirect effect through lt1 is p5*p10 . The total indirect effect is p2*p7+p3*p8+p4*p9+p5*p10 . The total effect is the sum of them p1+p2*p7+p3*p8+p4*p9+p5*p10 .

The output from such a model is given below. From it, we can see that the indirect effect ind1=p2*p7 is significant. The total indirect ( indirect ) from age to EPT is also significant. Finally, the total effect ( total ) from age to EPT is significant.

To cite the book, use: Zhang, Z. & Wang, L. (2017-2022). Advanced statistics using R . Granger, IN: ISDSA Press. https://doi.org/10.35566/advstats. ISBN: 978-1-946728-01-2. To take the full advantage of the book such as running analysis within your web browser, please subscribe .

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Institute for Digital Research and Education

Analyzing Data: Path Analysis

Path analysis is used to estimate a system of equations in which all of the variables are observed. Unlike models that include latent variables, path models assume perfect measurement of the observed variables; only the structural relationships between the observed variables are modeled. This type of model is often used when one or more variables is thought to mediate the relationship between two others (mediation models). Similar models setups can be used to estimate models where the errors (residuals) of two otherwise unrelated dependent variables are allowed to correlated (seemingly unrelated regression), as well as models where the relationship between variables is thought to vary across groups (multiple group models).

1.0 A Just Identified Model

The examples on this page use a dataset ( https://stats.idre.ucla.edu/wp-content/uploads/2016/02/path.dat ) that contains four variables, the respondent’s high school gpa ( hs ), college gpa ( col ), GRE score ( gre ), and graduate school gpa ( grad ). We begin with the model illustrated below, where GRE scores are predicted using high school and college gpa ( hs and col respectively); and graduate school gpa ( grad ) is predicted using GRE, high school gpa and college gpa. This model is just identified, meaning that it has zero degrees of freedom. In the model: command, the keyword on is used to indicate that the model regresses gre on hs and col ; and grad on hs , col , and gre . The output: command with the stdyx; option was included to obtain standardized regression coefficients and R-squared values. (The stdyx; option produces coefficients standardized on both y and x, but other types of standardization are available and can be requested using the standardized; option.) Title: Path analysis -- just identified model Data: file is https://stats.idre.ucla.edu/wp-content/uploads/2016/02/path.dat ; Variable: Names are hs gre col grad; Model: gre on hs col; grad on hs col gre; Output: stdyx; Here is the output from Mplus.
Under MODEL RESULTS the path coefficients (slopes) for the regression of gre on hs and col are shown, followed by those for the regression grad on hs . Along with the unstandardized coefficients (in the column labeled Estimate), the standard errors (S.E), coefficients divided by the standard errors, and a p-values are shown. From this we see that hs and col significantly predict gre , and that gre and hs (but not col ) significantly predict grad . Additional parameters from the model are listed below the path coefficients. Note that the regression intercepts are listed under the heading Intercepts rather than with the path coefficients, this is different from some general purpose statistical packages where all of the coefficients (intercepts and slopes) are listed together. Because we requested standardized coefficients using the stdyx option of the output: command, the standardized results are also included in the output (after the unstandardized results). Under the heading STDYX Standardization all of the model parameters are listed, standardized so that a one unit change represents a standard deviation change in the original variable (just as in a standardized regression model). As part of the standardized output the r-squared values are presented under the heading R-SQUARE. Here the estimated r-squared value for each of the dependent variables in our model is given, along with standard errors and hypothesis tests.

1.1 Indirect and Total Effects

One of the appealing aspects of path models is the ability to assess indirect, as well as total effects (i.e. relationships among variables). Note that the total effect is the combination of the direct effect and indirect effects. In this example we will request the estimated indirect effect of hs on grad (through gre ). Below is the diagram corresponding to this model with the desired indirect effect shown in blue. We can obtain the estimate of the indirect effect by adding the model indirect: command to our input file, and specifying grad ind hs; . Here is the entire program; except for the highlighted portion of the output (and the title) this model is identical to the previous model. Title: Path analysis -- with indirect effects. Data: file is https://stats.idre.ucla.edu/wp-content/uploads/2016/02/path.dat ; Variable: Names are hs gre col grad; Model: gre on hs col; grad on hs col gre; Model indirect: grad ind hs; Output: stdyx; The output for this model is shown below, and some the output has been omitted since the output for this model is the same as the previous model except for the addition of sections showing the total, indirect and direct effects. The output is the same because we have estimated the same model; adding the indirect effects requests additional output from Mplus, but that does not change the model itself. The breakdown of the total, indirect, and direct effects appears below the MODEL RESULTS and STANDARDIZED MODEL RESULTS in a section labeled TOTAL, TOTAL INDIRECT, SPECIFIC INDIRECT, AND DIRECT EFFECTS. Because standardized coefficients were requested, the standardized total, indirect, and direct effects appear below the unstandardized effects. MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-Value GRE ON HS 0.309 0.065 4.756 0.000 COL 0.400 0.071 5.625 0.000 GRAD ON HS 0.372 0.075 4.937 0.000 COL 0.123 0.084 1.465 0.143 GRE 0.369 0.078 4.754 0.000 Intercepts GRE 15.534 2.995 5.186 0.000 GRAD 6.971 3.506 1.989 0.047 Residual Variances GRE 49.694 4.969 10.000 0.000 GRAD 59.998 6.000 10.000 0.000 <output omitted> QUALITY OF NUMERICAL RESULTS Condition Number for the Information Matrix 0.348E-04 (ratio of smallest to largest eigenvalue) TOTAL, TOTAL INDIRECT, SPECIFIC INDIRECT, AND DIRECT EFFECTS Two-Tailed Estimate S.E. Est./S.E. P-Value Effects from HS to GRAD Total 0.487 0.075 6.453 0.000 Total indirect 0.114 0.034 3.362 0.001 Specific indirect GRAD GRE HS 0.114 0.034 3.362 0.001 Direct GRAD HS 0.372 0.075 4.937 0.000 STANDARDIZED TOTAL, TOTAL INDIRECT, SPECIFIC INDIRECT, AND DIRECT EFFECTS STDYX Standardization Two-Tailed Estimate S.E. Est./S.E. P-Value Effects from HS to GRAD Total 0.465 0.068 6.858 0.000 Total indirect 0.109 0.032 3.455 0.001 Specific indirect GRAD GRE HS 0.109 0.032 3.455 0.001 Direct GRAD HS 0.356 0.070 5.073 0.000 Under Specific indirect, the effect labeled GRAD GRE HS (note each appears on its own line and the final outcome is listed first), gives the estimated coefficient for the indirect effect of hs on grad , through GRE (the blue path above). The coefficient labeled Direct is the direct effect of hs on grad . We can say that part of the total effect of hs on grad is mediated by gre scores, but the significant direct path from hs to grad suggests only partial mediation.

1.2 Specific Indirect Effects

The above example was overly simple since there was only one indirect effect. Often models will have multiple indirect effects. In this example we place a directional path (i.e. regression) from hs to col , creating a model with multiple possible indirect effects. The diagram below shows the model, with the three indirect paths we wish to examine highlighted with colored lines. There are several ways to request calculation of indirect effects. The first, shown in the previous example (i.e. grad ind hs; ) requests all indirect paths from hs to grad . We can also use ind to request a specific indirect path, for example, below we use grad ind col hs; , to specify that we want to estimate the indirect effect from hs to col to grad (i.e. the dashed orange path shown in the diagram above). Finally, we can use via to request all indirect effects that go through a third variable, for example below we use grad via gre hs; to request all indirect paths from hs to grad that involve gre , this includes hs to gre to grad (i.e. the solid blue path), and hs to col to gre to grad (i.e. the dotted pink path). The new directional path ( col on hs; ), as well as the specific indirect ( grad ind col hs; ) and via ( grad via gre hs; ) options of the model indirect are highlighted in the input shown below. Title: Multiple indirect paths Data: file is https://stats.idre.ucla.edu/wp-content/uploads/2016/02/path.dat ; Variable: Names are hs gre col grad; Model: gre on col hs; grad on hs col gre; col on hs; Model indirect: grad ind col hs; grad via gre hs; The abridged output  is shown below. Note that the output for this model is similar in structure to the output from earlier models, except for the addition of the section showing the indirect effects.
In the first set of indirect effects (labeled Effects from HS to GRAD) gives the indirect effect of hs on grad through col . Although we estimated a direct effect of hs on grad in the model, this is not shown in this portion of the output (it is shown above), because we requested the specific indirect effect. The second set of indirect effects (labeled Effects from HS to GRAD via GRE) shows all possible indirect effects from hs to grad , that include GRE , in this case, there are two such effects. This portion of the output shows that hs has a significant indirect effect on grad , overall (Sum of indirect), as well as the two specific indirect effects, that is through gre , as well as through col and gre . Note that this output does not include the total effect of grad on hs , for this output we would simply specify grad ind hs; as we did in the previous model.

2.0 An Over Identified Model

This is an example of an overidentified model, that is a model with positive degrees of freedom (as opposed to the previous models which can be described as saturated or just identified). Having positive degrees of freedom allows us to examine the fit of the model using the chi-squared test of model fit, along with fit indices, for example, CFI and RMSEA. In the illustration below, paths that are included in the model are represented by solid lines; paths that could be estimated, but are not, are represented by dotted lines. Note that now hs does not have a direct effect on either grad or gre , its only influence is via col . This corresponds to the hypothesis that high school gpa is only associated with GRE scores and graduate school grades through its relationship with college gpa. The input file for this model is shown below. Title: Path analysis -- over identified model Data: file is https://stats.idre.ucla.edu/wp-content/uploads/2016/02/path.dat ; Variable: Names are hs gre col grad; Model: col on hs; gre on col; grad on col gre; Output: stdyx; Below is the output for this model. INPUT READING TERMINATED NORMALLY Path analysis -- over identified model SUMMARY OF ANALYSIS Number of groups 1 Number of observations 200 Number of dependent variables 3 Number of independent variables 1 Number of continuous latent variables 0 Observed dependent variables Continuous GRE COL GRAD Observed independent variables HS Estimator ML Information matrix OBSERVED Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20 Input data file(s) https://stats.idre.ucla.edu/wp-content/uploads/2016/02/path.dat Input data format FREE THE MODEL ESTIMATION TERMINATED NORMALLY TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 44.429 Degrees of Freedom 2 P-Value 0.0000 Chi-Square Test of Model Fit for the Baseline Model Value 362.474 Degrees of Freedom 6 P-Value 0.0000 CFI/TLI CFI 0.881 TLI 0.643 Loglikelihood H0 Value -2811.629 H1 Value -2789.415 Information Criteria Number of Free Parameters 10 Akaike (AIC) 5643.258 Bayesian (BIC) 5676.242 Sample-Size Adjusted BIC 5644.561 (n* = (n + 2) / 24) RMSEA (Root Mean Square Error Of Approximation) Estimate 0.3266 90 Percent C.I. 0.247 0.412 Probability RMSEA SRMR (Standardized Root Mean Square Residual) Value 0.086 MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-Value COL ON HS 0.605 0.048 12.500 0.000 GRE ON COL 0.625 0.056 11.101 0.000 GRAD ON COL 0.317 0.079 4.014 0.000 GRE 0.492 0.078 6.303 0.000 Intercepts GRE 19.887 3.009 6.609 0.000 COL 21.038 2.576 8.165 0.000 GRAD 9.779 3.664 2.669 0.008 Residual Variances GRE 55.313 5.531 10.000 0.000 COL 49.025 4.903 10.000 0.000 GRAD 67.311 6.731 10.000 0.000 STANDARDIZED MODEL RESULTS STDYX Standardization Two-Tailed Estimate S.E. Est./S.E. P-Value COL ON HS 0.662 0.040 16.684 0.000 GRE ON COL 0.617 0.044 14.112 0.000 GRAD ON COL 0.276 0.068 4.092 0.000 GRE 0.434 0.065 6.671 0.000 Intercepts GRE 2.103 0.397 5.298 0.000 COL 2.251 0.363 6.210 0.000 GRAD 0.913 0.375 2.436 0.015 Residual Variances GRE 0.619 0.054 11.452 0.000 COL 0.561 0.053 10.677 0.000 GRAD 0.587 0.053 11.002 0.000 R-SQUARE Observed Two-Tailed Variable Estimate S.E. Est./S.E. P-Value GRE 0.381 0.054 7.056 0.000 COL 0.439 0.053 8.342 0.000 GRAD 0.413 0.053 7.743 0.000 QUALITY OF NUMERICAL RESULTS Condition Number for the Information Matrix 0.104E-03 (ratio of smallest to largest eigenvalue) The chi-squared value compares the current model to a saturated model. Since our model is not saturated (i.e., our model has positive degrees of freedom), the chi-squared value is no longer zero and may be used to evaluative model fit. Similarly, the CFI and TLI which were equal to one in the just identified model now take on informative values. Further down, the RMSEA and SRMR now take on informative values (in a just identified model, they are displayed as zero). Having positive degrees of freedom, and hence, informative values of the fit indices allows us to better evaluate how well our model fits the data. The specific coefficient estimates from this  model are generally interpreted as they were in the just identified model.

Your Name (required)

Your Email (must be a valid email for us to receive the report!)

Comment/Error Report (required)

How to cite this page

  • © 2021 UC REGENTS

Introduction to Path Analysis in R

Thomas bihansky, what is path analysis.

Path analysis is a form of multiple regression statistical analysis used to evaluate causal models by examining the relationships between a dependent variable and two or more independent variables. Using this method one can estimate both the magnitude and significance of causal connections between variables.

There are two main requirements for path analysis:

All causal relationships between variables must go in one direction only (you cannot have a pair of variables that cause each other)

The variables must have a clear time-ordering since one variable cannot be said to cause another unless it precedes it in time.

Path analysis is theoretically useful because, unlike other techniques, it forces us to specify relationships among all of the independent variables. This results in a model showing causal mechanisms through which independent variables produce both direct and indirect effects on a dependent variable.

How to use Path Analysis

Typically path analysis involves the construction of a path diagram in which the relationships between all variables and the causal direction between them are specifically laid out.

When conducting path analysis one should first construct an input path diagram, which illustrates the hypothesized relationships. After statistical analysis has been completed, an output path diagram can then be constructed, which illustrates the relationships as they actually exist, according to the analysis conducted.

While path analysis is useful for evaluating causal hypotheses, this method cannot determine the direction of causality. It clarifies correlation and indicates the strength of a causal hypothesis, but does not prove direction of causation.

R Packages used

NOTE: OpenMx is required to run semPlot . To install OpenMx , paste the below command into your console and press enter:

Once OpenMx is installed, you can now load the required packages:

Conducting a Path Analysis in R

The four general steps to conducting a Path Analysis in R include:

  • Read in your data (as a correlation matrix or raw data)

Specify the model

Fit the model, view the results, read in your data.

For this tutorial, we will use the mtcars dataset to demonstrate how to conduct a path analysis. However, a covariance matrix can also be used if necessary.

First, we must identify the independent and dependent variables within our dataset.

In the R environment, a regression formula has the following form: y ~ x1 + x2 + x3 + x4

In this formula, the tilde sign (“~”) is the regression operator. On the left-hand side of the operator, we have the dependent variable (y), and on the right-hand side, we have the independent variables, each one separated by the “+” operator.

For this demonstration, we will utilize mpg as the independent variable and cyl , disp , hp , gear , am , wt and carb as the dependent variables. Furthermore, we will also assume that hp is a function of cyl , disp , and carb .

The cfa() function is a dedicated function for fitting confirmatory factor analysis models. The first argument is the user-specified model. The second argument is the dataset that contains the observed variables. Once the model has been fitted, the summary() function provides a nice summary of the fitted model.

As we can see from the above summary, wt is a significant indicator of mpg and both disp and carb are significant indicators of hp . However, hp itself is not significant with respect to mpg .

One of the best ways to understand an SEM model is to inspect the model visually using a path diagram. Thanks to the semPlot package, this is easy to do.

Building a Structural Equation Model (SEM)

The semPaths() function provides a quick and easy way to generate a visual representation of your model and automatically calculates key statistics that describe the relationships between the dependent variable and each independent variable. The SEM produced below is that of the mtcars model we created earlier in this tutorial.

https://rdrr.io/cran/semPlot/man/semPaths.html provides a good breakdown of many additional customization options.

Exercise 1: What other layouts can you find that might make the SEM easier to read? HINT: Google search “semPath layouts”.

The “tree” layout provides a good amount of space between the variables, making it easier to read. The diagram can be customized much further to the programmer’s desire, however that is beyond the scope of this tutorial.

Exercise 2: What do the arrows and values between each independent variable and the dependent variable represent?

The arrows and values between each independent variable and the dependent variable (or moderating variable) are path coefficients. Path coefficients are standardized versions of linear regression weights which can be used in examining the possible causal linkage between statistical variables in the structural equation modeling approach. The standardization involves multiplying the ordinary regression coefficient by the standard deviations of the corresponding explanatory variable: these can then be compared to assess the relative effects of the variables within the fitted regression model.

We can see from the path coefficients in our SEM that mpg is more strongly caused by wt than by any other variable.

Exercise 3: What other inferences can you draw about the relationship between variables from the above SEM?

Exercise 4: What do the arrows and values between the independent variables represent?

As we can see, the arrows and values between the independent variables on the SEM match those calculated through the use of a correlation plot.

Analysing Path Analysis with Multiple Regression

  • First Online: 07 October 2023

Cite this chapter

path analysis hypothesis

  • J. P. Verma 4 &
  • Priyam Verma 5  

Part of the book series: Synthesis Lectures on Mathematics & Statistics ((SLMS))

128 Accesses

We shall start the discussion on path analysis with multiple regression in this chapter. Let us start our discussion with a caselet. A group of researchers wanted to understand the factors that contribute to customer satisfaction in the hospitality industry.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and affiliations.

Sri Sri Aniruddhadeva Sports University, Chabua, Dibrugarh, Assam, India

J. P. Verma

The Aix-Marseille School of Economics, Marseille, France

Priyam Verma

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Verma, J.P., Verma, P. (2024). Analysing Path Analysis with Multiple Regression. In: Understanding Structural Equation Modeling. Synthesis Lectures on Mathematics & Statistics. Springer, Cham. https://doi.org/10.1007/978-3-031-32673-8_5

Download citation

DOI : https://doi.org/10.1007/978-3-031-32673-8_5

Published : 07 October 2023

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-32672-1

Online ISBN : 978-3-031-32673-8

eBook Packages : Synthesis Collection of Technology (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

afni.nimh.nih.gov

You are here, structural equation modeling (sem) or path analysis.

Introduction

Path Analysis  is a causal modeling approach to exploring the correlations within a defined network. The method is also known as Structural Equation Modeling (SEM), Covariance Structural Equation Modeling (CSEM), Analysis of Covariance Structures, or Covariance Structure Analysis. In FMRI data analysis it has been applied to visual system, language production, motor attention, memory system, etc.. Historically it is an approach more often used as confirmatory (hypothesis testing) than exploratory (descriptive or model searching), more model-driven than data-driven, and more "causal" than correlative.

The hypothetical model in path analysis usually involves two kinds of variables: observable/manifest (endogenous or dependent) variable and latent (exogenous or non-observable) variables. Observable variables serve as indicators of the underlying construct represented by the observable variables, and latent variables are usually theoretical constructs that cannot be observed directly. In FMRI the observable variables are BOLD time series at those regions of interest, while it usually does not involve any latent variables.

There are two goals of path analysis: (1) understanding patterns of correlations among the regions; (2) explaining as much of the regional variation as possible with the model specified. Different from statistical testing in other techniques, such as multiple regression and ANOVA, the focus in path analysis is usually on a decision about the whole model: reject, modify, or accept it?

A few noteworthy points regarding path analysis:

* Interpretation of path coefficients: First of all, they are not correlation coefficients. Suppose we have a network with a path connecting from region A to region B. The meaning of the path coefficient theta (e.g., 0.81) is this: if region A increases by one standard deviation from its mean, region B would be expected to increase by 0.81 its own standard deviations from its own mean while holding all other relevant regional connections constant. With a path coefficient of -0.16, when region A increases by one standard deviation from its mean, region B would be expected to decrease by 0.16 its own standard deviations from its own mean while holding all other relevant regional connections constant.

*  Requirement for large sample size? < 100: small; 100-200: medium.

* An alternative model might account for the proposed model equally well if not better. 

* Keep in mind there is a difference between a statistical model of reality and reality itself (Kline, 2005).

* Statistical causal modeling (including SEM) does not prove causation.

* Specification error of omitting ROIs from a path analysis is that estimates of causal effects of ROIs included in the model may be inaccurate if some ROIs are omitted that co-vary with those in the model, and the error could be either underestimation (more likely) or overestimation.

* One assumption of path analysis, measurement error for exogenous variables, is hardly true in FMRI.

* Uncertain about the directionality of an effect? (1) Try alternative models with different directonalities: (2) Include reciprocal effects. The first is preferred. However, it is possible some models equally fit the data, thus no statistical basis for model selection among alternatives. Inclusion of reciprocal effect makes the analysis non-recursive (thus more difficult to analyze). (3) Include all possible paths? Limit on degrees of freedom, which has to be positive so that the model would be identifiable.

1dSEM  in the most recent AFNI package is specifically written for path analysis in the context of FMRI field based on Bullmore et al. [1] and Stein and Meyer-Lindenberg [2]. A big difference here from the conventional path analysis is that the residual error variances are estimated prior to the analysis instead of being treated as unknown parameters. The main reason is that the sample size is relatively small in FMRI data.

For more discussion, see  G. Chen, et al., Vector autoregression, structural equation modeling, and their synthesis in neuroimaging data analysis, Comput. Biol. Med. (2011), doi:10.1016/j.compbiomed.2011.09.004

A script for preprocessing and running SEM

To make life easier, we created a  tcsh script  (updated Dec 11, 2008) that contains all the steps described below except ROI time series extraction. First make sure you have the most recent version of AFNI on your computer. If you have  n  subjects and  m  ROIs, you need one input file for each of those  n  subjects, in which you store the  m  ROI time series as m columns, plus a file specifying the connections. To run the script, do something like this:

tcsh -x SEMscript.csh subj1.1D subj2.1D ... subj15.1D thetas.1D

Again the last file in the command line, thetas.1D, stores a matrix of  m x m  ( m  = number of ROIs) whose specifications were discussed above. Also, at the end of the script, there are 3 command lines for running SEM in 3 different modes: model validation, tree model, and forest search. So you may want to make some modifications there.

Details of running SEM

The following is a suggested scheme for obtaining input data for 1dSEM. It is assumed that all subjects have gone through exactly the same experiment design, and the time series have been extracted at those regions of interest for each subject from the input file for individual subject analysis (motion corrected and scaled properly). If some subjects don't share the same time series, you can stack all subjects' data, and then calculate the covariance or correlation matrix, but then you may have to modify the following steps to adopt the new situation.

You may consider using 3dSynthesize (and 3dcalc) to remove effects of no interests such as baseline, head motion, task effects of no interest, physiological fluctuations, etc.. 

Suppose there are 15 subjects, 5 ROI's with 300 TR's in each time series, totaling 15 X 5 = 75 times series: ts#_Subj*.1D

(1) Compute eigentimeseries for each ROI.

For each ROI, run singular value decomposition (SVD) on an n X T matrix (time series of all subjects) using  1dsvd  (n = number of subjects,  T = number of TR's in the extracted time series):

   1dsvd ts#_Subj 1 .1D ts#_Subj 2 .1D .. ts#_Subj n .1D   (# is the region serial number)

The output on the screen looks like this (15 subjects and 300 TR's):

++ 1dsvd input vectors: 00..00:  ts_Subj 1 .1D 01..01:  ts_Subj 2 .1D ... 14..14:  ts_Subjn.1D ++ Data vectors [A]:     --------- --------- --------- 00:   0.15000   0.00000   ... 0.00000 01:   0.27300   0.16000   ... 0.53000 ... 299: 0.01000   0.22500   ... 0.00000 ++ Left Vectors [U]:             25.639       7.9888      ...  0.5319    ------------ ------------ ------------ 00:     -0.38698     0.075116  ...  -0.090827 01:     -0.09261     0.033281  ...  -0.077916 ... 299:   -0.42006    -0.040904  ...   0.35085 ++ Right Vectors [V]:         25.63904   7.98879  ... 0.53188     --------- --------- -------------- 00:  -0.52292   0.64472  ... 0.55758 01:  -0.65014   0.12138  ... -0.75006 ... 14:  -0.55126  -0.75472  ... 0.35568 ++ Pseudo-inverse:     --------- --------- --------- ---- 00:   0.01331   0.00150   ...   0.02793 01:   0.02114   0.01112   ...  -0.02553 ... 14:  -0.01018  -0.00799  ...   0.03334

The SVD decomposition is in the format of A = USV', where S is a diagonal matrix consisting of singular values s1, s2, ..., s15 of A (or the square roots of the eigenvalues of A'A.) These singular values are shown above for both left and right matrices in the output in descending order.

Extract the first column from left matrix U which corrresponds to the largest singular value s1, and save it as sv_nn.1D (nn represents the ROI index). This vector is also the corresponding eigenvector of A'A. Plot out sv_nn.1D ( 1dplot ), and verify whether it has a pattern more or less matching up the frequency of the experiment design.

(2) For each ROI re-sign left singular vectors sv*.1D

Obtain average time course of the ROI:

  3dMean -prefix ts_mean_nn  Subj * _ts_nn.1D 

  set dotp = `3ddot -dodot sv_nn.1D ts_mean_nn.1D`

  1deval -a sv_nn.1D -expr "step(-$dotp)*(-a)+ step($dotp)*a " >svc_nn.1D

The last command corrects the sign of sv.1D if the output (a single number, the dot product) from 3ddot is negative.

(3) Calculate covariance or correlation matrix

Correlation matrix is usually used because of arbitrary scaling issue in the BOLD signal.

Once steps 2 and 3 are done for all ROI's, estimate the inter-regional covariance matrix based on singular vector identified above

   1ddot -dem -cov -terse svc_1.1D svc_2.1D ... svc_5.1D   (5 ROIs in this example)

Alternatively, you can get the covariance coefficient matrix by running

   1ddot -dem -cov -terse  ts_mean_1.1D  ts_mean_2.1D ...  ts_mean_5.1D 

If you prefer to use the correlation matrix, do the following

   1ddot -dem  -terse  svc_1.1D svc_2.1D ... svc_5.1D 

   1ddot -dem  -terse  ts_mean_1.1D  ts_mean_2.1D ...  ts_mean_5.1D  

(4) Calculate residual error variances for each ROI

psi = Σsi2 -s12   (basically the sum of those singular values squared without the first - principal - one)

where s1, s2, ..., s15 are singular values of A in step 1.

Or if you obtained correlation matrix during the previous step, use

psi = 1 - s12/Σsi2   (5) Obtain effective degrees of freedom

Use 3ddot to get the first order autocorrelation coefficient   ar_i  for  i -th ROI

    3ddot -demean  svc#.1D'{0..298}'  svc#.1D'{1..299}'  (# is the serial region number)

   3ddot -demean  ts#_mean.1D'{0..298}'  ts1_mean.1D'{1..299}'  (# is the serial region number)

The effective degrees of freedom is estimated as

(T/P)  Σ(1-ar_i)/(1+ar_i)   (you can use a calculator such as  ccalc )

where T = number of time points in each time series, P = number of ROI's

(6) Use results from steps 3, 4, and 5 as input for 1dSEM.

There are two basic modes of analysis in 1dSEM:  model validation  and  model search . With model validation, you can test whether a theoretical network can stand against the path analysis. Suppose we have a model of 5 regions in the brain like this (focus on the path connections and ignore those path coefficients for the moment)

  First create a text file testthetas.1D specifying the connections

#    VEC     PFC   SMA   IFG   IPL VEC   0      0      0     0    1   PFC   1       0      0     0     0  SMA   0     1      0     0     0 IFG    0     0      1     0     0   IPL   1     0      0     1      0

Save the correlation matrix from step 4 as file testcorr.1D

0.661 1 0 0 0

0.525 0.66 1 0 0

0.486 0.507 0.437 1 0

0.731 0.63 0.558 0.517 1

  and the residual error variances from step 5 as file testpsi.1D

0.825 0.868 0.87 0.881 0.851

Then run 1dSEM (number 30 at the end of the script is from step 6)

   1dSEM -theta testthetas.1D -C testcorr.1D -psi testpsi.1D -DF 30 -limits -1 1

with the following output on the screen

++ Program 1dSEM: AFNI version=AFNI_2007_01_15_xxxx [32-bit] ++ Authored by: Daniel Glen, Gang Chen Finding optimal theta values ++ Total number of iterations 82429 ++ Cost is 0.429081 ++ Chi Square = 12.4434 Connection coefficients matrix: 5 x 5 01:      0.0000      0.0000      0.0000      0.0000      0.8076 02:      0.5974      0.0000      0.0000      0.0000      0.0000 03:      0.0000      0.5961      0.0000      0.0000      0.0000 04:      0.0000      0.0000      0.3144      0.0000      0.0000 05:     -0.1589      0.0000      0.0000      0.5231      0.0000

The estimated path coefficients are shown in the above figure.

On the other hand if we want to adopt the model search mode looking for a 'best model' that fits the data, replace file testthetas.1D with the following matrix (check 1dSEM -help for definitions)

#    VEC     PFC   SMA   IFG   IPL VEC   0      2      2     2     2   PFC   2      0      2     2     2  SMA   2      2      0     2     2 IFG   2      2      2     0     2   IPL   2      2      2     2     0

and run the following

1dSEM -model_search -theta testthetas_ms.1D -C testcorr.1D -psi testpsi.1D -nrand 10 -DF 30 -stop_cost 0.1 -grow_all -max_paths 3 -limits -1 1

Output is something like

++ Program 1dSEM: AFNI version=AFNI_2006_06_30_1332 [32-bit] ++ Authored by: Daniel Glen, Gang Chen nmodels to try is 20 theta_init_mat matrix: 5 x 5 #             VEC         PFC         SMA         IFG         IPL VEC         0.0000      2.0000      2.0000      2.0000      2.0000 PFC         2.0000      0.0000      2.0000      2.0000      2.0000 SMA         2.0000      2.0000      0.0000      2.0000      2.0000 IFG         2.0000      2.0000      2.0000      0.0000      2.0000 IPL         2.0000      2.0000      2.0000      2.0000      0.0000 Finding optimal theta values Total number of iterations 147957 max i,j = 0, 4 with cost = 1.87853, ntheta = 1 Connection coefficients matrix: 5 x 5 #             VEC         PFC         SMA         IFG         IPL VEC         0.0000      0.0000      0.0000      0.0000      0.7310 PFC         0.0000      0.0000      0.0000      0.0000      0.0000 SMA         0.0000      0.0000      0.0000      0.0000      0.0000 IFG         0.0000      0.0000      0.0000      0.0000      0.0000 IPL         0.0000      0.0000      0.0000      0.0000      0.0000 Finding optimal theta values Total number of iterations 247056 max i,j = 1, 2 with cost = 1.37668, ntheta = 2 Connection coefficients matrix: 5 x 5 #             VEC         PFC         SMA         IFG         IPL VEC         0.0000      0.0000      0.0000      0.0000      0.7310 PFC         0.0000      0.0000      0.6600      0.0000      0.0000 SMA         0.0000      0.0000      0.0000      0.0000      0.0000 IFG         0.0000      0.0000      0.0000      0.0000      0.0000 IPL         0.0000      0.0000      0.0000      0.0000      0.0000 Finding optimal theta values Total number of iterations 310515 max i,j = 4, 2 with cost = 1.0108, ntheta = 3 Connection coefficients matrix: 5 x 5 #             VEC         PFC         SMA         IFG         IPL VEC         0.0000      0.0000      0.0000      0.0000      0.7310 PFC         0.0000      0.0000      0.6600      0.0000      0.0000 SMA         0.0000      0.0000      0.0000      0.0000      0.0000 IFG         0.0000      0.0000      0.0000      0.0000      0.0000 IPL         0.0000      0.0000      0.5580      0.0000      0.0000 Finding optimal theta values Total number of iterations 370922 max i,j = 3, 4 with cost = 0.707411, ntheta = 4 Connection coefficients matrix: 5 x 5 #             VEC         PFC         SMA         IFG         IPL VEC         0.0000      0.0000      0.0000      0.0000      0.7310 PFC         0.0000      0.0000      0.6600      0.0000      0.0000 SMA         0.0000      0.0000      0.0000      0.0000      0.0000 IFG         0.0000      0.0000      0.0000      0.0000      0.5170 IPL         0.0000      0.0000      0.5580      0.0000      0.0000 Finding optimal theta values Total number of iterations 438186 max i,j = 1, 0 with cost = 0.5501, ntheta = 5 Connection coefficients matrix: 5 x 5 #             VEC         PFC         SMA         IFG         IPL VEC         0.0000      0.0000      0.0000      0.0000      0.7310 PFC         0.4342      0.0000      0.4321      0.0000      0.0000 SMA         0.0000      0.0000      0.0000      0.0000      0.0000 IFG         0.0000      0.0000      0.0000      0.0000      0.5170 IPL         0.0000      0.0000      0.5580      0.0000      0.0000 Finding optimal theta values Total number of iterations 750521 max i,j = 2, 3 with cost = 0.491374, ntheta = 6 Connection coefficients matrix: 5 x 5 #             VEC         PFC         SMA         IFG         IPL VEC         0.0000      0.0000      0.0000      0.0000      0.7310 PFC         0.4342      0.0000      0.4321      0.0000      0.0000 SMA         0.0000      0.0000      0.0000      0.2667      0.0000 IFG         0.0000      0.0000      0.0000      0.0000      0.4028 IPL         0.0000      0.0000      0.4618      0.0000      0.0000

*** Theoretically speaking the range of the path coefficients can be anything, but most of the time they do fall into [-1, 1]. To save runtime, the default values for -limits are set with -1 and 1, but if the result hits the boundary, increase them under option -limits and re-run the analysis.

*** To make life easier, we created a  tcsh script  (July 10, 2007) that contains all the above steps except ROI time series extraction. First make sure you have the most recent version of AFNI on your computer. If you have  n  subjects and  m  ROIs, you need one input file for each of those  n  subjects, in which you store the  m  ROI time series as m columns, plus a file specifying the connections. To run the script, do something like this:

Again the last file in the command line, thetas.1D, stores a matrix of  m x m  ( m  = number of ROIs) whose specifications were discussed above. Also, at the end of the script, there are 3 command lines for running SEM in 3 different modes: model validation, tree model, and forest search. So you may want to make some modifications there.  

Other packages

1.  sem package  in  R , a free software environment for statistical computing and graphics [Download basic package - either the appropriate binary for your platform or the source code; Set up path; Install sem by implementing the following at R prompt: install.packages("sem",dependencies=TRUE); Try: library(sem) If it doesn't complain, everything is in the right order.] ( manual )

2. A  Matlab package  by  Douglas Steele (Matlab required)

3.  Mplus  has a free  demo version  with limitation of up to 6 dependent variables, 2 independent variables and 2 between variables in two-level analysis - Microsoft Windows

4.  LISREL  by Scientific Software International Inc. (student version free) - Microsoft Windows

5.  AMOS  (Analysis of Moment Structures) by SPSS, Inc. (Free  student version  limited to 8 observed variables and 54 parameters) - Microsoft Windows

6.  CALIS  (Covariance Analysis and Linear Structural ) procedure of SAS/STAT 8 - Microsoft Windows

7.  EQS  - Microsoft Windows

8.  Mx Graph  (FREE) - Microsoft Windows

Acknowlegement

We sincerely thank Andreas Meyer-Lindenberg and Jason Stein for their generous help during the development of this program. 

References [1] Bullmore, E. T., Horwitz, B., Honey, G. D., Brammer, M. J., Williams, S. C. R., Sharma, T., How Good is Good Enough in Path Analysis of fMRI Data? NeuroImage 11, 289-301 (2000). [2] Stein, J.L., Wiedholz, L.M., Bassett, D.S., Weinberger, D.R., Zink, C.F., Mattay, V.S., Meyer-Lindenberg, A.  (2007).  “A Validated Network of Effective Amygdala Connectivity”  NeuroImage 36: 736–745.  [3]  SEM wiki [4] http://www2.chass.ncsu.edu/garson/pa765/structur.htm

FOIA     Accessibility     Privacy Policy

path analysis hypothesis

HHS Vulnerability Disclosure

path analysis hypothesis

We use cookies to improve user experience and analyze website traffic. By continuing to the site, you consent to store all the technologies described in our Cookie Policy.

Path Analysis

Path Analysis extends the standard regression models by allowing the simultaneous modeling of multiple interrelated dependence relationships. It is a multivariate method used to examine multiple regression-like equations simultaneously and gets its name from the causal paths, or 'links', drawn between variables in a diagram.

What is Path Analysis?

Path Analysis allows for hypothesis testing about the network of relationships among variables in the model. Its use is common in social sciences, economics, epidemiology, and biology where it serves as an exploratory tool or a way to confirm a specific direction of causal effects.

Every path model comprises two types of variables: endogenous (those being influenced within the system) and exogenous (those that come from outside the model and exert their influence on it).

What is the difference between Path Analysis and Structural Equation Modeling?

What are some applications of path analysis in gis, what are the assumptions in a path analysis model, can path analysis handle categorical variables, ready to level up your map-making process.

The new standard for GIS software. Simple and powerful. Step into Atlas to explore, analyze and share geospatial data.

  • Community Maps
  • Help Center
  • Documentation
  • GIS Formats
  • Technical Glossary
  • Talk to support

Trending posts

  • Get better at using color palettes with choropleth maps
  • Free Data Sources for Environmental Data
  • New in Atlas: Flexible Popups
  • Reveal Patterns with Heatmaps
  • Travel Time Analysis to Analyze Locations
  • Bookmarks - Shortcuts to Interesting Places

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Environ Res Public Health

Logo of ijerph

Path Analysis to Assess Socio-Economic and Mitigation Measure Determinants for Daily Coronavirus Infections

Elie yammine.

1 Statistics and Computer Science Department, Faculty of Science, Lebanese University, Baabda 1003, Lebanon; [email protected]

Abbas Rammal

2 Data Science Department, Faculty of Information, Lebanese University, Baabda 1003, Lebanon

Associated Data

All data, models, and code generated or used during the study appear in the submitted article.

(1) Background: With the rapid global spread of the coronavirus disease 2019 (COVID-19) and the relatively high daily cases recorded in a short time compared to other types of seasonal flu, the world remains under continuous threat unless we identify the key factors that contribute to these unexpected records. This identification is important for developing effective criteria and plans to reduce the spread of the COVID-19 pandemic and can guide national authorities to tighten or reduce mitigation measures, in addition to spreading awareness of the important factors that contribute to the propagation of the disease. (2) Methods: The data represents the daily infections (210 days) in four different countries (China, Italy, Iran, and Lebanon) taken approximately in the same duration, between January and March 2020. Path analysis was implemented on the data to detect the significant factors that affect the daily COVID-19 infections. (3) Results: The path coefficients show that quarantine commitment (β = −0.823) and full lockdown measures (β = −0.775) have the largest direct effect on COVID-19 daily infections. The results also show that more experience (β = −0.35), density in society (β = −0.288), medical resources (β = 0.136), and economic resources (β = 0.142) have indirect effects on daily COVID-19 infections. (4) Conclusions: The COVID-19 daily infections directly decrease with complete lockdown measures, quarantine commitment, wearing masks, and social distancing. COVID-19 daily cases are indirectly associated with population density, special events, previous experience, technology used, economic resources, and medical resources.

1. Introduction

The coronavirus disease 2019 (COVID-19) has rapidly spread around the world since it first appeared in the city of Wuhan, China, towards the end of December 2019. The World Health Organization classified it as a pandemic in March 2020 [ 1 ]. Europe and the U.S. have recorded the highest number of infections and deaths. U.S. cases formed more than one fourth of total global infections by June 2020 [ 2 ]. Governmental and institutional reactions and measures varied across countries with respect to the time of introduction of social distancing measures and their degree of severity. Globally, these control measures have caused significant disruption to social and economic structures. However, it is unknown whether these policies have had an impact, and how long they should remain in place. It is thus essential to assess the effects of these control measures on the pandemic for the benefit of global health security.

Although there have been many efforts to analyze and predict the behavior of CVOID-19 infections, due to the highly complex nature of the outbreak and the variation in its behavior from nation-to-nation, the main challenge is to determine the factors that affect the increase of daily infections. The study is aimed at examining the socio-economic and mitigation measure determinants of the daily infections. Through the path analysis technique, which is a form of multiple regression statistical analysis that is used to evaluate causal models by examining the relationships between a dependent variable and independent variables, we can estimate both the magnitude and significance of causal connections between variables [ 3 ]. For this reason, we apply the path analysis technique in this study, which involves the analysis of hypothesized relationships among multiple variables [ 4 ]. This technique consists of a family of models that depicts the influence of a set of variables on one another [ 5 ].

Many studies have focused on the application of path analysis on the COVID-19 pandemic. V. Burkova conducted a path analysis to examine possible factors that may be associated with self-reported levels of anxiety during the first wave of the COVID-19 pandemic [ 6 ], while B. Wielgus performed a path analysis to examine the relationship between anxiety and general psychosomatic functioning during the COVID-19 pandemic, considering the influence of indirect factors such as psychological flexibility and mindfulness [ 7 ]. Annette Brose performed multilevel structural equation modeling to identify mechanisms underlying changes in well-being in times of threat in the COVID-19 pandemic, with a focus on appraisals of the pandemic and affective states, stress, and mindfulness in daily life [ 8 ]. On the other hand, L. Tamariz conducted structural equation modeling (SEM) on COVID-19 infections in South Florida and found that the infection is associated with economic disadvantage in a particular geographical area and not with racial/ethnic distribution [ 9 ]. Furthermore, M. Zareipour conducted a study based on a path analysis to find the determinants of COVID-19 prevention behavior in the elderly in Urmia, Iran, and found that effective interventions based on the health belief model and promoting knowledge, perceived susceptibility, severity, and perceived self-efficacy can prevent the elderly from contracting this disease [ 10 ]. Marvin G. Pizon generated a path analysis model of COVID-19 to establish the specific cause-and-effect between air pressure, air temperature, and relative humidity [ 11 ]. L. Salehi applied a path analysis to assess the relationship of fear and anxiety caused by COVID-19 with pregnancy and the mental health of pregnant women and found that it is necessary to pay more attention to the mental health of pregnant women during the pandemic [ 12 ].

Path analysis has also been used widely in the medical field. Hardenberg developed a path analysis model based on linear equation system for use in phylogenetic studies [ 13 ]. In his article, using a path analysis, H. Nadrian examined the possible direct/indirect effects of health belief model (HBM) constructs on self-care behaviors among heart failure patients [ 14 ]. Rebekah J. Walker studied the association between the social determinants of health to outcomes in individuals with type 2 diabetes and the results were consistent with a previous conceptual framework which stated that there exist a direct and an indirect link between socio-economic and psychosocial factors and glycemic control [ 15 ]. Path analysis and SEM are some of the most used techniques nowadays despite the continuous rise of new and sophisticated methods in social and medical sciences.

Previous studies have shown that many factors can be associated to the daily cases of COVID-19. It was found in Thailand that touristic and cultural activities are significant factors that contribute to the number of COVID-19 cases [ 16 ]. In Italy, a strict lockdown decreased the transmission rate to maintain societal immunity [ 17 ]. Population density is found to be positively related with deaths due to COVID-19 in low populated countries [ 18 ]. Income, social capital, and trust and beliefs are proven to be significant factors related to daily COVID-19 cases [ 19 ].

The article is organized as follows. In Section 2 , we explain the variables used in the study, how data were collected, and the statistical methodology used. Second, we present the results and coefficients in Section 3 . Finally, we analyze and interpret our results and provide the discussion and conclusions in Section 4 and Section 5 , respectively.

2. Materials and Methods

2.1. conceptual framework, 2.1.1. statement of problem.

As the world witnessed a continuous increase in the daily infections of COVID-19 with great fear of an uncontrolled spread of the disease, it became essential to determine the variables and factors affecting this increase and take immediate action to control the spread.

2.1.2. Importance of Variables Selected

Because COVID-19 transfers through surfaces and the air, implementing lockdown measures with different levels is important to reduce contact between people. Quarantine and lockdowns have always been effective ways to control communicable disease outbreaks. An example of this is the 2003 SARS outbreak, where the use of quarantine, border controls, contact tracing, and surveillance proved to be effective in containing the global threat in just over three months [ 20 ]. We also think that medical and economic resources are among the main factors that contribute to the daily COVID-19 cases where countries with enough resources will not face difficulties in controlling the spread, whereas a lack of resources has been a source of weakness in fighting against such SARS diseases. Financing profoundly affects the performance of the health system in a specific country. Any policy that the health system decides to implement or not directly depends on the amount of available funding [ 21 ]. Experience in dealing with health outbreaks greatly impacted how countries in response to COVID-19, as in the case of Hong Kong, which faced the 1957 “Asian” and 1968 “Hong Kong” influenza pandemics, along with A(H7N9) in 2013. In addition, Taiwan experienced the SARS outbreak in 2003, whereas Liberia was profoundly affected by the Ebola epidemic in 2014, which led to thousands of deaths.

All these experiences made local governments realize the importance of establishing a tiered command structure to prepare for and respond to future outbreaks and consolidate all health protection functions. As a result, the public health systems and social measures in Hong Kong proved to be critical in controlling COVID-19. Liberia maintained a low level of spread of the COVID-19 while Taiwan recorded only around 600 positive cases by March 2021 [ 22 ]. Some studies showed that the population density is important in modeling the COVID-19 infections. A study in the U.S. revealed that population density is an effective predictor of cumulative infection cases at the country level [ 23 ]. Other studies [ 24 , 25 , 26 , 27 ] have shown that SARS-CoV-2 transmission is potentially more likely to occur among cities with higher population densities. The use of modern technology in healthcare systems has helped in many aspects. Artificial Intelligence (AI) is used to identify, track, and forecast outbreaks and help in diagnosing the virus. It is used in processing the healthcare claims. Drones and robots are used to deliver food and medical supplies and sterilize public places. AI is helping to develop drugs and COVID-19 vaccines through the use of supercomputers [ 28 ].

Not all factors influence the COVID-19 daily cases directly. For that we assume some variables have direct and indirect effects, either negative or positive. Performing a series of multiple regressions among the independent variables can help us identify the mediators that connect independent variables with the daily COVID-19 cases. Based on the multiple regressions, mediators are included in the path analysis so that independent variables have direct and indirect effects on our dependent variable.

2.2. Data Set Collection

To answer and judge the test hypothesis and evaluate the outcomes of particular questions, we used the process of collecting and measuring data. Thus, to predict the behavior of the spread of coronavirus, four countries were chosen that adopted different methodologies to deal with the COVID-19 pandemic and achieved different results related to the methodology used: China (67 days), Lebanon (40 days), Italy (61 days), and Iran (42 days).

After studying the situation of the virus in these countries, we noticed several indicators that directly or indirectly affected the level of spread in each country. These are:

  • The governments’ reactions: this factor refers to the different responses and reactions from the governments of the four countries during the outbreak. These indices are used to explore whether government response affects the rate of infection and identify correlates of intense responses.
  • The medical resources: this factor refers to the health system policies such as the COVID-19 testing regime or emergency investments into healthcare (ICU beds, etc.) and the health services quality in these four countries. The sensitivity effects of this factor on the results are proposed to be investigated in this study.
  • The commitment of the people in each country to government guidelines. Theoretically, this factor must have a direct dependance on the intensity of the spread of COVID-19 in these countries.
  • The special events: this factor takes into consideration the existence of simultaneous events that affected the spread of COVID-19: other disasters, economic problems, war, political problems or disturbance, official holidays, etc.
  • The economic level and governmental aids: this factor refer to the economic policies enacted during the pandemic, such as income support to citizens or the provision of foreign aid. Depending on the direct relationship between this factor and the quarantine compliance of the people in each country, we have proposed it to be present in this study.
  • Previous experience in the history of the four governments that determined the existence of experience in critical disaster management, or lack thereof.
  • The use of technology devoted to control the virus spread in these countries to help in health and hospitalization services, lockdown control, and restrictions of infected zones.
  • The population density: this is considered as the number of the people per 1 km 2 in the four countries, which can affect the spread of the virus.
  • The family number: this refers to the average number of family members in each of the selected countries.

These direct and indirect factors were used as parameters by our model to predict the future behavior of the spread of COVID-19. The data were combined into a series of novel indices that aggregated various measures of each factor. The parameters were then measured and detected depending on specified criteria and are presented in Table 1 .

The direct and indirect factors that are hypothesized to affect the spread level of COVID-19 in the selected countries.

Thus, we collected real data (210 days) of the four countries, from different official sources for precise parameters and the daily infection records. The date range for the data for each of the four countries is shown below.

  • China: 9 January 2020–28 March 2020 (80 days)
  • Lebanon: 21 February 2020–31 March 2020 (40 days)
  • Italy: 31 January 2020–31 March 2020 (61 days)
  • Iran: 19 February 2020–31 March 2020 (42 days)

The dependent variable is the daily infections records which are basically the cumulative records for a dependent day-by-day scale. In other words, the record of the next day is the sum of the records of the current day and the new records obtained in the same day. Table 1 shows the technique for coding each of the factors by developing the measurement scales used to build the model. This is the first basic step to build and develop a model using the structural equation modeling method.

In addition, the data collected do not include missing values. Using the free missing values factors, we can predict the degrees of the possibility of COVID-19 infection with the help of a machine learning algorithm. These methods may result in better accuracy, unless a missing value is expected to have a very high variance.

2.3. Hypothesis

We hypothesized that lockdown, medical resources, economic resources, technology used, population density, previous experience, family number, procedure, and special events variables influenced the daily COVID-19 infections.

The purpose of studying the above hypothesis lies in determining the factors with the most influence on the development of the COVID-19 pandemic. This will enable us to act quickly and consciously in tightening or reducing the mitigation measures, thereby leading to a better understanding of the behavior of the virus.

2.4. Statistical Analysis

The data were analyzed by path analysis using the AMOS and SPSS statistical software to determine the direct and indirect effects. We used structural equation modeling (SEM) which is defined as a combination of factor analysis and regression. SEM is a powerful, multivariate technique used increasingly in scientific investigations to test and evaluate multivariate causal relationships. SEM differs from other modeling approaches in that it tests the direct and indirect effects on pre-assumed causal relationships. Path analysis was developed to quantify the relationships among multiple variables [ 29 ]. It was the early name for SEM before there were latent variables, and it was very powerful in testing and developing the structural hypothesis with both indirect and direct causal effects. However, the two effects have recently been synonymized. Path analysis can explain the causal relationships among variables. A common function of path analysis is mediation, which assumes that a variable can influence an outcome directly or indirectly through another variable. The interest in SEM is generally on constructs called latent variables. The relationship between the latent variables is represented by regression or path coefficients. The structural equation model implies a structure of the covariances between the observed variable and the latent variable [ 30 ]. Path analysis is a statistical technique that uses both bivariate and multiple linear regression techniques to test the causal relations among the variables specialized in the model [ 31 ]. By using this method, we can estimate both the magnitude and significance of causal connections between variables. In this study, path coefficients were computed via a series of multiple regression analyses based on the hypothesized model. Path diagrams were constructed with a single headed arrow representing the causal order between two variables, with the head pointing to the effect and the tail to the cause. A curved, double arrow indicated a correlation between two variables. The method is also known as causal modeling, analysis of covariance structures, and latent variable model [ 32 ]. The sample size in this study was adequate based on the recommendation by Kline [ 6 ] that 10–20 times as many cases as parameters are sufficient for significance testing of model effects.

Path analysis is comprised of four stages: (1) model specification: statement of the theoretical model in terms of equations or a diagram; (2) model identification and parameter estimate: the theoretical model can be estimated with observed data. The model’s parameters are statistically estimated from data. Multiple regression is one such estimation method, but most often more complicated methods are used; (3) model fit: the estimated model parameters are used to predict the correlations or covariances between measured variables and the predicted correlations or covariances are compared to the observed correlations or covariances; (4) model respecification: the model is respecified by adding or removing a significant or a non-significant parameter estimate depending on its P-value and the change of the chi-square of the model. The final process of the path analysis is the resulting identification of the effects of independent variables on the dependent variable. The relationship between the variables is described in the form of structural equations. The structural equations are constructed by calculating the direct effects (DE), indirect effects (IE), and the total effect (TE) between the variables [ 33 ]. The values of these indices are determined based on the path coefficients. The stages of path analysis are depicted in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is ijerph-18-10071-g001.jpg

The stages of spatial path analysis.

3.1. Descriptive Analysis

The days were distributed as follows: 31.9% from China, 20% from Iran, 29% from Italy, and 19% from Lebanon. Only China had previous experience in dealing with a viral outbreak, whereas the remaining countries had no experience. Most days (39%), people were not completely committed to quarantine measures, and there were no special events 49% of the days. A full lockdown was held 43.8% of the days. Moreover, on 47% of the days, medical resources were considered as good, and on 50.5% of the days, economic resources were low. The technology used was considered low and high 30% of the time, respectively.

Table 2 displays a summary statistic on the variables used. We can see from Table 2 that in most days medical resources (mean = 0.7774) were available, whereas there was not enough technology available to use for mitigation measures (mean = 0.4679).

Summary statistics for the variables used in path analysis.

Figure 2 shows that China, which implemented a full lockdown, recorded as much COVID-19 cases as other countries that implemented partial lockdowns. Meanwhile, Iran recorded less cases than Italy with the same lockdown measures (Lockdown = 0.25, 0.5).

An external file that holds a picture, illustration, etc.
Object name is ijerph-18-10071-g002.jpg

Grouped scatterplot of lockdown and daily Covid-19 cases by country.

Figure 3 shows that countries with prior experience in health crises were able to reduce the transmission of COVID-19, whereas countries with no previous experience recorded higher cases of the virus.

An external file that holds a picture, illustration, etc.
Object name is ijerph-18-10071-g003.jpg

Grouped scatterplot of lockdown and daily Covid-19 cases by experience.

Table 3 displays the Pearson correlation between the independent variables among each other and between every independent variable and the daily cases of COVID-19.

Pearson correlation between the factors and the dependent variable.

** Correlation is significant at the 0.01 level (two-tailed). * Correlation is significant at the 0.05 level (two-tailed).

Table 3 shows that there exist five variables (Technology Used, Procedure, Density, Medical Resources, and Economic Resources) that are correlated with the daily cases at a 0.01 level of significance, while only one variable (Family number) is correlated to the daily cases with 0.05 level of significance. Because 6 out of 10 variables are significantly correlated to the daily cases of COVID-19, and since we are trying to assess and detect the factors that most contribute to the daily infections, path analysis is the right methodology to use. Most variables are correlated to each other which creates dependencies and associations among the independent variables; thus, mediators (variables carrying the indirect effects) have high correlations with the independent ones.

3.2. Evaluation of Path Analysis

First, we present the necessary indices that validate our path model. One of the most used fit indices worldwide is the Chi-square goodness of fit resulting from maximum likelihood estimation (MLE). In fact, the smaller χ G O F 2 is, the better the fit model. In our model, a minimum Chi-square of 83.1 was reached after 11 iterations. The probability level obtained was equal to 0, verifying the significance of the model. We consider two other goodness-of-fit indices: Akaike’s information criterion (AIC) and Schwarz Bayesian information criterion (BIC). These indices are not used to test the model in the sense of hypothesis testing, but for model selection. Given a data set, a researcher chooses either the AIC or BIC, and computes it for all models under consideration. Then, the model with the lowest index is selected. Note that both the AIC and BIC combine absolute fit with model parsimony [ 34 ]. The lowest AIC and BIC found are 157.1 and 280.943, respectively. The corrected Akaike’s information criterion (CAIC) = 317.943 The goodness-of-fit index (GFI), the proportion of variance accounted for by the estimated population covariance, is equal to 0.934. It is categorized as an absolute fit index (AFI) which examines the level of correspondence between the proposed model and the observed data.

The following indices, called incremental fit indices, permitted us to evaluate the contribution of the estimated model with respect to the reference model (null model). These indices suggested improvements in the fit of the model. The comparative fit index (CFI), comparing the fit of a target model to the fit of an independent or null model, was equal to 0.978 for our model. The Tucker–Lewis index (TLI), used to measure a relative reduction in misfit per degree of freedom [ 35 ], was equal to 0.944. The normed fit index (NFI) which reflects the proportion by which a researcher’s model improves fit compared to the null model (uncorrelated measured variables) [ 36 ] was equal to 0.97. The relative fit index (RFI) is equal to 0.93 and the incremental fit index (IFI) is equal to 0.978.

The better the model the more the above indices are close to 1. In our study, all the incremental fit indices were greater than 0.9 (cut-off value) [ 37 ], which verified that the model exists and is significant. The root mean square error of approximation (RMSEA), which is a supplementary statistic used to determine the fit to the Rasch model with a large sample size, was equal to 0.132. This was due to the small sample size of only 210 days.

In our study, the number of measured variables ( k ) = 10, number of distinct sample moments = ( k × ( k + 1 ) ) 2 = 55 , and number of distinct parameters to be estimated = 37. The degree of freedom (df) = number of distinct sample moments − number of estimated parameters = 55 − 37 = 18 > 0 (overestimated). Thus, our hypothesis of whether the socio-economic and mitigation measure factors influenced daily COVID-19 infections could be tested via path analysis.

The dependent variable was the daily COVID-19 infections. The exogenous variables were family number, procedure, special events, density, and previous experience. The endogenous variables were lockdown, medical resources, economic resources, and technology used. Error terms were considered as unobserved exogenous variables connected to the endogenous variables. Multicollinearity problems were absent since all bivariate correlations presented in Table 4 were below 0.8 [ 33 ]. Path coefficients (parameter estimates) were calculated based on the hypothesized model and the results are presented in Table 6.

Correlation estimates between exogenous variables.

As we expected, all factors had direct and indirect impact on the daily COVID-19 infections of varying strengths. However, there was an absence of significant causal effect from medical resources and economic resources to daily infections.

Table 5 shows that it is estimated that the predictors of medical resources explain 97.2% of its variance. In other words, the error variance of medical resources is approximately 2.8% of the variance of medical resources itself. Also, it is estimated that the predictors of lockdown explain 90.3% of its variance. In other words, the error variance of lockdown is approximately 9.7% of the variance of lockdown itself. The same interpretation applies for economic resources and technology used variables.

Squared multiple correlations.

Table 6 shows that approximately all causal effects are significant with 95% confidence level. Although the causal relation between medical resources and lockdown is slightly not statistically significant, we still consider this relation in our model.

Estimated parameters for all factors. *** p -Value < 0.001.

Table 7 shows that due to the direct (unmediated) effect of procedure on daily Covid-19 cases, when procedure goes up by 1 standard deviation, daily covid-19 cases go down by 0.823 standard deviations (95% CI = −1.175 to −0.541; p < 0.05). Due to the direct (unmediated) effect of lockdown on daily covid-19 cases, when lockdown goes up by 1 standard deviation, daily covid-19 cases go down by 0.775 standard deviations (95% CI = −1.051 to −0.497; p < 0.05). Due to the direct (unmediated) effect of technology used on daily covid-19 cases, when technology used goes up by 1 standard deviation, daily covid-19 cases goes up by 0.17 standard deviations (95% CI = 0.015 to 0.287; p < 0.05).

Path Analysis on socio-economic and mitigation measure determinants of COVID-19 daily infections.

The indirect effects of medical resources, special events, and lockdown with p -values respectively equal to 0.11, 0.12, and 0.976 are not statistically significant. Moreover, the confidence intervals (CIs) for the non-significant indirect effects contain zeros, which is strong evidence of the non-significance of these effects. Meanwhile, all other CIs do not contain zeros, which is strong evidence of their estimates’ significance. Due to the indirect (mediated) effect of experience on daily covid-19 cases, when experience goes up by 1 standard deviation, daily covid-19 cases go down by 0.288 standard deviations (95% CI = −0.475 to −0.122; p < 0.05). Due to the indirect (mediated) effect of procedure on daily covid-19 cases, when procedure goes up by 1 standard deviation, daily covid-19 Cases go up by 0.355 standard deviations (95% CI = 0.19 to 0.602; p < 0.05). Due to the indirect (mediated) effect of density on daily covid-19 cases, when density goes up by 1 standard deviation, daily covid-19 cases go down by 0.35 standard deviations (95% CI = −0.493 to −0.231; p < 0.05). Due to the indirect (mediated) effect of family number on Daily covid-19 cases, when family number goes up by 1 standard deviation, daily covid-19 cases go down by 0.097 standard deviations (95% CI = −0.155 to −0.023; p < 0.05). Due to the indirect (mediated) effect of economic resources on daily covid-19 cases, when economic resources go up by 1 standard deviation, daily covid-19 cases go up by 0.142 standard deviations (95% CI = 0.011 to 0.236; p < 0.05). Due to both direct (unmediated) and indirect (mediated) effects of procedure on daily covid-19 cases, when procedure goes up by 1 standard deviation, daily covid-19 cases go down by 0.468 standard deviations (95% CI = −0.649 to −0.314; p < 0.05). Due to both direct (unmediated) and indirect (mediated) effects of lockdown on daily covid-19 cases, when lockdown goes up by 1 standard deviation, daily covid-19 cases go down by 0.776 standard deviations (95% CI = −1.060 to −0.486; p < 0.05). From Table 7 and Figure 4 , we can see that only lockdown and procedure have both direct and indirect effects. All other variables only have either a direct or indirect effect.

An external file that holds a picture, illustration, etc.
Object name is ijerph-18-10071-g004.jpg

Path diagram of the default model with standardized parameter estimates.

3.3. Path Diagram Layers

The path diagram presented in Figure 4 can be divided into five layers. Each layer consists of exogenous and endogenous variables. The five layers are constructed as follows:

  • Layer 1 (L1) consists of family number, event, density, experience as exogenous variables, and medical resources as an endogenous variable.
  • Layer 2 (L2) consists of family number, procedure, density, and experience as exogenous variables and economic resources as an endogenous variable.
  • Layer 3 (L3) consists of event, procedure, density, experience as exogenous variables and lockdown as endogenous variable.
  • Layer 4 (L4) consists of lockdown and economic resources as exogenous variables and technology used as an endogenous variable.
  • Layer 5 (L5) consists of lockdown and technology used as exogenous variables and daily COVID-19 cases as an endogenous variable.

We divided the path diagram into five layers to better understand the indirect effects of all factors on daily COVID-19 infections.

For all endogenous variables, total effects are calculated as the sum of direct and indirect effects:

All results obtained in Table 8 are calculated by the sum of results from Table 9 and Table 10 .

Standardized total effects of exogenous variables on endogenous variables.

Standardized direct effects of exogenous variables on endogenous variables.

Standardized indirect effects of exogenous variables on endogenous variables.

The medical resources, lockdown, economic resources, and technology used are considered as both endogenous and intermediate variables.

Table 8 , Table 9 and Table 10 represent the total, direct, and indirect effects of exogenous variables on intermediate variables, respectively.

4. Discussion

Daily COVID-19 infections are associated with the social and economic situation in each country and with level of each individual’s participation in society. Commitment to mitigation measures may have an impact as well. The framework of predicting daily COVID-19 cases is wide. A previous study showed that the number of diagnostic tests conducted positively affect the confirmed daily cases of COVID-19 [ 38 ]. Moreover, a path analysis was done on geographical determinants of COVID-19 daily infections in the U.S. [ 39 ]. Some studies tried to predict daily infections (dependent) using multiple linear regression on positive, deceased, and recovered cases (independent) [ 40 ]. Another study conducted in Italy showed that the mobility of citizens affected the recorded daily cases using multiple linear regression models [ 41 ]. Until now, no path analysis has been conducted to detect or assess the determinants of COVID-19 daily infections. Path analysis was used in this study to test a hypothesized model of daily COVID-19 cases in four different countries with different times to guide practice and provide directions for future research. Path analysis is superior to ordinary regression analysis as it provides an explanation of both the casual relation and the relative importance of alterative paths of influence [ 30 ]. We found that only lockdown and procedure have both direct and indirect effects on the rate of daily COVID-19 infections.

4.1. Direct Effects

The path coefficients showed that lockdown, technology used, and procedure have direct effects on COVID-19 daily infections. The largest impact is for the procedure and lockdown variables. An increase of 1 standard deviation in procedure degree leads to a decrease of 0.823 standard deviations in the COVID-19 daily infections and an increase of 1 standard deviation in lockdown degree produce a decrease of 0.776 standard deviations in COVID-19 daily infections. This highlights the importance of the commitment of every individual to the mitigation measures set in place by the authorities. The results also support the wearing of masks and social distancing, which help reduce the spread of COVID-19, thus reducing the daily confirmed cases. In terms of technology, COVID-19 daily infections increase by 0.17 standard deviations with the increase of 1 standard deviation in the used technology. This model shows that mitigation measures directly reduce the spread of COVID-19. The lack of direct effects from medical resources, experience, economic resources to COVID-19 daily cases are an unexpected, unique finding in this study. Future models could include other factors to assess and incorporate with the current model such as citizen movements in terms of foreign flights and local transportation.

4.2. Indirect Effects

From the results obtained in Table 7 , Table 8 , Table 9 and Table 10 , all exogenous variables except technology used have indirect effects on daily COVID-19 infections from different patterns and routes. Moreover, all factors affect the daily cases indirectly through all intermediate variables except procedure. Procedure has an indirect effect on daily cases through lockdown, economic resources, and technology used, whereas all remaining exogenous variables include medical resources in their intermediate variables list.

The final model shows that experience, special events, family number, and density have a significantly negative indirect effect on COVID-19 daily cases through their effects on lockdown. Lockdowns appear to be most strongly affected by procedure among the other exogenous variables. Medical and economic resources have significantly positive indirect effects on COVID-19 daily cases through their effects on technology used and lockdown degree. Technology used appears to be most strongly affected by economic resources and experience.

5. Conclusions

This study aimed to identify and assess the different socio-economic and mitigation measure determinants on COVID-19 daily cases. The study helped us detect some important factors to build an international effective strategy in the war against the COVID-19 pandemic. The findings, through the above analysis, indicate that implementing full lockdowns and the commitment to wearing masks and social distancing are essential for reducing daily COVID-19 infection rates. All other factors used in the study still have significant effects with different strengths and proportions.

There are still some limitations to this study. The data used comes from the beginning of the pandemic, which may not reflect today’s reality. Moreover, the data are a combination of several countries grouped together, which in turn differ in terms of area, population, economic, technological, and cultural capabilities. The limitations mentioned above reduce the generalizability of the findings in this study.

Future studies can conduct path analysis on the determinants of the death rate caused by COVID-19 to help limit deaths and save more lives.

Acknowledgments

This study was carried out by Lebanese University—Faculty of information’s—Data Sciences Department in partnership with Lebanese University—Statistics and Informatics Department—Faculty of Science in Beirut.

Author Contributions

Conceptualization, E.Y. and A.R.; methodology, Path Analysis; software, IBM AMOS. Both authors have read and agreed to the published version of the manuscript.

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Data availability statement, conflicts of interest.

We wish to draw the attention of the Editor to the following facts which may be considered as potential conflicts of interest and to significant financial contributions to this work. We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us. We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In doing so, we confirm that we have followed the regulations of our institutions concerning intellectual property. We further confirm that any aspect of the work covered in this manuscript that has involved either experimental animals or human patients has been conducted with the ethical approval of all relevant bodies and that such approvals are acknowledged within the manuscript. We understand that the Corresponding Author is the sole contact for the Editorial process (including Editorial Manager and direct communications with the office). He is responsible for communicating with the other authors about progress, submissions of revisions and final approval of proofs. We confirm that we have provided a current, correct email address which is accessible by the Corresponding Author, and which has been configured to accept email from [email protected]. The authors: Abbas Rammal, Elie Yammine.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Google sign-in

Building path analysis model in SEM with SPSS Amos

Path analysis is a statistical method used for establishing a causal relationship between variables. It is used when there are multiple variables in a study. It is an important Structural Equation Modeling (SEM) analysis type used commonly by researchers for testing the hypothesis. The previous article explained how to conduct path analysis in SPSS Amos. This article demonstrates through a case study how to interpret the findings from a path analysis model in your research.

The impact of job satisfaction on organisational commitment using the path analysis model

Employees are an integral part of any organization. If they are satisfied with their job, they will be more committed to their organisation (Ćulibrk et al., 2018). In this case study, the path analysis model of SEM is used to examine the influence of job satisfaction on organisational commitment. For this, first-hand data was obtained from 350 employees of a company using a close-ended questionnaire. It contained questions related to organisational commitment and job satisfaction.

Job satisfaction and organisational commitment were the main variables in the questionnaire. Each of them contained sub-variables which are as follows.

Sub-variables of job satisfaction and organisational commitment

SEM for impact assessment

The path analysis model used showing the linkage between organisational commitment and job satisfaction is shown below.

Path analysis model for job satisfaction

In the above model. JS1, JS2, JS3 and JS4 denote the sub-variables of job satisfaction. OC1, OC2, OC3 and OC4 denote sub-variables of organisational commitment.

The first step towards the finalisation of the path analysis model is to establish the validity and reliability of the model.

Reliability and validity of path analysis model in SEM

Model fitness in an SEM model is established only after reliability and validity are proven. This is denoted through four types of validity:

  • Convergent validity
  • Internal consistency
  • Composite reliability
  • Discriminant validity

The below table presents the results of the model for this present case study.

In the above table, AVE (Average variance extracted) determines the convergent validity of the model. It should be at least 0.5 (Alarcón & Sánchez, 2015). Here, the AVE value for job satisfaction is 0.76 and organizational commitment is 0.62 > 0.5, thus, there is a presence of convergent validity in the model.

The examination of internal consistency by the Cronbach alpha test depicts each construct linkage with the other. Herein, as the value of Cronbach alpha shows that job satisfaction is 0.85 and job satisfaction is 0.78 > 0.7 (Alarcón & Sánchez, 2015),  thus, internal consistency is present in the model.

Composite reliability depicts each construct’s significance in the model. For job satisfaction, the CR value is 0.85 while organizational commitment is 0.78. As the values are more than 0.7,  composite reliability exists in the model.

Lastly, discriminant validity defines the difference between each construct from others. The below table shows that the correlation value of organizational commitment and job satisfaction is 0.70. As the value is less than the square root of the average variance extracted, i.e. 0.87 and 0.79, discriminant validity is present in the model.

Since all the conditions are being met, this path model is valid and reliable.

Model fitness in SEM

The next step is to assess the model’s fitness . For this, values of different indices are examined. Results for the model are shown below.

In the above table, ‘Absolute fitness measure’ indices represent that CMIN/Df is 5.119 > 5, GFI is 0.935 > 0.90, AGFI is 0.882 < 0.9, and RMSEA is 0.109 > 0.10. As 3 out of 4 indices are not fulfilling fitness requirements, the model is not absolutely fit.

In ‘incremental fitness’ indices, the value of NFI is 0.93 > 0.90, CFI is 0.942 > 0.9, TLI is 0.919 > 0.9, and IFI is 0.943 > 0.9. As all the incremental fitness measures have required indices values, the model is incrementally fit.

For ‘parsimonious fitness measure’, PGFI is 0.519 > 0.5, PCFI is 0.673 > 0.5 and even PNFI is 0.664 . 0.5. Hence, the model is incrementally and parsimoniously fit but not absolutely.

For improving the fitness of the model, based on the modification values computed in AMOS, the covariance-based linkage is developed between variables. With this, the results of modified path model fitness indices are shown below.

In the above table, a majority of the indices in the ‘Absolute fit measure’ are within the required limit and even if AGFI is close to the desired value, a modified model is absolutely fit. For ‘incremental fit measure’, all indices are greater than 0.9. Thus, the model is incrementally fit. Lastly, for ‘parsimonious’ indices, the majority of indices value is within the desired limit, thus, the model is parsimoniously fit.

Thus, this model can now be used for examining the impact and understanding the influence of job satisfaction on organisational commitment.

Impact determination using SEM

After establishing the model’s fitness, reliability and validity, the next step is the final step. The hypothesis is tested to establish the relationship between employee satisfaction and organisational commitment. The hypothesis is as follows:

H01: Job satisfaction does not have a significant influence on the organizational commitment HA1: Job satisfaction has a significant influence on the organizational commitment

The results of the hypothesis are shown below.

Table 5 depicts that as the standard error (S.E.) value is 0.07 which is low, there are fewer biases present in the model to determine the relationship. Further, the p-value of the model is 0.00 < 0.05 and the CR value is 12.68 > 1.96 (z-value at 5% significance). Thus, the null hypothesis H0 is rejected. Hence, this study proves that there is an impact of job satisfaction on organizational commitment.

Why path analysis?

The path analysis model is the most widely used SEM model to examine the direct and indirect relationships, especially in the management field. This method enables the linkage building between multiple variables but does not have complex relationships. For instance, it cannot create a model with two mediators or a mediator and moderator. Thus most researchers adopt a path analysis model to assess a single impact of either direct or indirect nature. The above case study had only two constructs- job satisfaction and organizational commitment. The aim was to simply examine the direct impact. There were no complex constructs. Therefore path analysis was most suitable. Alternative methods such as MANOVA would have failed to derive the true impact because it does not take into consideration latent constructs. SPSS Amos is one of the simplest software for conducting SEM.

  • Alarcón, D., & Sánchez, J. A. (2015). Assessing convergent and discriminant validity in the ADHD-R IV rating scale : User-written commands for Average Variance Extracted ( AVE ), Composite Reliability ( CR ), and Heterotrait-Monotrait ratio of correlations ( HTMT ). Spanish STATA Meeting , 1–39.
  • Ćulibrk, J., Delić, M., Mitrović, S., & Ćulibrk, D. (2018). Job Satisfaction, Organizational Commitment and Job Involvement: The Mediating Role of Job Involvement. Frontiers in Psychology , February .
  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on WhatsApp (Opens in new window)
  • Click to share on Telegram (Opens in new window)

Notify me of follow-up comments by email.

1 thought on “Building path analysis model in SEM with SPSS Amos”

Proofreading.

We couldn’t find any results matching your search.

Please try using other words for your search or explore other sections of the website for relevant information.

We’re sorry, we are currently experiencing some issues, please try again later.

Our team is working diligently to resolve the issue. Thank you for your patience and understanding.

News & Insights

Validea-Logo

PATH Quantitative Stock Analysis

May 14, 2024 — 02:08 pm EDT

Written by John Reese for Validea  ->

Below is Validea's guru fundamental report for UIPATH INC ( PATH ) . Of the 22 guru strategies we follow, PATH rates highest using our P/B Growth Investor model based on the published strategy of Partha Mohanram . This growth model looks for low book-to-market stocks that exhibit characteristics associated with sustained future growth.

UIPATH INC ( PATH ) is a large-cap growth stock in the Software & Programming industry. The rating using this strategy is 55% based on the firm’s underlying fundamentals and the stock’s valuation. A score of 80% or above typically indicates that the strategy has some interest in the stock and a score above 90% typically indicates strong interest.

The following table summarizes whether the stock meets each of this strategy's tests. Not all criteria in the below table receive equal weighting or are independent, but the table provides a brief overview of the strong and weak points of the security in the context of the strategy's criteria.

Detailed Analysis of UIPATH INC

PATH Guru Analysis

PATH Fundamental Analysis

More Information on Partha Mohanram

Partha Mohanram Portfolio

About Partha Mohanram : Sometimes the best investing strategies don't come from the world of investing. Sometimes research that changes the investing world can come from the halls of academia. Partha Mohanram is a great example of this. While academic research has shown that value investing works over time, it has found the opposite for growth investing. Mohanram turned that research on its head by developing a growth model that produced significant market outperformance. His research paper "Separating Winners from Losers among Low Book-to-Market Stocks using Financial Statement Analysis" looked at the criteria that can be used to separate growth stocks that continue their upward trajectory from those that don't. Mohanram is currently the John H. Watson Chair in Value Investing at the University of Toronto and was previously an Associate Professor at the Columbia Business School.

Additional Research Links

Top NASDAQ 100 Stocks

Top Technology Stocks

Top Large-Cap Growth Stocks

High Momentum Stocks

Top Chip Stocks

High Insider Ownership Stocks

About Validea : Validea is an investment research service that follows the published strategies of investment legends. Validea offers both stock analysis and model portfolios based on gurus who have outperformed the market over the long-term, including Warren Buffett, Benjamin Graham, Peter Lynch and Martin Zweig. For more information about Validea, click here

The views and opinions expressed herein are the views and opinions of the author and do not necessarily reflect those of Nasdaq, Inc.

Validea logo

Stocks mentioned

More related articles.

This data feed is not available at this time.

Sign up for the TradeTalks newsletter to receive your weekly dose of trading news, trends and education. Delivered Wednesdays.

To add symbols:

  • Type a symbol or company name. When the symbol you want to add appears, add it to My Quotes by selecting it and pressing Enter/Return.
  • Copy and paste multiple symbols separated by spaces.

These symbols will be available throughout the site during your session.

Your symbols have been updated

Edit watchlist.

  • Type a symbol or company name. When the symbol you want to add appears, add it to Watchlist by selecting it and pressing Enter/Return.

Opt in to Smart Portfolio

Smart Portfolio is supported by our partner TipRanks. By connecting my portfolio to TipRanks Smart Portfolio I agree to their Terms of Use .

IMAGES

  1. Path analysis

    path analysis hypothesis

  2. Path Analysis Hypothesis

    path analysis hypothesis

  3. Path Analysis -- Advanced Statistics using R

    path analysis hypothesis

  4. Research hypothesis path diagram

    path analysis hypothesis

  5. Path analysis and hypothesis testing

    path analysis hypothesis

  6. Path Analysis and Remarks on Hypothesis

    path analysis hypothesis

VIDEO

  1. Path matrix from Graph Theory

  2. Concept of Hypothesis

  3. Bivariate Analysis: Hypothesis tests (Parametric Non-parametric tests)

  4. Ingenuity Pathway Analysis: Molecule Activity Predictor (MAP)

  5. Paths,Path Products,Path Names,Path sums,Path Loops and Path expressions in path Testing

  6. INDIRECT EFFECT CALCULATION IN PATH ANALYSIS

COMMENTS

  1. Path Analysis -- What it Is and How to Use It

    After the statistical analysis has been completed, a researcher would then construct an output path diagram, which illustrates the relationships as they actually exist, according to the analysis conducted. If the researcher's hypothesis is correct, the input path diagram and output path diagram will show the same relationships between variables.

  2. Finding Our Way: An Introduction to Path Analysis

    Research Methods in Psychiatry. Finding Our Way: An Introduction to Path Analysis. David L Streiner, PhD1. Key Words:path analysis, structural equation modelling, multiple regression. One of the first things we learn in introductory statistics is that there are 2 types of variables: independent variables (IVs) and dependent variables (DVs).

  3. Path Analysis

    Path analysis is always theory-driven; the same data can describe many different causal patterns, so it is essential to have an a priori idea of the causal relationships among the variables under consideration. That being said, path analysis can be used to refine a causal hypothesis.

  4. Path Analysis -- Advanced Statistics using R

    Path analysis is a type of statistical method to investigate the direct and indirect relationship among a set of exogenous (independent, predictor, input) and endogenous (dependent, output) variables. ... we can check the model fit. The null hypothesis is "\(H_{0}\): The model fits the data well or the model is supported". The alternative ...

  5. Path analysis (statistics)

    In statistics, path analysis is used to describe the directed dependencies among a set of variables. This includes models equivalent to any form of multiple regression analysis, factor analysis, canonical correlation analysis, discriminant analysis, as well as more general families of models in the multivariate analysis of variance and covariance analyses (MANOVA, ANOVA, ANCOVA).

  6. Analyzing Data: Path Analysis

    Path analysis is used to estimate a system of equations in which all of the variables are observed. Unlike models that include latent variables, path models assume perfect measurement of the observed variables; only the structural relationships between the observed variables are modeled. ... This corresponds to the hypothesis that high school ...

  7. PDF Path Analysis Introduction and Example

    Our first step is to solve for the parameter β13 from equation 3. We do this in order to get an equation that expresses β13 in terms of β12, we will need this to solve for β12. r13 = β12r23 + β13 . β13 = r13 − β12r23. Substituting this expression into equation 2 we obtain, r12 r12 r12 − r13r23 r12 − r13r23.

  8. Path Analysis: An Introduction and Analysis of a Decade of Research

    One technique, path analysis, is a variation of multiple-regression analysis and is useful for analyzing a num ber of issues involved in causal analysis. Path analysis, first developed in the 1920s, is a method for examining causal pat terns among a set of variables. Researchers use path analysis. causal model.

  9. PDF Introduction to Path Analysis

    The "Causal Ordering" must be theoretically supported path analysis can't "sort out" alternative arrangements -- it can only decide what paths of a specific arrangement can be dropped. Mediating variables must come after what they are mediating. E.g. The Treatment is related to the criterion.

  10. Finding Our Way: An Introduction to Path Analysis

    Path analysis is an extension of multiple regression. It goes beyond regression in that it allows for the analysis of more complicated models. In particular, it can examine situations in which there are several final dependent variables and those in which there are "chains" of influence, in that variable A influences variable B, which in ...

  11. Path Analysis

    The path analysis is a powerful statistical technique to test a causal model which based on theoretical reasoning explains causal relationships between a set of independent variables and the dependent variable. Also, it tests the assumed relationships between the independent variables. Indeed, path analysis is perfectly based upon theoretical reasoning about causal relationships between variables.

  12. Introduction to Path Analysis in R

    After statistical analysis has been completed, an output path diagram can then be constructed, which illustrates the relationships as they actually exist, according to the analysis conducted. While path analysis is useful for evaluating causal hypotheses, this method cannot determine the direction of causality.

  13. Analysing Path Analysis with Multiple Regression

    4. Simple regression analysis for predicting Job satisfaction from Self-discipline. Step 1 will provide a baseline for the path analysis. Steps 2 & 3 will help generate all path coefficients to evaluate the simple mediation role played by confidence, while step 4 will help in evaluating the possibility of mediation.

  14. Multiple-to-multiple path analysis model

    2.4 Multiple-to-multiple path analysis central theorem. The second step is to conduct multiple-to-multiple path analysis. And the innovation is that the correlation between Y caused by the common cause X is considered and three other types of paths are generated. For convenience of observation, let p = 3, m = 3 as an example to make a multiple-to-multiple path analysis diagram as Fig 2.

  15. Principles of Path Analysis

    An input path diagram is one that is drawn beforehand to help plan the analysis and represents the causal connections that are predicted by our hypothesis. An output path diagram represents the results of a statistical analysis, and shows what was actually found. So we might have an input path diagram like this:

  16. Structural Equation Modeling (SEM) or Path Analysis

    Introduction. Path Analysis is a causal modeling approach to exploring the correlations within a defined network. The method is also known as Structural Equation Modeling (SEM), Covariance Structural Equation Modeling (CSEM), Analysis of Covariance Structures, or Covariance Structure Analysis. In FMRI data analysis it has been applied to visual ...

  17. Path Analysis

    Path Analysis is a statistical technique used in Structural Equation Modeling (SEM) where direct and indirect effects among the variables of a system are studied by specifying causal relations between them. ... Path Analysis allows for hypothesis testing about the network of relationships among variables in the model. Its use is common in ...

  18. What are the assumptions to path analysis and how to test them?

    There are some structural assumptions to path analysis that are not difficult ascertain. They are (a) no loops (b) no going forward and backward (c) a maximum of one curved arrow per path. I am aware that path analysis assumes multivariate normality if the dependent variable is continuous.

  19. Path Analysis to Assess Socio-Economic and Mitigation Measure

    To answer and judge the test hypothesis and evaluate the outcomes of particular questions, we used the process of collecting and measuring data. ... Path analysis is comprised of four stages: (1) model specification: statement of the theoretical model in terms of equations or a diagram; (2) model identification and parameter estimate: the ...

  20. (PDF) Path Analysis

    Path analysis allows researchers to study direct and indirect effects simultaneously with multiple independent and dependent variables (Valenzuela and Bachmann, 2017). When an independent variable ...

  21. Hypothesis Testing With Path Analysis

    To test the hypothesis using path analysis techniques, we need pairs of data from the research sample. Example 1: Research data on incentives (X 1 ) and work motivation (X 2 ) with employee performance (Y) are presented : X1 = 5, 6, 4, 7, 8, 11, 5, 14, 4, 7

  22. How to conduct path analysis?

    Path analysis takes effect in two ways; before and after running the regression. They are also known as 'input path diagram' and 'output path diagram' respectively. While the input path diagram represents the causal connections between variables while proposing a hypothesis, the output path diagram shows the actual outcome of the test.

  23. Building path analysis model in SEM with SPSS Amos

    Building path analysis model in SEM with SPSS Amos. Path analysis is a statistical method used for establishing a causal relationship between variables. It is used when there are multiple variables in a study. It is an important Structural Equation Modeling (SEM) analysis type used commonly by researchers for testing the hypothesis.

  24. PATH Quantitative Stock Analysis

    PATH Quantitative Stock Analysis May 14, 2024 — 02:08 pm EDT Written by John Reese for Validea ->