If you have interactions between continuous variables then use "Just Another Variable.". Many researchers believe it is inappropriate to use imputed values of the dependent variable in the analysis model, especially if the variables used in the imputation model are the same as the variables used in the analysis model. Always include the dependent variable in your imputation model. However, it is possible to read this article independently, or to just read about the particular example that interests you (see the list of examples below). However, the regression results were uniformly good, even when the data were imputed using the original regression model where the distribution of the imputed values didn't match the distribution of the observed values very well. Imputation step. Non-linear terms in your analysis model present a major challenge in creating an imputation model, because the non-linear relationship between variables can't be easily inverted. Right Answers: Regressing y on x1-x3, the coefficient on each should be 1. If you impute the two groups separately then ordinary regression fits the data quite well: PMM can be a very effective tool for imputing non-normal data. In our experience it rarely makes a large difference in practice. Regressing ln(x1) on x2 and y (as the obvious imputation model will do) results in the following plot of residuals vs. fitted values (rvfplot): If the model were specified correctly, we'd expect the points to be randomly distributed around the y axis regardless of their x location. Not this time: Again, the principal lesson is that misspecification in your imputation model can lead to bias in your analysis model. the regress command in mi impute chained). Complete cases analysis actually does better with this particular data set, but that's not true in general. The result is biased estimates (in opposite directions) of the coefficients for both x and the interaction term: An alternative approach is to create a variable for the interaction term, gx, and impute it separately from x (White, Royston and Wood's "Just Another Variable" approach). To illustrate the process, we'll use a fabricated data set. Right Answers: Regressing y on x1 and x2, both should have a coefficient of 1. It's an improvement but still biased: The include() option of mi impute chained is usually used to add variables to the imputation model of an individual variable, but it can also accept expressions. We use as an example a dataset with 50 patient with low back pain. Some of the simulation parameters (and occasionally the seed for the random number generator) were chosen in order to highlight the issue at hand, but none of them are atypical of real-world situations. Some of these examples follow the discussion in White, Royston, and Wood.