- Thread starter aeneasgladius
- Start date
- Tags interaction p value r confidence intervals

My concern is if p value and confidence interval have a role in interaction definition

Generically, interaction implies the the relationship (or effect) of X1 on Y depends on the value of X2. You can switch X1 and X2 in that sentence and it's still true. You might say something like "The odds relationship of antibody status with the Outcome variable depends on how old the person is." Just of course, if you have evidence to suggest there is an interaction.

My question for your project: why is there a cutoff imposed on age, and why is 58 the correct age?

I would recommend fitting a model (exact logistic regression might be good here) for your outcome as a function of age (not cut into arbitrary groups) and antibody status, including an interaction between the two.

If your specific hypothesis is that the effect of age on the outcome increases when antibody status is positive, then conduct an upper-tailed test on the interaction coefficient; alternatively, just make the CI larger (say, 97.5%, to represent smaller alpha for a one tailed test) for the interaction coefficient and see if the lower bound on the CI is greater than 1. This assumes you code Ab positive is 1 and 0 if negative.

Imagine that you do an experiment and you assign gradually a substance very carefully with concentration levels from 20 to 85. Then you would cause very big measurement errors if all values less than 58 was coded as “low” and all those with more than 58 was coded as “high”.

Is it really a correct biological description to say that all the people of age 37, 47, and 57 are all the same, but at the age of 58 a mysterious jump happens and then that everybody with the age of 58, 68, and 78 are all the same.

It would have been better to use the actual age a “regression” variable in a logistic regression (as Ondan suggested) or with a quadratic model like b*age+b2*age*age or as a generalized additive model (gam) with local smoothing.

Tis is a real group of patients, the estimated median age is real. According to Scklo and Nieto is a simple a and fesible means to establish if effect heterogeinty exsist. My aim is to understand of what kind of interaction i am deling with. Thank you

Frank Harrell Jr. and Stephen Senn are two very well studied and respected biostatisticians (Statisticians by education and practice, so they're pretty well versed in the theory and mathematical nuance behind ideas, in addition to the application). Look up their summaries and work on this problem of "dichotomania" which is incredibly present in and deleterious to biomedical research. Alternatively, quickly watch this video to give a great deal of explanation and visualization (can probably do 1.5 speed).

Summary: If a med student on rounds said "The patient's sodium is <135, I think we should...", no clinician would say "Great, it's <135, that's all I need to know." They're clearly going to assess risks from the specific serum sodium level, and differently for a patient who is 129 vs 114. This is a good quick analogy that should demonstrate the issue. But, long form is below...or in the video

By categorizing age (or another continuous variable), you are making many assumptions that aren't really true or reasonable and you're devaluing the work you're trying to do.

1) You're assuming that the outcome is relatively homogeneous within groups and relatively heterogeneous between groups (i.e. within each group, the Y values basically fall on a straight,

2) You're assuming that continuous variable is not continuously related to the outcome (lines or curves, for example, are ruled out); you assume that the relationship has a discrete jump (think a staircase) to relate Y and that variable (sometimes reasonable in finance, generally not in medicine);

3) the "cut point" or "findings" are not likely replicated in other research.

4) you're assuming that cut point is optimal for every patient, which isn't true.

5) Insurance companies try to use literature to decide what to pay for or what not to pay for (or how much) based on literature, and using arbitrary variables like this can lead to improper policies enacted by those utilizing research. A common issue for this is when people try to relate hospital length of stay to many variables, for example, but the conclusions are spurious and based on improper methodology; in the end, the patient's are at risk of being hurt (I know a few people in public health who have said this has come up in their career).

6) There's a lot more...

I hope this clarifies what is meant by "not a real group" and why a different approach will be more favorable for realistic and repeatable conclusions.

Last edited:

Could I minimize that by stratifyng by smaller age groups?

Isn’t Aeneas the founder of Rome in Vergilius writing and a gladius the short Roman sword? You can use that gladius to do with dichotomania as Alexander did to the gordian knot!

Do you have reasonable suspicion to believe there is an interaction? And if so, what would you imagine the relationship to be? It seems like you are looking for an interaction, be it additive or multiplicative, but why? Also, it seems you are trying to see what type of interaction, but shouldn't you have a reason why either is feasible? Lastly, it seems you are familiar with the possibility of interactions being additive and/or multiplicative - and are trying to dichotomize to find your solution because that is what you think the literature is telling you to do. But as noted, doing this is arbitrary and you risk losing the true signal.

Yes, use the variable as continuously formatted. you can put in the model a continuous, binary, and product term for X1, X2, X1*X2. Then plot the predicted probabilities of the outcome for the two binary groups (so two lines) and see if they cross. This is how you would examine for the multiplicative interaction. Look to the work of Tyler Vander Weele of Harvard to see how to examine for additive interaction given your formatting of variables (contin*cat interaction).

P.S., It has been awhile since I reviewed the Szklo & Nieto text, but more has been written recently on the difference in approaches for dealing with interaction versus effect modification.

Yes, use the variable as continuously formatted. you can put in the model a continuous, binary, and product term for X1, X2, X1*X2. Then plot the predicted probabilities of the outcome for the two binary groups (so two lines) and see if they cross. This is how you would examine for the multiplicative interaction. Look to the work of Tyler Vander Weele of Harvard to see how to examine for additive interaction given your formatting of variables (contin*cat interaction).

P.S., It has been awhile since I reviewed the Szklo & Nieto text, but more has been written recently on the difference in approaches for dealing with interaction versus effect modification.

Last edited:

y is your dichotomous outcome

Beta_0 is your intercept (base case log odds)

y = Beta_0 + Beta_1(antibody)

y = Beta_0 + Beta_1(antibody) + Beta_2(age)

y = Beta_0 + Beta_1(antibody) + Beta_2(age) + Beta_3(antibody*age)