Causal inference multiple imputation investigation of the impact of cannabinoids and other substances on ethnic differentials in US testicular cancer incidence

Background Ethnic differences in testicular cancer rates (TCRs) are recognized internationally. Cannabis is a known risk factor for testicular cancer (TC) in multiple studies with dose-response effects demonstrated, however the interaction between ancestral and environmental mutagenic effects has not been characterized. We examined the effects of this presumed gene-environment interaction across US states. Methods State based TCR was downloaded from the Surveillance Epidemiology and End Results (SEER) website via SEERStat. Drug use data for cigarettes, alcohol use disorder, analgesics, cannabis and cocaine was taken from the National Survey of Drug Use and Health a nationally representative study conducted annually by the Substance Abuse and Mental Health Services Administration (SAMHSA) with a 74.1% response rate. Cannabinoid concentrations derived from Drug Enforcement Agency publications. Median household income and ethnicity data (Caucasian-American, African-American, Hispanic-American, Asian-American, American-Indian-Alaska-Native-American, Native-Hawaiian-Pacific-Islander-American) was from the US Census Bureau. Data were processed in R using instrumental regression, causal inference and multiple imputation. Results 1975–2017 TCR rose 41% in African-Americans and 78.1% in Caucasian-Americans; 2003–2017 TCR rose 36.1% in Hispanic-Americans and 102.9% in Asian-Pacific-Islander-Americans. Ethnicity-based scatterplot-time and boxplots for cannabis use and TCR closely mirrored each other. At inverse probability-weighted interactive robust regression including drugs, income and ethnicity, ethnic THC exposure was the most significant factor and was independently significant (β-estimate = 4.72 (2.04, 7.41), P = 0.0018). In a similar model THC, and cannabigerol were also significant (both β-estimate = 13.87 (6.33, 21.41), P = 0.0017). In additive instrumental models the interaction of ethnic THC exposure with Asian-American, Hispanic-American, and Native-Hawaiian-Pacific-Islander-American ethnicities was significant (β-estimate = − 0.63 (− 0.74, − 0.52), P = 3.6 × 10− 29, β-estimate = − 0.25 (− 0.32, − 0.18), P = 4.2 × 10− 13, β-estimate = − 0.19 (− 0.25, − 0.13), P = 3.4 × 10− 9). After multiple imputation, ethnic THC exposure became more significant (β-estimate = 0.68 (0.62, 0.74), P = 1.80 × 10− 92). 25/33 e-Values > 1.25 ranging up to 1.07 × 105. Liberalization of cannabis laws was linked with higher TCR’s in Caucasian-Americans (β-estimate = 0.09 (0.06, 0.12), P = 6.5 × 10− 10) and African-Americans (β-estimate = 0.22 (0.12, 0.32), P = 4.4 × 10− 5) and when dichotomized to illegal v. others (t = 6.195, P = 1.18 × 10− 9 and t = 4.50, P = 3.33 × 10− 5). Conclusion Cannabis is shown to be a TC risk factor for all ethnicities including Caucasian-American and African-American ancestries, albeit at different rates. For both ancestries cannabis legalization elevated TCR. Dose-response and causal relationships are demonstrated.

Keywords: Testicular cancer, Cannabis, Other drugs, Ethnicity, Pathways, Mechanisms, Casual inference Background Testicular Cancer (TC) is the commonest cancer in men aged 15-44 years, and rates have increased two to three times in many nations in recent decades [1]. Indeed TC is the leading cause of individual 'years of life lost' of any adult cancer [1].
Both genetic and environmental factors are believed to be significant, with 25% of the risk ascribed to genetic factors [1,2]. This includes eight to ten-fold elevation in risk of brothers of cases, and four-to six-fold elevation in their sons [1]. 2-5% of cases are bilateral [1,2]. Welldescribed differing rates by ethnicity have also been reported from many continents including Europe, New Zealand and USA [3][4][5], with a twenty-fold variation in TC rates (TCRs) in nationals of northern Europe (Denmark and Norway) compared to various African nations [1,5,6].
Data from the US SEER*Explorer website reveals that the all-age all-stages age-adjusted TCR in American males rose 83.45% from 3.4415 to 6.3136 / 100,000 in the years 1976-2017 [3]. The commonest age for TC is in males aged 30-34 years which is represented in the official statistics by the 15-39 year age group whose ageadjusted rate rose 92.16% from 6.29 to 12.09 /100,000 from 1975 to 2017 across all ethnicities combined [3].
However, these US TCR trends for "all ethnicities combined" conceal vastly different rates of TCR amongst different ethnic groups. For example amongst the US 15-39 year age-group TCR for Caucasian-Americans vs. African-Americans for 1975 was 7.01, and 3.09/100,000 respectively. However, this TCR for each ethnic group has not remained static over time.
Amongst the 15-39 year age-group the TCR amongst Caucasian-Americans rose 111.61% from 7.01 to 14.81 / 100,000 across the years 1975-2016. In contrast, amongst African-Americans in this age group the TCR fell 14.24% from 3.09 to 2.65/100,000. Accordingly across this time-period the mean TCR in Caucasian-Americans increased to a mean of 11.7 (+ 0.3374 S.E.M.), whereas in African Americans decreased to 2.82 (+ 0. 0.0737 S.E.M.) / 100,000. That is the rate of TCR amongst African Americans was decreased to 24.05% of the rate in Caucasian-Americans suggesting a strong gene-environment interaction. SEER*Explorer data show that the rate amongst Caucasian-Americans 15-39 years rose 3.7% annually 1975-1987 and 1.0% from 1987 to 2017 [3]. No annual percent change was listed on this site for African-American ancestry.
Interestingly the nature of the different genetic risk between African-Americans and Caucasian-Americans has been precisely defined as being due to a differing allelic frequency at an anomalous P53 response element on Chromosome 9 which drives cellular proliferation rather than the usual suppression of growth-related activities in cells experiencing genotoxic stress which is more generally associated with P53 activation [7,8]. This effect is described further in the Discussion.
Curiously, international reports include documentation of very different case rates within the same ethnic group further indicating a strong role for environmental components in TC, with current data suggesting the prospect of one or more gene-environment interactions [7,9]. For example some ethicities in neighbouring nations across Europe show significantly different rates of TC [1,5,6]. Similarly, geographic TC clusters amongst some ethnic populations have also been reported in northern Netherlands [10], while TCR in the Hispanic-American community has been reported as rising most rapidly in recent years [1]. Moreover, a man of Caucasian descent who moves from a low to a high incidence area assumes the risk of the high incidence area in the following generation [1]. This is consistent with data suggesting TC risk is 25% genetic [1,2], and 75% environmental [7,9].
It is generally agreed that TC likely results from in utero germ cell anomalies which then become activated by the hormonal surge of adolescence. Important classification and treatment insights have flowed from the increasing recent elucidation of the biology of testicular cancer and its close relationship to disordered primordial germ cell and spermatogonial development often from antenatal germ cell neoplasia in situ (GCNIS) [1,[11][12][13][14]. For example, on occasion immune checkpoint inhibition with PD-L1 inhibitors is recommended in selected patients [14].
That higher potency varieties of cannabis have been increasingly used in many countries over the past decades, couple with published data that cannabis exposure is both genotoxic [15], and results in genomic and chromosomal damage [13,16] raises the possibility that cannabinoid exposure in utero may be a risk factor for TC.
However, notwithstanding the role of genetics and likely in-utero cannabis exposure, all four studies to examine the association of TC and cannabis have noted that personal cannabis use is linked with rising testicular cancer rates [17][18][19][20]. Importantly, dose-response effects have been described [17][18][19] which was also confirmed in meta-analysis [6]. Two meta-analyses have been published on this data which found pooled odds ratios for exposure of 2.59 (95%C.I. 1.60, 4.19) for chronic, current and frequent cannabis use [6] and 1.71 (1.12, 2.60) [21].
Thus while TC may have an genetic component and is believed to arise a result of in utero germ cell anomalies in which cannabis exposure may be a factor, all four studies linking testicular cancer with cannabis use describe personal use in the preceding twenty years. This suggests both that exposure through personal use may be an important environmental component and that the usual pathophysiological processes of testicular cancerogenesis are greatly accelerated by personal cannabinoid exposure. Multiple pathways and mechanisms for cannabis damage to the primordial germ cell are reviewed in the Discussion section.
This evident temporal compression of the natural history of testicular carcinogenesis by cannabis exposure may offer further pathogenetic insights into the accelerated development of this tumour type.
As cannabis use has been linked with TCR in all four studies to examine the association [17][18][19][20], and as several reviews of TC epidemiology have noted its likely significance [1,9,13,14,22], it seemed important to examine the association between cannabis and testicular cancer across both space and time, along with ethnicity and other drug and cannabis exposures. It was also important to determine if any potential relationship satisfied the quantitative criteria for causality in all ethnicities. As data on known risk factors such as cryptorchidism, inguinal herniae, industrial pollution and sedentary lifestyles across space and time was not available to the present investigators it was not possible to include these covariates in the present analysis. The USA formed a suitable setting for this epidemiological investigation as the required data on TCR and other drug and cannabis use by State and different ethnicities over time is readily publicly available.

Data
National data on age-adjusted TCR, age-specific TCR and ethnic TCR were downloaded from the SEER*Explorer website [3]. State based data both as overall age-adjusted TCR and by ethnicity was downloaded using the SEERStat software from the National Program of Cancer Registries (NPCR) and Surveillance Epidemiology and End Results (SEER) Incidence File from the US Cancer Statistics Public Use Database, Submission 2001-2017 [23]. Drug use data for the years 2003-2017 was taken from the Restricted Use Data Analysis System (RDAS) of the Substance Abuse and Mental Health Data Archive (SAMHDA) of the National Survey of Drug Use and Health (NSDUH) from the Substance Abuse and Mental Health Services Administration (SAMHSA) [24]. The major drugs of interest were last year cocaine use (Cocaine), last year nonmedical use of pain relievers (Analgesics), last month cigarette use (Cigarettes) and last year abuse or dependence on Alcohol (Alcohol Use Disorder, AUD). Data was supplemented by median household income and statebased ethnicity data from the US Census bureau focussing on Caucasian-American, African-American, Hispanic-American, Asian-American, American Indian / Alaskan Native (AIAN) -American and Native Hawaiian / Pacific Islander (NHPI) -American ethnicities. This ethnicity data was paired as closely as possible with the ethnicity data from the SEER database which sometimes used a slightly different categorization system. Cannabinoid concentration data was obtained from publications of Drug Enforcement Agency listing the various concentrations found in Federal seizures [25][26][27]. Data relating to the legal status of cannabis in US states was derived from an internet search [28].

Derived data
Data on intensity of use of cannabis was taken from a variable called "mrjmdays" denoting the number of days in the past month for which cannabis had been used. The responses to this variable are categorized as 0, 1-2, 3-5, 6-19 and 20-30 days per month. For each ethnicity and for each year of the NSDUH the percentage responding in each category was multiplied by the mean number of days in that category and summed to provide an ethnic cannabis use score for that year. This ethnic cannabis use intensity score was then multiplied by the percent of cannabis used last month in that state and the concentration for that year of THC in Federal seizures to provide an estimate of ethnic-specific THC exposure. State-based cannabinoid concentrations were estimated as the product of the concentration of various cannabinoids in Federal seizures multiplied by the last month cannabis use in each State.

Statistics
Data was processed using R version 4.0.2 and R-Studio 1.3.1093 in October 2020. Data was read in and reconfigured using dplyr from the tidyverse suite [29]. Maps were drawn in ggplot2, sf and RColorBrewer [30][31][32]. Graphs drawn in Microsoft Excel, ggplot2 and lattice [30,33]. Data was log transformed as indicated by the results of the Shapiro-Wilks test. All regression models were manually serially reduced by elimination of the least significant term as is performed in the classical technique of model reduction. Mixed effects was conducted using the nlme package [34]. Two-step instrumental variable regression was conducted using the ivreg function from the AER package [35].
Inverse probability weighting was calculated from the ipw package [36] and was used to control for cannabis exposure across all the groups by other substance exposure. This was applied in mixed effects repeated measures, two-step instrumental variable regression and robust regression. Each of these forms of multivariable regression was used for different purposes. Mixed effects and instrumental variable models all provided model standard deviations from which e-Values could be calculated. Robust generalized linear regression from the survey package was used to perform robustified regression [37]. Mixed effects models with state as the random effect were used to account for the repeated measures nature of the data. Instrumental variable regression was used to test directly for cannabinoid effects underlying the effects of the primary covariates as described in the Results section.
SEER data is suppressed for cell counts less than 15 or very low rates. Hence a high rate of missing data was noted particularly for TCR by ethnicity. The problem was most marked for ethnic minorities. This issue was addressed by multiple imputation by chained equations using the mice package in R [38]. In view of the size of the missing data problem 256 imputations were used each with 20 iterations. The initial seed used was 59. The imputation method was by the classification tree ("cart") method which provided the best fit for the ethnically grouped data. Linear models were used to investigate this data. The results from models were pooled appropriately using Rubin's rules. Using these techniques the fraction of missing information obtained in simple linear models regressing TCR against ethnic THC exposure was reduced to 2.6%.
e-Values were calculated using the e-Value algorithms of the EValue package [39].
All t-tests were two sided. P < 0.05 was considered significant.

Data sharing and availability
Data has been deposited in the Mendeley data repository along with the software programming code in R and may be found at URL https://doi.org/10.17632/ yxy3dg2wt6.1.

Ethics
All methods were carried out in accordance with relevant guidelines and regulations. This study was approved by the Human Research Ethics Committee of the University of Western Australia on 7th January 2020 RA/4/20/7724.

Results
The outline plan for the results section is as follows. We first present the univariate ethnic and testicular cancer rate data upon which the analysis rests. We then examine various bivariate relationships including different ethnic time trends. Various forms of multivariable regression are used to determine the impact of multivariate adjustment on the bivariate relationships described. All multivariable models are inverse probability weighted and E-Values are freely used to allow formal causal inferences to be drawn both qualitative and quantitatively. Multiple imputation is employed to complete ethnic TCR data where missing data is severely problematical. Finally the effects of legal status on the TCR in various ethnicities is considered and investigated by the tools of causal inference. Each of these steps in this procedure has several component steps which are included along the way as is mandatory in a formal presentation of such analyses.   Figure 3 does the same for the African-American population.

Univariate data
When the colour coding for Fig. 3 is reversed, as shown in Fig. 4, the map resembles Fig. 1 of the TCRs rather closely. This is consistent with the above described preponderance of cases in the Caucasian-American community. One presumes that in these maps ethnicity is acting as a surrogate marker for TC incidence. Figure 5 presents the log of the Asian-American ethnicity. Figure 6 presents the log of the Asian-Pacific Islander -American ethnicity. Hawaii is noted to be a particular hot spot. Figure 7 shows the log of the Hispanic-American population density. It seems to be concentrated along the southern border states.
Bivariate Relationships and Ethnic Differentials. Figure 8A shows the observed age-adjusted TCRs 15-60 years for the Caucasian-American and African-American populations. Figure 8B shows the linear projections of this data. Table 1 sets out the observed and predicted values from which these graphs were drawn. Table 2 shows the relative long term rises for all ages by ethnicity and the applicable periods for which they have been monitored. Only nine cancer registries contribute to long term cancer data in the SEER database. More recent data 2001-2017 is contributed by 21 registries.  Table 2 is notable for marked ethnic disparities in the rate of rise of the TCR. For example across the period cited the Non-Hispanic Caucasian American population TCR grew only 0.4% compared to the growth in the Hispanic-American TCR of 36.11% and in the Asian-Pacific Islander -American population where it grew 103.87% albeit from a lower starting point. This may imply differing exposures to some environmental intoxicant.
The long term data appears at the bottom of this table. One notes that the growth of the TCR in the African-American community of 41.27% was only 52.82% of the growth of the Caucasian-American community which was 78.13%. Figure 9 sets out these trends by ethnicity across all ages. Figure 10A again sets out the ethnic trends and the bar chart in Panel B shows the relative rises in each of the four groups. Figure 11 shows four scatterplots of the cannabis use intensity index by ethnicity (Panels A and C) and the TCRs by ethnicity (Panels B and D) as loess curves (A, B) and regression lines of best fit (C, D). One notes a very striking resemblance between the two sets of graphs. Figure 12 sets out these data as boxplots in panels A and B and time-dependent regression lines in panels C and D. One reads the boxplot graphs by noting that groups which do not have overlapping notches are statistically significantly different from each other. The broad parallels between the two sets of plots is again apparent. Figure 13 sets out the relationship of the TCR to the ethnic THC exposure index by ethnicity. Whilst each of these six plots look similar careful comparison shows that the scales on the horizontal axis are very different.

Log (African−American Ethnicity) US States 2003−2017 − Reversed Color Scale
These data are therefore re-plotted with comparable axis scales in Fig. 14. Now two striking trends appear. Firstly the much higher THC exposure in the Caucasian-American and AIAN-American groups is very apparent. Also the slope of the regression lines in each case is very different. Hence the regression line for the Asian-American group is very short in horizontal scope, but very steep. This graph is very thought-provoking and has far reaching implications indeed. These differences clearly indicate major impacts of cannabis Inverse probability weights may be calculated for this data which control for the effect of cannabis exposure as a function of the other substance exposures. Table 3 presents the results of linear regressions of the TCR against ethnicity and the ethnic THC exposure in both wide and long datasets. In the latter all the ethnicities are collated into one column and the data-table becomes longer by a factor of the number of ethnicities.  In each case ethnicity and the ethnic THC exposure is noted to be a highly significant covariate of TCR. Table 4 presents the results of inverse probability weighted mixed effects regression all from the long dataset. Many terms involving ethnicity and ethnic THC exposure are highly significant. In an additive model with the other four drugs ethnic THC exposure is significant (β-estimate = 0.05 (0.04, 0.06), P = 5.80 × 10 − 31 ). In a four-way interactive model with other drugs ethnic cannabis exposure is significant (β-estimate = 0.64 (0.17, 1.10), P = 0.0072). Table 5 presents the results of an inverse probability weighted two-step instrumental variable regression model on the long dataset. The table is very interesting. When race alone is considered in the first model with African-American race as the comparator, no significant changes are seen. However when ethnic THC exposure is considered it is very highly significant. When race and ethnic THC exposure are considered in an additive model all parameters are significant and ethnic THC exposure is the most significant.

Multivariable adjustment
When Race and ethnic THC exposure are considered in an interactive model Caucasian-American and Hispanic-American races both in interaction with ethnic THC exposure are highly significant (β-estimate = 1. However, when the same regression is performed with THC, cannabigerol and cannabinol as instrumental variables significance amongst both the terms and the model is lost and adjusted R-squared falls from 0.9430 to − 0.0000108. Similarly when all substances and income are considered along with ethnic THC exposure, two interactive   terms including ethnic THC exposure are positive and highly significant. However when THC, cannabigerol and cannabinol are included as instrumental variables again there is a marked collapse of significant findings, adjusted Rsquared falls from 0.9962 (which is very high) to 0.0138 (quite low) and the model Wald coefficient falls from 20, 000 to 20.04.
In the final model in this table using a different interaction structure the ethnic THC exposure is again independently highly statistically significant (β-estimate = 0.32 (0.20, 0.43), P = 4.7 × 10 − 8 ). Table 6 presents the results of inverse probability weighted robust regressions in the long dataset. Ethnic THC exposure is clearly highly significantly prominent. As model complexity increases so the significance of terms including the cannabinoids increases.

Multiple imputation of missing ethnicity data
It would be useful to study these ethnic effects in further detail. However if one simply considers the Caucasian-American and African-American datasets it is noted that 1782 of 2700 datapoints are absent, or 66.0%. This is a severe limitation on further detailed analysis.
For this reason formal data imputation by multiple imputation of chained equations was performed using the R-package mice [38]. Table 7 shows the impact of missing data by ethnicity. Mean and median data and missing data rates are indicated.
Following [40] 256 imputations were performed with 20 iterations each. Figure 16 shows the density plot of the multiply imputed data. One notes the obvious peaks in the lower areas corresponding to the smaller values of the ethnic minorities. Figure 17 is a strip plot illustrating how the imputed values nicely follow the values of the known observed data including their distribution pattern. Figure 18 demonstrates the manner in which the first 100 imputations nicely converge.
Multiple imputation in this manner allows the calculation of various regression equations from the data with pooling of the final models into a meaningful outcome by applying Rubin's rules for pooling of such chained models. Table 8 shows the result of various linear model formulae performed in this way on the imputed long dataset. Ethnic cannabis exposure is noted to continue to be highly significant. Quintile effects are also demonstrated.   [41].

Cannabis legal status
It was of interest to determine if cannabis legalization had an effect on the TCR by ethnic background. This data is presented in Fig. 19 as scatterplots for Caucasian-Americans (A, C) and African-Americans (B, D). Panels A and B present the TCR by legal status and in panels B and D legal status is dichotomized into liberal regimes v. illegal paradigms. These data are assessed quantitatively in Table 11A  The applicable e-Values for this data is presented at the foot of Table 9.

Main results
The present study assessed cannabis and other drug use as risk factors for testicular carcinogenesis and their potential to explain the well described ethnic differentials, and changes in TCR's amongst ethnic populations across time. Data showed that exposure to THC and cannabigerol are risk factors for TC for all ethnicities investigated, and fulfil the criteria of causal relationships in all ethnicities studied. Data also confirmed the previously described four-fold elevation of TCR amongst Caucasian-Americans compared to African-Americans. We confirm that time-based scatterplots and boxplots of intensity of cannabis use tend to follow TCR and the two are shown to be closely associated at multivariable regression by several different techniques. Different ethnicities demonstrate different sensitivities to the testicular oncogenic action of cannabinoids, and the pattern of TCR within each ethnicity is not necessarily constant e.g. Hawaii where it is much higher than elsewhere.
Since the relationships persist after inverse probability weighting and are accompanied by high e-Values, findings fulfil the quantitative criteria for causal relationships. These relationships were greatly strengthened when missing data are multiply imputed by chained equations. Legalization to make cannabis more available was also associated with higher TCRs. In that procannabis legalization is associated with higher cannabis use and exposure [42] cannabis legalization may be said to exacerbate and contribute to higher TCRs.

Biological and mechanistic considerations Description of biology of NSGCT
TC is believed to arise due to genotoxic and epigenotoxic insults incurred during utero life on the germ stem cells which then become activated postnatally by the hormonal surge of puberty [1,[11][12][13][14]. Rising rates therefore may imply a rising incidence of some in utero genotoxic or epigenotoxic insult which becomes apparent only later in life.  The four cohort studies of the cannabis -TGCT relationship all described adult / adolescent cannabis exposure [6,[17][18][19][20][21]. This implies a very significant truncation of the usual time course of TGCT by excluding the period of in utero exposure. It is not explicit in our data whether the major aetiological exposure occurs in utero or in later lifeor indeed if both may be implicated. In the case of NSGCT which is oncogenically dedifferentiated backwards this implies significant and relatively rapid genomewide demethylation [14].  Ethnic differential P53 is known as the "guardian of the genome" since it is widely connected across the genomic machinery to strongly shut down aberrant DNA replication in the presence of any form of genotoxic stress [43]. In this context it is worth describing the genomic elucidation of the above-mentioned ethnogenomic variability. Investigators intersected 62,567 genomewide association study (GWAS) cancer-associated single nucleotide variations (SNP's) with 17,118 unique positive signals for P53 activation response elements (P53-RE's) in four different cell lines using seven P53 activators [7]. The base sequences surrounding the 86 positive hits were compared to assess their fit with the two canonical decameric DNA recognition sequences in P53. At position rs4590952 in the kit P53-RE on chromosome 9 the position weight matrix value dropped from 15.6 to 11.1 (median = 13.8) with guanine to adenosine substitution. Three nearby sites have been identified in three previous GWAS screens as being associated with TCR and to have a threefold elevation per allele in TCR risk amongst Caucasians [44][45][46]. The Kit -Kit-ligand dimer is the key nearby receptor ligand pair which acts as the master transcription factor for primordial germ cell, controls their specification and prevents differentiation, and is highly and uniquely expressed in seminomas rather than other TGCT's. Kit also plays a key controlling role in haemopoietic stem cells and melanoblasts [7]. This mutation has been identified as a risk factor for both seminomas and non-seminomatous germ cell tumours [7]. This site is unique as it is activated by P53 activation rather than suppressed as is more usual [43]. In a subsequent assay in testicular tumour cell lines the per allele activation of the P53-RE by P53 activation was 188-fold (range 93 to 373-fold) [7]. This allele was positively selected for in seminomas (21.7-fold) and also Fig. 18 Imputation convergence. Note increasing convergence of data with increasing iterations of imputation algorithm amongst Caucasian-Americans. It was thought that the allele was positively selected for in light skinned races as its effect to stimulate melanoblast activation and proliferation in the tanning response to UV radiation was protective of the skin from UV induced carcinogenesis [7]. This was thought to explain is relative prevalence amongst light skinned races.
Other similar loci have also been described including rs995030 and research in this area is on-going at the present time [8].

Cannabinoid pathophysiology
High dose marijuana smoking has been shown to grossly disturb human sperm morphology with shrunken and bent sperm heads, bent tails, multiple tails, bilobed heads, tangled tails, multiple heads, pyknotic heads and polymorphonuclear pus cells all described [47].
In mice exposure to Δ9THC was shown to induce ring and chain chromosome formation with chains up to four chromosomes long due to end-to-end fusion formation [48].  These authors also showed that when mouse sperm were exposed to the cannabinoids Δ9THC, cannabinol and cannabidiol there was a dramatic increase in chromosomal translocations from about 1% at control levels, to 4.95-6.48% comparable to the positive control which was the cytotoxic drug mitomycin C at 6.73% [48].
Cannabinoids have been shown to have marked effects on sperm function including reduction of sperm concentration in seminal fluid, induction of DNA fragmentation, defective sperm maturation, disorders of DNA packing within sperm and protaminehistone replacement, DNA nicking in sperm by Tnp2, defective DNA repair, defects of nuclear size and incomplete DNA packing by failure of the histoneprotamine transition [49][50][51].
Cannabinoids induce collapse of the inner mitochondrial membrane potential by several routes [50,51].
THC exposure has been shown to lead to marked demethylation of the genome of human sperm [52] a change which makes genes more susceptible to genomic damage and chromosomal rearrangements [13,16].
Cannabis exposure of both lymphocytes and oocytes has been shown to induce 20% cell death with a single division and marked chromosomal bridging and nuclear bleb formation in surviving cells [53]. Cannabinoid exposure has also long been known to be associated with micronucleus formation and comet tail formation, which are two of the major genotoxicity assays implicating chromosomal mis-segregation and single-and doublestranded DNA breaks respectively [15].
Moreover low micromolar doses of cannabidiol and its propyl analogue cannabidivarin have been shown to cause micronucleus formation and prominent comet tails on formal testing, changes which are greatly exacerbated in an oxidizing environment [15].
Downs syndrome has been linked with cannabis exposure in Hawaii, Colorado, Canada and Australia [54][55][56][57] and early termination of pregnancy for anomalycorrected rates of Downs syndrome, trisomies 18 and 13, Turners syndrome and Deletion 22q11.2 in a space time and odds ratio analysis in the USA [58].
Prenatal cannabis use has been linked with acute lymphoid leukaemia which is primarily a disease  [17][18][19][20] and present study, which is itself caused by whole genome duplication, isochromosome 12 formation, deletion and augmentation of many chromosomal arms, over 1200 micro-RNA's [13,14] and genomewide DNA demethylation.
Formation of ring and chain concatenation of chromosomes in rodents was mentioned above [47].
This list implicates cannabinoids in major chromosomal toxicity by many mechanisms including chromosomal deletion, reduplication, megabase scale reduplications, longitudinal and transverse duplications and gene amplification and oncogenic cellular de-differentiation.
All of this demonstrates that cannabinoid-exposed cells are clearly genomically stressed.
In the context of genomic stress P53 is activatedand the ethnogenomic differential mechanism described above becomes activated as a stimulus to tumour cell proliferation in light skinned races, and to an oncoprotective block to mitosis and meiosis in darker skinned races.
Given the aforementioned, it is possible that cannabis exposure causes in utero germ cell damage which is associated with TC, suggesting that cannabis use during pregnancy should be cautioned. This is consistent with recommendations by both the American College of Obstetrics and Gynaecology (ACOG) and the American Academy of Pediatric (AAP) [59][60][61][62][63]. Notwithstanding this advice, a significant number of American women are reported as using cannabis whilst pregnant, perhaps explaining part of the rise in TC across many communities. In Colorado 69% of cannabis dispensaries contacted by a group of researchers recommended cannabis use to pregnant women [64], while in California 24% of pregnant teenagers either self-admitted to cannabis use whilst pregnant or tested positive for it during their gestation [65]. Nationwide it was estimated in 2017 that 161,000 pregnant women used cannabis whilst pregnant [24].
It is however important to stress that while TC is generally believed to arise as a result of in utero germ cell anomalies which may be impacted on by maternal cannabis use, all published studies on the association have identified an association between TC and personal cannabis use, suggesting that one or more likely gene / epigenomeenvironment interactions are at play whereby postnatal and adult cannabinoid exposure contribute to underlying genetic risk as environmental causal and exacerbating factors. Since the epigenetic state of the primordial germ cells / gonocytes is a key determinant of the differentiation block experienced by all NSGCT tumour cells, it follows that part of the effect of postnatal cannabinoid exposure must be to de-differentiate susceptible cells into a more immature foetal-like and pro-oncogenic state.

Generalizability
We feel that study findings are generalizable for many reasons. The SEER Cancer data is registry controlled and comes from most USA states and thus represents the great preponderance of the data from the population. The NSDUH survey has a good response rate at 74.1%. Moreover the effects we describe are often very strong. There is great internal consistency across results within this study with similar results being found for all ethnicities studied, and also good external consistency with all the published literature in this field. Moreover since data fulfil the criteria for causality we would expect that this causal relationship to hold widely across space and time.

Strengths and limitations
This study has a number of strengths and limitations. Its strengths include the use of a large population dataset and registry controlled data, and a variety of advanced statistical methods including inverse probability weighting, mixed effects models, robust regression, two-step instrumental variable regression, multiple imputation of missing data by chained equations and e-Values. Further, many levels of significance are very high as are their corresponding e-Values. Since the relationships described amply fulfill the quantitative criteria for causality we feel that the relationships described herein are transportable to other situations and other times. The main limitation of the present work is the absence from this dataset of individual exposure data which is a limitation commonly shared with most epidemiological studies. Also spatiotemporal data on known risk factors such as cryptorchidism, inguinal herniae, industrial pollution and sedentary lifestyles was not available to the present investigators. It would be a useful advance if future studies could be repeated with these factors included in the multivariable analysis.

Conclusion
Data analysis indicate that exposure to THC and cannabigerol is a risk factor for TC for all ethnicities. We have confirmed the four-fold elevation of TCR amongst Caucasian-Americans compared to African-Americans and data indicate that a likely gene-environment interaction is at play with cannabis the most likely environmental causal factor. In view of the high e-Values demonstrated we feel that the place of cannabis and cannabinoids is unlikely to be supplanted by other  covariates with further research. All ethnicities are subject to an increase in testicular oncogenesis under a paradigm of increasing cannabinoid exposure with some ethnicities demonstrating marked differences in their apparent sensitivities. Time based plots and box pots of cannabis use and TCR generally move in parallel. Effects of cannabis, ethnic THC exposure and cannabinoid exposure are statistically highly significant, confirmed with a variety of multivariable techniques, and are independently significant. These relationships are strengthened by multiple imputation of missing ethnicity data. Findings fulfil the criteria of causal relationships in all ethnicities studied. Cannabis legalization significantly elevates the TCR for both African-American and Caucasian-American patients. In short, we feel that these findings are robust, fulfil the criteria for causal relatoinships and add an important transgenerational dimension to the present cannabis debate which applies to the major ethnic groups identified within the USA for which data is available.