Monolingual and bilingual children's processing of coarticulation cues during spoken word recognition

Abstract Bilingual children cope with a significant amount of phonetic variability when processing speech, and must learn to weigh phonetic cues differently depending on the cues’ respective roles in their two languages. For example, vowel nasalization is coarticulatory and contrastive in French, but coarticulatory-only in English. In this study, we extended an investigation of the processing of coarticulation in two- to three-year-old English monolingual children (Zamuner, Moore & Desmeules-Trudel, 2016) to a group of four- to six-year-old English monolingual children and age-matched English–French bilingual children. Using eye tracking, we found that older monolingual children and age-matched bilingual children showed more sensitivity to coarticulation cues than the younger children. Moreover, when comparing the older monolinguals and bilinguals, we found no statistical differences between the two groups. These results offer support for the specification of coarticulation in word representations, and indicate that, in some cases, bilingual children possess language processing skills similar to monolinguals.


Introduction
Processing spoken language is a complex task, in part due to the multi-dimensional and variable characteristics of speech. This ability continues to develop throughout childhood, even into adolescence (Rigler, Farris-Trimble, Greiner, Walker, Tomblin & McMurray, 2015). Language complexity is further intensified in multilingual contexts: learners who are exposed to more than one language also receive more variable speech input (Byers-Heinlein & Fennell, 2014), and the multiplicity of phonetic cues from different languages can create challenges for the learner. The way that bilingual learners cope with variability is not well documented, especially in the context of spoken word recognition. Coarticulation, a process where sounds in words influence each other (Fowler, 1980), is an ever-present source of phonetic variability.
word bone [bõʊn] (the tilde represents nasalization on the vowel), yielding a stimulus with a mismatching nasalized vowel [bõʊt]. Adults and children were able to perceive the mismatching vowel nasalization, as indicated by the fact that they looked towards the image of the bone when they heard boat presented with a nasalized vowel. However, while adults ended up fixating the target boat well above chance by the end of the mismatching ([bõʊt]) trials, young children were unable to resolve the ambiguity caused by the cross-splicing. Instead, children hovered around chance towards the end of the trial, as if they could not decide which of the two words they had heard. This pattern of results showed time-dependent sensitivity to coarticulation cues in toddlers, but also suggested that young listeners had difficulty resolving phonetic mismatches, which could be attributed to children's relative inefficiency in resolving lexical competition compared to adults (Huang & Snedeker, 2011;Rigler et al., 2015;Sekerina & Brooks, 2007;Swingley, Pinto & Fernald, 1999). As a tentative explanation based on Huang and Snedeker (2011) who found that five-year-old children showed continued interference from a competitor word, Zamuner et al. (2016) hypothesized that the smaller number of exemplars in memory may yield less robust word representations, and therefore result in lower activation of the target word when the auditory stimuli contained mismatches. Another complementary hypothesis was based on the less mature processing system in toddlers, yielding different competitor inhibition mechanisms (Huang & Snedeker, 2011) and thus difficulties in recognizing ambiguous stimuli. This is an open question, as little work has examined the development of the link between spoken word processing and competitor inhibition in children (e.g., with nine-year-olds and sixteen-year-olds, see Rigler et al., 2015). However, the general finding seems to be that children are slower than adults at activating targets and inhibiting competitors (Cross & Joanisse, 2018;Huang & Snedeker, 2011;Rigler et al., 2015). In our study, we compare and extend the findings with toddlers from Zamuner et al. (2016) with a group of older monolingual children (4;3 to 6;5 years old), to investigate whether older children are better able to resolve mismatching coarticulatory cues.
Second language perception and phonetic variability In addition to competition and inhibition mechanisms, exposure to phonetic variability has been shown to significantly impact lexical processing and word learning in children (e.g., Rost & McMurray, 2009) as well as adult second language (L2) processing (Barcroft & Sommers, 2005). However, the picture is not as clear for bilingual children. In their review article, Byers-Heinlein and Fennell (2014) argue that exposure to more than one language often results in more phonetic variation in the input for young learners, which could in turn result in maintained sensitivity to more phonetic contrasts than monolinguals. For example, bilinguals can be exposed to two languages from the same person or within code-switched sentences (Byers-Heinlein, 2013), and speech sounds produced by bilinguals are often different from monolinguals (MacLeod & Stoel-Gammon, 2009). Bilingual learners are thus exposed to greater variability than monolinguals in general. Therefore, given that exposure to variability influences early lexical processing (Rost & McMurray, 2009), bilingual learners are expected to maintain sensitivity to phonetic distinctions in both of their languages (for a review of the process in adults see Flege, 2007), i.e., they ought to discriminate more contrasts than monolinguals (Burns, Yoshida, Hill & Werker, 2007; Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola & Nelson, 2008; Sundara, Polka & Genesee, 2006). However, monolingual children are expected to maintain sensitivity to distinctions that are contrastive in their native language but lose this ability for foreign contrasts, as has been repeatedly shown in the literature (Kuhl et al., 2008).
To date, very little work has been conducted on the interplay between phonetic details and word recognition in bilingual children, a process that depends on the ability to distinguish sounds. Some work has examined how monolinguals and bilinguals process a Catalan vowel contrast between /ε/ and /e/, which maps to a single vowel category in Spanish. Children's sensitivity to this contrast appears to depend partly on the stimuli used. In one study which included cognates (Ramon-Casas, Swingley, Sebastián-Gallés & Bosch, 2009), Catalan-Spanish bilingual children (aged 17 to 27 months) were insensitive to the /ε/-/e/ contrast. However, in a study using novel words (Ramon-Casas, Fennell & Bosch, 2017), bilinguals aged 21 and 22 months were able to perceive the /ε/-/e/ contrast. While the work by Ramon-Casas and colleagues illustrates the variation in phonemic perception between monolinguals and bilinguals, these studies do not examine how bilingual children cope with coarticulatory information within spoken words. It is important to make this distinction because phonetic cues differ across languages in how they are realized (Cohn, 1990). For example, as mentioned above, vowel nasalization is coarticulatory, non-contrastive and variable in English (Beddor, 2009); vowel nasalization is not necessary for recognizing words in English, e.g., the word scent can be realized with a vowel that is more or less nasalized and listeners will recognize the word anyway. However, languages like French have phonological nasalization on vowels (Cohn, 1990), which can be variably realized as well, where words differ based solely on vowel nasalization (e.g., pain [pε] 'bread' ∼ paix [pε] 'peace'), and in which phonological nasalization is expressed through longer nasalization duration on the vowel (Desmeules-Trudel & Brunelle, 2018). French learners must remain phonologically sensitive to vowel nasalization duration to differentiate between words, while the same cue does not indicate meaning differences between words for English listeners (i.e., variable nasalization duration always corresponds to a coarticulatorily nasalized vowel in English) even though the cue can be used to speed up word recognition.
When it comes to bilingual children's perception of phonetic properties that are present in both of their languages (e.g., vowel nasalization for English-French bilinguals, however with different phonological status in their languages), little is known concerning the perception of sublexical (e.g., coarticulatory) information. However, we know that young monolingual children's word recognition patterns are significantly influenced by coarticulation (Paquette-Smith et al., 2016;Zamuner et al., 2016) and bilingual children can maintain sensitivity to phonetic properties in more than one language (Burns et al., 2007;Kuhl et al., 2008;Ramon-Casas et al., 2017;Sundara et al., 2006).
This thus leads us to the first formal goal of the current study, which is to investigate if bilingual children are more or less sensitive to (mismatching) coarticulatory cues than monolinguals. Specifically, we aim to determine if the presence of contrastive vowel nasalization in the L2 (French) has an influence on the perception of nasal coarticulation (i.e., non-contrastive) in the L1 (English). In the current study, we operationalize sensitivity to nasal coarticulation through potential disruptions in the word recognition patterns of items that contain phonetic mismatches (i.e., the presence of a nasalized vowel in an oral-consonant context, see below for specific methods). Based on previous research, we expect that bilingual children will display sensitivity to coarticulation, just like their monolingual peers. However, no strong predictions can be made as to whether bilinguals' sensitivity will be lesser, equal to, or greater than that of monolinguals. On the one hand, some studies have documented that monolinguals and bilinguals show similar processing abilities (e.g., Byers-Heinlein, Fennell & Werker, 2013;Legacy, Zesinger, Friend & Poulin-Dubois, 2018), which would yield to similar patterns of sensitivity to coarticulation (i.e., equal disruption in word processing when mismatching cues are present) in monolinguals and bilinguals. On the other hand, some research has demonstrated a bilingual advantage in processing (for a review, see Bialystok, Craik & Luk, 2012) and yet some other research that bilinguals lag behind their monolingual peers (Pelham & Abrams, 2014). If these hypotheses are true, given the fact that both languages' systems influence each other (Brasileiro Reis Pereira, 2009;Fabiano & Goldstein, 2005;Paradis, 2001), one might predict that English-French bilinguals' sensitivity to vowel nasalization will be different from monolingual English listeners. For example, since English-French bilingual children have to maintain a phonological contrast between non-nasalized and nasalized vowels in their (French) lexicon, one might expect their phonological system to treat coarticulatory vowel nasalization in English differently, perhaps with greater sensitivity to coarticulation. This prediction would be supported by the fact that bilingual listeners are exposed to more variability for this phonetic cue (Byers-Heinlein & Fennell, 2014), and that this kind of variability may motivate maintaining fine-grained perceptual abilities for vowel nasalization in English-French bilingual children.
The second goal of the current study is to examine if four-to six-year-old children are able to resolve coarticulatory mismatches within words, compared to the younger group from Zamuner et al. (2016). We are interested in this question because younger children (two-and three-year-olds) could not yet overcome the coarticulation mismatch in Zamuner et al.'s (2016) previous investigation. Since Huang and Snedeker (2011) found evidence that five-year-old children show sustained interference from competitors during word discrimination, we do not expect our group of four-to six-year-old participants to resolve the phonetic mismatch as efficiently as adults. However, we predict that they will be more adult-like in their resolution of competitor interference than Zamuner et al.'s (2016) toddlers, given their older age.

Participants
The group of younger monolinguals were 19 children, aged 2;1 to 3;10, who completed the study published in Zamuner et al. (2016), and for whom the data was reanalyzed below in order to provide a comparison with the older monolingual group. 1 The 1 We conducted a post-hoc power analysis using G*Power 3.1.9.3 (Faul, Erdfelder, Lang, & Buchnier, 2007) to determine the achieved power of the Zamuner et al. (2016) study. We expected a large effect size ( f 2 = 0.35) based on the observation of the data, and an a error probability (i.e. p-value) of 0.05. Zamuner et al. (2016) tested the influence of one predictor (CROSS-SPLICING), which is one of the required arguments in G*Power. Since G*Power does not provide a way to perform power analyses for generalized additive mixed-effects models (GAMMs; see below), which is the statistical method that we other children, aged between 4;0 and 6;11 years, completed the same experiment (N = 119). We focused on this age range to determine if children older than 3;0 years could resolve phonetic ambiguity created by mismatching phonetic cues. Children were tested in a sound-attenuated room on a university campus or museum-based lab. Twenty bilinguals (ten girls, ten boys; age range: 4;3 to 6;5 years; M = 5;4 years; SD = 8.1 months) and twenty age-matched monolinguals (seven girls, thirteen boys; age range: 4;3 to 6;5 years; M = 5;4 years; SD = 8.2 months) are included in the current analysis. 2 Note that the majority of the data collection was conducted in a museum-based lab, which has the objective of involving the community in developmental research through research participation and knowledge translation to the families. In this context, parents are welcome to walk into the reception area of the museum-based lab and are offered to participate in the research with their child. Consequently, given the inclusive mandate of the testing setting, we did not restrict our recruitment criteria to only monolinguals and bilinguals, but rather provided an opportunity to the children to participate in a research activity. However, as we were interested in researching sensitivity to coarticulation and coarticulatory-mismatch solving abilities in monolinguals and bilinguals, our strategy was to select two groups of age-matched monolinguals and bilinguals. Note that all children from the large testing sample that fitted the inclusion criteria below were included in the analyses.
The final group of children included in the analyses were either English monolingual or English-French bilinguals who had not been diagnosed with a speech or hearing delay as determined by parental questionnaire. Our criteria for determining bilingualism in children were established through parental report and are as follows: children had to be considered English-dominant (i.e., 50% or more exposure to English in the family, to ensure that the children would know all of the used words in the experiment), had to be exposed to L2 French ≥ 30% of the time for two consecutive years (across contexts such as home, daycare/school, with extended family), had to be exposed to French as an L2 from the first year of life, and had to be exposed to L2 French at least 30% of the time through overall development. No participants spoke an L2 other than French. Due to the absence of wide-spread norms for establishing bilingualism status in L2 research, we required bilingual participants to be exposed to L2 French for more than a quarter of their linguistic interactions, therefore the 30% cut-off point. Bilingual children's average exposure to French across development was 45.1%, SD = 7.8%. Monolingual children had been exposed to French less than 30% overall and had been exposed to French for less than two consecutive years. Monolingual children's average exposure to French across development was 6.1%, SD = 6.3%. Note that in the Canadian education are using in the current paper, the closest statistical test available in the software is Linear multiple regression: Fixed model, R2 increase. We used this statistical test as an estimation of power for the current study. Note that the results of the power analysis are a general indication of achieved power, but we expect GAMMs to be robust against Type II errors even with a relatively small sample size. We found that the post-hoc Power (1-β error probability) was 0.681 given a sample size of N = 19. The results of the current power analysis should thus be interpreted with care, as the GAMM method is well suited to the eye tracking measures used here. 2 We also conducted a post-hoc power analysis for the whole sample, which comprises N = 59 children (19 children from Zamuner et al. (2016), 20 older monolinguals, and 20 older bilinguals). We investigated the impact of two predictors (AGE and LANGUAGE BACKGROUND), and thus achieved a power of 0.983, expecting a large effect size. When expecting a medium effect size ( f 2 = 0.15), we achieved a power of 0.739. We consider these values more than sufficient to pursue analyses using the current sample size. system, children can be exposed to French as an L2 relatively early, at four or five years old, which means that all children in our sample were likely exposed to French to some extent, although not continued exposure to French in the monolingual group. Other children who were tested did not fit our bilingualism criteria or were not age-matched (N = 26) or contributed only one trial in one of the experimental conditions (N = 37). Note that these latter 37 children fixated to target images in filler trials, but did not meet our criterion of looking to at least two trials in both the same-splice and cross-splice conditions. There were 16 participants excluded based on a failure to calibrate or other technical problems (N = 14), fussing (N = 1), or parental interference (N = 1).

Stimuli
Stimuli were six pairs of imageable English nouns (see Table 1). Each pair started with the same consonant and vowel, followed by either an oral consonant (e.g., boat [boʊt]) or a nasal consonant (e.g., bone [bõʊn]), and both had the same place of articulation. As in Zamuner et al., (2016), three additional experimental pairs (duck-dumptruck, leg-lemon, and egg-M) were excluded from the analyses because of multiple coarticulation cues: nasalization and place of articulation. There were nine filler pairs (boots-carrot, star-keys, monkey-camel, frog-fish, dog-elephant, turtle-sandwich, chicken-kangaroo, doll-clock, and flower-sun). The stimuli were recorded by a female native speaker of Canadian English and normalized for amplitude at 70 dB. A trained phonetician spliced the stimuli by keeping the initial and final consonants of an oral word token (e.g., [boʊt] 1 ) and replacing the original vowel with one from another token of the same word (e.g., [boʊt] 2 ) or a nasal token (e.g., [bõʊn] N ), considering zero crossings to avoid acoustic artifacts like clicks or noises in the final signal. This yielded two SPLICING conditions: one with matching phonetic cues (same-splice, e.g., [b 1 o ͜ ʊ 2 t 1 ]), and one with mismatching phonetic cues (cross-splice, e.g., [b 1 õʊ N t 1 ]). In Table 1, the two rightmost columns indicate the vowel onset timing within the target word.
The images used in the experiment were the same size, and animacy was also controlled for within pairs (e.g., adding eyes to the cloud image which was paired with the image of a clown) in order to minimize preference effects in children. Potential frequency effects could not be controlled for due to the limited number of familiar and picturable C(C)VC-C(C)VN English minimal pairs. We also could not include word frequency as a covariate in our analyses given the low number of items in our procedure, although this question could be the object of further investigations in the future.
Design and procedure Children were tested by themselves or on their parent's lap. Eye gaze data was collected on an Eyelink1000 (campus-based lab) or an Eyelink1000 Plus (museum-based lab) eye tracker in monocular remote mode, measuring movements of the right eye. The experimenter proceeded through a three-point calibration before the familiarization phase. During the familiarization, children saw each test and filler image and heard the corresponding unspliced label. During the experimental phase, children saw a central fixation point to ensure that they looked at the center of the screen at the beginning of the trial, then two images appeared on the screen. Experimental and filler images always appeared in the same pairs. Images were displayed for 1500 ms, and then an audio clip with the phrase "Look at the [target]" played. The images remained on the screen for four seconds after the onset of the sound file; each trial lasted approximately six seconds. The entire experiment took approximately five minutes to complete, with a total 18 trials. The splicing condition in which each item was presented was counterbalanced across participants (e.g., half of the children heard boat in the SAME-SPLICE condition, and half in the CROSS-SPLICE condition).

Analysis procedure
Eye movement data (right eye only) was extracted using DataViewer 2.16 in 50-ms time bins. Proportions of fixations to the target images within the time bins were calculated as: % fixations = duration of fixations to target duration of fixations to competitor + duration of fixation to target The data were statistically analyzed using generalized additive mixed models (GAMMs; Wood, 2017), which can account for nonlinear trends through time, as found in eye tracking data. GAMMs can also include (linear or nonlinear) random effects, and account for autocorrelation in the time-dependent data (i.e., one data point in time is necessarily correlated to the preceding data point, which can yield to an overconfidence of model estimates; Baayen, van Rij, de Cat & Wood, 2018;Porretta, Kyröläinen, van Rij & Järvikivi, 2018). Furthermore, GAMMs do not assume normal distribution of the data, which makes them appropriate for eye fixation data. We will present two models in the current paper: first, a GAMM for the fixations to the target (e.g., the boat image) in the CROSS-SPLICE splicing condition only to assess sensitivity to English nasal coarticulation in monolinguals and bilinguals as well as competitor inhibition patterns, and a second GAMM on the fixations to the target image in the filler trials to assess group differences between mono-and bilinguals in unspliced words.
The dependent variable of the GAMMs was the empirical-logit-transformed fixations (Barr, 2008) to the target. Empirical logits are an approximation of the log odds of looking to one image (e.g., the target) compared to the other image (e.g., the competitor), calculated as: where y corresponds to the number of samples during which the target was fixated, and N corresponds to the total number of samples within the time bin (i.e., eye tracker sampling at 500 Hz, thus 25 samples per 50-ms time bin).
The independent factors of interest for the GAMMs were the TIME window of analysis (between 300 ms and 2000 ms for experimental trials, and between 300 ms and 1000 ms for filler trials, see below) and language BACKGROUND (young monolingual, old monolingual or old bilingual). We chose a shorter window of analysis for the filler trials since these were not spliced, were unambiguous, and there was no expected effect of phonological competition (e.g., the item star was presented next to the item keys). The peak in average fixations to the target in the filler trials occurred at approximately 750 ms for all three groups.
For the GAMM on experimental trials, we modeled empirical-logit-transformed fixations in the CROSS-SPLICE condition in order to assess sensitivity to phonetic mismatch. The time window of interest was chosen for analysis of experimental trials between 300 ms after word onset to account for eye movement programming delay (Buckler & Fikkert, 2016;Zamuner et al., 2016) until 2000 ms after word onset, a time at which it is likely that children will continue to look at the images based on the prompt. Similarly to other GAMM analyses (Porretta, Tucker & Järvikivi, 2016;Porretta et al., 2018), random effects corresponded to a combination of participant and trial (i.e., EVENT), allowing each trial (for each participant) to have its own intercept in the model. An AR-1 autocorrelation value of 0.868 was empirically determined based on the data and included in the experimental GAMM formula, and a value of 0.706 for the filler items GAMM. Autocorrelation values correspond to the average correlation of a given data point with the preceding one in the time series. We present below a difference curve (fixations to the target by young monolinguals minus fixations by old monolinguals), generated from the GAMM, between younger monolinguals and older monolinguals to assess the differences in fixations to the target in the cross-splice condition, and thus examine if one or the other group was more sensitive to phonetic mismatches (i.e., sensitivity to coarticulation). We also present a difference curve between older bilinguals and monolinguals to assess the group differences concerning sensitivity to nasal coarticulation.

Analysis of proportions of fixations, experimental trials
In this analysis, we were interested in the effect of LANGUAGE BACKGROUND (bilingual or monolingual) and participant AGE (young monolinguals and old monolinguals) on sensitivity to nasal coarticulation. Eye tracking results in Figure 1 show the general effects of SPLICING condition on fixations to the target image. Higher values on the y-axis suggest that children tended to fixate the target image more. In all groups, participants looked more to targets in the SAME-SPLICE condition (grey lines) compared to the CROSS-SPLICE condition (blue lines). Focusing on the SAME-SPLICE condition (grey lines), there does not seem to be a difference between monolinguals and bilinguals, as demonstrated by similar shapes and overlapping error bars throughout the analysis window. However, young monolinguals fixated to the target slightly less than older monolinguals between 500 ms and 1000 ms within the trial, although the error bars seem to overlap with older children. This suggests relatively similar processing abilities for all children for SAME-SPLICE words.
In the CROSS-SPLICE condition (blue lines in Figure 1), bilinguals (triangles) maintained similar proportions of fixations to the target as older monolinguals, but young monolinguals (empty squares) fixated more to the target (e.g., boat) than both the older groups. Note that in our procedure, proportions of fixations were calculated based on fixations to the images only, which then suggests that young monolinguals shifted their gaze between the target (e.g., boat) and (nasal) competitor (e.g., bone) more than the other groups. This suggests that the younger group was disrupted by the phonetic mismatch, but that they also fixated less to the competitor image, which suggests less sensitivity to coarticulatory nasalization. In other words, young monolinguals did not inhibit the target (e.g., boat) as much as the older groups when hearing phonetic cues that corresponded to the nasal competitor (e.g., bone), thus that they might not consider coarticulation as much when processing words.
This finding is also supported by the statistical analysis presented in the difference curves in Figure 2 (also see Table A1 in Appendix). For illustration purposes, this figure presents difference curves in fixations to the target (e.g., blue empty-squares curve minus blue circles curve in Figure 1, and blue triangle curve minus blue circles curve in Figure 1) between young monolinguals and older monolinguals (Figure 2A), as well as between older bilinguals and older monolinguals ( Figure 2B) within the Figure 1. Overall fixation patterns to the target by SPLICING condition and participant LANGUAGE BACKGROUND. Higher proportions of fixations on the y-axis correspond to more fixations to the target (e.g., boat) and lower proportions of fixations on the y-axis correspond to more fixations to the competitor (e.g., bone). TIME window (x-axis) on separate panels. These curves were computed with the plot_diff function of the itsadug package (van Rij, Wieling, Baayen & van Rijn, 2017). This function plots difference curves in predicted (mean and confidence intervals) data by the model. Portions of the difference curves that are significantly above or below 0 represent a significant difference between the two groups for a given time interval, and are noted with red-shaded intervals below. Y-values below 0 represent more fixations to the competitor by young monolinguals or bilinguals than old monolinguals, and y-values above 0 represent more fixations to the target image.
We were thus able to establish that young monolinguals fixated significantly more to the target in cross-spliced trials between 950 ms and 1385 ms when compared to older monolinguals (deviance explained of 39.7%). This supports our observation that young monolinguals shift their gaze between the two images in the coarticulatory mismatch condition more than older monolinguals, thus that they might not be as sensitive to coarticulation as the latter group. Furthermore, towards the end of the trials, no differences in the raw data or statistical analysis emerged between younger and older monolinguals, suggesting that the older group did not resolve the phonetic mismatch better than the younger group.
The lack of apparent difference between older bilinguals and older monolinguals in the raw data ( Figure 1) suggests that bilingual listeners were as equally sensitive to nasal coarticulation as monolingual children overall: both groups were sensitive to coarticulation (i.e., see dip in blue curve for both groups in the time window of analysis in Figure 1). This is also borne out by the statistical analysis of cross-splice items, where no difference emerged when computing the difference curve in fixations to the target between bilinguals and monolinguals ( Figure 2B). Although we are aware that it is difficult to formulate strong conclusions from null results, the overwhelming similarity of the fixation curves between older monolinguals and bilinguals in the cross-splice condition (Figure 1) points in the direction of similar sensitivity to nasal coarticulation in both groups. It is possible that using a different type of measure, age group or coarticulation contrast, one may see differences in processing between monolinguals and bilinguals. However, for the contrast tested in this study, we observed no statistical difference between the groups of monolinguals and bilinguals.

Analysis of filler trials
This analysis compared all groups' fixations to targets on the filler trials ( Figure 3) in order to assess the potential differences across groups when processing regular speech. In Figure 4, we show difference curves between young monolinguals and older bilinguals (A) as well as between older bilinguals and older monolinguals (B). Visualization of the GAMM results (see Table A2 in Appendix for the numeric output of the model; deviance explained of 56.2%) in Figure 4 shows that young monolinguals fixated significantly less to the target for the entire duration of the trials ( Figure 4A). However, there was no significant difference between monolingual and bilingual children in filler trials ( Figure 4B). This means that, on the one hand, younger monolinguals were less efficient than older monolinguals at fixating to the target, but that (older) monolingual and bilingual participants process English filler words similarly. While it would be informative to have independent measures of children's language skills using standardized tests, this analysis of filler trials suggests that there are no differences in processing abilities for non-cross-spliced items in age-matched monolinguals and bilinguals, in addition to less efficient general processing in younger monolingual children compared to older children.

Discussion
Bilingual spoken word recognition is made more complex because two languages exist in a listener's mind. While research has found that monolingual adults and toddlers are able to use English vowel nasalization during spoken word recognition (Beddor et al., 2013;Zamuner et al., 2016), little research has focused on the development of these abilities over time, and on how bilingual children process phonetic details during spoken word recognition. Thus, we investigated the development of English monolingual children's sensitivity to English coarticulatory vowel nasalization. We also examined how English-French bilingual children process English coarticulatory vowel nasalization.
First, we compared data from a group of younger monolingual children (aged 2 to 3 years) to a group of older monolingual children (aged 4 to 6 years). The statistical analysis showed that younger monolingual children tended to fixate slower and less to targets in filler trials, which can be explained by a less mature word processing system. In the cross-splice condition, listeners were presented with targets that contained a mismatched nasalized vowel. This led all groups to fixate more to the competitor (bone) and less to the target (boat) for a portion of the trial. However, fixations to the target in the cross-splice condition were significantly higher for younger monolinguals than older monolinguals (i.e., closer to 50% fixations than older monolinguals, since both groups had fixations well below 50% in this splicing condition). Thus, our data suggest that the older monolinguals were more sensitive to coarticulation (i.e., young monolinguals were not capable of inhibiting the non-nasalized target as much as old monolinguals), and the activation of the target was more disrupted by the mismatch (since, at that point within the trial, the competitor is hypothesized to be activated). Consequently, one could argue that the older monolingual children were better at resolving the phonetic mismatch because they recovered from a larger disruption. However, looking at the amount of looking to the target in the cross-splice condition, both the younger monolinguals and older bilinguals peak at 50% looking to the target. Thus, fixation data indicate that even older monolingual children cannot resolve the phonetic mismatch (fixations hover around chance in the CROSS-SPLICE condition, 2000 ms after word onset), similar to the younger toddler participants. This corresponds to the Huang and Snedeker (2011) explanation that children have sustained competitor interference, and it is not until children are older that they have more adult-like processing patterns (see also Rigler et al., 2015).
The second goal was to compare the processing of coarticulation cues that vary across languages in bilinguals and compare those results with a group of monolinguals. To do this, we examined English vowel nasalization in a group of English monolinguals and a group of English-French bilinguals. English and French are ideal languages to investigate this question, since French uses vowel nasalization as a phonological distinction (Cohn, 1990), while English contains vowel nasalization as merely a coarticulatory property (Beddor, 2009). We initially predicted that bilingual children would be sensitive to nasal coarticulation in English, but made no strong prediction on the potential differences between monolingual and bilingual children's processing of nasal coarticulation. Past research has argued that monolinguals and bilinguals process language similarly , in which case no difference is expected between our groups. However, others have suggested a bilingual advantage (Bialystok et al., 2012) or disadvantage (Pelham & Abrams, 2014), in which cases we would have expected to find differences across the groups concerning sensitivity to nasal coarticulation. We found that monolingual and bilingual children displayed similar sensitivity to vowel nasalization (Legacy et al., 2018), as shown by similar patterns of fixations to the target images, across the time course, and in both the same-splice and cross-splice conditions. This is in line with some research showing that monolinguals and bilinguals process certain aspects of language similarly. Given that the children in our study were relatively-balanced bilinguals and exposed to both English and French from a young age, perhaps it is not surprising that the bilinguals were not different from the monolinguals in processing the coarticulation cues. Moreover, while French has a phonological contrast between oral and nasal vowels and English has pervasive nasal coarticulation between nasal consonants and oral vowels, this also means that English monolingual children are being exposed to cues of nasalization in their linguistic environment. Perhaps this is enough for English monolinguals to develop similar sensitivity to nasal coarticulation as English-French bilinguals, even though the way the cues are used in the different languages varies. It is possible that if one were to test a cue that occurs in only one of the two languages, there would be evidence of differences between bilinguals and monolinguals.
In summary, we find that children's sensitivity to coarticulation cues grows between the ages of two to six years; however, older children continue to have sustained competitor interference. We also found that bilingual children's sensitivity to coarticulatory information in their L1 patterns similar to that of monolinguals', even when the coarticulatory cue is contrastive in their L2. Our results are in line with previous studies showing equal sensitivity to phonetic details for bilinguals and monolinguals (Liu & Kager, 2018).
There are a number of possible directions for future research. First, a parallel study in French with French monolingual and English-French bilingual children would answer whether there are any bi-directional effects in how bilingual and monolingual children process phonological and coarticulatory vowel nasalization differently from monolinguals. Second, one could examine a group of younger English-French bilinguals to establish whether they are similar to the English monolinguals. We predict that a similar group of younger bilinguals would perform like the younger monolinguals. This is because our older bilinguals were relatively balanced in their exposure to English and Frenchrecall that the bilingual children's average exposure to French across development was 45.1%. This leads to another avenue for future research, which would be to examine processing in bilinguals with different language experience, to see whether less proficient bilinguals would be more likely to draw on the phonemic inventory of their L1 to facilitate speech perception in their L2, and vice versa (Desmeules-Trudel, 2018). For example, to see whether children who were more dominant in French would show more sensitivity to English nasalization, as a cross-over transfer from French.
While the current findings raise a number of intriguing questions, our findings support the idea that phonological representations are rich and include phonetic details (Browman & Goldstein, 1986;Pierrehumbert, 2002), such as coarticulation. This research also highlights that, in some instances, bilinguals can show similar processing to their monolingual peers. The historically monolingual focus in language research belies the fact that a majority of the world's children are exposed to more than one language (Grosjean, 1982). It is therefore crucial to understand how speech perception, and linguistic skills more generally, develop in this, the majority population.