When can we say a child has acquired their L1?

This question (or something like it) was posted on twitter some time in May and it led to a bit of back and forth, with not a lot of consensus on the right answer. I contributed a few tweets, but because I was travelling at the time it was hard to engage properly in the ongoing conversation. Moreover, Twitter isn’t a great medium, imo, for the kind of discussion the question deserves. I said I might write something about this when I was back home, and it’s taken me a bit of time to settle in and actually write up my thoughts, but here they are.  Fair warning, I have a lot to get done in my last week or so of sabbatical, so I wrote this quickly and hence it isn’t very polished. It also isn’t as nuanced as it could be, but for me, for now, it’ll do. (A new attitude I’m trying out. For those who know me, not a natural way of being for me.)

 

It’s a deceptively simple question, and it’s not uncommon for textbooks to say that the average child has acquired most of the basics by 3 or 4 years of age.  Not surprisingly, then, the OP got a few responses along the lines of “I’ve heard 3 or 4.” As well as ones that went on to say things like “but I think it’s more like 5 or 6.” Then there were a whole lot of other responses that were deemed (by someone else, not the OP) to be unhelpful. I assume that the unhelpful comment was about responses that basically said “never” or even worse “the question doesn’t make sense”. I am in the camp of the latter response, and here’s a bit of an explanation as to why. I’ll break down a few of the reasons I think the question is troublesome.

I should be clear that I don’t mean to pick on the OP: they asked what to many people seems like a perfectly reasonable question, and any points they (or others) brought up in response to answers were likely quick responses in medium that is not good for long, nuanced, reasonable exchanges. So I use them as ways to talk about why the question is problematic, not why the person is wrong. Because of this, while I use things that were said in my response, I won’t ID any of the twitter handles.

 

1) What do you mean by ‘their language’? To answer a question about when something happens you need to know what the something is. So what does ‘their language’ mean.

In an effort to be more specific (which is good) in response to the question of what they meant by language, the OP distinguished between ‘their idiolect’ and ‘their language’. But since any language is really just a label we give to a collection of idiolects, I am assuming they meant shared vs. non-shared things. But even that is not as unproblematic as one would like, especially given we are talking about things that have been learned. Whether one thinks that some things about language are innate or not, the things that are presumed, by some, to be innate are not the things the OP was wondering about. And there is emerging evidence that there is more variation in language knowledge in adults than we might have assumed, meaning that we would also have to define what we mean by ‘shared’ (i.e. the stuff that is part of their ‘language’). If it’s 100% agreement among adult speakers of the language, then there might be less left than we think (Dabrowska, 2018), as anyone who’s ever sat in on an undergrad (or grad for that matter) syntax class can attest. I’ve never heard a native English speaker say that “ran the dog after cat the” is grammatical, but you get much beyond basic word order and you inevitably have at least a few dissenters here and there. That’s not to say that there is not high levels of agreement on many things, but a high level of agreement is not total agreement. We could of course include things without perfect agreement, but that would require some specific pre-defined level of agreement (given a specific sample size) to include things where there is less than perfect agreement. Moreover, we often appeal to idiolects to explain (within dialect) variation. When you say ‘in my idiolect it works this way’ it means that the version of the language you have acquired  works that way, that is, you are including the idiolectal in the language that has been acquired.

This may seem like an absurd thing to bring up, but we need to define what we are asking about, and even that basic step is more fraught than it would at first appear. I mean, we all know what we mean when we say the child is learning English, or French, or ISL, right? Yes, of course we do, but saying the child is learning English is different than making a claim about when she is has actually ‘acquired’ English. That entails a target that is reached, and my point is that it’s not totally clear what that target includes, so how can we decide that it has been reached. It’s kind of like asking someone how long it will take to drive to Toronto from somewhere. Do you mean to the outskirt of the region people refer to as Toronto, the actual boundary of the city proper, or to downtown? Those are 3 different places and they take three different times to reach.

 

2) Even if we could distinguish between their ‘language’ and their idiolect, there are still a lot of ways we could define ‘language’ and this was pointed out. Setting aside for now what we mean by ‘acquired’ (more on that later) do you care about the phonological system, the syntax, the semantics, the pragmatics, sociolinguistic variation? All of it? Many people would likely leave out the lexicon, as we know that we continue to learn new words throughout our lifetime and it feels odd to say that someone who doesn’t know every word in a language (i.e., every word used by some speaker) hasn’t yet acquired the language. (But even then, would we want to say someone who only knows 500 words knows the language? I don’t know the answer to this question, but the point is that these sorts of questions are tricky.) But if we care about the learning of culturally shared things, then things like socio-linguistic variation would seem to be especially relevant, as no one, regardless of their theoretical orientation, would suggest that these things aren’t learned. Other things fit less clearly (e.g. affix ordering). Some people thing everything is learned, others think some things aren’t. So your theory can define what you include in the set of things that are learned and so is relevant for answering the question. Here I am not taking a stand on what is versus what isn’t learned, my views on that are not relevant to the point I am making. What is relevant is that what you include in “their language” is not theory neutral, if what you want is to include only things that are clearly learned.

 

3) Then there is the question of what ‘acquired’ means. There was some mention of an inflection point, where, e.g., learning stops being fast and starts being slow, in order to accommodate the known ongoing changes in adult language (e.g., changes in RC processing, Wells, Christiansen, Race, Acheson, MacDonald, 2009), the idea being that the inflection point would be the time of interest. In principle, the idea of an inflection point is reasonable (at least to me), but since each form in the language likely has its own, deciding on how to create a super measure that would include them all (even within a specific domain of interest) would be arbitrary. (The extant data suggest that each individual aspect of language has its own trajectory, where by individual aspect I don’t mean things like ‘passives’ or ‘relative clauses’ or ‘tense’, I mean things at a more fine-grained level than that: e.g., verb agreement appears to be acquired on a morpheme by morpheme basis (possibly even a verb+agreement morpheme by verb+agreement morpheme basis. So you’d need an inflection point for each individual ‘thing’). I am not saying that we couldn’t create a measure of average inflection points (across aspects of the language), just that, however we do it, it will be arbitrary. Moreover, I would actually be surprised if every child showed the same ‘super-line’, that is, if an averaged inflection point looked the same for all children (it certainly doesn’t for word learning, e.g., see work on the naming insight/naming explosion). And for children in different cultures where there are different practices surrounding talk to children timing may be affected (Shneidman & Goldin-Meadow, 2012). (It’s easy enough to have different ages at which the language is considered acquired for different cultural and linguistic groups, but that has the potential to bring with it a lot of unnecessary baggage.) Additionally, what are we measuring? Production? Comprehension? Generalization? Correct generalization into all possible contexts? All of these?

 

Basically, while it’s possible, in principle, to figure out exactly what you mean by acquired, for what, and how you are going to measure it, it’s not at all clear to me that it’s in any way interesting, or more importantly, meaningful, to do so, because time point that comes out as a result will be an arbitrary one, that is only meaningful within the specific definitions. For these, and other reasons then, I don’t think the question makes sense.

Now back to my sabbatical to do list in a vain attempt to get one or two things I had planned to do actually crossed off before it ends.

 

Dabrowska, E. (2018). Experience, aptitude and individual differences in native language ultimate attainment. Cognition, 178, 222-235.

Shneidman, L. A., & Goldin-Meadow, S. (2012). Language input and acquisition in a Mayan village: how important is directed speech?. Developmental science, 15(5), 659–673. doi:10.1111/j.1467-7687.2012.01168.x

Wells, J. B., Christiansen, M. H., Race, D. S., Acheson, D. J., MacDonald, M. C. (2009). Experience and sentence processing: statistical learning and relative clause comprehension. Cognitive Psychology, 58(2), 250-271. doi: 10.1016/j.cogpsych.2008.08.002.

New paper out on line at Journal of Child Language “Learning speech-internal cues to pronoun interpretation from co-speech gesture: a training study”

We’ve got a new paper that just came out on line in JCL. Here’s the abstract, and a link to a read-only version of the full paper.

“This study explores whether children can learn a structural processing bias relevant to pronoun interpretation from brief training. Over 3 days, 42 5-year olds were exposed to narratives exhibiting a first-mentioned tendency. Two characters were introduced, and the first-mentioned was later described engaging in a solo activity. In our primary condition of interest, the Gesture Training condition, the solo-activity sentence contained an ambiguous pronoun, but co-speech gesture clarified the referent. There were two comparison conditions. In the Gender Training condition the characters were different genders, thereby avoiding ambiguity. In the Name Training condition, the first-mentioned name was simply repeated. Ambiguous pronoun interpretation was tested pre- and post-training. Children in the Gesture condition were significantly more likely to interpret ambiguous pronouns as the first-mentioned character after training. Results from the comparison conditions were ambiguous: there was a small but non-significant effect of training, but also no significant differences between conditions.”

 

Methods sections and (avoidance of) self-plagiarism – some thoughts and a practical solution – Guest post by Dr. Matt Dye

The following is written by Dr. Matt Dye of the National Technical Institute for the Deaf in Rochester NY. It’s a follow up on my related post. You can find more information on him and his research here: http://www.deafxlab.com/

Ah, the Methods section! Perhaps the driest, yet most important, section in a research article. That section undergraduates always decide to skip, and of which reviewers always ask for more clarification.

I’d start by dialing back the sarcasm, and reiterating that this is perhaps the most important section in the article. Along with the Results section, it allows the educated reader to discern the quality of the science being reported. So, let’s not skimp. However, we have all felt the pain of trying to say the exact same thing using different words. Here we have the undergraduate refrain, “But the authors said it so eloquently, I couldn’t find a way to paraphrase it without making it worse!” However, we are, for the most part, not undergraduates submitting our work to academic journals for peer review. So, we cannot get out of it that way.

I like to think that after me doing my best technical writing, and one (or two) rounds of responding to peer review, my Methods section is as tight as it can get. Scientific protocol rendered into perfect prose. But if I am honest with myself, then of course there is room for improvement. Herein follows a suggestion that could (a) improve the Methods section, (b) result in a rewritten Methods that hopefully avoids charges of text recycling, and (c) provide a valuable educational experience for our postdocs and students:

1.  Ask a trainee in your lab to replicate the setup of your study. From scratch. Using only your Methods section as a guide. Here lies a critical test of how well that section is written.
2.  Assess how well the trainee was able to do so. Could she accurately reproduce the same procedure, or did she have to request information not in the manuscript? Were there any differences between her setup and the one you expected?
3.  If there were errors, or required information was missing, ask the trainee to rewrite the Methods section to provide the necessary information. Assign authorship credit and acknowledge contribution to the new manuscript.
4.  If there were no errors, ask the trainee where she was uncertain or where she had to struggle to figure out what to do. Ask the trainee to rewrite the Methods section to make clearer the necessary information. Assign authorship credit and acknowledge contribution to the new manuscript.
5.  Repeat process for each manuscript using the same (or very similar) methods. As soon as you have reached the point of perfection (or massively diminished returns on time invested):
*   Cite the latest iteration in new submissions;
*   Make a preprint of the article with that version publically available on your     website (and make sure that it can be downloaded anonymously);
*   In your cover letter, let the Editor know this process (or a better version – which I’m sure is possible.)

We end up with better trained students and postdocs (who also get appropriate author credit for their CVs), improved replicability of methods, and less chance of desk rejection from hard-working and under-appreciated editors.

Matt Dye PhD FPsyS
RIT/NTID
http://www.deafxlab.com/

Methods sections and (avoidance of) self-plagiarism

A while back I posted the following on twitter: “I hate writing methods sections for work that is the same as previous work: tweaking wording that works to avoid self-plagiarism is tedious.”  My twitter auto posts to Facebook, so my friends there saw it too. Interestingly, on FB I mostly got commiseration from others who similarly dislike having to do this (but who mostly also seem to do it). On twitter, however, the responses were mostly advice to ‘cite and copy’, and there were several people referring to COPE guidelines as justification.

The issues surrounding self-plagiarism or text recycling as this practice is sometimes called are too complex, I think, for a series of response tweets. So I decided to write a blog post about it, introducing the issue from my perspective. I also invited some friends and fellow researchers who commented on FB to contribute their perspective. They will be added later and the post updated. (Here’s a link to one of them.)

From me:

I do behavioural work. Many of my studies are trying to understand aspects of language learning. How do we learn languages? What is easy or hard to learn? For whom? And why? Studying real people learning real languages can lead to hypotheses, but it is hard to definitely answer these questions using real life learners learning real life languages in real life. (Yes, I know that that is terribly repetitive. But it gets my point across.) So researchers in my field have to do something different to really get at the questions we are interested in. The paper in question discusses a study using a miniature artificial language methodology. Let me give you some background on this methodology. Hint: when I say methods I don’t mean (just) statistical methods or analyses, I mean the whole design of the study from start to finish. In my field, a lot of the ‘heavy-lifting’ is done in the design, meaning, the stimuli and test items. The nature of the data collection process is crucial and can be quite complex.

Although the general method has a pretty standard abbreviation (MAL, which I will use from hereon in), there is nothing whatsoever that is standard about MAL methods.  Each MAL is constructed to get a specific question. Basically the process of Mal development goes something like this: the researcher thinks about the specific variables they are interested in isolating in a language/learning situation/learner, and then designs a language or set of languages (each given to a different condition) that varies on that single variable. Michael Erard (he writes a lot of great stuff about language) did a piece on MALs a few years back that explains the process and intent behind it well. https://motherboard.vice.com/en_us/article/sillyspeak-the-art-of-making-a-fake-language

In any case, each language is unique and the specifics of the language need to be described in enough detail so a reader can evaluate whether it actually gets at the question it was supposedly designed to get at. In my work, I use a variety of different kinds of MALs to get at different kinds of questions. Sometimes the ‘language’ is just sounds. These are used, for instance, when researchers are interested in the kinds of statistical computations learners can perform and whether those computations can help you discover the kinds of patterns that exist in real languages. This line of work got its start with Saffran, Aslin, & Newport (1997?) and their basic method has been used in a great deal of follow-up work (including some out of my lab…). People are presented with a sample of MAL input for some (usually, but not always, prespecified) amount of time and then are later tested on what they know. Testing usually involves judging items that are or are not consistent with the patterns in the input language. It might seem that this specific MAL is well known enough that methodological details beyond question or theory driven adjustments can be dealt with by simply citing the SAN paper.  But, it turns out that some seemingly irrelevant methodological differences might be important to learning outcomes (plug for research by my student). Meaning that at this point we shouldn’t simply leave out methodological details from these kinds of MAL studies.

Most of my MAL work (I do other things too) investigates very different questions and uses much more complex artificial languages; the words mean something, they are presented in sentences alongside video clips, and participants are asked to produce novel sentences (i.e., sentences they didn’t get in their exposure). They are also sometimes asked to make judgments about novel sentences that are or are not consistent with the patterns in their input. The specifics of the language design are important, as are the specifics of the judgment task test items that are inconsistent with the patterns in the input. That is, the ‘ungrammatical’ MAL can sentences tell us different things depending on why or how they are ungrammatical. The specifics of the design are very important in these studies: If the language or the test items are not designed properly, the study won’t test what it is supposed to test. Thus, a thorough description of the methods is very important for readers (and reviewers!) to be able to assess the results and conclusions based on them in any MAL research, let alone replicate them.

The MALs used by SAN and related work are simple enough that it takes relatively little space to describe them well. However, the more complex languages I use in most of my work take a great deal more. Thus, the method sections in these papers are long if they (the methods) are well described. I (and others) tend to use base languages that I tweak as necessary to ask related questions. That means that there are multiple papers using very similar methods. It might seem then that I could simply refer back to the earliest paper for the basics of the methods and just explain any differences or deviations from the original in the new paper. But then the reader, or reviewer, could not actually assess the later papers on the basis of what is actually in that paper. As a reviewer, I hate it when I cannot assess a paper on the basis of what is in the paper. Don’t make me go look somewhere else to figure out whether what you did makes sense. So I am left with essentially repeating a great deal of content from one paper to the next. (Before you accuse me of salami-slicing, I don’t. These are papers asking related but different questions about a particular phenomenon and so where using very similar methods makes sense.) What to do?

Many of the tweets I received in response to my original tweet were telling me to go ahead and copy, being sure to cite the original, per COPE’s guidelines.

Let’s look at those guidelines (which the journal I am planning on submitting the paper in question to is a member of).

I downloaded a copy from the following website https://publicationethics.org/files/Web_A29298_COPE_Text_Recycling.pdf on June 13, 2017. I will inset any quotations from those guidelines to make clear which text is not mine in what follows.

These guidelines are intended to guide editors when dealing with cases of text recycling.

Text recycling, also known as self-plagiarism, occurs when sections of the same text appear (usually un-attributed) in more than one of an author’s own publications. The term ‘text recycling’ has been chosen to differentiate from ‘true’ plagiarism (i.e. when another author’s words or ideas have been used, usually without attribution).

A separate issue, not to be confused with text recycling, is redundant (duplicate) publication. Redundant (duplicate) publication generally denotes a larger problem of repeated publication of data or ideas, often with at least one author in common. This is outside the scope of these guidelines and is covered elsewhere.

Notice that is says “usually un-attributed”, suggesting that simply citing the appropriate original source does not necessarily make it not text-recycling. Moving on…

How can editors deal with text recycling?

Editors should consider each case of text recycling on an individual basis as the ‘significance’ of the overlap, and therefore the most appropriate course of action, will depend on a number of factors.

Significance isn’t defined, and the factors that are discussed don’t really make significance any clearer (to me). Shortly thereafter it says this:

In general terms, editors should consider how much text is recycled. The reuse of a few sentences is clearly different to the verbatim reuse of several paragraphs of text, although large amounts of text recycled in the methods might be more acceptable than a similar amount recycled in the discussion.

In my work, it is more than a few sentences, an even ‘several paragraphs’ is pushing it. Clearly, reuse in methods sections is seen as being different, but even there, editors are being counseled to attend to the amount of repeated text. But what exactly counts as ‘large amounts’ that ‘might be more acceptable’ – and notice that it doesn’t say ‘acceptable’, it says ‘more acceptable’. More acceptable can still be unacceptable. So far, clear as mud. The guidelines highlight the editors’ discretion, which means that they can be applied differently by different editors. And can result in serious consequences for authors.

Text recycling may be discovered in a submitted manuscript by editors or reviewers, or by the use of plagiarism detection software (e.g. CrossCheck). If overlap is considered minor, action may not be necessary or the authors may be asked to re-write overlapping sections and cite their previous article(s) if they have not done so.

More significant overlap may result in rejection of the manuscript. Where the overlap includes data, editors should handle cases according to the COPE flowchart for dealing with suspected redundant publication in a submitted manuscript. Editors should ensure that they clearly communicate the reason for rejection to the authors.

This says may be asked to rewrite and cite (if they haven’t already), again, saying that just having cited yourself is not enough, it shouldn’t be the same text (i.e., it should have been rewritten).

 

And from the guidelines published on the web by the journal’s publisher (Taylor & Francis) (copied text is again inset and is from the following website: http://authorservices.taylorandfrancis.com/ethics-for-authors/ (text copied below retrieved June 13, 2017):

Case 2: Plagiarism

“When somebody presents the work of others (data, words or theories) as if they were his/her own and without proper acknowledgment.” Committee of Publications Ethics (COPE)

When citing others’ (or your own) previous work, please ensure you have:

  • Clearly marked quoted verbatim text from another source with quotation marks.

According to this, it might be fine if I just enclosed the pages (yes pages) in question inside quotation marks. But pages and pages of quotations (even from my own work) seems excessive.

Shortly after that section is the following one (same website, same date of retrieval, copied text is again inset to make clear it is copied and not mine):

Make sure you avoid self-plagiarism

Self-plagiarism is the redundant reuse of your own work, usually without proper citation. It creates repetition in the academic literature and can skew meta-analyses if the same sets of data are published multiple times as “new” data. If you’re discussing your own previous work, make sure you cite it.

Taylor & Francis uses CrossCheck to screen for unoriginal material. Authors submitting to a Taylor & Francis journal should be aware that their paper may be submitted to CrossCheck at any point during the peer-review or production process.

Any allegations of plagiarism or self-plagiarism made to a journal will be investigated by the editor of the journal and Taylor & Francis. If the allegations appear to be founded, all named authors of the paper will be contacted and an explanation of the overlapping material will be requested. Journal Editorial Board members may be contacted to assist in further evaluation of the paper and allegations. If the explanation is not satisfactory, the submission will be rejected, and no future submissions may be accepted (at our discretion).

Note that the first sentence says ‘usually without proper citation’ not ‘without proper citation’. That means that even including a citation does not by itself clear you of self-plagiarism. It also does not distinguish methods sections from other sections of the paper. (As a language researcher I tend to notice these wording choices as well as words that are missing. Unless I’m editing my own work, in which case I am quite likely to miss missing words, make bad wording choices, etc.)

 

I found a paper in Biochemia Medica discussing this issue with a bit more clarity. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3900061/) The paper is attempting to make potential editorial policies regarding different kinds of self-plagiarism.

I will highlight a few sections from the paper. (Šupak-Smolčić, V., & Bilić-Zulle, L. (2013). How do we handle self-plagiarism in submitted manuscripts? Biochemia Medica, 23(2), 150–153. http://doi.org/10.11613/BM.2013.019)

In most cases of augmented manuscripts, the major overlap is seen within the methods section. As such, editors and readers can be misled to consider it as technical (self) plagiarism, which is usually not sanctioned with the same strictness as plagiarism of other parts of the paper. Nevertheless, if a submitted manuscript shows substantial overlap in the methods section with the author’s previous work, then the editor can consider this manuscript for publication only under the following circumstances:

  • the author refers to his previous work,
  • methods cannot be written in any other form without altering comprehensibility,

Although this section was about papers that reuse data, there is a relevant (imo) bit of text here: ‘methods cannot be written in any other form without altering comprehensibility’. This suggests that if they can be rewritten they should.

Later it seems to suggest that some overlap in methods might be OK, again at the discretion of the editor. But given the earlier passage just discussed, presumably, overlap is only deemed tolerable if unavoidable. In my paper, it is avoidable (as in, I can write it a different way, it’s just a hassle that is only being undertaken to avoid editorial hassles).

Based on the editorial policy of Biochemia Medica, upon detection of self-plagiarism, a submitted manuscript can be considered for publication only if it contains relevant new data and will contribute to overall scientific knowledge. Additional conditions have to be met:

When text similarity is observed with an author’s previous publication, and the original publication is cited, the submitted manuscript has to be revised, with the questionable parts corrected. Overlaps within the methods section can be tolerated, but the cut-off percentage is for the editor to decide. Similarities in the introduction section can be approached differently from the treatment of overlaps in the discussion and conclusion sections.

 

In case you think that this is silly and no one will ever face any consequences for text recycling: http://www.ithenticate.com/plagiarism-detection-blog/bid/94140/The-Challenge-of-Repeating-Methods-While-Avoiding-Plagiarism#.WUAFon0bjeQ (or search replies to my tweet to find the person whose paper got (desk?) rejected for this.

I’m not trying to pick on COPE or Taylot & Francis, I’m trying to lay out why it might not be as easy as the ‘just copy and cite’ advice I was getting. My suspicion is that that advice came from people working in very different fields with little appreciation for the nature of methods in other areas (and so why it might not be so easy for other researchers). We can have a discussion about whether these guidelines are reasonable, in fact, I think it would be good to do so. But I don’t see a way to come up with a one-size-fits all approach to this precisely because of the differences in methods. For now, I think I’ll stick with reworking my methods sections as best I can while still including all of the relevant details, because I think that methods are important for evaluation, making people go elsewhere to read them is bad, and I don’t want to get dinged by checkers for too much overlapping text. And I think that this is probably true for most people in my field. Other fields are likely quite different in terms of how much specificity is really required. Moreover, I want people to know what I actually did! Too often people think you did something you didn’t do, and then make claims about your work that are incorrect. If I provide details in the papers, they have less of an excuse for that. (Same goes for me and other people’s work – I often go back to a paper thinking they did something but finding out I was wrong. If the details aren’t there, it’s harder to do.)

 

(There is also the question of who actually ‘owns’ the words an author might wish to reuse. Aside from the copyright issues with many journal publications – often authors do not retain it, the journal does – if the original paper was co-authored, the words in question don’t really just ‘belong’ to a single author, and so are they really theirs to do with as they wish? I don’t know the answer to this, but it’s interesting to think about.)

 

New paper out: Introducing the Infant Bookreading Database (IBDb)

I am very excited to be able to announce that a new paper “Introducing the Infant Bookreading Database (IBDb)” has just come out as an on-line first view paper in the Journal of Child Language. The paper, co-authored with Lisa Matthewson, describes a data-base of children’s books that came out of the survey we posted a link to a few years back, and which we’re making available to other researchers. (Click here to get to it.) Note, the paper is Open Access.

Here’s the abstract of the paper: Studies on the relationship between bookreading and language development typically lack data about which books are actually read to children. This paper reports on an Internet survey designed to address this data gap. The resulting dataset (the Infant Bookreading Database or IBDb) includes responses from 1,107 caregivers of children aged 0-36 months who answered questions about the English-language books they most commonly read to their children. The inclusion of demographic information enables analysis of subsets of data based on age, sex, or caregivers’ education level. A comparison between our dataset and those used in previous analyses reveals that there is relatively little overlap between booklists gathered from proxies such as bestseller lists and the books caregivers reported reading to children in our survey. The IBDb is available for download for use by researchers at <http://linguistics.ubc.ca/ubcibdb/>.

Alexis Black and Masaki Noguchi are at BU (without me)

Alexis Black is presenting a study from her dissertation work at BUCLD on Saturday November 5th. The talk is entitled “The impact of phonological knowledge on statistical learning”.

Abstract: Current theories suggest that statistical learning is fundamental to language acquisition; much about the mechanisms underlying this capacity, however, remain unknown. Across 5 experiments we exposed 120 adult participants to an artificial language composed of either native or non-native phonemes for 2-8 minutes. We hypothesized that making the sounds more difficult to perceive and encode would alter the trajectory of the statistical learning process. Participants exposed to non-native sounds failed to distinguish words from part-words until familiarized to 4 times as much stimuli as required for native-language sounds. Learners were sensitive, however, to the difference between familiar and completely novel 3-syllable combinations after only 2 minutes of exposure. After 4 minutes of exposure, this strengthened to include a novel syllable combination at either the beginning or end of the word. These results have implications for thinking about infant learners who are in the process of acquiring their native sound inventory.

And Masaki Noguchi has a poster, also on Saturday. “Learning of talker-specific phonemic contrasts by adults”

 

The Language and Learning Lab is at #EvoLang11!

Oksana Tkachman will be in New Orleans at EvoLang 2016 presenting our (Tkachman & Hudson Kam) poster “Arbitrariness of Iconicity: The Sources (and Forces) of (Dis)similarities in Iconic Representations”.

It’s a reporting of our initial findings on a really cool new project Oksana is running. Here’s a brief description of what the study is about:

“Our study investigates factors that might lead to favoring some features of
referents over others in iconic representations. We investigate this by having
hearing, sign-naïve adult participants invent gestured names for easily
recognizable objects. The items participants were asked to create signs for
differed along a number of dimensions that we hypothesize might influence the
nature of the iconic representation, as shown in Figure 1. For instance, some of
the items were man-made while others were part of the natural world, as it has
been claimed that man-made objects are represented with handling (grasping)
handshapes (Padden et al., 2013). We also investigated the effect of movement
and size, for both man-made and natural categories. We anticipated that these categories would have impact on the choice of representational features; for example, the size and shape of natural objects would be encoded in the gestures, and the man-made objects would be represented by the prototypical interaction of humans with those objects.”

If you want to know what we found, go see Oksana present the poster! (Or just email either of us for a copy. Oksana: tox.cs84@gmail.com, or Carla.HudsonKam@ubc.ca)

 

“The impact of conditioning variables on the acquisition of variation in adult and child learners” just out in Language

I’m happy to be able to say that “The impact of conditioning variables on the acquisition of variation in adult and child learners” in now out in the recent issue of Language. (Note, it’s not open access.)

Abstract: “Natural human languages often contain variation (sociolinguistic or Labovian variation) that is passed from one generation of speakers to the next, but studies of acquisition have largely ignored this, instead focusing on aspects of language that are more deterministic. Theories of acquisition, however, must be able to account for both. This article examines variation from the perspective of the statistical learning framework and explores features of variation that contribute to learnability. In particular, it explores whether conditioning variables (i.e. where the pattern of variation is slightly different in different contexts) lead to better learning of variation as compared to when there are no conditioning variables, despite the former being conceptually more difficult. Data from two experiments show that adult learners are fairly good at learning patterns of both conditioned and unconditioned variation, the latter result replicating earlier studies. Five-to-seven-year old children, in contrast, had different learning outcomes for conditioned versus unconditioned variation, with fewer children regularizing or imposing deterministic patterns on the conditioned variation. However, the children who did not impose deterministic patterns did not  necessarily acquire the variation patterns the adults did.”

 

New (old) paper by Morrison & Hudson Kam (2009): Phonological form influences memory for form-meaning mappings in adult second-language learners.

In a previous post I committed to doing blog posts on failed conditions. One reason behind this was to be part of the solution to the hidden data problem that exists in psychology, and by extension, cognitive science. But not all hidden data are failures to find a relationship, some hidden data come from studies that have never been written up, and may never be due to time constraints. (That is a whole other problem and one I’m not going to tackle here.) In an effort to get some of those data out too, I’ve decided to post papers that for one reason or another are unlikely to ever be published.

This is the first such post. It’s a paper that resulted in part from an undergraduate honors thesis by Jessica Morrison at UC Berkeley. The paper was written up in late 2008 early 2009. It was submitted somewhere (I can’t remember where). It was rejected, but we got comments that could have been useful for reworking the paper a bit before sending it off somewhere else. Jessica left academia, and despite plans to send the paper elsewhere, it was not my highest priority. So the paper languished and now the citations are out of date. I have faced the fact that I will always have other things on my plate that are higher priority than this paper. But I think it’s a neat little project, and someone out there might find it interesting or useful. So here it is. I haven’t done anything to the manuscript other than add a statement about copyright, and put the tables and figures into the text. So it’s not the prettiest document, but it is now available, which is something it wasn’t when it was just sitting on my hard drive. And hey, it’s also open access! Just click on the highlighted names after the title to get a copy of the paper.

Title: Phonological form influences memory for form-meaning mappings in adult second-language learners. Morrison & Hudson Kam (2009)

Abstract: This study asks whether phonological form affects adult second language learners’ ability to learn the meanings of novel words. Specifically, we ask whether hard-to-pronounce words, defined as having phones/phone combinations not present in the learner’s native language, are more difficult to learn meanings for, and further, if learnability differences are due to interference from production problems or more general representational difficulties. We exposed participants to easy- and hard-to pronounce novel word-novel object pairings and tested their memory for the pairings. Participants who had either repeated words aloud, performed subvocal repetition, or heard another learner’s attempts to repeat the words during exposure performed worse on hard-to-pronounce words when tested immediately after exposure. When tested the following day, all participants, regardless of exposure condition, showed the effect. In a follow-up experiment, participants who engaged in an articulatory suppression task during learning did not have more difficulty with hard-to-pronounce words, suggesting that differences cannot simply be due to interference. Rather, we suggest that more difficult phonological forms lead to weaker representations which are then more difficult to link up with meanings in memory.