Email writing tasks and the use of lexical bundles

– Zhi Li, December 4, 2018

Effective email writing plays an important role in our professional and personal life. To represent such real-life English writing activities, email writing is included as a writing task in the Canadian English Language Proficiency Index Program (CELPIP) – General test. A number of factors, including writers’ English language proficiency, influence the writing style and the quality of emails. However, email writing may also be described as “institutionalized”; the situational factors (such as the nature of the email message and the relationship between the writer and the recipient(s)) are clearly defined. Therefore, frequent lexical combinations can be expected in this type of writing. In a recent study of email writing published in the TESL Canada Journal, Alex Volkov and I took a corpus-based approach to investigating the relationship between test takers’ English proficiency levels and their use of lexical bundles or multi-word sequences when completing CELPIP-General email writing tasks. In our study, Alex and I analyzed a total of 7,500 email texts extracted from a database of CELPIP-General test performances which were sampled to represent three English language proficiency levels (CELPIP Levels 4, 7, and 10) or three Canadian Language Benchmarks (CLB Stages I, II, and III) proficiency stages. The collection of electronic texts formed a corpus of 1,357,911 words.

As already mentioned, the term ‘lexical bundles’ refers to recurring continuous multi-word sequences. Lexical bundles can range from as few as two words to more than six words, but 4-word lexical bundles are the most commonly studied bundles. Lexical bundles can be identified based on two distributional features: The frequency of occurrence of these units and the dispersion of these units in a corpus. Since they are frequency-based lexical units, lexical bundles are not necessarily complete in structure and often may represent fragments (e.g., “thank you for your”). Unlike idioms and other fixed expressions (e.g., “kick the bucket”), lexical bundles usually do not carry idiomatic meaning. Nevertheless, previous research has established that lexical bundles are important units or “building blocks” of language, linking adjacent elements and fulfilling certain discourse functions in a text. For example, three major functions are typically associated with the use of lexical bundles, namely, stance expressions, referential expressions, and discourse organizers. For each function, multiple sub-functions can be distinguished for finer analysis.

When analysing the occurrence of multi-word sequences in corpora, researchers typically use two measures: frequency of occurrence (how often the bundle appears in the corpus) and dispersion of occurrence (how many different texts in which the bundle appears). In our study, these measures were essentially the same because emails are usually short and a multi-word sequence is unlikely to be used more than once in the same text. Consequently, the frequency of occurrence of a lexical bundle is likely to be the same as its occurrence in different texts or its dispersion in the corpus. Therefore, we applied only the frequency criterion for identifying lexical bundles. We also needed to set a minimum frequency of occurrence and selected 40 occurrences per million words as it guaranteed a wide coverage of lexical bundles in the corpus data. Using AntConc, a corpus tool developed by Laurence Anthony (http://www.laurenceanthony.net/software/antconc/), we extracted lexical bundles of different sizes, from 2-word sequences or 2-grams to 6-word sequences or 6-grams. We examined lexical bundle types as well as lexical bundle tokens. A lexical bundle type refers to each unique lexical bundle. A lexical bundle token refers to every occurrence of a lexical bundle. For instance, the lexical bundle “with regards to” is a lexical bundle type. If it is used ten times in the corpus, we can report ten lexical bundle tokens.

We found that 81 to 98 2-6-word lexical bundles types were used by all three groups of test takers. Some example bundles were “best regards”, “my name is”, “I am writing to”, “to whom it may concern”, and “hope this email finds you well”. Closer inspection of the data revealed that, in general, higher proficiency level test takers used more lexical bundles both in terms of bundle types and bundle tokens. While there were some overlaps in the types of lexical bundles used by the test takers of different proficiency levels, unique lexical bundles occurred in each individual group. For example, CELPIP Level 10 test takers used some lexical bundles that show high levels of politeness, such as “appreciate if you could” and “please feel free to”. By contrast, some lexical bundles used by the CELPIP Level 4 test takers appeared to be more straightforward, as shown in “I just want to”. The top 40 lexical bundles used by each group of test takers are listed in the table below. The lexical bundles shared across all three proficiency levels are in boldface and bundles that are unique to a particular level are in italics.

Top 40 lexical Bundles from the Sub-corpora of Three Proficiency Levels

CELPIP 4 CELPIP 7 CELPIP 10
I would like to
my name is
dear sir madam
good day
dear sir I am
thank you very much
as soon as possible
thank you so much
I am writing to
I am going to
best regards
I am writing this
how are you
I don’t have
to whom it may concern
be able to
I hope you will
I do not know
and I want to
have a good day
I just want to
thank you for your
to let you know
I hope you understand
first of all I
because I have a
because I want to
dear sir or madam
I hope you can
have a lot of
I am writing you
is very important to
so that I can
have a nice day
a lot of people
I hope that you
I am looking forward
and I hope you
don’t have any
if you don’t
I would like to
my name is
kind regards
I am writing this
be able to
best regards
I am writing to
dear sir madam I
to whom it may concern
as soon as possible
thank you for your
to hear from you
thank you very much
I am looking forward
to inform you that
to hearing from you
to let you know
I look forward to
dear sir I am
please let me know
I hope you will
I am writing you
dear sir or madam
I would like you
first of all I
at the same time
thank you so much
thank you in advance
I hope you can
I would really appreciate
for your kind consideration
I am planning to
to inform you about
if you have any
the reason why I
I am writing in
so that I can
we would like to
to bring to your
I just want to
I would like to
I am writing to
be able to
kind regards
to whom it may concern
my name is
I look forward to
best regards
thank you for your
please let me know
to hearing from you
I am writing this
as soon as possible
if you have any
dear sir or madam
thank you in advance
and I have been
to bring to your
I would like you
to inform you that
would it be possible
to hear from you
to let you know
I would really appreciate
please feel free to
appreciate if you could
I have noticed that
do not hesitate to
it would be a
and would like to
I am unable to
at your earliest convenience
the end of the
in regards to the
I would love to
thank you very much
I hope you will
I would also like
to inform you of
so that I can

Alex and I also examined the distribution of lexical bundles by discourse function (e.g. stance expression, referential expression, discourse organizer). We also included a new category of discourse function to account for the sub-functions that are specific to email-writing such as opening, closing, and politeness. We found that the distributions of the four functions are similar across the three proficiency groups. A closer look at the sub-functions, however, reveals noticeable differences in three functions, except for the function of discourse organizer. For example, while the relative proportion of the lexical bundles fulfilling stance functions was similar across the proficiency levels, the emails written by high-performing test takers showed a different profile of the four sub-functions of stance (i.e. desire/intention, ability, obligation/directive, and certainty) than the ones written by low-performing test takers. These differences at the sub-function level are by and large in line with our expectations of the writing style and quality of emails produced by test takers of different proficiency levels. That is, high-performance test takers exhibited superior pragmatic competence and better genre awareness of email writing.

This study reveals that test takers’ English writing proficiency influences their selection and use of lexical bundles. It also exemplifies the usefulness of lexical bundles in English email writing. Learners and test takers might find that a list of these bundles will broaden their repertoire of expressions for email writing.

If you would like to read more about this research, the study has been published as:

Li, Z., & Volkov, A. (2017). “To whom it may concern”: A study on the use of lexical bundles in email writing tasks in an English proficiency test. TESL Canada Journal, 34(11), 54–75. https://doi.org/10.18806/tesl.v34i3.1273

 

Zhi Li is an assistant professor in the Department of Linguistics and Religious Studies at the University of Saskatchewan (UoS), Canada. Before joining UoS, Zhi worked as a Language Assessment Specialist at Paragon Testing Enterprises, Canada and a sessional instructor in the Department of Adult Learning at the University of the Fraser Valley, Canada. Zhi holds a doctoral degree in applied linguistics and technology from Iowa State University, USA. His research interests include language assessment, technology-supported language teaching and learning, corpus linguistics, and computational linguistics.