APPLICATIONS AND PROSPECTS OF
ARTIFICIAL INTELLIGENCE IN LINGUISTIC
RESEARCH
Shaohua Jiang
School of Humanities, Fujian University of Technology, Fuzhou, Fujian, 350118,
China
Krirk University, Bangkok, 10220, Thailand
sophia_FP@126.com
Zheng Chen*
Concord University College, Fujian Normal University, Fuzhou, Fujian, 350000,
China
Reception: 2 January 2024 | Acceptance: 22 January 2024 | Publication: 19 February 2024
Suggested citation:
Jiang, S. and Chen, Z. (2024). Applications and Prospects of Articial
Intelligence in Linguistic Research. 3C Tecnología. Glosas de innovación
aplicada a la pyme 13(1), 57-76.
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
57
ABSTRACT
In modern linguistic research, the application of Artificial Intelligence has led the field
and provided powerful tools and prospects for linguists. LSTM is used for extracting
character features, joint vector representation and constructing text generation models
and generating natural language text. LSTM is involved in the design of speech
recognition network to process the input speech signals for generators and
discriminators to improve the accuracy of speech recognition. By continuously
optimizing the training objectives, the translation system will more accurately translate
text from one language to another, thus facilitating cross-cultural communication.
Through the application of artificial intelligence, the F1 value has been improved by
3.9% compared with the previous value, and the cumulative variance contribution rate
of the five factors is more than 60%, with all subloadings reaching 0.4 or more.
Artificial intelligence will promote the development of the field of linguistics, improve
research efficiency and accuracy, and promote the innovation of language technology.
KEYWORDS
Artificial intelligence; LSTM; joint vector; speech recognition; F1 value
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
58
ABSTRACT
In modern linguistic research, the application of Artificial Intelligence has led the field
and provided powerful tools and prospects for linguists. LSTM is used for extracting
character features, joint vector representation and constructing text generation models
and generating natural language text. LSTM is involved in the design of speech
recognition network to process the input speech signals for generators and
discriminators to improve the accuracy of speech recognition. By continuously
optimizing the training objectives, the translation system will more accurately translate
text from one language to another, thus facilitating cross-cultural communication.
Through the application of artificial intelligence, the F1 value has been improved by
3.9% compared with the previous value, and the cumulative variance contribution rate
of the five factors is more than 60%, with all subloadings reaching 0.4 or more.
Artificial intelligence will promote the development of the field of linguistics, improve
research efficiency and accuracy, and promote the innovation of language technology.
KEYWORDS
Artificial intelligence; LSTM; joint vector; speech recognition; F1 value
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
INDEX
ABSTRACT .....................................................................................................................2
KEYWORDS ...................................................................................................................2
1. INTRODUCTION .......................................................................................................4
2. LITERATURE REVIEW .............................................................................................4
3. APPLICATION OF LSTM IN LINGUISTICS .............................................................5
3.1. Application of LSTM in text analysis ..................................................................6
3.1.1. Extracting character features ......................................................................6
3.1.2. Joint vector representation ..........................................................................7
3.1.3. LSTM cell structure .....................................................................................7
3.2. Role of LSTM in processing speech signals ......................................................8
3.2.1. Speech Recognition Network Design ..........................................................8
3.2.2. nput Speech Processing ...........................................................................10
3.2.3. Generator ..................................................................................................10
3.2.4. Discriminators ............................................................................................11
3.2.5. Training objectives .....................................................................................12
3.3. Creating text generation models using LSTMs ................................................12
4. PROSPECTIVE ANALYSIS OF ARTIFICIAL INTELLIGENCE IN LINGUISTIC
RESEARCH ............................................................................................................14
4.1. Quality of translation in different languages .....................................................14
4.2. Identification accuracy ......................................................................................16
4.3. Validation of creative writing skills ....................................................................17
5. CONCLUSION ........................................................................................................17
ACKNOWLEDGMENTS ...............................................................................................18
REFERENCES ..............................................................................................................18
ABOUT THE AUTHOR .................................................................................................19
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
59
1. INTRODUCTION
Perhaps because of the wider range of application scenarios, or because of the
input from established technology companies such as Google, Baidu, and Tech Data,
speech recognition technology is often the concept of artificial intelligence technology
that comes to mind [1]. Speech recognition technology is indeed used in a large
number of specific scenarios in the language learning process [2]. However, if the role
and value of speech recognition technology is not understood accurately enough, it
would be biased to even expect that relying on speech recognition technology can
solve the challenges of language learning intelligence [3]. The key to speech
recognition is recognition. No matter how high the recognition degree and accuracy,
the ultimate goal is to recognize what the learner has said and display the specific
text. This function and process, however, is not strictly pedagogical [4]. That is to say,
the result of the recognition is simply a textual result, and it does not yet address the
really important matter of how to improve the quality of what is being said. For
language learners, the required results and value are much greater than for
conventional translation tools [5]. This means that even if 100% recognition accuracy
can be achieved, at best it will enable fast and accurate translation or presentation,
and will not provide learners with methods and suggestions for learning and
improvement [6].
In this paper, LSTM network is used to extract character level features from text
data to capture important information and patterns in the text. LSTM is used to create
joint vector representations and the structure and functionality of LSTM units are
described. LSTM network is used to design speech recognition system to recognize
and understand the speech content in the speech signal. Generators and
discriminators are used in speech signal processing to improve the recognition
accuracy and STM network is used to achieve the training objectives to improve the
performance and effectiveness of speech signal processing. The present generative
model is created to be used for tasks such as natural language generation. In
addition, the innovation of this paper is the use of LSTM networks to create a text
generation model, which is potentially valuable for natural language generation tasks.
This model can be used to generate natural language text such as articles,
comments, or conversations, which is expected to have a wide range of applications
in the field of automated writing and chatbots.
2. LITERATURE REVIEW
Rasulova, Z emphasizes the importance of studying the processes and
mechanisms of translation, referring to the methodological and psychologist's view
that the issue of translation skills and their formation has an important place in
translation theory and practice. It is shown that when studying translation, it is
important to focus not only on the outcome of the translation, but also to delve into the
skills and strategies of the translator and how these skills are formed [7]. Braithwaite,
B suggests that there is a rapidly growing scholarly interest in sign languages of the
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
60
1. INTRODUCTION
Perhaps because of the wider range of application scenarios, or because of the
input from established technology companies such as Google, Baidu, and Tech Data,
speech recognition technology is often the concept of artificial intelligence technology
that comes to mind [1]. Speech recognition technology is indeed used in a large
number of specific scenarios in the language learning process [2]. However, if the role
and value of speech recognition technology is not understood accurately enough, it
would be biased to even expect that relying on speech recognition technology can
solve the challenges of language learning intelligence [3]. The key to speech
recognition is recognition. No matter how high the recognition degree and accuracy,
the ultimate goal is to recognize what the learner has said and display the specific
text. This function and process, however, is not strictly pedagogical [4]. That is to say,
the result of the recognition is simply a textual result, and it does not yet address the
really important matter of how to improve the quality of what is being said. For
language learners, the required results and value are much greater than for
conventional translation tools [5]. This means that even if 100% recognition accuracy
can be achieved, at best it will enable fast and accurate translation or presentation,
and will not provide learners with methods and suggestions for learning and
improvement [6].
In this paper, LSTM network is used to extract character level features from text
data to capture important information and patterns in the text. LSTM is used to create
joint vector representations and the structure and functionality of LSTM units are
described. LSTM network is used to design speech recognition system to recognize
and understand the speech content in the speech signal. Generators and
discriminators are used in speech signal processing to improve the recognition
accuracy and STM network is used to achieve the training objectives to improve the
performance and effectiveness of speech signal processing. The present generative
model is created to be used for tasks such as natural language generation. In
addition, the innovation of this paper is the use of LSTM networks to create a text
generation model, which is potentially valuable for natural language generation tasks.
This model can be used to generate natural language text such as articles,
comments, or conversations, which is expected to have a wide range of applications
in the field of automated writing and chatbots.
2. LITERATURE REVIEW
Rasulova, Z emphasizes the importance of studying the processes and
mechanisms of translation, referring to the methodological and psychologist's view
that the issue of translation skills and their formation has an important place in
translation theory and practice. It is shown that when studying translation, it is
important to focus not only on the outcome of the translation, but also to delve into the
skills and strategies of the translator and how these skills are formed [7]. Braithwaite,
B suggests that there is a rapidly growing scholarly interest in sign languages of the
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
Global South, especially those emerging in small sign language communities. Neutral
theoretical constructs about these communities and sign languages may be too
abstract and may lead to a tendency to exoticize and objectify research by ignoring
the actual needs and concerns of community members [8]. Bafoevna, N. D et al. point
out that theological linguistics emerged partly due to the fact that religions have an
important place in the social consciousness and are an integral part of any culture.
Therefore, if the religious factor is ignored, the study of language will appear
incomplete and may even become unfeasible in some cases [9]. Mizumoto, A et al.
point out that in the field of corpus linguistics, the application of RS/MA has been very
limited and confined to very few subfields. Given that corpus linguistics covers a wide
range of issues, meta-analysis is considered to have great potential as a method for
systematically synthesizing research results in the field [10]. Su, H et al. proposed a
local grammar approach to the study of non-synchronous discourse behavior in
academic texts, aiming to provide a new avenue for the study of non-synchronous
academic discourse. The local grammar approach captures the realization patterns of
discourse acts at both the lexico-grammatical and discourse semantic levels, which
helps to understand how the realization of a particular discourse act varies across
time and contexts [11]. Awad Al-Dawoody et al. selected a corpus of 60 randomly
selected research articles and used them according to Hyland's classification of
metadiscourse markers, using the AntConc.3.2.4 for qualitative and quantitative
analysis. It was found that there is a gap between Egyptian and Saudi researchers in
the use of different metadiscourse markers [12]. Chen, L et al. analyzed by binary
logistic regression based on a corpus that recently published articles were more likely
to express surprises triggered by a priori knowledge as compared to earlier published
articles. These results can be explained by the fact that surprises are heuristic in
nature and also by the pressure of academics in strategically promoting their research
directions [13]. Umarova, N. R discusses conceptual terminology which is the most
active and controversial terminology in modern linguistics, with a focus on the
importance of concepts and their linguisticization in the way that language perceives
the world, and expresses the national and cultural characteristics of the language.
Cognitive approach is one of the methods of recognizing and explaining natural
phenomena related to language through language. Cognitive linguistics is a discipline
that studies human cognitive activity. Its main aim is to determine the involvement and
share of the language system in the process of recognizing the world [14]. Hamzah,
M. H et al. objective was to conduct a linguistic literature review of the aboriginal
languages of Malaysia, using a systematic evaluation approach and focusing on the
three main aboriginal groups of Peninsular Malaysia. The study covered linguistic
subfields such as phonology, morphology, sociolinguistics, syntax, semantics,
vocabulary and grammar. Further linguistic research is clearly necessary to protect
and preserve these languages [15].
3. APPLICATION OF LSTM IN LINGUISTICS
Artificial Intelligence, and in particular LSTMs, are crucial for understanding and
processing natural language. LSTMs are a special type of recurrent neural network
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
61
especially suited for processing and predicting sequential data. In linguistics, this
means being able to efficiently process sequences of words, understand sentence
structure, and even entire texts. Language contains complex long-term dependencies,
for example the subject of a sentence may influence the verb form at the end of the
sentence. LSTM is important because it can capture these long-term dependencies
better than traditional RNNs [16]. This is crucial for understanding the meaning of text,
for language generation and translation. Another advantage of LSTM is its ability to
store and process large amounts of historical information, different languages have
different grammatical structures and expression conventions, the flexibility of LSTM
makes it a powerful tool for understanding and processing multiple languages.
3.1. APPLICATION OF LSTM IN TEXT ANALYSIS
3.1.1. EXTRACTING CHARACTER FEATURES
In natural language processing, CNNs are often used to extract text features, and some
researchers have found that using CNNs to extract character-level features can represent the
morphological features of words well [17]. Figure 1 shows the network structure for extracting
character features in the model of this paper, for example, suyimen is the Latin Viennese word
for I like. In this paper, the character vector dimension is set to 30 and is randomly initialized.
The maximum character length of each word is 50, if the maximum length is exceeded, the
first 50 letters are intercepted, and if the length is less than 50, Padding is used to make up.
The character feature representation vectors of the words are extracted through the
convolutional and maximum pooling layers. The size of the convolution kernel is 30 and the
length of the convolution kernel is 3.
Figure 1 Character feature extraction
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
62
especially suited for processing and predicting sequential data. In linguistics, this
means being able to efficiently process sequences of words, understand sentence
structure, and even entire texts. Language contains complex long-term dependencies,
for example the subject of a sentence may influence the verb form at the end of the
sentence. LSTM is important because it can capture these long-term dependencies
better than traditional RNNs [16]. This is crucial for understanding the meaning of text,
for language generation and translation. Another advantage of LSTM is its ability to
store and process large amounts of historical information, different languages have
different grammatical structures and expression conventions, the flexibility of LSTM
makes it a powerful tool for understanding and processing multiple languages.
3.1. APPLICATION OF LSTM IN TEXT ANALYSIS
3.1.1. EXTRACTING CHARACTER FEATURES
In natural language processing, CNNs are often used to extract text features, and some
researchers have found that using CNNs to extract character-level features can represent the
morphological features of words well [17]. Figure 1 shows the network structure for extracting
character features in the model of this paper, for example, suyimen is the Latin Viennese word
for I like. In this paper, the character vector dimension is set to 30 and is randomly initialized.
The maximum character length of each word is 50, if the maximum length is exceeded, the
first 50 letters are intercepted, and if the length is less than 50, Padding is used to make up.
The character feature representation vectors of the words are extracted through the
convolutional and maximum pooling layers. The size of the convolution kernel is 30 and the
length of the convolution kernel is 3.
Figure 1 Character feature extraction
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3.1.2. JOINT VECTOR REPRESENTATION
that denotes the word vector, denotes the character feature vector, and
denotes the i th linguistic feature vector, the overall input vector can be represented
as . The joint feature result is shown in Fig. 2.
Figure 2 Joint feature representation
3.1.3. LSTM CELL STRUCTURE
Figure 3 shows the basic structure of an LSTM cell, which controls the input and
output information through three special gate structures [18]
Figure 3 LSTM cell structure
Vword
Vchar
Vfi
V
=
[
Vword :Vchar :Vf1::Vf10
]
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
63
(1)
(2)
(3)
(4)
(5)
where is the Sigmoid activation function, i is the input gate, is the forgetting
gate, c is the memory cell, o is the output gate, h is the hidden layer, tanh denotes the
hyperbolic tangent activation function, W is the weight matrix, e.g., Wxi is the weight
matrix between the inputs x and the input gate, Whi is the weight matrix from the
hidden layer to the input gate, and b is the bias vector.
3.2. ROLE OF LSTM IN PROCESSING SPEECH SIGNALS
3.2.1. SPEECH RECOGNITION NETWORK DESIGN
Under the assumption that speech and noise are independent of each other, the
speech signal and the noise signal are superimposed to form a mixed speech signal
Zt, and then the mixed speech signal is transformed into a two-dimensional time-
frequency signal by a short-time Fourier transform, and then the spectral
coefficients of the speech are deduced, where M denotes the time frame
corresponding to the speech and N denotes the frequency. The spectrum
of the speech signal is obtained by the following equation:
(6)
where denotes the inner product of matrix elements and is called the
time-frequency mask, the time-frequency mask value characterizes the
interrelationships between different sources in a mixed signal, such as the target and
interfering speakers in speech separation, and the time-frequency mask Mj is
estimated by using Wiener filtering of the power amplitude spectrum, with the
following equation:
(7)
where denotes the absolute value of the matrix and is an index chosen
based on the probability distribution of the hypothesized speech, which is taken as 0.5
in this paper.
it=σ(Wx x xt+Whiht1+Wcict1+bi)
ft=σ
(
Wxf xt+Whf ht1+Wcf ct1+bf
)
ct=ftct1+ittanh(Wxc xt+Whcht1+bc)
ot=σ(Wxoxt+Whoht1+Wcoct1+bo)
ht=ottanh(ct)
σ
f
YM×N
^
Yj∈∼M×N
^
Yj=YMj
MjM×N
α
M
j=
^
Yj
α
j ^
Yjα
||
α
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
64
(1)
(2)
(3)
(4)
(5)
where is the Sigmoid activation function, i is the input gate, is the forgetting
gate, c is the memory cell, o is the output gate, h is the hidden layer, tanh denotes the
hyperbolic tangent activation function, W is the weight matrix, e.g., Wxi is the weight
matrix between the inputs x and the input gate, Whi is the weight matrix from the
hidden layer to the input gate, and b is the bias vector.
3.2. ROLE OF LSTM IN PROCESSING SPEECH SIGNALS
3.2.1. SPEECH RECOGNITION NETWORK DESIGN
Under the assumption that speech and noise are independent of each other, the
speech signal and the noise signal are superimposed to form a mixed speech signal
Zt, and then the mixed speech signal is transformed into a two-dimensional time-
frequency signal by a short-time Fourier transform, and then the spectral
coefficients of the speech are deduced, where M denotes the time frame
corresponding to the speech and N denotes the frequency. The spectrum
of the speech signal is obtained by the following equation:
(6)
where denotes the inner product of matrix elements and is called the
time-frequency mask, the time-frequency mask value characterizes the
interrelationships between different sources in a mixed signal, such as the target and
interfering speakers in speech separation, and the time-frequency mask Mj is
estimated by using Wiener filtering of the power amplitude spectrum, with the
following equation:
(7)
where denotes the absolute value of the matrix and is an index chosen
based on the probability distribution of the hypothesized speech, which is taken as 0.5
in this paper.
it=σ(Wx x xt+Whiht1+Wcict1+bi)
ft=σ(Wxf xt+Whf ht1+Wcf ct1+bf)
ct=ftct1+ittanh(Wxc xt+Whcht1+bc)
ot=σ(Wxoxt+Whoht1+Wcoct1+bo)
ht=ottanh(ct)
σ
f
YM×N
^
Yj∈∼M×N
^
Yj=YMj
MjM×N
α
Mj=
^
Yjα
j ^
Yjα
||
α
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
The generative adversarial network structure consists of two parts, the generator
(G) and the discriminator (D). In this paper, we propose a learnable time-frequency
mask generator that introduces a recursive derivation algorithm with a neural network
structure and another sparse coding layer for generating the time-frequency mask Mj
.
In particular, the generator consists of a multilayer recurrent neural network (RNN)
and a sparse coding layer, the RNN outputs to the sparse coding layer, and the output
of the sparse coding layer is the corresponding time-frequency mask M. The method
eliminates the need for subsequent processing such as signal filtering, and there is no
need for manually defining the number of layers of the neural network.
The generative adversarial network is shown in Fig. 4, where the generator acts as
an encoder (RNNdec)
through a layer of bi-directional recurrent neural network, a layer
of recurrent neural network as decoding, and a layer of feed-forward neural network
as a sparse coding layer. The output of the sparse coding layer is time-frequency
masked Mj
, which is then multiplied by the matrix elements with the mixed signal to
obtain the target speech signal. The discriminator consists of a layer of feed-forward
neural network encoder and a layer of feed-forward neural network decoder FNNdec
and outputs as values in the interval [0,1]. The generator and the discriminator are
iteratively optimized to obtain the optimal time-frequency mask Mj
, which is used to
estimate the amplitude spectrum of the target speech signal, and then combined with
the phase spectrum of the mixed signal to reconstruct the time-domain signal with a
short-time Fourier inverse transform.
Figure 4 Generates adversarial network structure
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
65
3.2.2. NPUT SPEECH PROCESSING
Let Zt be the speech time-domain signal sampled at 44.1kHz and mixed with 0dB
signal-to-noise ratio, and Zt be converted into a two-dimensional time-frequency signal
YeRMxN by the short-time Fourier transform (STFT), which is a frame-adding window
in accordance with the method of overlapping segmentation, and the window function
adopts the Hamming window, with the length of the frame being set to 23ms, and the
frame shift being set to 6ms, i.e., each frame contains N=1024 sample points, and
there is an overlap of 256 sample points between neighboring time frames. After
conversion, the time-frequency signal Y is partitioned into sub-band clusters B with
batch data (Batch size) = M/T in a time period T. The remaining frames are padded
with values of 0 so that the time dimension expands to T. In order to maintain
correlation at the articulation of speech segments, the sub-bands of the latter frame
overlap with the former by a time period .L x 2 The amplitude spectrum of
each subband b in Y is used as input to the generator, but considering that the high-
frequency portion of the sound is small in energy and relatively insensitive to human
hearing, the high-frequency portion of the sound larger than the frequency F is
ignored during the training phase, and is used as the input to
minimize the number of training parameters and to preserve the most important
information of the speech.
3.2.3. GENERATOR
After the input speech is processed to as the input to the encoder RNNenc,
RNNenc using a bi-directional RNN (Bi-GRU), the output of each time frame ht updated
with the iteration of time frames t and a residual network is superimposed as:
(8)
where is denoted as the amplitude spectral vector of the output
superimposed on at each time . The residual network facilitates faster
training.
The of each time frame in the merged time period T is denoted as
, and the overlapping time period L x 2 is subtracted to obtain the loss
, where , specifically:
(9)
where L is denoted as the time period in which the sub-bands overlap and is
merged according to the above equation to obtain .
Yin T×N
Yfilter ∈∼T ×F
||Yfilter
llhenct=ht+yfiltert
Yfilter =[yfilterT, …, yfiltert, …, yfilter1],yfiltertF
henct
ht
yfilter
t
t
henc1
tT
Henc T×(2×F)
Henc T×(2×F)
T =T(L×2)
~
H
enc =
[
henc
1+
L,henc
2+
L,,,hencT
L
]
~
Henc
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
66
3.2.2. NPUT SPEECH PROCESSING
Let Zt be the speech time-domain signal sampled at 44.1kHz and mixed with 0dB
signal-to-noise ratio, and Zt be converted into a two-dimensional time-frequency signal
YeRMxN by the short-time Fourier transform (STFT), which is a frame-adding window
in accordance with the method of overlapping segmentation, and the window function
adopts the Hamming window, with the length of the frame being set to 23ms, and the
frame shift being set to 6ms, i.e., each frame contains N=1024 sample points, and
there is an overlap of 256 sample points between neighboring time frames. After
conversion, the time-frequency signal Y is partitioned into sub-band clusters B with
batch data (Batch size) = M/T in a time period T. The remaining frames are padded
with values of 0 so that the time dimension expands to T. In order to maintain
correlation at the articulation of speech segments, the sub-bands of the latter frame
overlap with the former by a time period .L x 2 The amplitude spectrum of
each subband b in Y is used as input to the generator, but considering that the high-
frequency portion of the sound is small in energy and relatively insensitive to human
hearing, the high-frequency portion of the sound larger than the frequency F is
ignored during the training phase, and is used as the input to
minimize the number of training parameters and to preserve the most important
information of the speech.
3.2.3. GENERATOR
After the input speech is processed to as the input to the encoder RNNenc,
RNNenc using a bi-directional RNN (Bi-GRU), the output of each time frame ht updated
with the iteration of time frames t and a residual network is superimposed as:
(8)
where is denoted as the amplitude spectral vector of the output
superimposed on at each time . The residual network facilitates faster
training.
The of each time frame in the merged time period T is denoted as
, and the overlapping time period L x 2 is subtracted to obtain the loss
, where , specifically:
(9)
where L is denoted as the time period in which the sub-bands overlap and is
merged according to the above equation to obtain .
Yin T×N
Yfilter ∈∼T ×F
||Yfilter
llhenct=ht+yfiltert
Yfilter =[yfilterT, …, yfiltert, …, yfilter1],yfiltertF
henct
ht
yfilter t
t
henc1
tT
Henc T×(2×F)
Henc T×(2×F)
T =T(L×2)
~
Henc =[henc1+L,henc2+L,,,hencTL]
~
Henc
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
The introduced recursive derivation algorithm generates temporary variables
continuously and recursively through the encoder until the convergence
criterion is satisfied, which is the mean-square error LMSE between neighboring
valuations of the temporary variables and the threshold is Let the maximum
number of iterations be iter, and denotes the training function of the decoder
RNN.
After the decoding converges, it is passed to the sparse coding layer that
generates the time-frequency mask Mj and shares the sparse coding layer weight
parameter for each time period T:
(10)
The modified linear unitary function is defined as follows:
(11)
where ReLU is a segmented linear function that sets all negative values to 0 while
positive values remain constant, a setting known as unilateral inhibition, which gives
the neurons sparse activation, and the sparsification process is done to improve
interference suppression while restoring the frequency dimensions to the target
speech signal frequency dimension N. is the weight coefficients
matrix for the feed-forward neural network, and is the corresponding
deviation.
The amplitude spectrum of the target speech signal is obtained
by the encoder and decoder defined earlier with the following equation:
(12)
where is the real input to the generator.
3.2.4. DISCRIMINATORS
The time-frequency mask generated by the generator contains perturbations from
the noise signal, and the discriminator plays a role in noise reduction by determining
the true and false speech signals, so that the generated signal Й constantly
approximates the target speech signal [19-20]. The discriminator consists of the
codecs of feedforward neural networks FFNenc and FFNdec. The inputs are divided into
Hj
dec
R NNdec
Hj
dec
τterm
funcj
dec
Hj
dec
~
Mj=Re
LU
(
H
jdec
W
mask +
b
mask )
Re
LU(x) =
{x
if
x> 0
0
if
x< 0
Wmask ϵ(2×F)×N
bmask ∈∼N
^
Y
j
filter ∈∼T×N
^
Y
j
filter =Yfilter ~
Mj
Yfilter =[yinL, …, yinTL
]
Yfilter
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
67
two types, one is the speech signal and the mixed signal Yin generated by the
generator, and the other is the real speech signal Yj and the mixed signal Yin, and the
inputs are merged into . FFNenc and FNNdec share the weight parameters through
the time period T. The output of the discriminator is:
(13)
where and denote the weight coefficient matrices of
feedforward neural networks FFNenc and FFNdec
, respectively, with corresponding
deviations of and .
3.2.5. TRAINING OBJECTIVES
Based on the input of the generator as well as the input of the discriminator, the
objective function is adjusted to:
(14)
Where Yj is the real speech signal, Yin is the input mixed signal, and G(z)
is the
generated speech signal. The input to the discriminator is not only the original speech
signal ri and the corresponding signal generated by the generator Yj
, but also an
additional mixed signal Zt obtained by short-time Fourier transformation of the time-
frequency signal Yin, Yin which constrains the generation direction of the generator. The
GAN network enables the generated speech signal not only to approximate the
probability distribution of the target speech signal, but also learns the spectral
structure of the audio signals in this environment.
3.3. CREATING TEXT GENERATION MODELS USING LSTMS
The traditional machine translation model only associates the learned expression of
the last word with the current word to be predicted for translation, whereas the
addition of the attention mechanism associates the learned expression of each word
at the source language end with the current word to be predicted for translation.
Compared with the traditional machine translation, the effect of the model after adding
the attention mechanism is significantly improved, two LSTM classification models,
one is to use the output of the last moment of the LSTM as a higher level of
representation, and the other is to average all the moments of the LSTM output as a
higher level of representation. Both of these representations have certain defects, the
first one is missing the previous output information, and the other averaging does not
reflect the different importance of the output information at each moment. In order to
solve this problem, the Attention mechanism is introduced, and the LSTM model is
improved in this paper, and the LSTM-Attention model is shown in Figure 5.
^
Yj
filter
Yj
concat
Re
al / Fake =ReLU
(
ReLU
(
Yj
conat
Wenc +benc
)
Wdec +bdec
)
Wenc∈∼2N×(N/2)
Wdecϵ(N/2)×1
benc N/2
bdec 1
min
G
max
D
VCGAN (G,D)=E
[
logD(Yj,Yin )
]
+E
[
log
(
1D
(
G(Yin ),Yin
))]
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
68
two types, one is the speech signal and the mixed signal Yin generated by the
generator, and the other is the real speech signal Yj and the mixed signal Yin, and the
inputs are merged into . FFNenc and FNNdec share the weight parameters through
the time period T. The output of the discriminator is:
(13)
where and denote the weight coefficient matrices of
feedforward neural networks FFNenc and FFNdec, respectively, with corresponding
deviations of and .
3.2.5. TRAINING OBJECTIVES
Based on the input of the generator as well as the input of the discriminator, the
objective function is adjusted to:
(14)
Where Yj is the real speech signal, Yin is the input mixed signal, and G(z) is the
generated speech signal. The input to the discriminator is not only the original speech
signal ri and the corresponding signal generated by the generator Yj, but also an
additional mixed signal Zt obtained by short-time Fourier transformation of the time-
frequency signal Yin, Yin which constrains the generation direction of the generator. The
GAN network enables the generated speech signal not only to approximate the
probability distribution of the target speech signal, but also learns the spectral
structure of the audio signals in this environment.
3.3. CREATING TEXT GENERATION MODELS USING LSTMS
The traditional machine translation model only associates the learned expression of
the last word with the current word to be predicted for translation, whereas the
addition of the attention mechanism associates the learned expression of each word
at the source language end with the current word to be predicted for translation.
Compared with the traditional machine translation, the effect of the model after adding
the attention mechanism is significantly improved, two LSTM classification models,
one is to use the output of the last moment of the LSTM as a higher level of
representation, and the other is to average all the moments of the LSTM output as a
higher level of representation. Both of these representations have certain defects, the
first one is missing the previous output information, and the other averaging does not
reflect the different importance of the output information at each moment. In order to
solve this problem, the Attention mechanism is introduced, and the LSTM model is
improved in this paper, and the LSTM-Attention model is shown in Figure 5.
^
Yj
filter
Yj
concat
Re al / Fake =ReLU(ReLU(Yj
conat Wenc +benc )Wdec +bdec )
Wenc∈∼2N×(N/2)
Wdecϵ(N/2)×1
benc N/2
bdec 1
min
G
max
D
VCGAN (G,D)=E[logD(Yj,Yin )]+E[log(1D(G(Yin ),Yin ))]
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
The input sequence in the figure is the vector representation of each word of a text
segmentation , and each input is passed into the LSTM unit to get the
output of the corresponding hidden layer . Here, Attention is introduced
in the hidden layer, and the probability distribution value of the attention assigned to
each input is calculated , and the idea is to compute the proportion of
the matching score of the output of the hidden layer and the whole text representation
vector to the overall score at that moment, the formula is as follows:
(15)
where ht is the output state of the hidden layer at the i nd moment, and can be
regarded as a text representation vector one level higher than the word. As mentioned
above, both text representation methods have defects, so here is randomly
initialized as a parameter to be gradually updated during the training process.
represents the score of the i th hidden layer output hi in the text
representation vector , the larger the score, the greater the attention of the input word
in the text at this moment, the formula is as follows:
(16)
Where is the weight matrix, b is the bias, and tahn is the nonlinear activation
function. After obtaining the value of the probability distribution of attention at each
moment, the feature vector v containing the text information is calculated as follows:
(17)
Finally, the softmax function is utilized to obtain the prediction category as, which is
calculated as follows:
(18)
In this paper, we use the gradient descent method to train the model, and gradually
update the parameters of the model by calculating the gradient of the loss function,
and finally reach the convergence. In order to make the objective function converge
more smoothly, and also to improve the efficiency of the algorithm, only a small
number of samples are taken for training each time. The model uses the cross-
entropy loss function, and the calculation formula is as follows:
(19)
where is the actual category label value and yi is the predicted category label
value calculated using the softmax function
x0,x1,x2,,xt
h0,h1,h2,,ht
α0,α1,α2,,αt
αi,j[0, t]
α
i=
exp(score(¯
h,h
i
))
j exp
(
score(¯
h,hj)
)
¯
h
¯
h
score(¯
h,hi)
¯
h
score (¯
h,hi)=wTtanh(W¯
h+Uhi+b)
w,W,U
v
=
t
i=0
αih
i
y= softmax(Wvv+bv)
H
y
(y)=
i
y
i
logyi
y
i
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
69
Figure 5 LSTM-Attention model
4. PROSPECTIVE ANALYSIS OF ARTIFICIAL
INTELLIGENCE IN LINGUISTIC RESEARCH
4.1. QUALITY OF TRANSLATION IN DIFFERENT LANGUAGES
In order to verify the diagnostic ability of LSTM for language translation system, in
the experiment, the LSTM-based artificial intelligence system is applied to different
language translations to examine the ability of the diagnostic system in revealing the
translation quality, strengths, weaknesses and characteristics of the translation
system. The language translation systems that participated in the experiment included
three statistical language translation systems, a rule-based language translation
system that included diagnostic scores for each linguistic category at the lexical and
phrase levels, a lexical category group that included all lexical categories, and a
phrase category group that included all phrase-level category scores, system-level
scores, and system-level scores computed using BLEU. Here, the small size of the
test corpus resulted in a small number of sentence-level detection points with low
reliability, so they were not considered for the time being. The first column in the table
is the name of the diagnostic category or group of categories. The second and third
columns are the diagnostic scores from System A and System B, respectively. The
fourth column is the Paired t-statistic significance test score from the scores of the two
systems. This score was obtained by repeating the experiment on a random subset of
the test set 134). In this experiment, a Paired t-statistic value greater than 2.17 would
indicate that the difference between the two scores is significant (>95%). The fifth
column is the standard deviation of the diagnostic scores for Systems A and B. The
sixth column is the 95% confidence interval for the diagnostic scores of System A and
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
70
Figure 5 LSTM-Attention model
4. PROSPECTIVE ANALYSIS OF ARTIFICIAL
INTELLIGENCE IN LINGUISTIC RESEARCH
4.1. QUALITY OF TRANSLATION IN DIFFERENT LANGUAGES
In order to verify the diagnostic ability of LSTM for language translation system, in
the experiment, the LSTM-based artificial intelligence system is applied to different
language translations to examine the ability of the diagnostic system in revealing the
translation quality, strengths, weaknesses and characteristics of the translation
system. The language translation systems that participated in the experiment included
three statistical language translation systems, a rule-based language translation
system that included diagnostic scores for each linguistic category at the lexical and
phrase levels, a lexical category group that included all lexical categories, and a
phrase category group that included all phrase-level category scores, system-level
scores, and system-level scores computed using BLEU. Here, the small size of the
test corpus resulted in a small number of sentence-level detection points with low
reliability, so they were not considered for the time being. The first column in the table
is the name of the diagnostic category or group of categories. The second and third
columns are the diagnostic scores from System A and System B, respectively. The
fourth column is the Paired t-statistic significance test score from the scores of the two
systems. This score was obtained by repeating the experiment on a random subset of
the test set 134). In this experiment, a Paired t-statistic value greater than 2.17 would
indicate that the difference between the two scores is significant (>95%). The fifth
column is the standard deviation of the diagnostic scores for Systems A and B. The
sixth column is the 95% confidence interval for the diagnostic scores of System A and
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
B, accurate only to 0.01 due to space space. As can be seen by the BLEU scores.
System B is 0.005 points higher than System A.
The translation system diagnostic results are shown in Table 1, where the
difference in diagnostic scores between the two systems on the lexical category
groups is not significant. On the diagnostic scores for each linguistic category at the
lexical level, the two also have their own distinctions, and there is no obvious
advantage for either one. However, on the phrase category group, the score
advantage of System B, or LSTM, was more pronounced, and on the diagnostic
scores for each linguistic category at the phrase level, System LSTM was higher than
System A across the board, especially on the discontinuous distant phrase category.
This result shows the advantage of System B in dealing with complex phrases and
distant relations, an advantage that comes from recurrent neural network-based
processing. Paired t-statistic statistics also show that the differences between the two
systems are significant for all diagnostic scores. This comparison shows that the
diagnostic system accurately captures the microscopic differences and commonalities
between two systems with very similar macroscopic performance.
Table 1 Diagnostic results of translation system
System
ASystem B T
Score
Score
variance (A/B)
95%
confidence
interval
(A/B)
Lexical level
Ambiguous word
0.59 0.59 2.88 0.00/0.00
0.58-0.61/0.
58-0.61
Neologism
0.18 0.19 5.56 0.03/0.03
Idiom
0.19 0.23 13.38 0.04/0.04
Noun
0.59 0.59 2.68 0.00/0.00
Verb
0.51 0.51 9.41 0.00/0.00
Adjective
0.58 0.55 17.43 0.01/0.02
Pronoun
0.75 0.73 13.49 0.02/0.02
Adverb
0.53 0.54 7.11 0.01/0.01
Preposition
0.65 0.64 6.21 0.01/0.01
Quantifier
0.58 0.57 4.68 0.02/0.02
Reduplicated word
0.33 0.39 9.86 0.10/0.08
Match
0.66 0.65 8.07 0.01/0.01
Phrase level
Subject-predicate
collocation
0.51 0.51 7.36 0.01/0.01
Predicate-object
collocation
0.41 0.41 15.52 0.01/0.01
Interobject collocation
0.44 0.51 9.51 0.01/0.01
Quantifier collocation
0.51 0.51 3.56 0.01/0.01
Azimuth collocation
0.52 0.53 2.83 0.03/0.04
Category group
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
71
4.2. IDENTIFICATION ACCURACY
In this paper, we use the BIO annotation specification, and the named entity
category includes three categories, person name, organization name and place name.
In order to determine whether this linguistic feature is useful for Uyghur named entity
recognition, the four features Pos1-Pos4 are added to the LSTM intelligent model at
the same time, which is used to compare whether the addition of the Pos4 feature, is
helpful for the overall named entity recognition task. The affixed lexical features are
shown in Table 2.It can be seen that, in terms of the F1 value, the addition of all of
them improves the lexical features to some extent. There is an improvement of 0.5.
Table 2 affix characteristics /%
After Table 2, it is found that linguistic features can improve the language named
entity recognition accuracy, therefore, all the linguistic features will be added, and the
comparison experiments with Pos1-Pos4 features and Suffix1-Suffix4 features will be
conducted, and the comparison of linguistic features is shown in Table 3. The final F1
value is improved by 3.9%, which fully indicates that for complex morphological
languages, adding linguistic features can improve named entity recognition accuracy.
Vocabulary
0.48 0.48 8.03 0.01/0.01
Phrase
0.47 0.49 13.97 0.01/0.01
System level
Department of
linguistics
Class score
0.42 0.43 16.51 0.00/0.00
BLEU series
Class score
0.35 0.36 7.91 0.00/0.00
Trait P R F1
Just_token 75.8 74.7 75.3
Pos1 76.7 74.7 75.6
Pos2 76.4 75.0 75.7
Pos3 74.4 75.8 75.4
Pos4 75.6 73.0 74.3
Pos1-Pos4 76.2 75.5 75.9(0.5)
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
72
4.2. IDENTIFICATION ACCURACY
In this paper, we use the BIO annotation specification, and the named entity
category includes three categories, person name, organization name and place name.
In order to determine whether this linguistic feature is useful for Uyghur named entity
recognition, the four features Pos1-Pos4 are added to the LSTM intelligent model at
the same time, which is used to compare whether the addition of the Pos4 feature, is
helpful for the overall named entity recognition task. The affixed lexical features are
shown in Table 2.It can be seen that, in terms of the F1 value, the addition of all of
them improves the lexical features to some extent. There is an improvement of 0.5.
Table 2 affix characteristics /%
After Table 2, it is found that linguistic features can improve the language named
entity recognition accuracy, therefore, all the linguistic features will be added, and the
comparison experiments with Pos1-Pos4 features and Suffix1-Suffix4 features will be
conducted, and the comparison of linguistic features is shown in Table 3. The final F1
value is improved by 3.9%, which fully indicates that for complex morphological
languages, adding linguistic features can improve named entity recognition accuracy.
Vocabulary
0.48
0.48
8.03
0.01/0.01
Phrase
0.47
0.49
13.97
0.01/0.01
System level
Department of
linguistics
Class score
0.42
0.43
16.51
0.00/0.00
BLEU series
Class score
0.35
0.36
7.91
0.00/0.00
Trait
P
R
F1
Just_token
75.8
74.7
75.3
Pos1
76.7
74.7
75.6
Pos2
76.4
75.0
75.7
Pos3
74.4
75.8
75.4
Pos4
75.6
73.0
74.3
Pos1-Pos4
76.2
75.5
75.9(0.5)
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
Table 3 affix characteristics /%
4.3. VALIDATION OF CREATIVE WRITING SKILLS
In order to facilitate statistical analysis, before conducting exploratory factor
analysis, the suitability of factor analysis of questionnaire data N=727 was tested by
KMO and Bartlett's test of sphericity, and the results showed that KMO=0.98 (>0.9),
good level. LSTM was used to extract the common factors from the questionnaire
data and the final factor loading matrix was obtained by the maximum variance
method with orthogonal rotation, Table 4 shows the results of total variance
interpretation of writing strategies. Five factors were extracted using the writing
strategy, and the eigenvalues of each factor reached an acceptable value greater than
1. The cumulative variance contribution of the five factors was 66.5%, which is a
desirable level of more than 60%. The common degree of each item, except R40, is
greater than 0.5, and the factor loadings have reached 0.4 or more, indicating that the
five factors extracted by the AI are all valid and can explain writing strategy ability
better.
Table 4 Interprets the total variance of writing strategies
5. CONCLUSION
In this paper, LSTM was used as the main tool to explore several aspects in the
field of linguistics, including text analysis, speech signal processing and text
generation. The suitability test (KMO=0.98) indicated that the data were at a good
Trait P R R
Just_token 75.8 74.7 74.7
Pos1-Pos4 76.2 75.5 75.5
Suffix1-Suffix4 78.6 75.0 75.0
All_feature 77.5 81.1 81.1
Inicial eigenvalue
Sum of squares of factor loads
Divisor Total Variance
%
Accumul
ate to % Total Variance
%
Accumulate
to %
1 25.8 52.7 52.7 8.1 16.6 16.6
2 2.1 4.4 57.1 8.1 16.5 33.1
3 1.9 4.1 61.2 6.5 13.4 46.5
4 1.4 2.9 64.1 5.9 12.2 58.7
5 1.1 2.4 66.5 3.7 7.7 66.5
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
73
level and suitable for factor analysis. Five common factors were extracted from the
questionnaire data using the LSTM method, and the results showed that these five
factors had high eigenvalues with a cumulative variance contribution rate of more than
60% of the desirable level, which indicated that these factors were able to explain the
writing strategy ability better. In addition, the common degree of each item is greater
than 0.5, and the factor loadings are all above 0.4, which further verifies the validity of
these five factors extracted by AI. In addition, the article uses the BIO annotation
specification for named entity recognition, which classifies named entities into three
categories: personal names, institutional names, and place names. By adding the affix
lexical features to the LSTM intelligent model, the results show some improvement in
the F1 value, indicating that these features are helpful for the Uyghur named entity
recognition task, which provides a strong support and innovation for the application of
artificial intelligence in linguistic research.
ACKNOWLEDGMENTS
1. This research was supported by the funding of the following research project:
Exploration on the Reform of College English Grammar Teaching by
Educational Informationization (No.JZ180077).
2. This research was supported by the funding of the following research project:
Corpus-assisted English Grammar Teaching Innovation (No.2018CG02644).
3. This research was supported by the funding of the following research project:
An Innovative Model of Blended English Teaching by SPOC (No.
FJJKCGZ18-793).
REFERENCES
(1)
Yang, L., Fan, Z., & Zhou, J. (2022). Borderless Fusion Financial Management
Innovation Based on Speech Recognition Technology. Scientific Programming.
(2)
Dokuz, Y. , & Tufekci, Z. . (2020). Mini-batch sample selection strategies for deep
learning based speech recognition. Applied Acoustics,171.
(3)
Ho, N. H., Yang, H. J., Kim, S. H., & Lee, G. (2020). Multimodal approach of speech
emotion recognition using multi-level multi-head fusion attention-based recurrent neural
network. IEEE Access, 8, 61672-61686.
(4) Tsunemoto, A., Trofimovich, P., & Kennedy, S. (2023). Pre-service teachers’ beliefs about
second language pronunciation teaching, their experience, and speech assessments.
Language Teaching Research, 7(1), 115-136.
(5)
Hyland Bruno, J., Jarvis, E. D., Liberman, M., & Tchernichovski, O. (2021). Birdsong
learning and culture: analogies with human spoken language. Annual review of
linguistics, 7, 449-472.
(6) Bernardo, M. L. P. (2022). Localizing theory in a Spanish-language translation program.
Teaching Literature in Translation: Pedagogical Contexts and Reading Practices, 262.
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
74
level and suitable for factor analysis. Five common factors were extracted from the
questionnaire data using the LSTM method, and the results showed that these five
factors had high eigenvalues with a cumulative variance contribution rate of more than
60% of the desirable level, which indicated that these factors were able to explain the
writing strategy ability better. In addition, the common degree of each item is greater
than 0.5, and the factor loadings are all above 0.4, which further verifies the validity of
these five factors extracted by AI. In addition, the article uses the BIO annotation
specification for named entity recognition, which classifies named entities into three
categories: personal names, institutional names, and place names. By adding the affix
lexical features to the LSTM intelligent model, the results show some improvement in
the F1 value, indicating that these features are helpful for the Uyghur named entity
recognition task, which provides a strong support and innovation for the application of
artificial intelligence in linguistic research.
ACKNOWLEDGMENTS
1. This research was supported by the funding of the following research project:
Exploration on the Reform of College English Grammar Teaching by
Educational Informationization (No.JZ180077).
2. This research was supported by the funding of the following research project:
Corpus-assisted English Grammar Teaching Innovation (No.2018CG02644).
3. This research was supported by the funding of the following research project:
An Innovative Model of Blended English Teaching by SPOC (No.
FJJKCGZ18-793).
REFERENCES
(1) Yang, L., Fan, Z., & Zhou, J. (2022). Borderless Fusion Financial Management
Innovation Based on Speech Recognition Technology. Scientific Programming.
(2) Dokuz, Y. , & Tufekci, Z. . (2020). Mini-batch sample selection strategies for deep
learning based speech recognition. Applied Acoustics,171.
(3) Ho, N. H., Yang, H. J., Kim, S. H., & Lee, G. (2020). Multimodal approach of speech
emotion recognition using multi-level multi-head fusion attention-based recurrent neural
network. IEEE Access, 8, 61672-61686.
(4) Tsunemoto, A., Trofimovich, P., & Kennedy, S. (2023). Pre-service teachers beliefs about
second language pronunciation teaching, their experience, and speech assessments.
Language Teaching Research, 7(1), 115-136.
(5) Hyland Bruno, J., Jarvis, E. D., Liberman, M., & Tchernichovski, O. (2021). Birdsong
learning and culture: analogies with human spoken language. Annual review of
linguistics, 7, 449-472.
(6) Bernardo, M. L. P. (2022). Localizing theory in a Spanish-language translation program.
Teaching Literature in Translation: Pedagogical Contexts and Reading Practices, 262.
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
(7)
Rasulova, Z. (2022). TRANSLATION CONCEPTS IN THE CONTEXT OF MODERN
LINGUISTIC RESEARCH. International Bulletin of Applied Science and Technology,
2(11), 161-165.
(8) Braithwaite, B. (2020). Ideologies of linguistic research on small sign languages in the
global South: A Caribbean perspective. Language & Communication, 74, 182-194.
(9) Bafoevna, N. D., & Ikromdjonovna, K. N. (2023). The Main Directions of Theo linguistic
Research In Modern Linguistics. Journal of Survey in Fisheries Sciences, 10(2S),
2127-2136.
(10) Mizumoto, A., Plonsky, L., & Egbert, J. (2021). Meta-analyzing corpus linguistic research.
In A practical handbook of corpus linguistics (pp. 663-688). Cham: Springer International
Publishing.
(11)
Su, H., Zhang, Y., & Lu, X. (2021). Applying local grammars to the diachronic
investigation of discourse acts in academic writing: The case of exemplification in
Linguistics research articles. English for Specific Purposes, 63, 120-133.
(12) Awad Al-Dawoody Abdulaal, M. (2020). A cross-linguistic analysis of formulaic language
and meta-discourse in linguistics research articles by natives and Arabs: Modeling
Saudis and Egyptians. Arab World English Journal (AWEJ) Volume, 11.
(13)
Chen, L., & Hu, G. (2020). Surprise markers in applied linguistics research articles: A
diachronic perspective. Lingua, 248, 102992.
(14) Umarova, N. R. (2021). A linguistic approach to conceptual research. ASIAN JOURNAL
OF MULTIDIMENSIONAL RESEARCH, 10(4), 62-66.
(15)
Hamzah, M. H., Halim, H. A., Bakri, M. H. U. A. B., & Pillai, S. (2022). Linguistic
Research on the Orang Asli Languages in Peninsular Malaysia. Journal of Language and
Linguistic Studies, 18, 1270-1288.
(16) Oh, Y. R., Park, K., Jeon, H. B., & Park, J. G. (2020). Automatic proficiency assessment
of Korean speech read aloud by non-natives using bidirectional LSTM-based speech
recognition. Etri Journal, 42(5), 761-772.
(17)
Hou, W., Wang, J., Tan, X., Qin, T., & Shinozaki, T. (2021). Cross-domain speech
recognition with unsupervised character-level distribution matching. arXiv preprint
arXiv:2104.07491.
(18) Santoso, J., Setiawan, E. I., Purwanto, C. N., Yuniarno, E. M., Hariadi, M., & Purnomo,
M. H. (2021). Named entity recognition for extracting concept in ontology building on
Indonesian language using end-to-end bidirectional long short term memory. Expert
Systems with Applications, 176, 114856.
(19) Peng, L., Fang, S., Fan, Y., Wang, M., & Ma, Z. (2023). A Method of Noise Reduction for
Radio Communication Signal Based on RaGAN. Sensors, 23(1), 475.
(20) Budinsky, R. , Ozmeral, E. J. , & Eddins, D. . (2023). The impact of hearing aid user's
own voice on device signal processing. The Journal of the Acoustical Society of
America.
ABOUT THE AUTHOR
Shaohua Jiang is working as a lecturer of School of Humanities, Fujian University
of Technology. His research is focused within the fields of English Language
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
75
Education, Translation Education under the heading Smart Education and Artificial
Intelligence.
Zheng Chen is an Associate Professor at the Department of Foreign Languages,
Concord University College, Fujian Normal University. Her research is focused within
the fields of English Language Education and American Literature Studies under the
heading Artificial Intelligence.
https://doi.org/10.17993/3ctecno.2024.v13n1e45.57-76
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.45 | Iss.13 | N.1 April - June 2024
76