53
Edición Especial Special Issue Mayo 2019
DOI: http://dx.doi.org/10.17993/3ctecno.2019.specialissue2.50-67
53
1. INTRODUCTION
The digital age has provided an immense amount of data in terms of news
articles, social media data, and web LaValle, Lesser, Shockley, Hopkins &
Kruschwitz, 2014; Gharehchopogh & Khalifelu, 2011). Every day, a large amount
of data is published on the news websites, micro–blogging websites and other
information repositories (Lei, Rao, Li, Quan & Wenyin, 2014). The published
news articles reveal the events happening around the world (Lei, et al., 2014).
The challenging issue, specically, in the textual data format (i.e., news articles)
is to extract purposeful information. Manually, it is a hard task to interpret a
large collection of data (Lee, Park, Kim & No, 2013). Besides, the information
hidden in unstructured data format inherently makes it dicult processing tasks,
because it deals with natural language processing. Therefore, in the current
era of information ow, media analysts and other researchers need an easily
understandable and high–level summary of information. For instance, a media
analyst may require searching news regarding a certain topic, events happening
to a certain geo–location, and/or news events based on a timeline. These and
other such queries are objectives, which requires an ecient method to answer
such queries.
Text Analytics allows knowledge discovery and purposeful nding of information
from such a massive amount of data for investigation. The extracted knowledge can
be used for better decision–making strategies and eective resource management.
Therefore, extracting purposeful knowledge from large data having natural
language involvement is an open challenge, which acquires sophisticated methods
and algorithms to deal with it. To this aim, this research study extracts concepts
from a large number of news stories and articles. The concept extraction refers
to a meaningful sequence of words that are used to represent objects, events,
activities, entities (real or imaginary), topics or ideas, which are of interest to the
users (Parameswaran, Garcia–Molina & Rajaraman, 2010; Szwed, 2015). The
concept extraction technique is a very eective way of extracting all the possible
useful and meaningful concepts from text documents. The extracted concepts,
later, may be tagged as essential concepts and may be represented in an ecient