Md. Alauddin
Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selangor,
Ting Choo Yee
Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selangor,
Ian Tan Kim Teck
Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selangor,
Recepción: 02/08/2019 Aceptación: 24/09/2019 Publicación: 06/11/2019
Citación sugerida:
Alauddin, M., Choo Yee, T. y Kim Teck, I. T. (2019). Airline digital click stream
event processing for enriching the airline business. 3C Tecnología. Glosas de innovación
aplicadas a la pyme. Edición Especial, Noviembre 2019, 287-305. doi: http://dx.doi.
Suggested citation:
Alauddin, M., Choo Yee, T. & Kim Teck, I. T. (2019). Airline digital click stream
event processing for enriching the airline business. 3C Tecnología. Glosas de innovación
aplicadas a la pyme. Speciaal Issue, November 2019, 287-305. doi: http://dx.doi.
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254–4143
The new era of digital world with the rapid expansion of social network and mobile
applications created wider scope to expand airline industry for new way of promoting
their business. Due to several social media and other digital platforms, we need to
emphasize on target marketing/customer proling. Hence, to do target marketing,
a new web technology is created to collect each of the raw events of their web data
and mobile app data for tracking the way user is searching ights. In the proposed
method BigQuery is used to process huge volume of online customers’ data. The
proposed method is to understand the airline ecommerce online visitors eectively
by analysing the event data stream collected from various digital properties. The
obtained raw digital data consists of lot information with a semi-structured and it
needs to be cleansed before analysing it. So, the rst stage of proposed system is to
extract the data from various digital sources in real-time, then chose which data is
appropriate for analysing and nally extract the key insights to improve the airline
business. From the extracted variables, search patterns, the predictive models such as
ight search forecast, seat sales forecast and digital channel attribution models can
be developed.
Click stream processing, Big Query, Digital data processing, digital marketing, Data
Cleansing and Enrichment.
Edición Especial Special Issue Noviembre 2019
In recent years, most of the Asian airlines prime focus is on digital transformation
(O’Connell & Williams, 2005). The prime objectives of digital transformation are
to understand the online customer acquisition, digital channel attribution, online
customer segmentation, and their search trend. These are the most important
techniques to take right business action at right time to increase revenue. Most of
the airline industries have their own online and mobile based ecommerce platform,
it is possible to track and record their activities on the webpage as from which
webpage they have entered, when and what they search, where they drop o, what
they purchase, how frequently they book etc., (Klein & Loebbecke, 2000). These
visitor data can be for customer analytics like online customer prole, sales funnel to
understand at which point visitors drop o, are they price sensitive or not.
However, tracking and processing visitors’ raw events from the website logs data is
complicated because of the large volume of hit level data (One of the major Asian
airlines has about 15 million of online visitors per month, which generates roughly
3-5 billion events of unstructured or semi-structured web tracking data) (Ananthi,
2014). In this paper, the online digital click stream dataset is obtained from one of
the major Asian Airline system with 50 destinations. Each route is tracked with one
way and return ights for 30 days to 120 days. This paper mainly focus more on
the real-time digital data collection and pre-processing of the dataset for ight sales
prediction. The overall objective of the proposed work is that, the key variables are
selected from the extracted digital click stream data is to improve the airline business.
The growth of Internet around the world made airline business to change their way
of attracting the passengers (Singh & Jain, 2014). Also this digital era made to buy
tickets from anywhere in the globe at any time by comparing the dierent airlines. So
it is becoming very dicult to predict the ticket prices and attracting the passengers
becoming dicult with the inuence of many factors (Gillen & Lall, 2004). However,
data science showed a way to progress in this type of scenarios to study the patterns
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254–4143
and predict the behavior of the sales outcome. For example, it can be identifying the
correlation between seat prices of particular airlines and air trac delays. As per
recent surveys of (Forbes, 2008), it is noticed that for every minute of ight delay it
will aect the ticket prices about $1.5. Low cost airlines oer ticket pricing without
the baggage, food and beverages, which gives privileges to aord all common people
(Groves & Gini, 2013). Hofer, Windle and Dresner (2008) explained more details of
the how low cost airlines are dier from the other airlines. Lazarev (2013)
in detailed how fare variations can be inuenced in various time periods. Lazarev
designed very good model to predict optimum prices for low cost airlines to generate
almost 90% of the prot margin. In general, all the customers always think if earlier
booking ight fares might be less prices.
Based on the various studies on the airline business, the most important aspects to buy
tickets online in advance according to the user’s observation and their risk (Etzioni,
Tuchinda, Knoblock & Yates, 2003). The user who purchases their tickets online
should have a sense of control over the task they are performing over the Internet.
This helps to reduce the feeling of risk or fear associated with the possibility of:
making a mistake when making an airline booking online (that is, psychological risk);
not receiving their ticket or the ight not even existing (performance risk) (Brons,
Pels, Nijkamp & Rietveld, 2002). Several research papers described the promotions
on ticket prices, gift vouchers, airline points and upgrades, which playing indirectly
to attract the customers (Barrett, 2004; Gillen, & Lall, 2004). The majority of these
studies conclude that the incentives employed have a positive eect on airline ticket
purchase and repeat purchase and highlight that the eectiveness of the program
depends to a large extent on the particular incentive oered (Aviasales, n.d.). The
literature regarding the choice of Airlines has made it clear that both the benets
provided by frequent yer programs and air fares signicantly aect user’s choices
(Groves & Gini, 2013). Users who travel for business perceive the frequent yer
programs as more useful than other users. These authors even guarantee that business
travelers are willing to pay more in exchange for reducing access time, traveling with
top-ranked airlines, and traveling in a better class (O’Connell & Williams, 2005;
Sabre, 2015).
Edición Especial Special Issue Noviembre 2019
In recent years, most of the people in the world entered towards digital era, which
increases the ecommerce transactions in a vast manner compared to the oine. Also
the power of digital world made people to reach the world from anywhere any time
through either social media, travel blogs or meta search engine. With these available
resources, the traveller’s can see dierent travel websites, travel blogs for price
comparison before they book their ight tickets. This open lot of opportunity for
the airlines to track the travellers search patterns and predict passengers’ behaviour
using predicting models. Besides, it is also possible to nd which online channel is
more eective for which airline routes and geo location for predicting the cost per
acquisition, which in turn save lot of advertisement costs. Further, the successful
tracking of all the digital data also enable the airlines to build sales funnel of digital
products, customer life time value calculation and other predictive modelling for
digital marketing.
To collect the online digital data and analyze its patterns, ve types of variables are
considered for better prediction of seat sales, which are:
Flight Search.
The transactional, operational data are extracted using various channels such as web,
mobile and tablets in the year 2016. The collection of digital data in real-time is so
complicated process, but with the evolution of Java scripts tagging framework, it is
possible to track each web page and its components based on visitor status on the
internet. The passenger activities such as which page they search, how much time
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254–4143
they spent on each webpage, how many clicks and scrolls on each page etc. Also, the
ecommerce related information such as add to cart, product related information and
ecommerce transaction details etc. As the ight sales digital web data is very big and
complex, the data collected, cleansed and processed using cloud technology. The
implementation of digital analytics will help marketing to monitor the load factor
(%) for future ights and how traveler is choosing origin hub to destination hub and
other connecting hubs using y through (transit). Figure 1 shows the detailed block
diagram of the airline data collection from various sources and its predictive model.
Figure 1. Airline digital data processing architecture.
Airline travel visitors search ight from dierent devices such as desktop, mobile devices
and tablets. Therefore collecting the data from dierent devices is bit complicated, so
it is necessary to consider each digital properties carefully. To collect the digital data
(raw data) from various sources, a renowned tracking framework (The java script which
is modication of Google tracking framework) is used. After collecting the data and tracking
the gathered data, the user activity sends to the server for reporting and further
analysis. The system uses dierent technologies to create data hits according to the
types of digital properties. Hence, a new custom code is implemented for tracking
web and mobile app users’ activity. The proposed custom code also identies the
Edición Especial Special Issue Noviembre 2019
new users and returning users, which provides the more information to x the seat
price dynamically. Finally, the custom code is implemented for capturing the business
specic information such as Flight Search Origin, Flight Search Destination, and
Departure Date etc. Also, the web server is tracked to receive HTTP request, which
gives the details of the airline customers searching patterns. From the webserver log
the customers details (such as, computer info, the Location, hostname, the browser
type, and language they are browsing etc.,) are extracted.
In the proposed research, BigQuery is used to process high volume of customers’
digital data. BigQuery is a RESTful web service that enables interactive analysis
of massively large datasets working in conjunction with Google Storage. It is an
Infrastructure as a Service (IaaS) that may be used complementarily with Map
Reduce. BigQuery is used to process the raw data to further level. After exporting
each digital properties as raw tables, which are available in BigQuery as multiple
daily tables. BigQuery uses SQL syntax to process the raw data. Figure 2 shows the
airline ight search data processing ow. Figure 3 shows the airline online trac and
search data processing ow from all airline digital properties in a daily aggregation.
After tracking for capturing the web and mobile digital properties and the listed
attributes, the captured data is exported to BigQuery on a periodic basis.
Figure 2. The block diagram of Airline Flight Search data processing ow.
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254–4143
In general, the open source tracking code retrieves web page data as follows:
A browser requests a web page that contains the tracking code.
A JavaScript Array is created and tracking commands are pushed onto the
A <script> element is created and enabled for asynchronous loading (loading
in the background).
The ga.js tracking code is fetched, with the appropriate protocol automatically
detected. Once the code is fetched and loaded, the commands on the array
are executed and the array is transformed into a tracking object. Subsequent
tracking calls are made directly to the server.
Loads the script element to the DOM.
After the tracking code collects data, the GIF request is sent to the analytics
database for logging and post-processing.
A GIF request can be classied into few types. Table 1 shows various types of GIF
request. In each of these cases, the GIF request is identied by type in the utmt
parameter. In addition, the type of the request also determines which data is sent
to the Analytics servers. For example, transaction and item data are only sent to the
Analytics servers when a purchase is made. User, page, and system information is
only sent when an event is recorded or when a page loads and the user-dened value
is only sent when the _setVar method is called.
Table 1. GIF request types.
Request Type Description Class
Page A web page on your server is requested Interaction
An event is triggered through Event Tracking that
you set up on your site
Transaction A purchase transaction occurred on your site Interaction
Each item in a transaction is recorded with a GIF
A custom user segment is set and triggered by a