EXPEDIA GROUP TECHNOLOGY — DATA

Enhancing User Experience Through Personalisation

How Hotels.com does personalized ranking of properties on search results page

What device? Single or family travel? Hotel, Resort or Apartment? What Star rating?

Introduction

Returning users are key to any company and especially for online travel companies such as Hotels.com where enhancing the experience of these users is vital in sustaining long-term loyalty. This experience enhancement could be achieved in various ways, however, in this blog the focus would be solely on personalisation. Here, we refer to personalisation as an optimal ranking of the Search Results Page (SRP) for a user based on their previous interactions on Hotels.com. To achieve this we developed a recommender system that supplements our existing SRP’s Learning To Rank Model (LTRM).

Recommender systems objectives can be broadly characterised as modelling user preferences. A large body of previous research focussed on user-item techniques (bayesian personalised ranking and collaborative filtering), however, recently Recurrent Neural Networks (RNNs) have shown to be successful at capturing the nuances of user’s interactions (Netflix™ [1], Alibaba™ [2] etc). RNN does not rely on co-occurrence frequencies but rather capture user sequential data to predict immediate as well as distant user interactions.

Challenges

Vanilla RNNs are able to summarise the user state by taking into account the sequence of items that were interacted with in the past, without taking any contextual information into consideration. However, contextual information has shown to improve ranking performance [1-3]. For example, contextual information such as those mentioned below are beneficial in our case:

  • user A prefers to book hotels for work and resorts for family travels respectively,
  • user B stays in the same hotel when travelling to a particular destination,
  • user C prefers to book hotels with a 4 star rating or above etc.

Multiple contexts define a user’s travel and utilising all these contexts in a constructive way would help to better predict the user’s next interaction, thereby paving way to deliver a personalised experience. In the current research literature such as [1-3], contextualised RNNs were developed that incorporate and utilise one to few contexts effectively. However, these models were not able to reproduce similar success towards our case (detailed reasons are stated below). In order to tackle this, we developed a Multi Contextualised Sequence Aware Model (MCSAM) as our recommender system. The MCSAM takes into account multiple contextual information and incorporates it in the model’s intermediate layers that eventually modifies the behaviour of the RNN within it.

The main challenges that we faced when developing the MCSAM were identifying:

  • impactful contexts and
  • nonlinear functions which explicitly combines item representation and context representation.

that resulted in significant uplift in ranking performance when comparing MCSAM supplemented LTRM with LTRM only. In our case, items are properties (these two terms are interchangeably used throughout the blog) whereas the contexts that were impactful are discussed below.

Contexts

The contextual information we used can be grouped into events and augments. Events are contexts that must be triggered by the user which results in appending his/her respective user sequential data and are nullable. Whereas augments are supplemented contexts attached to events which are non-nullable.

Figure 1: Snapshot of Hotels.com SRP highlighting some of the contexts.

Some examples of events are click or book of a property, price filtering, re-sort by star rating etc. Whereas some examples of augments are number of adults and children, time of an event occurring, length of stay etc. We utilised 14 different contexts and all of them were represented as one-hot encoded vectors apart from time (numerical).

Model Architecture

MCSAM was developed to incorporate multiple contextual representations with property representation. Figure 2 below illustrates the architecture of MCSAM.

Figure 2: Pairwise architecture of Multi Contextualised Sequence Aware Model (MCSAM). Yp and Yn represents positive and negative property samples respectively. Light and dark blue boxes in the embedding layer represents property and contexts embeddings respectively (all contexts embeddings are pictorial categorised as one for clarity purpose). Annotation x and - represents dot product and subtraction.

Input and Embedding Layer

The input layer consists of inputs at a search level per user, i.e. positive and negative property samples, historical properties and contexts (both are sequential with H length) and current contexts. Current contexts are obtained from a current search (some augments can be known at real-time such as number of adults, device used etc.). For each search, we define the positive sample as booked property and negative samples as properties above the highest ranked clicked property. All the inputs were passed through embedding layers except for time as it was numerical (not distinguished on Figure 2 for clarity).

Interaction Layer

The embedded matrices of the sequential inputs are passed through the interaction layer, where property and contexts representations are combined. The common practice in the current research literature when combining item representation and context representation is applying concatenation. As mentioned above, we had a contrasting outcome when we applied it in our case. The conclusions we drew were:

  • the number of contexts in consideration in our case is higher in comparison to those in the literature (to the best of our knowledge) and
  • the number of levels in our property representation differ largely compared to that of all our contexts (approximately 100,000 fold). This resulted to approximately 5 fold difference in embedding size of property compared to all our contexts.

Due to the above two reasons, we observed that by purely using concatenation the model struggled and failed to learn and capture nonlinear patterns between property and contexts representations. This prompted us to develop an interaction layer as shown in Figure 3 below.

Figure 3: Illustration of how multiple contexts representations were combined with item representations.

The interaction layer consist of the following steps:

  1. Property-context concatenation: each embedded context representation is concatenated with the embedded property representation.
  2. Row-wise multiplication with time: each concatenated matrix is multiplied by time. Multiplication provides tighter binding of time into the concatenated matrices which allowed the capturing of similarities across the sequence [1]. In other words this was done to capture temporal behaviour and we acknowledge that there are other approaches to attain this (such as the work shown in [2]).
  3. Convolution: 2D convolution was applied to each matrix resulting from step 2 in order to obtain local nonlinear patterns within those matrices. The resulting matrix after convolution was mapped at each sequential step with a filter depth of D. Which means: i) the resulting matrix has H length and ii) at each sequential step the user’s last D previous interactions was used to extract nonlinear patterns.
  4. Concatenation: all the output matrices after convolution are concatenated before passing it through the recurrent layer.

Recurrent and Attention Layer

The final concatenated matrix from the interaction layer is passed through a standard bidirectional Gated Recurrent Unit (GRU) and global attention is further applied.

Final Layers

The final layers of MCSAM were replicated to resemble the final layers of LTRM (as much as possible). The reason being the latest output of the attention layer for each interacted user from a trained MCSAM is used as a feature within LTRM in production. Nevertheless, MCSAM was trained offline in a pairwise manner as shown in Figure 2.

On a general note, the final layers of MCSAM could be modified to suit one’s requirements. Additionally, MCSAM could be used as a standalone model to rank items (properties). The reason we opt not to do so in our case is because LTRM utilises several other features which are also crucial for ranking performance but are not adaptable within MCSAM.

Deployment

As mentioned above, in production the latest output of the attention layer for each interacted user from a trained MCSAM is uploaded in a batch fashion. This is used within LTRM as a feature to rank properties in real-time. This is how MCSAM supplements LTRM in our case.

Model Evaluation

Our model evaluation consisted of comparisons between MCSAM supplemented LTRM versus LTRM. We achieved uplifts in both offline evaluation (NDCG) and online evaluations (CVR, GP, cumulative probability of booking a property in top 10 and 20). We also observed a reduction in filter interactions online.

We discovered a discrepancy between the projected and measured online evaluations. One of the reasons for this was offline training of MCSAM on purely converted data. Additionally, when the online evaluations were broken down by user types (members vs non-member, users with only clicked history vs booked history etc) the ranking performance varies. Having a single model perform uniformly across different user types is challenging and our next effort would be to bridge this gap.

Use Cases

There are various ways to demonstrate the ranking performance of MCSAM supplemented LTRM. These two simple use cases mentioned below are constructed to do so.

Use Case 1

User A has previously clicked and booked on several 3 - 3.5 star rating properties on Hotels.com. User A is intending to visit London and Figure 4 depict what would be seen by User A on Hotels.com SRP.

Figure 4: Properties with star rating 3 - 3.5 appear twice as high up on the SRP when personalised (highlighted in blue shades).

Use Case 2

User B has previously clicked and booked on apartments on Hotels.com when visiting Singapore and Malaysia. User B is intending to visit New York and Figure 5 depict what would be seen by User B on Hotels.com SRP.

Figure 5: Apartments appear thrice as high up on the SRP when personalised (highlighted in blue shades).

Conclusion

In this blog, we discussed the end to end process we undertook to personalise the ranking of properties on the SRP of Hotels.com for returning users. We mentioned that contextual RNNs are powerful models to capture user sequential behaviour. In order to develop contextual RNNs for our user case, we needed to identify impactful contexts and a novel way to incorporate property representation with contexts representations. This led to the development of MCSAM, whose attention layer’s output is used as a feature within our existing LTRM. Our success is evident by positive offline and online ranking performance. We acknowledge that our ranking performance varies across different user types and our next objective is to bridge this gap. We believe this could be achieved by harvesting contextual information relating to these user types.

I would like to give a special mention to @christian.sommeregger for letting me work on this project and assisting me.

References

[1] Linas Baltrunas. “https://drive.google.com/file/d/1A1N7GWJd9NtLlOWCt4evqAbVcUcvk_f0/view” RecSys presentation (2019).

[2] Zhu, Yu, et al. “A brand-level ranking system with the customized attention-GRU model.” arXiv preprint arXiv:1805.08958 (2018).

[3] Smirnova, Elena, and Flavian Vasile. “Contextual sequence modeling for recommendation with recurrent neural networks.” Proceedings of the 2nd Workshop on Deep Learning for Recommender Systems (2017).

Learn more about technology at Expedia Group

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store