April 2020 Bank Secrecy Act / Anti-Money…
A roadmap of an AI use case from POC to industrialization in the corporate real estate industry
The corporate real estate industry has this specificity that most of its business activities rely on a single lifetime event which is the relocation of a company. This major event encompasses numerous business transactions from various industries: land ownership, brokerage, investments, facility management, energy procurement, company restaurants. These transactions have generally been built upon long-time relationship between players, and a sharp sense of business rather than on advanced analytics. Interestingly enough, the real estate industry has been widely overlooked by the rise of AI technologies in the last decade. Maybe because it is a long-time genuine business or because there were not any obvious applications of AI. But nowadays, market data consolidation and increased competition drive real estate players to find new sources of revenue and leverage AI to this end. These initiatives are labelled under the term PropTech.
One of the most simple and effective use cases of PropTech is predicting corporate move to anticipate subsequent business transactions. If you are a corporate real estate player, most of your revenue depends on this key one-time event. To take advantage from this event means approaching your prospect ahead of this event within the right timeframe. Not too early in his thought journey for a potential move, but not too late so that he still has options.
In this context, obtaining the news of a future move at the right moment ahead offers a major competitive advantage. This AI application is very straightforward both to explain and to assess, which makes it widely successful among stakeholders.
In our journey to implement and deploy this use case for one of the biggest corporate broker of the market, we started with a pilot for a limited scope. This pilot was initially designed to serve the purpose of a POC which is to validate the use case relevance. But it also arose as a set up phase preceding a potential industrialization of the application. Once the webscraping scripts, the data transformation pipeline, and the machine learning model were built, they could easily be generalized to a wider scope. This industrialization process provided us however with some insightful takeaways that made this project an excellent learning example of a data science product implementation from end to end. Firstly, because it involves all the key stones of the data value chain. Secondly, because it proved to be a successful industrialization by design. Thirdly, because it brought up some of the most topical issues of machine learning productionizing.
Our solution fairly utilizes all the variety of skills from the so-called data value chain. First, the data acquisition process stands for a major part of the added value of the product. The external data we collect through webscraping, the process of automated and massive extraction of online data, includes corporate news, real estate local market data, and most importantly, private financial statements, which provide an accurate understanding of a company’s state.
Once collected, external data is mapped together with internal sources, and then analyzed and transformed to ensure it respects the quality level required for any following business purpose. Modeling is performed using a state-of-the-art machine learning algorithm to achieve a high level of prediction accuracy. Standard tuning and validation processes are used for model approval. Finally, data visualization and deployment are the last project stages which grant the end-users adequate access to the predictions through either a user-friendly interface or a dedicated API.
With the benefit of hindsight, it is apparent that this project followed the well-known and advisable adage “think big, start small, scale fast”. Indeed, operational constraints inherent in industrializing a data product were considered and addressed early on before any development work. These challenges included long-term model performance, product maintainability, and user acceptance. Then the pilot launched on a limited geographical and sectorial scope offered a quick and solid validation of the use case relevance and feasibility. The performance was backtested with monitored business results. Finally, deploying early onto Heka, our development platform, allowed for rapid onward scaling.
In the course of industrializing our use case, we exposed some of the most challenging issues related to machine learning productionizing. First, automation is key. That means that any data product feature, from data upload, to model fine-tuning and predictions access must be allowed through a dedicated API. Therefore, when deploying on Heka, we started with building the essential APIs for the business operations. Second, black-box models have bad press in business applications. That makes AI interpretability one of the most topical research fields of machine learning. Upon request of our clients, we quickly integrated into our solution an interpretation module that provides the top factors along with their respective impacts explaining the resulting score. Third, real life models are quickly obsolete. Major crises such as the Covid-19 pandemic reshape economic relationships and corporate behaviors. We mitigated its impact on our model’s performance by shocking financial inputs or temporarily narrowing the application scope to the companies that feature sufficient performance. Specifically, when reviewing past crises consequences, we found out that struggling businesses were more likely to maintain their moving plans after a downturn, which ensured our model was still relevant for these companies following the Covid-19 crisis.
The solution automates the collection of financial, market, corporate news,etc to provide a move probability score within 12 to 24 months.