Data Integration in Commercial Real Estate
Although the real estate industry continues to more aggressively look to technology for better efficiency and competitive advantage, many firms have struggled to identify and develop effective solutions. There are many reasons firms are struggling, including real estate professionals’ lack of experience in data and technology, but one of the main challenges has been the availability of data needed for advanced analysis. Without a large amount of clean data, even basic uses of technology such as automation of everyday tasks becomes difficult, much less the application of advanced technologies such as artificial intelligence and machine learning capabilities.
This article will begin with a discussion about the current state of data in most real estate firms, the challenges to transitioning from a legacy approach to data to a more modern approach, and will then provide some steps firms can take to begin creating a more data-driven operational structure.
My data is mine, your data is yours
Most firms today are designed around a siloed organization model, meaning different groups and divisions specialize in very specific tasks. The people within each of those groups are extremely knowledgeable about the activities needed to successfully perform their jobs, but often know little about what happens in other divisions at anything other than a superficial level. In addition, this siloed model has led to each group saving and organizing data in a way that does not allow other groups to access this data.
As an example, imagine a large bank with a variety of real estate services, such as lending, asset management, appraisal, CMBS/RMBS primary and secondary securitization, research, and others. You would think it would be valuable for the lending group to have access to all the data in the appraisal and asset management groups to monitor the performance of various real estate markets or data from the securitization and research groups as to the state of capital markets. All of this data could be combined and analyzed using advanced technologies to provide a tremendous amount of insight that would be very valuable to the lending group in deciding which loans to make and how to price those loans. But rarely, if ever, is this the case.
Imagine five people cooking a meal and each one has a few ingredients, but they’re not allowed to share those ingredients. I wouldn’t be too excited about that meal. If they were allowed to all come together and combine their ingredients, however, I would think the meal would be much better. Most firms are keeping a few individual ingredients isolated among several groups and not allowing those ingredients to come together for a more robust meal.
To be fair, sometimes in large institutions there are legal reasons why data isn’t shared across different groups, but many times there are no legal restrictions and it’s simply a matter of many real estate firms not having the infrastructure needed for central data collection and availability. It’s also usually not a matter of being selfish with data. Most often it’s because real estate firms are still operating under a model that was used 10 or 20 years ago before data and technology became such an important part of business strategy.
We have a ton of data, we just can’t access it
As mentioned in the section above, the aggregation of data into a form that’s clean and easily accessible would allow organizations to achieve much more insight much faster than they were ever capable of before (by “clean” data we mean data where errors such as missing values, formatting errors, etc. have been corrected).
And while firms are beginning to realize the value in data availability, they’re having a tremendously difficult time accomplishing it. The siloed model is one reason for this difficulty, but another is the nature of how real estate performs activities. If you think about most people’s relationship with real estate, it involves hundreds or thousands of spreadsheets that were all built uniquely for different properties. Or it involves the review of PDF’s of sale or lease contracts that were all drafted by different attorneys individually for each transaction. Or maybe even information in the thousands of emails that have been traded back and forth between internal employees and external contacts.
it’s true that most firms have a ton of data, but it’s also true that most of them can’t access it. Spreadsheets and contracts were not used on top of a digital foundation. For example, Amazon collects data on every single product you look at, what pages you visit, how long you stay on each page, where your cursor goes, how much you buy, when you buy, what you buy, etc. The entire operation is built to track, collect, and centrally store every interaction you and the billions of other people using Amazon have with the site. This immense amount of data allows Amazon to make product recommendations to you. Netflix does the same with movie and show recommendations. Facebook and Google do the same with using data to target you for advertisements. These firms have been collecting this data for 10-20 years. Financial firms, marketing firms, the medical industry, supply and logistics firms, and many others have adopted this same approach to collecting and using data to immense benefit.
The difference between these technology firms and real estate firms is that real estate has not collected and stored the immense amount of data that the industry interacts with on a daily basis. All of this data is stored in individual Excel spreadsheets, PDF’s, emails, etc. And it’s impossible to go back and scrape all of these documents for the data that they contain.
The reason that it’s impossible to accurately collect this data retroactively is because of the uniqueness of each spreadsheet, document, and email. It is POSSIBLE to collect the data, but IMPOSSIBLE to accurately collect the data. Analysts who build spreadsheets often use different formats, titles, formulas, etc. Attorneys who draft lease and sale contracts use different formats for each contract. And each property and transaction is unique in its own right, exponentially increasing the possible locations and formats of each piece of data. Thus, we can get data out of these documents, but at an accuracy rate that is unacceptable for inclusion in data analysis systems.
So, um, we can’t use data, right?
Not yet. The real estate industry has simply not developed enough data capabilities to implement really advanced technology. It can happen, but it’s going to take some time and it’s going to take a significant effort on the part of firms and the industry as a whole. As discussed above, companies like Amazon, Google, Netflix, IBM Watson, Renaissance Technologies (if you don’t know Renaissance, look them up – it’s the best performing hedge fund of all time averaging 66% returns over a 30-year period) all spent the better part of a decade developing foundational data collection and analysis capabilities. These firms have absolutely dominated their industries and displaced existing firms who took a traditional approach to business operations and strategy. Other firms that didn’t recognize the value and didn’t provide the resources needed to achieve this level of insight often withered away.
What do we do now?
Not only can entire books be written (and have been written) about digital transformation, but each of the segments below could be their own books. We’ll only go into high level descriptions here with a very abbreviated task list. Firms should dig deep into each of the following steps and develop a plan for data integration that’s specific to your operation, needs, and resources.
The first step firms need to take is figuring out methods to collect and store all internal information that comes through. This includes all the market research, financial statements, rent rolls, comparables, sale and lease terms from legal contracts, underwriting assumptions and results, etc. And this needs to happen across every single activity that the firm performs. This step also includes cleaning the data (ensuring accuracy and completeness) to make sure it’s ready to use for deeper analysis later.
The next step is to perform a thorough analysis on what’s needed to get to an ideal state of technology implementation. This means identifying what data you have and comparing that to what data you need to answer the questions that would be most valuable to your firm. Unless your firm has in-house data science and technical expertise, this step will most likely require the help of people experienced in data science. Because most real estate professionals don’t have a background in data science and technology, they often don’t even know what they don’t know. They don’t know the techniques and approaches to data science. They don’t know the business opportunities that can be achieved by implementing machine learning or artificial intelligence methodologies. If you don’t know the tools that are available to you, it’s often impossible to even recognize what the problems or opportunities are in the first place.
The third step is to identify ways to bridge the gap between what data you have and what data you need to get to the ideal set of data. This could be developing capabilities internally, such as hiring data scientists to collect the necessary data, or looking to third party data providers. In another article we address how firms should be thinking about third party data and what questions need to be asked of the providers.
The final step is to ensure that all data (legally) is available to all functions in the firm. This is also where a large push for analysis using advanced technologies needs to occur. Firms can begin to achieve tremendous insight through machine learning and artificial intelligence with the right kind of data. This insight often sets apart one company from competitors.
Is this really important?
We’ll end this article with an excerpt from an email that Jeff Bezos (founder and CEO of Amazon) sent to his team in 2002 – long before Amazon became the behemoth it is today) to highlight the importance of proper data integration. This article mandates that all developers of their internal systems make all their data available to everyone else in the organization. This is likely one of the major reasons Amazon has grown so rapidly.
- All teams will henceforth expose their data and functionality through service interfaces.
- Teams must communicate with each other through these interfaces.
- There will be on other form of interprocess communication allowed: no direct linking, no direct reads of another’s team’s data store, no-shared memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
- It doesn’t matter what technology is used. HTTP, Corba, Pubsub, custom protocols – doesn’t matter.
- A;; service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.
- Anyone who doesn’t do this will be fired.
About the Author
Josh Panknin is a Visiting Assistant Professor of Real Estate at New York University’s Schack Institute of Real Estate and an adjunct professor in the school of engineering at Columbia University. Prior to academics, Josh was Head of Credit Modeling and Analytics at Deutsche Bank’s secondary CMBS trading desk where he helped develop and implement automated models for valuing CMBS loans and bonds. He also spent time at the Ackman-Ziff Real Estate Group and in various other roles in research, acquisitions, and redevelopment. Josh has a master’s degree in finance from San Diego State University and a master’s degree in real estate finance from New York University’s Schack Institute of Real Estate.