By Josh Panknin | Dezember 4, 2020

Data Quality in Real Estate Valuation


As technology begins to pervade the real estate industry, there are many new and exciting capabilities that are offered to companies and professionals that will help them do their jobs more effectively. As with any new tool, however, it’s important to understand how that new tool works and what the limitations of it might be. One of the holy grails of real estate is the accurate automated valuation model. If you can accurately automate the valuation of every property in the United States, the ability to recognize arbitrage opportunities would be seemingly endless. And that sounds amazing. Perhaps a little too amazing to be completely accurate. In this article we’ll take a look at some of the data challenges associated with developing fully automated valuation models for the real estate industry. While a full exploration into the advanced technologies, such as artificial intelligence, machine learning, etc., are beyond the scope of this article, it is crucial to understand that all of these technologies require the presence of good, quality, data in order to be effective. Below we’ll explore four qualities of real estate data that provide challenges to fully automated valuation.

Data Challenges

Compared to other industries and activities, the amount of data available for real estate analysis is small. Exacerbating this shortage of data, and described in more detail below, are four main issues:

  1. Geographic relevance of data
  2. Temporal relevance of data
  3. Qualitative data
  4. Absence of private data

Geographic relevance of data

Real estate has a spatial component to it. If I own an office building in New York City, I don’t pull comparables for analysis from Miami or Denver. I must look within the same market, and often the same submarket, for like properties. But the bigger the constraint we place on the geographic area we review, generally the smaller the number of comparable properties remain for analysis. The number of comparable properties is often in the single digits in many cases.

Temporal relevance of data

Adding to the reduction of comparable properties because of geography is the concept of temporal relevance. This means that if I’m trying to sell a property today and want to pull comparables for analysis to determine a likely value, I can’t look at sales that occurred ten years ago, or five years ago, or sometimes one year ago depending on how the market has changed over the past year. I must limit my analysis to sales or transactions (leases, refinancings, etc.) that occurred in the recent past and under similar market conditions (theoretically you could make adjustments based on the changes that have occurred in the market since a previous sale, but the more assumptions you make, the less accurate the analysis generally becomes).

Qualitative data

Qualitative data here refers to information that is subjective, such as the value of a view, a location, or some other feature with no quantitative value. This is in contrast to quantitative data such as the square footage leased, the rent paid, the number of beds/baths in an apartment unit, etc. With quantitative data, it’s relatively easy to plug a number into a specific field in a valuation model and move on to the next piece of information. But with qualitative data, this is much more difficult. While it’s easy for a human to differentiate between a great view and a bad view, it’s almost impossible for a computer to do the same thing. And even though a human can differentiate between a good view and a bad view, two humans will often place two different monetary amounts on the value of those views. And this goes for all qualitative real estate features.

Absence of private data

A thorough analysis of any commercial property involves the review of not only the physical characteristics of a property, but also the detailed financial aspects such as cash flow and rent rolls. The problem, however, is that this information is not publicly available for most properties. In my experience, about 50% of the data relevant for the analysis of a property can be found online or through other publicly available sources. The other 50% is typically held privately by the owner of a property and is not available to include in a valuation analysis. Yes, assumptions could be made, but as anyone who has ever valued a commercial property knows, slight changes in income or expenses can have a relatively significant impact on the estimated value of a property.

Why Limited Data Matters

Regardless of whether the intent is to use sophisticated technology to enhance the valuation process or the intent is to perform a traditional manual valuation analysis of a property, the absence of data is a severe challenge. Real estate is not a simple asset. When considering the number of features, or characteristics, relevant to the valuation of a property, one could easily reach into the hundreds. This includes physical characteristics, all revenue and expense line items, lease analysis for each tenant occupying the property including space, rent per square foot, rent escalations, reimbursements, early termination clauses, and a slew of other factors. It also includes macro- and micro-market analysis. All of these nuances mean that it becomes very difficult to compare any two buildings apples to apples, or feature to feature, because of the amount of potential variance in each feature. This difficulty would hold no matter how many properties you have access to. In reality, however, because of the geographic, temporal, qualitative, and privacy issues surrounding real estate data, the number of properties available for comparable analysis is extremely small.


The valuation of real estate can be a complex and difficult task to undertake even if you have every piece of relevant data at your fingertips. Yet the challenges we’ve discussed above increase this complexity significantly. And whether you’re using the most sophisticated piece of artificial intelligence software available or you have a new analyst building models in Excel, it’s easy to get a number placed on a valuation. What’s much more difficult is getting an accurate number placed on that valuation. For now it seems the most logical approach is to attempt to supplement the expertise of humans with parts of the valuation process that computers can perform well. This might mean using technology for the automation of data collection, basic analysis and mathematics, or mapping, while leaving the more subjective assumptions to experienced humans with intimate knowledge of the market and the ability to make reasoned assumptions where very little data exists. Humans can make mistakes as well, but it’s often easier to understand why a decision was made when there’s someone you can ask.

About the Author

Josh Panknin is a Visiting Assistant Professor of Real Estate at New York University’s Schack Institute of Real Estate and an adjunct professor in the school of engineering at Columbia University. Prior to academics, Josh was Head of Credit Modeling and Analytics at Deutsche Bank’s secondary CMBS trading desk where he helped develop and implement automated models for valuing CMBS loans and bonds. He also spent time at the Ackman-Ziff Real Estate Group and in various other roles in research, acquisitions, and redevelopment. Josh has a master’s degree in finance from San Diego State University and a master’s degree in real estate finance from New York University’s Schack Institute of Real Estate.

forumContact us
forumContact support

Thank you for contacting us. we will get back to you shortly!

This site uses cookies to improve your user experience. By using our website, you are agreeing to our use of cookies.
Click here for more information.