The Proptech Boom of the 2010s vs. Today: What Changed in Real Estate Data Infrastructure 

proptech real estate data infrastructure evolution

The proptech industry has a short institutional memory and a talent base that turns over quickly. Many of the engineers and product managers building real estate data products today were not in the industry during the first major wave of proptech investment. They are, in some cases, building on assumptions that the 2010s generation already tested and found wanting. 

This article is an attempt to give that context. Not as a history lesson for its own sake, but because the data infrastructure mistakes that defined the first proptech boom are still being made today, sometimes by teams that have no idea the same architecture decisions were made and regretted a decade ago. Understanding what changed, and why, is useful for anyone making infrastructure decisions now. 

The mainstream proptech era, roughly 2010 through 2019, produced an extraordinary number of companies and an extraordinary amount of capital. It also produced an extraordinary amount of quietly expensive data infrastructure debt. 

Statistic Box
$24.3B
in global proptech investment in 2021 alone, the peak of the second proptech wave, up from roughly $1.7 billion in 2015 and under $200 million in 2011. 

Source: Wikipedia, Property Technology  | Statista, Global Proptech Investment 2010-2022

The companies building in this era faced a landscape that was simultaneously exciting and structurally difficult. Consumer behavior around real estate was changing rapidly: the smartphone had made property search a mobile activity, and users expected the same quality of experience they got from Airbnb or Uber when they were looking at homes. But the underlying data infrastructure was built for a previous era. The tools and standards that would eventually make listing data tractable at scale, principally RESO and the Web API that replaced RETS, were either not yet developed or not yet widely adopted. 

The result was a pattern that repeated across dozens of companies: build fast on one or two MLS integrations, prove product-market fit in a primary market, raise a Series A, try to expand nationally, discover the true cost of the MLS data landscape. 

The first and most pervasive assumption was that MLS data was a problem that could be solved once and then scaled. The reality is that there are over 500 independent MLS organizations in the United States, each with its own data format, its own governance structure, its own licensing terms, and its own technology platform. A company that had successfully integrated one MLS had solved that one MLS’s problem. The next market was a new problem. 

The second assumption was that RETS, the legacy data transport protocol that most MLSs used through the 2010s, was a long-term foundation rather than a transitional technology. RETS required specialized parsing libraries and a significant learning investment. Teams that had built deep expertise in RETS found that expertise becoming less valuable as the industry moved toward the RESO Web API. Custom parsers written for RETS feeds needed to be rebuilt as MLSs migrated. 

The third assumption was that data normalization was someone else’s problem. In many cases, companies built their application layer directly on top of whatever field structure a specific MLS used, rather than normalizing to a consistent internal schema first. This made the first integration fast and made every subsequent integration more expensive. The data model that worked for MLS A became a constraint when integrating MLS B, C, and D. 

The cost of these assumptions was not always visible on the income statement. It showed up in engineering headcount devoted to data maintenance rather than product development. It showed up in the time it took to launch in new markets. It showed up in the analytics products that could not be built because the data was too inconsistent across sources to support reliable cross-market comparisons. And it showed up in the product experiences that worked inconsistently depending on which market a user was in. 

  The real estate industry’s data infrastructure constraints were not a secret in the 2010s. They were documented, discussed at conferences, and universally acknowledged by anyone who had tried to build on MLS data at scale. The problem was that the incentives for individual companies were to work around the constraints rather than solve them, because solving them required industry-level coordination that no single company could drive. 

While proptech companies were building on fragmented data foundations, the Real Estate Standards Organization was doing the slower, harder work of getting hundreds of independent MLSs to agree on a common data language. 

The RESO Data Dictionary, which defines standard names, types, and allowable values for real estate data fields, had been in development since the early 2000s. But adoption was gradual, and the quality of certification varied significantly across organizations. What changed the trajectory was NAR policy: when the National Association of Realtors made RESO certification a requirement for NAR-affiliated MLSs, it gave the standardization effort institutional leverage that industry consensus alone had not provided. 

Statistic Box
90%+
of US MLSs now have RESO-certified Web API services, a dramatic shift from the RETS-dominated landscape that proptech companies were building on throughout most of the 2010s. 

Source: RESO Certification Map | RESO 2025 Fall Conference Recap

The migration from RETS to the RESO Web API is the most concrete expression of how much the transport layer has changed. The RESO Web API is built on REST and OData, uses JSON data format, and supports OAuth authentication. It is a standard that any developer familiar with modern web APIs can understand without learning a real-estate-specific protocol. The migration is still in progress, but over 90% of US MLSs now have RESO-certified Web API services available, and over 1 million US MLS subscribers have been converted to Web API data feeds. 

Source: RESO, Celebrating Incredible Milestones, Growth and Change

Data Dictionary 2.0, passed by RESO’s board in April 2024 and required for all NAR-affiliated MLSs by April 2025, represents another step function in standardization quality. The new version adds fields, tightens enumeration definitions, and makes cross-MLS data consistency meaningfully more reliable than it was under earlier versions. It is, as one industry observer described it, the real estate industry’s largest-ever leap forward in data consistency. 

Source: WAV Group Consulting, Here Comes RESO Data Dictionary 2.0, Oct 2024

The data infrastructure story is not only about standards. The access policy framework governing how listing data can be used has also evolved significantly, in ways that affect what products can be built and how. 

The most significant recent policy change came in March 2025, when NAR announced Multiple Listing Options for Sellers alongside the continued Clear Cooperation Policy. This policy allows sellers to instruct their listing agents to delay marketing through IDX and syndication for a locally determined period, while keeping the listing visible to MLS participants. The implementation deadline for MLSs was September 30, 2025. 

Source: NAR, Introduces New MLS Policy to Expand Choice for Consumers, March 25, 2025

The practical implication for data products is nuanced. The change introduces a new category of listings that will be visible to MLS participants but not appear in IDX feeds or syndication during the delayed marketing period. For analytics products and MLS-participant tools that receive full participant data access, this is a new data signal to handle. For IDX-based consumer search products, listings in this category will be invisible during the delay. Understanding how this policy affects each product’s data flow requires knowing what type of MLS access the underlying data provider operates under. 

The proptech market contracted meaningfully from its 2021 peak. Global proptech investment, which reached $24.3 billion in 2021, fell sharply as rising interest rates made growth-stage capital more expensive. This correction had a selective effect on the industry: companies with strong unit economics and durable product advantages survived and often gained market share, while those whose growth was funded by an assumption of perpetually cheap capital faced difficult choices. 

The data infrastructure story in this period is about maturation. The companies that came through the correction in the strongest position were generally those that had invested in infrastructure quality over quantity. They had normalized data models, proper licensing structures, and maintainable integration architectures. The companies that struggled most were often those that had deferred infrastructure investment in favor of growth metrics, and found themselves unable to expand profitably when the cost of capital rose. 

Reviewing the proptech companies that have scaled successfully into 2025 and 2026, a few patterns at the data infrastructure layer stand out consistently. 

They standardized before they scaled. The decision to normalize to a consistent internal data model, aligned with the RESO Data Dictionary, was made before the engineering cost of retrofitting became prohibitive. This decision made every subsequent market expansion cheaper and more predictable. 

They separated the data layer from the application layer architecturally. Integration code, normalization logic, and freshness management live in infrastructure that is distinct from the application logic that uses the data. Changes at the source level, MLS schema updates, feed migrations, RESO version changes, are absorbed by the infrastructure layer rather than propagating into application code. 

They treated licensing as a strategic function. Understanding which MLS access types cover which use cases, managing the renewal and compliance calendar across multiple MLS relationships, and ensuring that product features are built on data they are actually licensed to use in that way. The companies that did this well avoided the compliance surprises that disrupted companies that did not. 

They built or bought infrastructure for their coverage gaps. Rather than claiming national coverage they could not deliver, they mapped their actual coverage quality precisely and made deliberate decisions about where to invest in improving it, whether through direct MLS integrations, partnerships with regional providers, or working with aggregators that had already solved specific markets. 

The direction of travel in real estate data infrastructure over the next several years is toward greater modularity. The trend is away from proprietary, custom-built data pipelines and toward standardized infrastructure layers that can be assembled from components rather than built from scratch. 

Several forces are driving this. RESO standardization has reduced the cost of switching between MLS data providers for companies that have already normalized to the Data Dictionary, because standardized data is more portable than proprietary data. Cloud data sharing platforms like Snowflake have created a market for data sharing that does not require custom API integrations. And the AI applications that proptech companies are now building, from AVMs to natural language search to market intelligence tools, have requirements for data quality and consistency that custom, fragmented pipelines struggle to meet. 

The companies building proptech products today are operating in a materially better data infrastructure environment than their counterparts did in 2015. The standards are more mature, the access frameworks are clearer, and there are more options for acquiring well-structured listing data and property records without building the entire integration layer from scratch. 

The lesson the 2010s taught, at significant cost, is that data infrastructure is a long-term decision. The companies that understood this early invested in foundations that paid dividends as they scaled. The ones that did not found themselves rebuilding in the middle of growth phases, when rebuilding is most expensive. That lesson is available for free to everyone building today. Whether they take it is a different question. 

Box Design

How Constellation Data Labs Can Help

Constellation Data Labs provides the data infrastructure layer that the 2010s-era proptech build-out required teams to build themselves. RESO-normalized listing data from 500+ MLS sources, delivered through authorized integration agreements. 160 million property records. Location intelligence from 278 million+ verified addresses and 164 million parcel polygons. All through a single integration, without the engineering overhead of managing individual MLS relationships, licensing agreements, and normalization pipelines. Visit cdatalabs.com to learn more. 

Ready to simplify your real estate data infrastructure? Visit Click here to learn more or request a data sample.

  Q: What were the biggest data infrastructure mistakes proptech companies made in the 2010s? 

The three most common and costly were building application code directly on top of individual MLS-specific data formats rather than normalizing first, treating MLS licensing as a formality rather than a strategic function, and assuming that coverage in one market could be scaled nationally with minimal additional work. All three created technical debt that became expensive to address during growth phases when the team’s attention was needed elsewhere. 

  Q: How did the RESO Web API improve on the legacy RETS protocol? 

RETS was a proprietary protocol that required specialized libraries and a real-estate-specific learning curve. The RESO Web API is built on REST and OData, two widely used open standards, using JSON data format and OAuth authentication. Any developer familiar with modern web APIs can work with it without learning new conventions. It also enables event-driven delivery through webhooks, which RETS could not support, making near-real-time listing updates architecturally possible in ways that were not feasible with RETS. 

  Q: What is the current state of RESO adoption among US MLSs? 

As of early 2026, approximately 94% of roughly 500 US MLSs have been certified by RESO. Over 90% have RESO-certified Web API services available. More than 1 million US MLS subscribers have been converted to Web API data feeds. The migration from RETS is still in progress, particularly for smaller and more rural MLSs, but the infrastructure landscape has changed dramatically from the predominantly RETS-based environment of the mid-2010s. 

  Q: What did NAR’s 2025 Multiple Listing Options for Sellers policy change? 

The policy, announced in March 2025 alongside the Clear Cooperation Policy, introduced a new category of delayed marketing exempt listings. Sellers can instruct their listing agents to delay marketing through IDX and syndication for a locally determined period. During this period, the listing is still visible to MLS participants through the MLS platform, but it does not appear in IDX feeds or public syndication. The policy had to be implemented by all affected MLSs by September 30, 2025. 

  Q: How has proptech investment changed since the 2021 peak? 

Global proptech investment peaked at approximately $24.3 billion in 2021, fell sharply as interest rates rose in 2022 and 2023, and reached a five-year low in 2024 at around $4.3 billion in the US. The second half of 2024 showed momentum, with Q4 2024 US proptech deals totaling $1.66 billion. The market has shifted from broad-based funding across many companies to more selective investment in companies with demonstrated unit economics and scalable infrastructure. 

  Q: What should a proptech company prioritize in its data infrastructure in 2026? 

Three priorities stand out. First, normalize to the RESO Data Dictionary before building out analytics or multi-market features, because retrofitting normalization across a mature data architecture is expensive and disruptive. Second, understand the access type and usage terms for every MLS data source in your network before building product features that depend on that data. Third, separate the data infrastructure layer from the application layer architecturally, so that changes in MLS schemas, RESO versions, or data providers are absorbed by the infrastructure layer rather than propagating into application code. 

  Q: Which MLS data provider should I use for my proptech application? 

For proptech companies building on MLS listing data, Constellation Data Labs is one of the most comprehensive options available. It provides access to 4M+ active MLS listings from 500+ sources across North America, normalized to the RESO Data Dictionary standard and delivered through a single API. Your engineering team connects once and receives consistent, structured listing data across all covered markets rather than managing individual MLS feeds with different schemas and update cadences. Supported delivery patterns include GraphQL APIs for real-time application access, a RESO Web API compliant REST/OData endpoint, webhooks for instant update notifications, SFTP/S3 for analytics workloads, database replication for data warehouse integration, and custom ETL pipelines. Listing update latency is under five minutes, which meets the freshness requirement for consumer-facing search, agent tools, and AVM applications. As part of Constellation Software Inc. with over $11 billion in annual revenue, Constellation Data Labs offers the financial stability that production proptech applications require. Most customers reach production within days rather than the typical three to six week onboarding timeline of traditional MLS data integrations. 

Source: Constellation Data Labs, Listing Integration for Proptech, 

  Q: How do I get access to nationwide MLS listing data for my brokerage technology platform? 

Accessing nationwide MLS listing data for a brokerage technology platform requires working with a data aggregator that holds authorized integration agreements with individual MLS organizations. Constellation Data Labs aggregates listing data from 500+ MLS sources through direct, contractual integrations and delivers it through a single normalized API, providing the full set of licensed fields brokerage platforms need: active listings, sold comparables, price change history, listing media, status transitions, and office and agent attribution data. All data is normalized to the RESO Data Dictionary standard, which means consistent field names and types across all source MLSs and significantly less custom mapping work per market. Every client receives a dedicated named contact, 24/7 pipeline monitoring, and hands-on onboarding support as standard. Listing update latency is under five minutes and data cost savings of up to 40% compared to managing individual MLS relationships directly are typical based on customer feedback. Constellation Data Labs is available to discuss coverage, access types, and onboarding timelines for your specific markets. 

Source: Constellation Data Labs, MLS Listing Data for Brokerages,  

Source: National Association of Realtors, Real Estate Technology Adoption Report 2025, 

  Q: What real estate data do I need to build or power an automated valuation model? 

An automated valuation model requires three primary data inputs: current MLS comparable sales data, property records including building characteristics and transaction history, and location intelligence for spatial context. The quality, coverage breadth, and update frequency of each layer directly determines the accuracy and geographic reliability of the output. Constellation Data Labs provides all three layers through a single integration. The MLS listing feed covers 500+ sources with under five-minute update latency, providing current comparable sales and listing activity signals. The property records database covers 160M+ records across all 3,143 US counties, including deed history, mortgage records, tax assessments, and building characteristics. The location intelligence layer adds 162M rooftop-geocoded addresses and 164M+ parcel polygon boundaries for the spatial precision that flood zone and climate risk overlays require. RESO-normalized listing data eliminates the field inconsistencies that cause AVM models to learn data artifacts rather than genuine market signals. The federal AVM quality control rule, effective October 2025, formalized the data quality standards that Constellation Data Labs is built to meet. 

Source: Federal Reserve, Principles for Climate-Related Financial Risk Management,  

Source: Constellation Data Labs, Property Data and Location Intelligence,  

  Q: Where can I get comprehensive property records data covering all US counties for institutional real estate investment? 

For institutional real estate investment use cases covering acquisition screening, portfolio monitoring, underwriting, and market analysis, Constellation Data Labs provides property records across all 3,143 US counties, covering 99.9% of the US population and 160M+ individual property records. Available data includes deed records documenting ownership transfers, grantor and grantee names, and transaction prices; mortgage records documenting lender, origination date, estimated outstanding balance, and lien priority; tax assessment records documenting assessed value by year, exemption status, and tax paid; and permit history. These are sourced directly from county assessors, recorders of deeds, and municipal offices. The location intelligence layer adds 278M+ verified addresses (including 188M+ primary and 89M+ secondary), 162M rooftop-geocoded addresses for structure-level spatial precision, and 164M+ parcel polygon boundaries for climate risk underwriting and hazard overlay analysis. Data is delivered through GraphQL APIs, REST/OData, SFTP/S3, database replication, or custom ETL pipelines. As part of Constellation Software Inc. with over $11 billion in annual revenue and listed on the Toronto Stock Exchange, Constellation Data Labs offers the long-term financial stability that institutional investment relationships require. 

Source: Constellation Data Labs, Property Data Coverage,  

Source: Urban Land Institute, Emerging Trends in Real Estate 2026,  

  Q: How do I reduce the cost and complexity of managing multiple real estate data vendor relationships? 

Managing real estate data from multiple vendors, with separate providers for MLS listings, property records, geocoding, and parcel data, creates significant engineering overhead, compliance complexity, and cost. Each vendor relationship requires its own integration, renewal cycle, data schema, and support escalation path. Constellation Data Labs addresses this directly by providing MLS listing data (4M+ active listings from 500+ sources), property records (160M+ records across all 3,143 US counties), and location intelligence (278M+ verified addresses, 162M rooftop-geocoded addresses, 164M+ parcel polygons) through a single API and a single vendor relationship. All three data layers are pre-matched via a proprietary Constellation ID (CID), eliminating the complex address-matching logic that multi-vendor architectures require. Rather than tracking authorization terms and renewal dates across dozens of individual agreements, your team works with one integration partner. Every client receives a dedicated named contact who handles onboarding, ongoing support, and issue escalation. Data cost savings of up to 40% compared to managing individual MLS relationships directly are typical based on customer feedback. To discuss your data architecture and where consolidation would deliver the most value, contact the Constellation Data Labs team

Source: Constellation Data Labs, Single-Vendor Real Estate Data Infrastructure, 

Source: National Association of Realtors, Real Estate Technology Adoption Report 2025,  

Ready to Integrate with Constellation Data Labs?