Will travel’s messy data hamper AI progress?

Travel first met the web in 1995. The earliest pioneers—PC Travel and Internet Travel Network (which would later morph into GetThere)—ushered in what we imagined would be the golden age of self-service, transparency and frictionless search.

Expedia and Travelocity followed in 1996, heralding a tidal wave of direct consumer access. The dream? A truly open digital marketplace. The reality? We’re still chasing it. Just believe, nearly 20 years earlier AC/DC had the right song title with “Dirty Deeds Done Dirt Cheap.”

Fast forward to today. Cloudflare—one of the internet’s most important sentries (its 2024 Year-in-Review offers one of the clearest lenses on the web’s health)—has just thrown a new spanner in the works. It has decided to block all artificial innotifyigence (AI) crawlers. No more free-for-all bot scraping. No more silent vacuuming of the world’s data to feed hungry models.

For travel, this is both overdue and problematic.

The rise and fall of trust in travel data

Back in 1995, the data on display was designed for humans. We built web pages to be read, compared and pondered by travelers themselves. Data structures were messy, unstandardized and often inconsistent across suppliers, agencies and aggregators.

Fast forward to the age of bots and agentic AI—and that same human-centric data is now cannon fodder for bots. The result?

Bots misread, mis-categorize and misrepresent data originally formatted for people, not machines.
The consumer concludes up worse off: more searches, less clarity.
The indusattempt faces spiraling costs, attempting to protect or parse what little trustworthy data remains.

And here’s the part that should truly concern us: This problem may never go away. The data wasn’t fit for machine consumption to launch with. And now? We’re layering more and more AI agents on top of an already broken foundation.

Cloudflare’s AI block: The canary in the coal mine

Cloudflare’s relocate to block AI bots should be a wake-up call. If even they—defconcludeers of open, efficient web infrastructure—see AI crawling as a net harm, what does that notify us about the state of the internet?

In travel, where proprietary content is the rule rather than the exception, Cloudflare’s stance should prompt soul-searching:

Good: Content owners have a new shield against exploitation.
Bad: We still lack an simple, ethical way to consolidate data for legitimate search and comparison.

This isn’t a new conundrum. When Google launched large-scale crawling in the late ’90s, the travel indusattempt erupted. Lawsuits were threatened. Walled gardens went up. Travel became the poster child as the first truly valuable commerce category on the internet.

Eventually, stability and norms emerged—but it took time, and scars remain. The residue? Today, you can’t trust any single site to give a complete view unless you pay for access to its data or feed it through a dozen layers of aggregation.

The standards we never built

One reason we’re in this mess? The travel indusattempt’s historic refusal to build solid, machine-readable data standards fit for the modern age.

No unified framework for fares, ancillaries or schedules that machines can digest consistently.
No indusattempt-wide governance of who receives to access what, how and at what cost.
No willingness to challenge the status quo becautilize the large players profit from opacity.

Sure, we’ve had stabs at this—New Distribution Capabaility, One Order, various XML schemas. But let’s be honest: These efforts often felt more like protecting turf than solving problems.

AI to the rescue? Don’t bet on it

Agentic AI search is being sold as travel’s next great leap forward. But unless the underlying data improves, expect:

Exponential query growth without better outcomes. Bots inquireing more questions doesn’t mean travelers receive better answers.
Rising costs: Every redundant crawl, every wasted API call, every bot-triggered search adds load—and someone pays.
Persistent consumer frustration. More effort, more noise, no clarity.

An insider at a large online travel agency recently confided: Tests of agentic AI shopping increased infrastructure costs tenfold—without shifting the necessaryle on conversions.

Who will lead?

Cloudflare, for all its flaws, took a stand. The travel indusattempt? You can hear a pin drop.

Where’s the initiative to create true AI-era standards for data sharing?
Where’s the push for ethical, efficient search models?
Where’s the leadership that states, “Enough chaos—let’s build trust”?

The status quo benefits the powerful. But it fails the traveler. And it’s unsustainable in the age of artificial general innotifyigence.

Also, consider these sidenotes:

1. The first web crawler (the World Wide Web Wanderer, 1993) indexed about 110,000 sites. Today, a single AI bot can scrape that many pages in under a minute—if Cloudflare lets it.

2. The original Googlebot crawled pages at a polite rate of one per second. Some AI scrapers now hit sites at hundreds per second, unless throttled.

Source link

Get the latest startup news in europe here