How Can You Trust Data in the AI Era?

Episode • May 16, 2024 • 27m

In this episode, Amir Bormand sits down with Harrison Tang, CEO and Co-founder of Spokeo, to explore a problem most people in AI, data, and digital identity overlook: entity resolution. Harrison unpacks how billions of fragmented data records are connected, how we determine what's true in a world of generated content, and why trust and privacy are becoming the new battlegrounds in tech.

They discuss the philosophical foundations of identity, the technical challenges of resolving entities at scale, and how GenAI complicates truth detection. If you're building in data, trust, or anything AI-related—this is required listening.

🧠 Key Takeaways:

Entity resolution is foundational to how we understand digital identity—but it’s far from solved, especially with GenAI-generated noise increasing.

Spokeo resolves 600M+ entities from 19B+ records, using distributed computing and multiple “criteria of truth” (consensus, authority, coherence, etc.).

Generative AI can create content—but not verify it. It’s great for mock/test data, but not for discerning truth.

The real challenge? Detecting fake content. Harrison breaks down the four pillars: provenance, detection, governance, and education.

Privacy ≠ Security. Identity and access management sits above entity resolution, and is crucial for enforcing data control.

⏱️ Timestamped Highlights:

00:55 – What Spokeo does and the scale of its data

02:10 – What is entity resolution? Why it matters

04:10 – The challenge of 19B record comparisons

06:00 – Garbage in, garbage out: why data quality starts at ingestion

07:10 – The five criteria of truth: consensus, authority, consistency, coherence, correspondence

10:40 – Where GenAI helps (and fails) in entity resolution

13:00 – Can AI discern truth like a human? Harrison’s take on AGI skepticism

16:20 – The rise of fake data and the opportunity for Spokeo

18:15 – AI provenance, invisible watermarks, and content authenticity

21:00 – The four pillars of trust in the AI age

23:00 – How privacy impacts data workflows and IAM

25:30 – Why entity resolution sits at the foundation of identity systems

💬 Quote of the Episode:

“The problem of who we are has existed since the beginning of the human race. And in the digital world, that question is more important than ever.” — Harrison Tang

🔗 Resources Mentioned:

W3C Credentials Community Group – where Spokeo contributes on decentralized identity standards

Adobe Content Authenticity Initiative – cited as a tool for detecting AI-generated content

Zero-shot prompting – the concept behind GenAI generating realistic data from a single prompt

🎯 Career Tips (from the episode):

While there wasn’t a dedicated segment on careers, Harrison did hint at a big opportunity area:

If you're in data or security, AI-generated fake content is a growing risk—and a career edge for anyone working on provenance, detection, and digital trust systems.

Activity

Switch to the Fountain App

Open in Fountain