The end of the industry filter

A few years ago, a client asked us last month for a list of healthcare companies.

We pulled it. It came back with 800 results. In the same CSV there were:

a chiropractor’s office
a health-tech startup
a manufacturer of MRI cooling parts
and a marketing agency focused on healthcare

Technically all healthcare. But not a list that could be used in any way.

The filter did exactly what we asked. We asked for healthcare. Everything in there was, in some loose sense, healthcare. The problem was that the unit of the question, healthcare, never meant what we needed it to mean.

I’ve been thinking about this for a few weeks now and the more I look at it, the more I think the entire shape of how we build target lists has been pointed at the wrong layer.

A taxonomy from 2003

The industry filter we all use, the dropdown in Apollo and ZoomInfo and Sales Navigator, is descended from a taxonomy that wasn’t built for sales. SIC codes and NAICS codes were built for government statistics. The contact databases came along in the 2000s, scraped public filings, layered on self-reported company descriptions, and started selling the taxonomy as if it told you who to sell to.

It never did. It told you what bucket the company would land in if a regulator had to count it. Which, for a sales motion, is the wrong question.

What actually happens when you search for “healthcare” - a database scans company descriptions and self-reported industry fields, and pattern-matches the word. Anything that touches the keyword gets pulled in. The result is search slop. You can spend a day deduping it and still not have a coherent list, because the underlying signal, this company describes itself as healthcare, was never the signal you wanted.

What you actually wanted was something like this company sells to hospital procurement teams. Or this company builds software that touches PHI. Or this company has a clinical advisory board. Those signals don’t live in a description field. They live in the company’s website, its job postings, its integrations, the people on its team, the case studies on its homepage.

For most of the last decade we couldn’t filter on those signals because the cost of computing them across millions of companies was too high. So we filtered on what the database had, which was the description. And we lost the rest.

What changed

A few things changed at once:

The first is that scraping the public web at scale became widely accessible. You can pull every page of every company’s website, every job posting, every press mention, in the time it takes to make coffee. The data was always there. The unit cost of looking at it dropped by two orders of magnitude.

The second is that you can now read what you pulled. Not parse it. A model can look at a company’s homepage and tell you whether they sell to procurement or to clinicians, whether they offer an API, whether they have a Slack integration, whether they have a podcast, whether their case studies are about hospitals or insurance companies. The kinds of judgments a human analyst would make in thirty seconds are now things you can run across a hundred thousand companies overnight.

The third is that the new data layer comes through APIs. You don’t open a database, click a filter, export a CSV, upload it somewhere else. You write a query, you get rows back, and the rows can be piped directly into whatever your campaign system is. The list and the campaign stop being separate steps.

None of these three is new on its own. The shift is that they finally line up.

The shape of a new query

Once those three things land together, the questions you can ask change by a lot.

You can ask: products with an API and a Slack integration but no Chrome extension, that have raised from Y Combinator or a16z’s speedrun. You can ask: wealth management firms whose VP of Financial Planning joined in the last 60 days, ranking high on Google for “financial advisors,” that don’t have a podcast. You can ask: manufacturing companies under NAICS 336 selling to European companies with 1,000+ employees.

Each of those is a specific worldview about who you sell to. Each one would have been impossible to build in 2020 without a research team and three weeks. Each one is now one API call.

The detail that matters most is that the filters operate on website-level signals, not description-level signals. They’re looking at what a company actually does, not at what someone wrote about it for a directory.

That’s a small-sounding distinction. It changes everything downstream.

The list stops being a list

For years, a target list was an artifact. You built it. You exported it. You uploaded it to a sequencer. You worked it. When it went stale, and it always went stale, you built another one. The whole motion treated the list as a frozen photograph of a market that had already started moving by the time you finished taking the picture.

When the list is one API call away, the list isn’t an artifact anymore. It’s a query. The query is what you maintain. The campaign sits on top of the query, and when the underlying world changes, when someone new joins as VP of Financial Planning, when a company ships an API, when a hiring pattern shifts, the list moves with it.

This is a quieter shift than “AI writing”, but I think it’s the bigger one. The bottleneck on outbound was never copy. The bottleneck was that the list was always a frozen photograph of something that had already changed.

If the list is a query, the campaign is a system that watches the world. The reps work the leads the system surfaces. The system gets better over time, not the copy.

What we’re still figuring out

I don’t want to oversell any of this. There are real limits.

The cost of running website-level enrichment across millions of companies is still not zero. You have to be careful about where you spend it. You can’t enrich the entire universe. You enrich the slice where the query already has signal. That means the old industry filter is still useful as a coarse first cut. You just don’t end the search there anymore.

There’s also a judgment problem. When the filter gets that specific, it’s easy to over-narrow. You can write a query so precise that it returns four companies, all of which are real, none of which are worth contacting. The art is in writing a query specific enough to mean something but broad enough to surface a list a team can actually work.

And the agents-running-the-queries part is still early. The lookalike filter taking a 5,000-character description works. But the muscle of writing a good description, knowing what to put in it and what to leave out, is the new skill we’re all about to need. The companies that get good at that will outcompete companies that just have a better database, because the database stopped being the differentiator a while ago.

I’ve been thinking about what the rep job looks like when the list is a query. I don’t have a clean answer yet. The shape of it, though, is that the highest-leverage thing a salesperson does is no longer running a sequence. It’s writing the right query.

The list, finally, gets to be the campaign.