Public infrastructure for public good.
How Learning Tapestry spent a decade building the infrastructure that maps 1.85 million credentials across a $2.34 trillion ecosystem, and then brought AI to transform how it collects data.
The Problem: A Landscape No One Could See
There are 1.85 million credentials in the United States. Degrees, certificates, badges, licenses, apprenticeships, and micro-credentials are issued by 134,491 providers, forming a $2.34 trillion annual ecosystem that touches every career, every industry, every community.
Yet for most of recent history, there was no clear organization or codification to allow people to understand the ecosystem.
A high school counselor in Tennessee couldn’t compare a nursing certificate from a community college to an online badge from a healthcare platform. An employer in North Carolina couldn’t verify whether a job applicant’s credential met industry standards. A veteran transitioning from military service couldn’t map their training to civilian equivalents. The information existed, scattered across institutional websites, catalogs, and databases, but there was no common language to describe it and no shared registry to search it.
In 2016, Credential Engine was founded with a mission to change that: create credential transparency by building a public registry and a shared vocabulary (Credential Transparency Description Language (CTDL)) that would make the entire credential landscape visible, searchable, and comparable for the first time.
But they needed someone to build the technology.
1.85 million credentials and 134,000 providers. Until someone built the infrastructure to map it, no one could see the whole picture.
The credential landscape wasn’t hidden. It was invisible, scattered across 134,000 websites with no shared language.
The Solution
Learning Tapestry’s involvement with Credential Engine goes back to the very beginning when the Credential Transparency Initiative became Credential Engine. Steve Midgley was brought in to architect the core technology: the Credential Registry. As LT’s founder and former Deputy Director of Educational Technology at the U.S. Department of Education, he had the policy insight and technical background the project demanded.
The Credential Registry is the API engine underneath Credential Engine’s public registry. It’s the infrastructure that stores, validates, and serves linked open-source metadata describing every credential in the system, using CTDL JSON-LD, a format built on the W3C’s Resource Description Framework. Think of it as the database engine, the API layer, and the validation system all in one: the foundation that everything else in the Credential Engine ecosystem builds upon.
Learning Tapestry designed the architecture and built the implementation. The team chose Ruby on Rails for the backend, PostgreSQL for the database, Elasticsearch for search, and an elegant data model.
The registry went live in December 2017, and has been open source from day one.
Along the way, the work extended into unexpected domains. For example, Learning Tapestry supported the Navy’s Ready Relevant Learning initiative through Credential Engine, helping map military training credentials to civilian equivalents for more than 75% of Navy enlisted ratings. The same infrastructure that helps a community college student in Tennessee find the right certificate also helps a Navy veteran find their next career.
The Innovation: Teaching Machines to Read Catalogs
By 2024, the Credential Registry had become essential infrastructure. States were publishing competency frameworks. Universities were registering degree programs. Workforce boards were mapping career pathways. But a fundamental bottleneck remained: getting credential data into the registry.
The problem was simple to describe and brutally hard to solve. There are 134,000 credential providers in the United States. Most of them publish their credential information on websites, in course catalogs, program pages, and certificate descriptions, in formats designed for human readers, not machines. Extracting structured data from those pages and transforming it into CTDL format had been a manual, painstaking process. At that pace, mapping the full credential landscape would take decades.
Learning Tapestry’s answer was xTRA, the eXtensible Extract and Transformation Assistant, an open-source, AI-powered tool that automates the entire extraction pipeline. It uses large language models not just to read web pages, but to understand them: detecting content patterns, identifying credential data, and extracting structured fields that map directly to the CTDL schema.
The system works in three phases including LLMs, a headless browser crawler, and twelve specialized AI modules. First, an LLM analyzes a target website and creates a “recipe,” automatically detecting how the site is structured, where the catalog pages live, how pagination works, and what types of content each page contains. No manual configuration per site. The AI figures it out.
Second, a headless browser crawler executes the recipe, discovering pages and queuing them for processing. Third, specialized LLM modules extract structured entity data, courses, credentials, competencies, learning programs, and generate bulk upload files ready to publish to the registry.
Twelve specialized AI modules handle everything from detecting catalog structures to verifying extracted data. When a page contains multiple credentials, the system splits the content first, because LLMs lose accuracy with longer documents. The result: LT built a human-in-the-loop pipeline where AI does the extraction and transformation, and people review and publish. The result is a 10x productivity gain.
Twelve specialized LLM modules detect catalog structures, classify pages, identify pagination, extract entity data, and verify results. Supports GPT-5, GPT-4o, and specialized models, adapting to arbitrary website structures without manual configuration.
AI performs extraction and transformation. People review and publish. The system never accesses personal information, only publicly available institutional data, so privacy is preserved. In an ecosystem handling 1.85 million credentials, the decision to work exclusively with public institutional data rather than personal information is a deliberate data governance architecture, shaped by LT’s experience navigating privacy regulations across education technology.
The entire stack, from the Credential Registry (Ruby/Rails) to xTRA (TypeScript/React/Node), is Apache 2.0 licensed, copyright Learning Tapestry, Inc.
The Impact
The Credential Engine engagement is a partnership that has spanned three technical eras.
The first era was foundation: building the Credential Registry and the core infrastructure that would hold the credential data.
The second era was scale: as states and institutions began publishing to the registry, the system needed to grow. Strategic consulting agreements, annual maintenance, infrastructure upgrades became the steady work of keeping critical public infrastructure reliable and evolving. The registry expanded to support competency frameworks, career pathways, quality assurance data, and transfer values. The team grew. New engineers joined the LT developers maintaining and extending the system.
The third era is intelligence: xTRA and AI-powered extraction, data deduplication, and, in 2026, a full registry re-architecture. The same partnership that started with building the foundation is now reimagining it.
Credential Engine’s 2025 Counting Credentials report found 1,850,034 unique credentials in the United States across seven categories, from 134,491 providers in an ecosystem worth $2.34 trillion annually. The number of digital badges alone has tripled since 2022. The total badges issued has quadrupled to 320 million.
The registry that Learning Tapestry built and maintains is the infrastructure mapping this entire landscape. States like Tennessee have published 124+ competency frameworks. North Carolina has registered 2,754+ credentials and is expanding to regional partners. The Gates Foundation invested a 2-year grant in 2024 to expand linked open data in the registry. JPMorgan Chase, Microsoft, Google.org, the National Science Foundation, and the Lumina Foundation all fund the work that runs on LT’s code.
And xTRA is changing the pace. What once required manual extraction from thousands of institutional websites, each with its own structure, its own format, and its own idiosyncrasies, now happens through an AI pipeline that adapts to any site automatically. The 10x productivity claim isn’t marketing. It’s the measured difference between manual credential extraction and what twelve specialized LLM modules can do when they work together.
Learning Tapestry has been pivotal to our success, helping with their AI expertise and strategy to build next generation content collection service.
Jeanne Kitchens, Chief Technology Services Officer, Credential Engine
From laying the foundation of America’s credential registry to teaching AI to read a hundred thousand course catalogs, this is what happens when a technology team commits, evolves, and keeps solving problems.
References
- Credential Engine, “Counting Credentials 2025,” December 2025.
- Credential Engine, “About Us,” credentialengine.org.
- Credential Engine, “Credential Registry,” credentialengine.org.
- Credential Engine, “Credential Transparency Description Language (CTDL),” credentialengine.org.
- CredentialEngine/CredentialRegistry, GitHub repository, Apache 2.0 license, copyright Learning Tapestry, Inc.
- Credential Engine, “CTDL xTRA Tool,” credentialengine.org.
- CredentialEngine/ctdl-xtra, GitHub repository, Apache 2.0 license, copyright Learning Tapestry, Inc.
- U.S. Navy, “Ready Relevant Learning,” Naval Education and Training Command.
- Inside Higher Ed, “Over 1 Million Digital Badges Now Offer in the U.S.,” December 2025.
- Credential Engine, “New Funding Opportunities Focus on Equity, Jobs, Skills and Pathways,” July 2024.
- Credential Engine, “Lumina Foundation Renews Commitment to Credential Transparency,” July 2020.
1.85 million credentials needed thoughtfully designed and reliable infrastructure. How can we help you organize your data?
We build data infrastructure and AI-powered tools for organizations working at national scale.
Work with us