Collecting data is good and important, but to be useful, it needs to be understandable.
For Lightcast to make sense of the global labor market and help our clients accomplish their missions, the billions of data points we collect need to be parsed and classified. In day-to-day conversation, those terms sound very similar, even interchangeable—but when we’re talking about labor market data, the differences matter.
What’s the Difference Between Parsing and Classification?
The short answer is that parsing can tell you what something says and classification tells you what it means.
In other words, parsing translates a piece of information into data the computer can use, essentially re-stating the information it receives. Classification goes beyond that: it takes the parsed output and connects it to categories that give it context.
Here’s what that means in practice.
Parsing and Classifying a Resume
Imagine this is the “Education” section of a candidate’s resume:
Iowa State University
Bachelor of Science in Technical Communication
Awards, Accomplishments, and Activities:
Dean’s List
Rodeo Club
Minor: Spanish
If we were to parse the resume, the program would sort through the text provided and look for the candidate’s degrees, then it would return a list of what it found exactly as the candidate wrote them. The code would look something like this:
"educationDegrees": [
{
"name": "Bachelor of Science",
"codes": [
"Bachelor of Science"
],
"specializations": [
{
"type": "major",
"name": "Technical Communication"
},
{
"type": "minor",
"name": "Spanish"
}
]
}
]
Since this section of code only wants to identify degrees, it skipped over the other accomplishments and activities and gave us just the minor and major. Through parsing, these two pieces of data are now easily readable and ready to use in any program or situation where it would be helpful to know a candidate’s degree background.
That’s a useful tool—an employer might use this to automatically fill out a set of forms that will all ask for someone’s major and minor, or maybe to compare a few candidates’ degrees at a glance. Or a university might use this to extract skills, degrees, and job titles from the online profiles of their alumni.
How is classification different? It goes a step further.
“Technical Communication” is a degree offered at Iowa State, but this type of technical writing program is called by different names at different schools (and so are Spanish degrees). In order to compare this resume to the billions of others in the world, it needs to be standardized. That’s what classification does.
The National Center for Education Statistics uses CIP Codes to create a common understanding of degree programs. For Technical Communications, the code would be 23.1303, corresponding to “Technical and Business Writing.” For the Spanish minor, the relevant code is 16.0905, for “Spanish Language and Literature.”
So if we were to run the exact same resume through a classification program, the names of the degree programs would be modified and standardized.
"educationDegrees": [
{
"name": "Bachelor of Science",
"codes": [
"Bachelor's degree"
],
"specializations": [
{
"type": "major",
"name": "Technical and Business Writing",
"cipCode": "23.1303"
},
{
"type": "minor",
"name": "Spanish Language and Literature",
"cipCode": "16.0905"
}
]
}
]
These standardized categories open up far more possibilities for understanding this data. In a large dataset, parsed output would be very, very difficult to wade through on its own—there are too many variables and nuanced differences that would each need to be processed individually.
Classification simplifies that process and makes large datasets usable again because it creates consistency. If you can trust that one term will mean the same thing across different datasets, you can create connections and drive insight that would be impossible otherwise.
Parsing and Classifying Jobs and Skills
Just like majors, jobs and skills also need to be standardized in order to be understood. Individuals can come up with a list of skills on their own—and this might be useful for just that one person, maybe to list them on their resume, and parsing can give you that specific list.
But when a company's HR department needs to understand which skills are present among its workforce, when a college or university needs to map the skills they teach to the skills employers value, or when workforce developers need to know what skills local workers will need in the future, the entire organization needs to be working from the same common language, and it needs to make sense in a broader labor market context. That’s where classification comes in.
So if someone wrote down on a resume or online profile that their skills include "data analysis," "economics," “sales," and "python," then parsing would be able to display those terms in a machine-readable format, generating code like competencyName: “data analysis”
. Classification then applies an additional layer of context and insight to those terms, connecting them into a system designed for processing and understanding the data, such as Lightcast Open Skills, where data analysis has the unique identifier "KS120GV6C72JMSZKMTD7."
If we were to classify the parsed text of competencyName: “data analysis”
, the name of the skill might not even change. The complete output might be something like:
{
"competencyName": "Data Analysis",
"competencyIds": {
"value": "KS120GV6C72JMSZKMTD7"
},
But even with the same wording, the classified output would be more useful than the parsed output on its own, because the classification process enriches the data by showing its context within the rest of the taxonomy—allowing it to speak the common language of skills.
Job titles can be especially vulnerable to unclear wording, making classification especially useful. When parsed output shows just those titles on their own, you’re lacking important insight that a standardized taxonomy would provide. Classification fills the gap; instead of using parsing to just see the words “Senior Solutions Architect” on their own, classification could provide an O*NET code (15-1252.00) or a Lightcast Occupation Taxonomy identifier (23171543).
Putting Classification To Work
Classifying data gives it the context and meaning to be useful in big-data analysis—and it’s fundamental to the work Lightcast does as the leader in global labor analytics. Parsing lays the foundation for this insight by formatting information in such a way that it can be classified. It’s a two-step process: a parsed output is useful for whatever computer application you want to use it for, and at Lightcast, our use case is classifying that data in order to understand the labor market.
So how can you use classification to create understanding for your business, institution, or community? Our tools are ready to help.