As hands-on "knowledge professionals", knowledge engineers typically dive straight into solving practical, real-world problems and delivering real value in a variety of different ways. While they may specialize in individual knowledge domains, knowledge engineers are capable of performing a wide range of data-related tasks. As a result, the term "knowledge engineering" should be described as an umbrella term which can be broadly defined as follows:
This non-exhaustive definition highlights the breadth and variety of knowledge engineering tasks, each of which can be approached from many angles using a number of knowledge engineering techniques described below.
In practice, knowledge engineering relies on a number of tools, practices and methodologies that collectively can be referred to as the "knowledge engineering toolkit". These diverse tools and approaches can be roughly classified into the following four categories:
vocabularies, terminologies, taxonomies, ontologies,
knowledge graphs and graph data repositories,
content repositories and knowledge bases,
data catalogs and metadata repositories.
In a way, these tools act as crystallization points for the different "tracks" of knowledge engineering which are examined below in more detail.
Broadly speaking, ontologies, vocabularies, taxonomies, and terminologies define and categorize various resources, concepts, terms, and other types of information assets. In addition to defining class hierarchies, they may store custom relationships between those classes (concepts) and formally define various logical rules.
For example, taxonomies can be used to organize products in an online or physical store and directly or indirectly shape all related business processes and business software tools, while controlled vocabularies formalize the precise medical or legal terminologies which facilitate structured data interchange.
Vocabularies, ontologies, and terminologies can themselves be formally defined, serialized, and stored in a number of popular data formats. To formally define business ontologies, for example, knowledge engineers use software tools such as Protégé.
By unifying, integrating, and serving knowledge at scale, knowledge graph technologies both support fact-checked and data-driven decision making and contribute to the rise of "know-it-all" intelligent agents. Defined as collections of data that capture certain areas of knowledge and can be represented as graphs, knowledge graphs typically adhere to formal ontologies which define the kinds of entities and relationships that can exist in the graph and formally describe their properties.
Knowledge engineering professionals that develop and maintain large scale knowledge graph systems are typically charged with creating semi- or fully automated ways of ingesting data into the graph, performing near-realtime updates, and closing its knowledge gaps. Commonly, the data needs to be extracted from a variety of unstructured sources such as web pages, rich text documents, scanned pages, and media resources and transformed into graph data before it can be pushed to the knowledge graph repository. Even when the data provided by data and content partners already comes in some structured format, it may undergo schema and ontology matching, entity linking, and record deduplication. From this perspective, the job of a knowledge engineer largely overlaps with that of a software engineer, data scientist, data engineer, or machine learning/AI engineer.
In an enterprise setting, knowledge engineers concerned with content management commonly pursue the goal of enabling employees to easily search and retrieve information they need, such as internal documents, database fields, and CRM data, both from the company's internal and third-party data repositories. In some cases, a knowledge engineer can be involved in developing custom multilingual solutions that combine natural language understanding, machine reasoning, and semantic representation to allow end users to ask complex questions against enterprise content repositories.
Enterprise data catalogs make data assets accessible to everyone in the business who needs them to make business decisions. These data catalogs are themselves commonly implemented as searchable datasets that store metadata about the indexed data assets. By connecting and indexing all of the organization's data assets, such metadata repositories facilitate efficient dataset management and knowledge discovery.
Knowledge engineers come from backgrounds ranging from software engineering to pure mathematics, statistics, and humanities. While there is no well-defined career path for knowledge engineers, there are some core competencies inherent to the knowledge engineering practice which include:
experience using and developing formal ontologies and vocabularies (e.g. Protégé),
knowledge of Semantic Web stack and standards (e.g. RDF, RDFS, OWL, SHACL, SKOS),
understanding of querying methods and tools (e.g. SPARQL, Cypher, Gremlin),
experience with graph databases (e.g. Neo4j, Apache Jena, Amazon Neptune),
understanding of the data wrangling, text mining, structured data extraction, and topic modeling techniques as well as programming languages and libraries used in data science (e.g. scikit-learn, NumPy, pandas, spaCy).
Overall, knowledge engineering encompasses multiple areas of expertise that all deal with "computable knowledge". As a result, the responsibilities of an individual knowledge engineer can vary from implementing ETL pipelines to designing business processes related to knowledge sharing within an organization. To be able to tackle all the challenges involved in the knowledge engineering process, a knowledge engineer needs to be equipped with a wide variety of data wrangling, software engineering, and project ownership skills.
Made by Anton Vasetenkov.
If you want to say hi, you can reach me on LinkedIn or via email. If you like my work, you can support me by buying me a coffee.