The Knowledge Graph Cookbook - Book Review
1 December 2021
Dr. Moria Levy
"The Knowledge Graph Cookbook: Recipes that Work" is a book by Andreas Blumaur in collaboration with Helmut Nagy and published in 2020. This book delves into a field gradually gaining momentum in artificial intelligence, the semantic web, knowledge management, and the representation of data and information through graphs.
At its core, the book revolves around the transformative power of knowledge graphs in converting data and information into actionable knowledge. Instead of replacing a knowledge management system, knowledge graphs are viewed as an integral component.
The book encompasses a wide range of topics and provides insightful descriptions of various sectors where knowledge graphs can bring significant benefits. Additionally, it features an extensive appendix that includes interviews with influential opinion leaders and representatives from organizations that have successfully implemented knowledge graphs.
The book's content is comprehensive, catering to different needs, including the emerging field of explainable AI, which has captured the interest of many professionals today. For those seeking innovative approaches beyond traditional knowledge management, this book offers valuable insights into a new and intriguing field.
What are Knowledge Graphs?
Knowledge graphs are graph structures consisting of nodes and their relationships. They are constructed based on data and serve as a foundation for creating knowledge. Knowledge graphs enable the extraction of new insights through visualization, making them powerful tools for understanding complex information. Often referred to as semantic networks, they depict connections between words that describe real-world entities. These words are associated with individual nodes, representing specific entities and connections, which enhance overall comprehension. Knowledge graphs can take on different forms, such as individual-focused graphs that capture characteristics of individuals or conceptual graphs that capture aspects of groups or broader concepts.
Example of an individual graph:
Example of a conceptual graph:
In 1976, John Sowa published his first research paper on conceptual graphs. In 1982 two mathematicians in the Netherlands invented the earliest versions of knowledge graphs. Today, "knowledge graphs" typically refer to software products that utilize business rules and artificial intelligence to automatically generate graphs from structured data. These tools create business rules that align with the information represented in the graphs.
Components of the graph:
URI (Uniform Resource Identifier): This represents the objects within the graph. Each object is managed as a triple consisting of the item's address, a relationship name indicating its attribute, and another related item's address or attribute name. For example: [Address] is associated with "Viennese schnitzel" or [Address] is part of [Other Address]. The combinations of these triples create a network of connections, forming the complete knowledge graph.
RDF Triple store: This is the computerized representation of the graph, known as RDF (Resource Description Framework).
Ontologies and Taxonomies: These are essential for maintaining order within an extensive graph and providing business relevance. They classify the contents and may include thesauri, which provide synonyms and connections between words.
Concepts: Concepts represent entities or business objects within the graph. They are interconnected schematically, and each concept has at least one name, with a preferred name always specified. For example, "beef" can be a concept within the knowledge graph.
Useful Knowledge Graphs:
Knowledge graphs have numerous applications and benefits, including:
Enhancing understanding: Knowledge graphs provide insights into data and the relationships between components, improving comprehension and aiding in tasks like search and analysis.
Semantic search improvement: Knowledge graphs enhance semantic search accuracy and relevance, leading to better search results.
Personalized user experience: By leveraging the relationships within the knowledge graph, user experiences can be tailored and personalized to individual preferences.
Concise information presentation: Knowledge graphs enable the concise presentation of information, exemplified by the Google Knowledge Graph.
Analysis Capabilities: Knowledge graphs support complex analyses, enabling the discovery of new knowledge and insights in areas like drug discovery and fraud detection, particularly in deep text analysis.
Data quality improvement: Knowledge graphs help identify exceptions and anomalies, enhancing data quality by detecting deviations from expected behavior.
Automation initiation: Knowledge graphs can initiate process automation based on data, integrating with technologies like Robotic Process Automation (RPA).
Enhanced data governance: Knowledge graphs contribute to better data governance practices, enabling organizations to manage data more effectively and ensure compliance.
Support for machine learning: Knowledge graphs play a role in machine learning processes, supporting tasks like model training and data preparation.
Explainable AI: Knowledge graphs provide interpretability and transparency in machine learning systems, making AI outputs more understandable and trustworthy.
Decision support systems: Knowledge graphs enhance decision-making processes by improving understanding and explanatory capabilities.
Internet of Things (IoT) applications: Knowledge graphs give meaning to the vast amount of data collected from IoT sensors, creating a "Graph of Things" and enabling the development of Digital Twin models, instrumental in smart city management and other IoT applications.
Establishing a common semantic context: Knowledge graphs enable the creation of a shared semantic context within an organization's catalog, utilizing linked tags to establish connections and meaning.
Shared views: Knowledge graphs provide a comprehensive 360-degree view, facilitating cross-language data understanding, cross-system integration, and cross-organizational collaboration.
Application in the Organization:
The organization can benefit from collaborations with various partners who play key roles in leveraging knowledge graphs. The following potential partners are crucial for maximizing the value of knowledge graphs within the organization:
Director of IT/Computing: This stakeholder is essential for advancing the organization's AI strategy, managing governance, and addressing business needs using software tools.
Data Manager/Analytics Leader: They utilize knowledge graphs to extract valuable insights from organizational data, contributing to data-driven decision-making and generating meaningful analytics.
Artificial Intelligence Architect: This partner provides comprehensive solutions and implements the organization's AI application landscape, ensuring seamless integration with knowledge graphs.
Data/Information Architect: Responsible for harmonizing technologies and architectural frameworks, they enable effective data management and integration within the knowledge graph.
Data Engineer: They utilize knowledge graphs to understand their data better, facilitating efficient data handling processes.
Machine Learning Engineer: Leveraging models learned from knowledge graphs, they drive machine learning initiatives, integrating graph-based insights into AI models.
Knowledge Engineer/Metadata Expert: These partners play a crucial role in creating data-driven taxonomies and ontologies using knowledge graphs, organizing and structuring information effectively.
Content Expert: Domain specialists in specific content areas utilize knowledge graphs to comprehend and represent data models in their respective fields.
Data Scientist/Data Analyst: They leverage knowledge graphs to understand, analyze, and forecast insights from data, supporting evidence-based decision-making processes.
Business User: Business users receive tailored responses to their specific needs facilitated by knowledge graphs, benefiting from the capabilities without requiring deep knowledge of underlying tools and technologies.
Collaborating with these partners across different roles within the organization ensures the successful implementation and utilization of knowledge graphs, leading to enhanced business outcomes and data-driven practices.
Here are the recommended steps for introducing knowledge graphs into the organization:
- Assess the organization's technical and organizational maturity to ensure readiness for knowledge graph implementation.
- Select the specific domain or area where knowledge graphs will be applied.
- Define a series of experiments and pilots with limited scope to test the feasibility and potential benefits.
- Gather experience and insights from the experiments and pilots.
- Formulate an action strategy with well-defined goals and objectives for knowledge graph implementation.
- Implement an iterative deployment approach based on successful case studies and lessons learned.
- Manage change effectively throughout the integration process, addressing any challenges or resistance.
- Continuously evaluate and measure the success of the knowledge graph implementation.
- Explore and consider various integration options at an organization-wide level.
Application of Knowledge Graphs at Different Levels:
Semantic AI application: Utilizing knowledge graphs to enhance AI capabilities and semantic understanding.
Conceptual and linguistic models of ontology and taxonomy: Creating structured models to classify and categorize information.
Knowledge graphs based on various data types: Incorporating numerical data, documents, and other data types into the knowledge graph.
Content layer and underlying data: Integrating the knowledge graph with the underlying data sources and content management systems.
Stages in Semantic Modeling of Knowledge:
Differentiation of various types of entities: Identifying and distinguishing different types of entities in the knowledge graph.
Naming each entity type: Providing specific names or labels for each entity type.
Creation of facts and relationships between entities: Establishing connections and relationships between entities within the knowledge graph.
Classifying items into specific categories: Grouping items and entities into particular categories or classes.
Establishment of general facts and relationships across categories: Defining broader points and relationships that span across different types.
Synonym creation across different languages: Creating synonyms and equivalents for entities in other languages.
Contextualization and separation of entities: Understanding the context and separating entities based on specific criteria or contexts.
Fusion of entities with similar characteristics: Combining entities with similar attributes to simplify the knowledge graph.
Mapping entities with identical characteristics across different graphs: Linking entities with similar features across other knowledge graphs.
Creation of new connections and inferences based on existing entities: Generating new links and drawing inferences based on the existing entities in the knowledge graph.
The Life Cycle of Working with Knowledge Graphs:
User: Extracting, analyzing, visualizing, and interacting with the knowledge graph. This may also involve model training and refinement.
Machine: Ingesting, retrieving, transferring, enriching, and linking elements to create and maintain the knowledge graph.
Expert: Managing inventory, extracting insights, creating ontology and taxonomy structures, data cleansing, and linking entities and concepts across different knowledge graphs.
By following these recommended steps and understanding the different levels and stages of knowledge graph implementation, the organization can successfully adopt and utilize knowledge graphs to unlock the full potential of its data and information resources.
Several methodologies can facilitate the application of knowledge graphs across their components:
Card Sorting: This methodology identifies topics, assigns names, and categorizes information. It can be implemented using physical cards or dedicated software tools.
Taxonomy Management: These methodologies focus on ensuring data governance and process modeling within the organization, providing a structured framework for organizing and classifying information.
Ontological Management: The book explains that ontological management methods vary depending on the project, and no single supporting tool exists. Instead, it involves a collection of best practices, such as agile development, focus, and validation, to ensure the ontology is effectively designed and maintained.
RDFization: This process involves transforming structured data into RDF representation, the foundation of knowledge graphs. Different approaches, such as centralized or distributed methods, can be used for implementation, each offering advantages.
Text Mining: This methodology focuses on extracting structured information from unstructured data and representing it in RDF format. It includes tasks like entity extraction, content classification, and fact extraction to convert textual data into a structured layout compatible with knowledge graphs.
Entity Linking and Data Fusion: This stage involves integrating local products and various sources of information into the knowledge graph, linking entities across different datasets, and ensuring a comprehensive representation of knowledge.
Knowledge Graph Querying: Utilizing the SPARQL protocol, organizations can query the knowledge graph to retrieve specific information or insights. Users can do this directly or through APIs that enable programmatic access to the graph.
Constraint-Based Data Validation: This methodology focuses on identifying anomalous data by applying business rules represented by the knowledge graph to data collection. It helps identify areas where the data does not meet the defined rules and constraints.
Inference in Graphs: By activating inference engines, organizations can derive new relationships or data within the knowledge graph, enabling automated reasoning and discovery of implicit knowledge.
Quality Measurement: This involves conducting qualitative assessments of various components of the knowledge graph, such as data coding, naming conventions, ontology and taxonomy planning, correctness, coverage, and performance. It ensures that the knowledge graph meets the required quality standards.
By leveraging these supporting methodologies, organizations can enhance the implementation and utilization of knowledge graphs in their operations, enabling effective knowledge management and deriving valuable insights from their data.
Recommendations for Implementing Knowledge Graphs
To ensure the proper handling and utilization of knowledge graphs, the following recommendations are suggested:
Start Small and Scale: Begin with a small implementation and gradually expand. Employ agile methodologies to iterate and improve the knowledge graph over time.
Understand Structured and Unstructured Data: Gain a comprehensive understanding of structured and unstructured data sources that will be incorporated into the knowledge graph.
Consider Taxonomy Acquisition: Explore the option of acquiring ready-made taxonomies or creating them automatically from the data or with the assistance of domain experts. Evaluate the pros and cons to determine the most suitable approach for each content area.
Utilize Clear Metadata: Use clear and descriptive metadata suitable for human understanding and machine interpretation. Ensure that the metadata is formal, rich, and reusable.
Provide Context within the Knowledge Graph: Ensure that the knowledge graph provides context that distinguishes between entities with similar names, such as between a person and a company.
Design an Integrated Data Fabric: Move away from discrete repositories, data warehouses, or data lakes and instead adopt a data fabric approach. This fabric integrates data, provides semantic links, and caters to different user needs, including access to unstructured big data and structured, filtered information for data scientists and researchers.
Use Established Standards and Methods: Leverage familiar standards and methods for organizing knowledge to avoid reinventing the wheel and promote interoperability.
Focus on Business Objects: Direct the focus on the business objects discussed within the knowledge graph rather than solely on the graph itself. View the knowledge graph as a means of supporting visualization and enabling the entire knowledge management life cycle.
Measure Business Success: Assess the success of the knowledge graph implementation by evaluating tangible benefits such as time saved in locating information, improved integration of databases, and the new possibilities that arise from connecting data in novel ways.
In summary, knowledge graphs are a dynamic and successful field that transforms tacit knowledge into explicit, structured, and shared information. As knowledge managers, embracing and implementing knowledge graphs can unlock new possibilities and enhance knowledge management practices.