View this email in your browser
Rules of the Game
Paco Nathan
KGC Editor
Let’s take a tour through the rules of the game in knowledge graphs.

Uli Sattler of University of Manchester presented a lightning talk, “My vision for the Semantic Web” (see slides), in the “Vision” track at the 2020 International Semantic Web Conference (ISWC). To paraphrase, we’ve had 30+ years of description logics, which have taken off within domains that have knowledge-heavy applications. These systems – now embodied as Web Ontology Language (OWL), etc. – leverage inference based on rules. This provides means for representing domain expertise, such as rules for identifying compliance violations.

By comparison, the field of machine learning (ML) has exploded – with some use cases at enormous scale, on which virtually all of the tech giants rely. ML relies on training models from data – lots of data. Sometimes that data is biased, or stale, or completely out of sync with what’s important to the domain experts. Dr. Sattler hopes to see more combining of the symbolic and neural approaches, respectively. If you have access to ISWC content, check out this recommended talk.

One widespread insight since about 2009 is that data science teams must leverage data and ML models plus domain expertise to be effective. Given current trends in knowledge graph embedding, graph neural networks, etc., some of that combination is occurring within the context of knowledge graphs. Quite frankly, more integration and combination of rule-based approaches could be happening.

In practice, having domain knowledge expressed as rules is what you need to evaluate and validate ML models. For example, that’s the notion of behavioral testing – which is well known in software engineering, although relatively recent in, say, natural language (e.g., with CheckList). This becomes especially important for large models with millions, billions, and now trillions of parameters, where more traditional means of “white box” software testing based on requirement lists aren’t quite feasible. This matter is compounded by the increasing risk of security issues related to ML models in deployment. Ben Lorica has been tracking these issues throughout 2020 with industry expert guests on The Data Exchange podcast. In the recent 2021 Trends Report: Data, Machine Learning, and AI by Ben Lorica, Mikio Braun, and Jenn Webb of Gradient Flow, the authors noted the gap between software engineering and machine learning practices, which MLOps begins to resolve. As the report describes, one large factor in that gap is about the lack of testing, which they predict that AI work will focus toward during 2021.

Dan McCreary at Optum recently explored the topic in the article “Rules for Knowledge Graphs Rules” on Medium. Notably, are rules code or data? One can make the case for either, and neither are wrong. However, a key takeaway is:
Rules can be represented as graphs, and enterprise rules engines
can be enterprise knowledge graphs. Connecting rules is a big deal.

Dan goes on to enumerate a super helpful classification of the kinds of rules that we encounter in graphs, including:
  • constraint rules, such as Shapes Constraint Language (SHACL)
  • computed values
  • transformations, such as schema mapping
  • rules about workflows (business process)
  • inference – discovering new relations in a graph, such as Resource Description Framework Schema (RDFS), OWL, etc.
In addition, Dan explores the distinctions between deterministic rules (as in semantic technologies) and probabilistic rules (as in neural networks, statistical relational learning, Bayesian networks, etc.). Overall, this is an excellent article – highly recommended.

Speaking of SHACL, Veronika Heimsbakk at CapGemini led a popular hands-on tutorial about SHACL at Knowledge Connexions in December, along with a related presentation. A GitHub repo with slides, demo code and data is at shacl-master. A new iteration is currently in the works, and we’re delighted to have Veronika leading a SHACL tutorial at KGC 2021 in May, along with a related presentation about SHACL use in natural language processing (NLP) workflows.

To step back a moment, SHACL is the World Wide Web Consortium (W3C) recommendation for shapes constraint language used to validate Resource Description Framework (RDF) graphs against a set of conditions. This lives within the deterministic branch of rules, based on Dan McCreary’s categories. 

Why shapes? If you’re familiar with the concept of a data object (i.e., as used in relational databases), the analog for that in graphs is a shape. SHACL provides a way of describing shape constraints as rules that are readily understood by stakeholders. SHACL has a wide range of use cases, including graph validation, data quality checks, code generation, data integration, compliance audits, and so on. One interesting thing about SHACL is that when constraint rules get applied to an RDF graph, the exceptions can come back as a (smaller) RDF graph. This kind of functional transformation implies lots of interesting potential uses within workflows.

We could dive much deeper into the notion of shapes within a graph, how that ties into topology and other forms of analysis and mathematical transformations applied to graphs. We might even dive into the topic of shape prediction and where reinforcement learning can be applied to KGs – although that discussion is best to continue at KGC office hours or in an upcoming KG tutorial.

Probabilistic rules were the other branch that Dan McCreary described. For example, see libraries such as pgmpy which includes Bayesian Networks, Markov Models, Causal Inference, etc. One category is called statistical relational learning (SRL), essentially knowledge graphs with uncertainty represented. In particular, one form of SRL is probabilistic soft logic (PSL) from Lise Getoor and her team at UCSC. PSL allows for simple rules to describe expected relations in a graph, plus means for estimating “truth” values (uncertainty) of either relations or an entire graph. PSL also has good properties in terms of computational costs. (Here’s a brief tutorial about using PSL, based on the kglab library.)

Where do we use probabilistic rules? For example, suppose you’ve just parsed a large number of documents with an NLP package such as spaCy, and now you want to add thousands of annotations from those documents into a knowledge graph. How do you check for data quality issues – i.e., what if some of those annotations are incorrect? It’s probably very expensive to have people go through each annotation manually to test its effects on the graph, so some form of machine learning and automation would help. PSL provides good means to identify within a graph the relations that appear less likely. In other words, out of a thousand new annotations added, which are the weirdest outliers that an expert should review manually? This approach can be extended into human-in-the-loop applications, identifying where to focus the human expert's attention.

Overall, a number of rule-based tools are available for work with knowledge graphs. Circling back to the earlier point by Uli Sattler, how can we combine these kinds of rule-based approaches within data science workflows? As an example, one could use SHACL constraint rules to identify well-known errors within a KG (deterministic), along with PSL rules to identify the likely anomalies (probabilistic). OWL inference and Simple Knowledge Organization System (SKOS) transitivities are other tools to add into that mix. These approaches are complementary in the sense that some are much more robust when working with uncertainty, while others lead to more formal, analytic solutions. See the figure below (from kglab docs), which shows these relative trade-offs among different rule-based approaches.

Overall, this range of rule-based techniques bodes well for future prospects of KG use cases and practices, and about the richness of KG workflows used in AI applications in the long term.

Meanwhile, check out KGC 2021 for talks by Dan McCreary and Veronika Heimsbakk, as well as a kglab tutorial – plenty about where rules meet KGs – and much, much more.
KGC 2021 Volunteer Opportunities

We still have a few more positions available for the May 2021 Knowledge Graph Conference. Looking for these roles:
  • Metadata Lead (approx. 7 hours per week)
  • Attendee Lead (approx. 5 hours per week)
  • Event Assistance (shifts during May 3-6 event)
Apply Here

Stay up to date with the KGC Events Calendar

Mark Your Calendars for
KGC 2021 on May 3-6!

Stay tuned for ticket sales, which will go live in March!
Have a question or knowledge to share about Knowledge Graphs? Check out our Q&A Board and see what are the latest hot topics in the KG Community.

Veronika Heimsbakk, whose work was mentioned in our lead article, posted some great sources for those who'd like to delve into the area of SHACL documentation.
Read It Here

Let us know what you're working on in Slack.
Copyright © 2021 Knowledge Graphs Conference, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.

Email Marketing Powered by Mailchimp