It’s been 8 years since Google declared “things, not strings” and unveiled its famous Knowledge Graph project. Now there are many large knowledge graph use cases in industry, managed by private firms for commercial purposes.
A line-up of company logos for private KGs includes Google, Amazon, Microsoft, eBay, Facebook, Twitter, Uber, Lyft, etc., and some of these include billions of triples. Other industry use cases for KGs are found throughout a range of business verticals including Finance, Pharma, Manufacturing, etc. Overall, in terms of engineering the world’s collected knowledge, these are what people commonly cite as emerging components of AI.
On the more public side of this endeavor, several notable research projects, government programs, community efforts, and so on, provide analogies to the private, commercial KGs, albeit without quite the richness and funding.
|
This month’s newsletter explores Private vs. Public KGs, then considers the Underlay project at MIT. To quote: “Powerful collections of machine-readable knowledge are growing in importance each year, but most are privately owned.”
What if a decentralized approach – dare we say, almost blockchain-ish in spirit – could provide verifiable components as the underpinnings for a large, public knowledge engineering effort? Some would argue that it’s important to have this kind work underway which is not funded by advertising.
Others would point out that the late 20th century approach to crowdsourced knowledge – i.e., Wikipedia – has been fraught with bias, poor quality, and is perhaps not the best process for decentralizing shared infrastructure.
What do you think? Weigh in over on the Slack.
|
|
Google's Knowledge Graph
The most famous of the bunch--we use this KG every time we use Google for search. The Google Knowledge Graph is a knowledge base used by Google to sort related information and present a dynamic infobox next to the search results.
|
|
DiffBot
Using machine learning, Diffbot structures and crawls
98% of the public web, transforming the internet into accessible, structured data. Users can search for or extract anything on the web to mine data, discover relationships, and pull data directly into daily tools and workflows.
|
|
Wolphram Alpha
You can use Wolfram Alpha’s breakthrough algorithms, knowledge base and AI technology to compute expert-level answers on a variety of high-level topics: mathematics, science & technology, society & culture, and everyday life.
|
|
Scopus
Scopus uniquely combines a comprehensive, curated abstract and citation database with enriched data and linked scholarly content. You can use it to quickly find research, identify experts, and access reliable data, metrics, and analytical tools for research strategy decisions.
|
|
Bing Knowledge and Actions Graph API
Bing has over a billion entities (people, places, and things) in its KG, with over 21 billion associated facts, 18 billion links to key actions and over 5 billion relationships between entities. Leveraging this asset, developers can meet their users’ information needs and help users perform searches in context--instead of leaving the app to perform searches.
|
|
Amazon Product Graph
The goal of the Amazon product graph is to structure all of the world’s information as it relates to everything available on their website. The graph uses product and non-product concepts to describe items and to form links between different entities. This allows customers to use greater variation in search terms when looking for items.
|
|
NIH MeSH
Medical Subject Headings (MeSH) RDF is a linked data representation of the MeSH biomedical vocabulary produced by the National Library of Medicine. It includes an RDF triple-store, a SPARQL endpoint and query editor, and a RESTful interface for retrieving MeSH data.
|
|
Wikidata
Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of Wikipedia (and other Wikimedia projects).
|
|
EU Vocabularies
The EU uses controlled vocabularies to easily meet the translation requirements of its multilingual communication. Controlled vocabularies also allow for the easy harmonization of concepts to improve technical, business, institutional and inter-institutional communication. The related knowledge management and metadata is geared to a machine-readable environment, meant to improve dissemination/discovery, repurposing/reuse, and collection/merging of data across open and globally connected digital environments.
|
|
MIT Underlay
The Underlay is a global, distributed graph of public knowledge, designed to replicate the richness of private knowledge graphs in a public, decentralized manner. Powerful collections of machine-readable knowledge are growing in importance each year, but most are privately owned (e.g., Scopus, Google’s Knowledge Graph, Wolfram Alpha). The Underlay aims to secure such a collection as a public resource. It also gives chains of provenance a central place in its data model, to help tease out bias or error that can appear at different layers of assumption, synthesis, and evaluation. Initial hosts will include universities and individuals, such that no single group controls the content.
|
|
DBPedia
DBpedia is a crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects. This structured information resembles an open knowledge graph (OKG), available for everyone on the Web. One can navigate this Web of facts with standard Web browsers, automated crawlers or pose complex queries with SQL-like query languages (e.g. SPARQL)
|
|
- Ben Lorica interviewed Mayank Kejriwal about building knowledge graphs as a foundational component of AI applications, and also about embedding.
- Ben posted threads on Twitter and LinkedIn calling out contrasts between what Mayank described and what had been described on earlier episodes with Denise Gossnell (CDO DataStax), and also conversations with TigerGraph and Neo4J.
- Denise asked to have a follow-up discussion – we’re really looking forward to that!
- TigerGraph replied about issues regarding scale and use cases.
- It seems that a dialog within our community, between the GraphDBs advocates and Knowledge Graph Embedding experts, is shaping up.
- Another KGC community member Bob DuCharme (author of Learning SPARQL) added: “I had waited years to hear Ben mention ‘RDF’ on his podcast.”
|
|
- Nov 30 - Dec 2: Knowledge Connexions 2020
- October 23: Office Hours with Paco Nathan on KG/ML
- October 29: Knowledge Espresso with Jon Herke with students from Futurist Academy
- November 5: Knowledge Espresso with Michael Bronstein
- November 12: Knowledge Espresso Aaron Bradley on The Emergence of the Content Graph
- And more: Keep up to date with the KGC Events Calendar.
|
|
"knowledge graph embedding"
An approach for mapping the entities and relations within a knowledge graph into low-dimensional vector spaces, to simplify manipulation of a KG while preserving its inherent structure. Often abbreviated as KGE. Think of factoring a large knowledge graph into smaller components, while preserving context among its entities and relations. Embeddings can be used to train machine learning models that generalize from a KG, for example to derive new knowledge from known facts (link prediction), disambiguate overlapping definitions (entity resolution), build question answering systems, etc. In general, uses of embeddings point toward improving AI reasoning systems and providing Explainable AI.
|
|
|
|