<== Back to Area 2 - Assets and Connectors | Forward to Area 4 - Governance ==> | Models for referenced objects: Referenceable
Area 3 provides definitions for the glossary. This enables the definition of meanings and the relationships between different types of terminology. Most definitions are created through a manual process, however, this definition may occur in a different tool and be replicated automatically into other metadata repositories. There can be multiple glossaries in the metadata repositories. Each glossary owns a set of glossary terms and (optionally) a category hierarchy. Glossary terms can be linked into none, one or many categories, from any glossary. Similarly, terms from different glossaries may have relationships linking them together.
Figure 1 shows the packages for the glossary.
Figure 1: Packages for area 2 - the glossary
- 0210: Top-level Glossary object definition.
- 0220: Glossary Categories and their relationships.
- 0230: Glossary Terms and their relationships to Glossary and Category.
- 0240: Semantic classifications for glossary terms
- 0250: Semantic relationships between Glossary Terms
- 0260: Defining contexts in the glossary
- 0270: Semantic assignments for metadata entities
- 0280: Structuring relationships between Glossary Terms
Each package is defined in its own model file <package-name>.json and will sit the addons/model directory of the atlas build tree if Atlas becomes the open metadata platform.
A metadata repository may contain many glossaries, particularly when it is part of a bigger enterprise cohort of repositories. Each glossary may come from a specific team or external organization. Or it may be focused on a particular topic or set of use cases. Figure 2 shows how a glossary is defined.
Figure 2: The glossary object provides the anchor point for the glossary content
The anchor for each glossary is the Glossary object.
The classifications associated with the glossary object are used to document the type of vocabulary it contains and its purpose:
- Taxonomy - A Taxonomy is a glossary that has a formal structure. Typically the terms have been organized into a category hierarchy that reflects their meaning or use. There may also be term relationships that also form part of the hierarchy. Taxonomies are often used to organize documents and other media in content repositories.
- Canonical Vocabulary - this glossary provides the standard vocabulary definitions for an organization. Typically terms from other glossaries are linked to terms from the canonical glossary.
These classifications are independent of one another so a Glossary object may have none, one or all of these classifications attached. In addition, there is a relationship to an external glossary.
- ExternallySourcedGlossary - The content from this glossary comes from an external source. It may be, for example an industry-specific glossary, or from a standards body, or from an open data site, or from a commercial organization.
The vocabulary for the glossary is organized into a hierarchy of categories. These categories effectively provide a folder structure for the glossary. Figure 3 shows the definition for a glossary category.
Figure 3: The glossary category and its hierarchy
- GlossaryCategory represents a category in a glossary.
- CategoryAnchor links each category to exactly one Glossary object. This means that this is its home glossary. If the Glossary object is deleted then so are all of the categories linked to it.
- CategoryHierarchyLink is a relationship used to organize categories into a hierarchy to, for example, create a structure for a taxonomy. A category may have none or one super-categories. However this super-category may be in a different glossary.
- SubjectArea is a classification for a category that indicates that the category represents a subject area.
- LibraryCategoryReference provides reference information for how this category corresponds to a category in an external glossary.
The vocabulary for the glossary is documented using terms. Each term represents a concept of short phrase in the vocabulary. Just like a category, a term is owned by a glossary but can be linked into a category from any glossary. Figure 4 shows the glossary term.
Figure 4: Terms
- GlossaryTerm represents a term in a glossary.
- TermAnchor links each term to exactly one Glossary object. This means that this is its home glossary. If the Glossary object is deleted then so are all of the terms linked to it.
- TermCategorization is a relationship used to organize terms into categories. A term may be linked with many categories and a category may have many terms linked to it. This relationship may connect terms and categories both in the same glossary and in different glossaries.
- LibraryTermReference provides reference information for how this term corresponds to a term in an external glossary.
The dictionary model adds some basic term classification used to show how particular terms are used. Figure 5 shows the thesaurus model.
Figure 5: The dictionary model
- ActivityDescription is a classification used to indicate that the term describes a verb, or an activity. Most term definitions are nouns, they describe concepts or things. However, it is useful to be able to define the meanings of particular activities in the glossary. The ActivityDescription classification highlights when a term describes such an activity.
- OPERATION - describes a function or API call
- ACTION - describes a governance action that results from evaluating governance rules.
- TASK - describes a task performed by a person.
- PROCESS - describes a process, which is a series of steps that are performed in a defined order.
- PROJECT - describes a type of project.
- OTHER - describes some other type of activity
- AbstractConcept - the term describes an abstract concept.
The Related Terms model contains relationships used to show how the meanings of different terms are related to one another.
Figure 6: The related term model
The TermRelationshipStatus defines how reliable the relationship is between two glossary terms:
- DRAFT means the relationship is under development.
- ACTIVE means the relationship is validated and in use.
- DEPRECATED means the the relationship is being phased out.
- OBSOLETE means that the relationship should not be used anymore.
- OTHER means that the status is not one of the statuses listed above. The description field can be used to add more details.
The related term relationships are as follows:
- RelatedTerm is a relationship used to say that the linked glossary term may also be of interest. It is like a "see also" link in a dictionary. The description field can be used to explain why the linked term is of interest.
- Synonym is a relationship between glossary terms that have the same, or a very similar meaning.
- Antonym is a relationship between glossary terms that have the opposite (or near opposite) meaning.
- PreferredTerm is a relationship that indicates that the preferredTerm should be used in place of the preferredToTerm.
- ReplacementTerm is a relationship that indicates that the replacementTerm must be used instead of the replacedByTerm. This is stronger version of the PreferredTerm.
- Translation - is a relationship that defines that the related terms represent the same meaning but each are written in a different language. Hence one is a translation of the other. The language of each term is defined in the Glossary object that anchors the term.
- IsA is a relationship that defines that the "isA" term is a more generic term than the "isOf" term. For example, this relationship would be use to say that "Cat" ISA "Animal".
- ValidValue is a relationship that shows the validValue term represents one of the valid values that could be assigned to a data item that has the meaning described in the "validValueFor" term.
The Context model defines a classification for a glossary term that indicates it defines a context, and a relationship called UsedInContext to link terms that are relevant in that context.
Figure 7: The context model
ContextDefinition classification indicates that the term describes a context.
Glossary Terms that are relevant in that context are linked to the context definition term using the UsedInContext relationship.
Figure 8: The Semantic Assignment model
SemanticAssignment is a relationship used to assign a term to a referenceable object. This means that the term describes the meaning of the referenceable object. The semantic assignment needs to be a controlled relationship when glossary definitions are used to provide classifications for the data assets and hence define how the data is to be governed.
|Thus TermAssignmentStatus defines how much the semantic assignment should be trusted. The relationship is created by the user (person or engine) identified by the createdBy attribute. The confidence attribute in the relationship stores the level of confidence (0-100%) in the correctness of the relationship - it is typically used by discovery engines. The steward is the person responsible for assessing the relationship and deciding if it should be approved or not.|
- DISCOVERED - this semantic assignment was added by a discovery engine.
- PROPOSED - this semantic assignment was proposed by person - they may be a subject mater expert, or consumer of the Referenceable asset.
- IMPORTED - the relationship has been imported from outside of the open metadata cluster.
- VALIDATED - this relationship has been reviewed and is highly trusted.
- DEPRECATED - this relationship is being phased out. There may be another semantic relationship to the Referenceable that will ultimately replace this relationship.
- OBSOLETE - this relationship is no longer in use.
- OTHER - the status of the relationship does not match any of the other term status values. The description field can be used to add details about the relationship.
The spine object model adds the relationships that enable a glossary to contain the definition of spine objects that can be used to control access to data, and the guild the design of new data stores and APIs. Figure 6 shows the relationships and classifications used to describe spine object.
Figure 9: Spine Object Model
- SpineObject - is a classification to say the term represents a type of object
- SpineAttribute - is a classification to say the term represents a type of attribute that is common for a spine object.
- ObjectIdentifier - is a classification saying that a term is typically an identifier attributed for a spine object.
Note that a term may be a spine object and/or a spine attribute and/or an object identifier at the same time.
- TermHASARelationship - is a term relationship between a term representing a SpineObject and a term representing a SpineAttribute.
- TermISATYPEOFRelationship - is a term relationship between two SpineObjects saying that one is the sybtype (specialisation) of the other.
- TermTYPEDBYRelationship - is a term relationship between a SpineAttribute and a SpineObject to say that the SpineAttribute is implemented using a type represented by the SpineObject.
A glossary project is used to develop new elements for the glossary.
Figure 10: Classifying that a project relates to new glossary content