An Entity Data Model for Relational Data Part I: Defining the Entity Data Model
Microsoft’s Entity Data Model allows you to define an application-oriented view of your data consistent with how you reason about that data. Part I of this article describes the Entity Data Model and how it enables you to represent real-world concepts in a way that makes relationships between related pieces of data more explicit and easier to query, navigate, and consume than through the traditional relational database model. Part II of the article discusses how Microsoft’s ADO.NET Entity Framework provides a flexible mapping of an application-oriented conceptual schema in terms of the Entity Data Model to existing relational database schemas. Shyam Pather’s article, “Programming Against the ADO.NET Entity Framework” completes the picture by describing the actual programming model and API exposed by the framework.
The world around us is infinitely complex. Describing that world in a way we can reason about requires us to break it into simpler components. We can see a pattern emerge in how we describe the physical world when we think about how we communicate concepts to our children; there are “things” (“Tessa”, “Kara”, a blanket) and relationships between things (“The blanket belongs to Kara.”)
In his groundbreaking 1976 paper introducing the entity-relationship model, Dr. Peter Chen defines a way of modeling real-world concepts by breaking complex information into things (entities) and associations between things (relationships).
Contrast this with the relational model employed by relational databases today. The relational model, first described by Edgar Codd in 1969, emphasizes relations, or tables of data, not relationships. The relational model is built around data normalization, which simplifies data storage and maintenance by minimizing the duplication of information in order to enforce data consistency. For example, to describe the fact that the blanket belongs to Kara, you might define three tables: Person, Object, and Ownership.
Dr. Chen compares the relational and entity-relationship models, saying that the entity-relationship model adopts a “…more natural view that the real world consists of entities and relationships” where the relational model “…can achieve a high degree of data independence, but it may lose some important semantic information about the real world.” For example, the three tables ("Person", "Object", and "Ownership") are not enough to understand the model. You must know that columns within the "Ownership" table contain the primary key fields of the object being owned and the owner-the semantic meaning of those columns is not captured by the model itself.
Microsoft’s new ADO.NET Entity Framework is an implementation of Dr. Chen’s entity-relationship model that maps relational database schemas to an Entity Data Model. This article describes key aspects of the Entity Data Model along with features of the ADO.NET Entity Framework, which allow mapping of that Entity Data Model to a relational store.
Components of the Entity Data Model
Following the entity-relationship model, the Entity Data Model is composed of entities and relationships.
What Is an Entity?
An entity is an instance of an EntityType (for example, a person or a blanket). The EntityType describes the Properties that define the structure of the entity (for example, name, birthdate, hair color and eye color). In order to be an entity, there must be a set of Key properties that uniquely identify the instance from other instances of the same EntityType within an EntitySet (for example, social security number).
EntityTypes may extend other EntityTypes through inheritance. For example, a Salesperson EntityType may extend an Employee EntityType by adding properties for Region, Quota, and Commission, while an Engineer may extend the same Employee EntityType by adding properties indicating the ProductTeam she is a member of. The Entity Data Model does not support multiple inheritance (a Salesperson cannot also be an Engineer, within the same model).
Inheritance typically implies substitutability (either a SalesPerson or an Engineer can be supplied anywhere an Employee is requested) as well as polymorphism (a request for Employees can return both SalesPersons and Engineers).
Related properties may be grouped together as a single composite property. For example, StreetAddress, City, Region, and ZipCode may be grouped together into a single "Address" property. The structure of that composite property is defined through a ComplexType that can be used by multiple EntityTypes, as well as other ComplexTypes, within a schema (for example, Employees may have a "HomeAddress" property, while Orders may have a "ShipTo" address). ComplexTypes differ from EntityTypes in that they do not have independent identifiers; an instance of a complex type is addressed by an instance of an entity plus the name of the property on that instance to which the complex type is defined.
Relationships define interesting associations between entities (for example, "Ownership"). Relationships are described by an AssociationType, which defines the types of entities that make up the association (for example, “ManagerEmployee” is made up of two “Employee” EntityTypes), their Roles (“Manager” and “Employee”) and Cardinality (each Employee has at most one Manager, while a Manager can have one or more Employees). Relationships may be one to one (for example, a marriage), one to many (for example, Manager to Employees), or many to many (for example, students to classes).
Compositional relationships are a special type of relationship in which one entity within the relationship contains the related entity (or entities). For example, OrderLines may be contained within an Order. In a compositional relationship, the contained entity (“OrderLine”) must be related to exactly one containing entity (“Order”). Thus:
- A contained entity cannot be associated with more than one containing entity through the compositional relationship. It is possible for it to be associated with other entities through non-compositional relationships (an OrderLine cannot be associated with more than one Order, but may be associated with products, suppliers, etc.).
- Any instance of a contained entity must be related to an instance of a containing entity (an OrderLine must have an Order).
- Deleting the containing entity (Order) deletes all contained entities (OrderLines).
- Additionally, an entity can be the contained entity in at most one compositional relationship.
While the initial version of the Microsoft Entity Data Model does not directly support compositional relationships, most of the characteristics of a compositional relationship (other than the restriction that an entity can be the contained entity in at most one compositional relationship) can be modeled through identifying relationships. In an identifying relationship, the key field(s) of the containing entity make up part of the key for the contained entity, and referential integrity is used to ensure the contained entity has a non-null containing entity, and is deleted if the containing entity is deleted.
Relationships with Payloads
Relationships with payloads (often called association entities) are used to add additional information to a relationship-for example, an "Employment" relationship between a Company and a Person may include HireDate, Salary, and Level. Although the initial version of the Entity Data Model does not directly support relationships with payloads, the same information can be represented by defining an intermediate EntityType with the additional information that has one to one relationships with the other two EntityTypes (for example, an “Employment” EntityType with HireDate, Salary, and Level, and relationships to both Company and Person).
Similarly, although the first version of the Entity Data Model only supports binary relationships (relationships with exactly two ends), n-ary relationships (relationships that may have more than two ends) can be represented by defining an intermediate EntityType with more than two binary relationships. (For example, an EntityType “Game” with relationships to home team, visiting team, referee, and the location. In this case, you may want to add other properties, such as start time and duration of the game, and final score.)
While defining an intermediate EntityType captures the content of relationships with payloads and n-ary relationships, doing so loses some of the semantic meaning of the model (“Employment” isn’t really an EntityType; it exists only to describe the relationship between two entities). Microsoft is looking to add both association entities and n-ary relationships, as well as compositional relationships, to future versions of the Entity Data Model to more fully represent the semantic meaning.
So far I've described how entity and relationship types are defined. Applications interact with entities and relationships through an instance of an EDM schema, defined by named sets of entity and relationship instances.
Instances of entities live within a named EntitySet. A single instance of an entity can belong to only one EntitySet. An EntitySet is the equivalent of a relational table.
Just as entities live within a named EntitySet, relationship instances live within a RelationshipSet. RelationshipSets hold the relationship instances of a particular type between entity instances within two specific EntitySets. RelationshipSets are loosely analogous to join tables in relational schemas.
EntitySets and RelationshipSets are defined within an EntityContainer. An EntityContainer can have multiple EntitySets of the same EntityType.
Microsoft’s Entity Data model defines an entity-relationship model for dealing with data. By modeling data in terms of instances and relationships, services such as querying, reporting, synchronizing, and programmability against an object model can be defined in terms of that entity model. Part II of this article describes how the Entity Data Model is used by the Microsoft ADO.NET Entity Framework to define an application-oriented schema that can be flexibly mapped to a variety of relational schema representations.
By modeling data in terms of instances and relationships, services such as querying, reporting, synchronizing, and programmability against an object model can be defined in terms of that entity model.
By: Michael Pizzo
Michael Pizzo has worked for over 17 years in the design and delivery of data access solutions and APIs at Microsoft. Michael first got involved in data access as a Program Manager for Microsoft Excel in 1987, integrating Microsoft’s flagship spreadsheet product with relational data. This led to his involvement in the design and delivery of ODBC, along with the ODBC-based Microsoft Query Tool shipped with Microsoft Office. During the design of ODBC, Michael was active in the standards organizations, sitting as Chair for the SQL Access Group, working with X/Open on the CAE specification for “Data Management: SQL-Call Level Interface (CLI)”, serving as Microsoft’s representative to the ANSI X3H2 Database Committee, and as an elected ANSI representative to the ISO committee meetings that defined and adopted Part 3 of the ANSI/ISO SQL specification for a call-Level Interface (SQL/CLI). Following ODBC, Michael was a key designer and driver of Microsoft’s OLE DB API for componentized data access within a COM environment, and later owned the design and delivery of ADO.NET version 1.0. He is currently a Principle Architect in the Data Programmability Team at Microsoft, contributing to the architecture and design of the next version of ADO.NET and core building block for Microsoft’s exciting new data platform; The ADO.NET Entity Framework.
ADO.NET Entity Framework lets developers build applications that access data by programming against a conceptual application model instead of programming directly against a relational storage schema..