Entity Modelling

www.entitymodelling.org - entity modelling introduced from first principles - relational database design theory and practice - dependent type theory


Foundations of Data

Studies of the foundations of a subject should reveal something of its essential characteristics, its possibilities and its bounds; though study of the foundations of data modelling seems absent from the literature, such studies ought to reveal to us more directly than logic or philosophy something of the nature of our relationship to the world.

Indiscernibility of Identicals

What is data and what do we mean by data structure, by data model and by database? As so often it is easier to give examples in response to questions like these than to give real answers — flight lists, bank records, a star catalogue, a price list, a list of waypoints along a route, the runners in a race, the times of the tides — all these things can be presented as data typically containing names, both natural and coded, numerics as both ordinals and cardinals, measurements, dates and times, distances. Though we wouldn't usually put the matter so, asking What is data is like asking What is language? for data is a form of language, and asking How does data relate to the real world is a way of asking How does language relate to the real world?

Consider that in tabular presentations of data as well as in general discourse we often label things or more particularly number them arbitrarily. We make attributions of properties to individuals that in themselves do not have these properties; we do this of necessity when representing and communicating relationships between entities which are otherwise indiscernible from their properties. Thus, in a story of three, we can speak of the first, the second and the third, which is an arbitrary attribution, whereas in a story of an Englishman, a Scotsman and a Welshman we need for the telling no such arbitrary attribution. In the example tables used to present molecular structure in figure 7 each table of bonds makes double reference to entries in the atoms table by identifying entries by their ordinal position in the table; this is an arbitrary attribution of a number to each atom within the molecular structure. This arbitrary attribution is necessary because the entities within an individual molecular structure in and by themselves do not satisfy the logical principle of the identity of indiscernibles. In the study of Logic to accept this principle is to accept that no two distinct individuals/objects/things can be exactly the same in all of their properties. Logician Max Black suggested as a counter example to the principle a completely symmetric universe populated only by two distinct spheres; in such a universe the principle does not hold — the two spheres are indiscernible but not identical; to put it another way, there is no definite description that can be applied to one sphere that does not apply to the other.

(a) Water

atoms
2.5369 -0.1550 O
3.0739 0.1550 H
2.000 0.1550 H
bonds
1 2 1
1 3 1
 

(b) Ethylene

atoms
2.0000 0.0000 C
3.0000 0.0000 C
1.6900 0.5369 H
1.6900 -0.5369 H
3.3100 -0.5369 H
3.3100 0.5369 H
bonds
1 2 2
1 3 1
1 4 1
2 5 1
2 6 1
 

(c) Benzene

atoms
2.8660 1.0000 C
2.0000 0.5000 C
3.7320 0.5000 C
2.0000 -0.5000 C
3.7320 -0.5000 C
2.8660 -1.0000 C
2.8660 1.6200 H
1.4631 0.8100 H
4.2690 0.8100 H
1.4631 -0.8100 H
4.2690 -0.8100 H
2.8660 -1.6200 H
bonds
1 2 2
1 3 1
1 7 1
2 4 1
2 8 1
3 5 2
3 9 1
4 6 2
4 10 1
5 6 1
5 11 1
6 12 1
Figure 7
Representations of molecular structure — courtesy of the PubChem open chemistry database. Each atoms table represents x,y coordinates for the pictorial representations whereas each entry in the corresponding bonds table uses numerical position of rows within the atoms table to make reference to atoms bonded. The final column of the bonds table indicates whether a bond is single(1) or double(2).

What properly constitutes a data model?

If, rather than by blindly following current practice, we are to understand what should properly constitute a data model from first principles then we need decide whether the principle of the identity of indiscernibles should be included among these. The argument for the principle runs alongs the lines that a data model is a theory of various sorts of things that are in some sense real to us and which for convenience we call ‘real world entities’; databases hold data according to this model and represent these real world entities and relationships. Clearly, goes the argument, there must be an unambiguous correspondence between the real word entities and their representations within the database instance — of necessity from the properties of a database entity, a unique real word entity with matching properties must correspond. For this to be achieved then it must be so that no two distinct database entities nor any two real world entities may have exactly the same properties, which is to say that the principle of identity of indiscernibles must hold true both of entities within database instances but also of real world entities. The counter argument is basically the Max Black counter argument given earlier and as illustrated by the molecular structure example — the real world that we wish to represent just might have indiscernibles that are identical. The answer to this dilemma in relational data modelling is to (i) enforce the indiscernibility of identicals in the database model (ii) accept that the principle may not hold of real world entities and thus (iii) to require the introduction of arbitrary distinguishing properties that have no basis in the real world but are simply artefacts introduced for descriptive purposes. In the case of descriptions of molecular structure it is common practice, as instanced in the Pubchem database, that the ‘arbitrary distinguishing property’ takes the form of an ordinal — one is assigned to each atom within the structure, in nature there is no such ordinal nor any other such distinguishing feature.

First Principles of Data

In early computer systems data was said to be stored in records within files and this terminology kept contact with the paper systems whose use preceeded computerisation. Subsequently, driven by E.F. Codd, there was a shift of predominant terminology to data being said to be held in rows within tables, equally, in the accompanying theory, rows are described, unhelpfully it seems to me, as tuples. At the risk of confusing matters further, but with foundations at heart, I will use instead the term message in place of either record or row or tuple.

In building Information Systems the fundamental principles of data need include those regarding the identification of subject entities as described above and to which we can add these:

  • there are subject entitites and they are of a fixed number of types,
  • data consists of messages, each message describes a subject entity,
  • messages have a structure and messages describing the same type of subject entity have the same structure.

If a database is a set of messages able to communicate entitites then what, from first principles, is the message structure? The most general statement that we can make is that a message comprises a set of attributes of the subject entity and that the message structure is agreement upon the set of attributes and the message representation corresponding to each type of subject entity.

Earlier we said that asking What is data? is a bit like asking What is language?. So, in looking for the first principles of data modelling we might look for help at linguistics. In a book written by Jonathan Culler summarising the work of the man sometimes said to be the father of linguistics, Ferdinand de Sassure, we find Alfred North Whitehead quoted:

... every entity is to be understood in terms of the way it is interwoven with the rest of the universe
and then by way of illustration and in Culler's words:
an electron is ... a node in a system of relations, which, like a phoneme, does not exist independently of these relations
In the domain of data messages, and with this lingustic mindset, it is no surprise then that many of the attributes communicated of a subject entity in a message are referential — wholy or in part they communicate relationships of the message's subject entity to other entities.

For the data shown in figure 7 we can describe individual rows in message structures like this:

atom => atomNo,
x,
y,
elementSymbol
bond => bondNo,
atom1No,
atom2No,
bondtype
In these descriptions atom and bond are types of entity and therefore types of message and atomNo, x, y, elementSymbol, atom1No, atom2No and bondtype are attributes. Of these attributes atomNo and bondNo are referential attributes which identify the subject entity of their respective messages and atom1No and atom2No are referential identifying the atoms which bonds link i.e. to which they are related.

Entity Models as Data Models

When entity models are used to specify data models they define the types of subject entities and for each such type the attributes used to communicate entitites of that type. They also define which attributes are referential and the relationships they represent inclusive of which referential attributes identify the subject entity of each message.


PREVIOUS Howlers and Other Violations
NEXT Data Modelling

玻璃钢生产厂家浙江 玻璃钢雕塑玻璃钢牛雕塑多少钱番禺玻璃钢大象雕塑乌鲁木齐玻璃钢雕塑公司玻璃钢雕塑一平方用多少材料安徽户外玻璃钢雕塑图片江苏秋季商场美陈销售玻璃钢雕塑丙烯上色的技法北京靠谱的商场美陈佛山品质玻璃钢人物雕塑广东透明玻璃钢雕塑厂家山东艺术商场美陈批发价北京玻璃钢雕塑浮雕玻璃钢雕塑园林多少钱彭州玻璃钢卡通雕塑海南玻璃钢广场雕塑定制福州玻璃钢雕塑工厂哪一家好春节商场门口美陈水果玻璃钢雕塑公司商场氛围美陈布展玻璃钢公仔雕塑报价明细表金华创意玻璃钢雕塑批发四川艺术商场美陈供应福建玻璃钢卡通雕塑品牌玻璃钢公园人物雕塑现货直销小品玻璃钢人物雕塑推荐厂家天津玻璃钢广场雕塑厂家供应上海玻璃钢卡通雕塑萍乡市玻璃钢雕塑厂玻璃钢雕塑内用什么骨架香港通过《维护国家安全条例》两大学生合买彩票中奖一人不认账让美丽中国“从细节出发”19岁小伙救下5人后溺亡 多方发声单亲妈妈陷入热恋 14岁儿子报警汪小菲曝离婚始末遭遇山火的松茸之乡雅江山火三名扑火人员牺牲系谣言何赛飞追着代拍打萧美琴窜访捷克 外交部回应卫健委通报少年有偿捐血浆16次猝死手机成瘾是影响睡眠质量重要因素高校汽车撞人致3死16伤 司机系学生315晚会后胖东来又人满为患了小米汽车超级工厂正式揭幕中国拥有亿元资产的家庭达13.3万户周杰伦一审败诉网易男孩8年未见母亲被告知被遗忘许家印被限制高消费饲养员用铁锨驱打大熊猫被辞退男子被猫抓伤后确诊“猫抓病”特朗普无法缴纳4.54亿美元罚金倪萍分享减重40斤方法联合利华开始重组张家界的山上“长”满了韩国人?张立群任西安交通大学校长杨倩无缘巴黎奥运“重生之我在北大当嫡校长”黑马情侣提车了专访95后高颜值猪保姆考生莫言也上北大硕士复试名单了网友洛杉矶偶遇贾玲专家建议不必谈骨泥色变沉迷短剧的人就像掉进了杀猪盘奥巴马现身唐宁街 黑色着装引猜测七年后宇文玥被薅头发捞上岸事业单位女子向同事水杯投不明物质凯特王妃现身!外出购物视频曝光河南驻马店通报西平中学跳楼事件王树国卸任西安交大校长 师生送别恒大被罚41.75亿到底怎么缴男子被流浪猫绊倒 投喂者赔24万房客欠租失踪 房东直发愁西双版纳热带植物园回应蜉蝣大爆发钱人豪晒法院裁定实锤抄袭外国人感慨凌晨的中国很安全胖东来员工每周单休无小长假白宫:哈马斯三号人物被杀测试车高速逃费 小米:已补缴老人退休金被冒领16年 金额超20万

玻璃钢生产厂家 XML地图 TXT地图 虚拟主机 SEO 网站制作 网站优化