Google Analytics

Thursday 19 March 2009

The Semantic Web relating to Genealogy: Thoughts Re-jiggled

‘Associating a URI with a resource means that anyone can link to it, refer to it, or retrieve a representation of it’, (Shadbolt et al 2006).
‘Much of the motivation for the Semantic Web comes from the value locked in relational databases. To release this value, database objects must be exported to the Web as first-class objects and therefore must be mapped into a system of URIs’, (Shadbolt et al 2006).
So, I envisage, any genealogical object, (such as a person, family, source, repository, place, note or media), must exist as an individual XML file on the Web that can then be linked to, as desired or required.

As far as possible, these files should be normalised. In the same way as the database rule that a table should relate to ‘the key, the whole key and nothing but the key’, objects should only include tags that relate to them.

Folksonomies
‘[Folksonomies] represent a structure that emerges organically when individuals manage their own information requirements. Folksonomies arise when a large number of people are interested in particular information and are encourage to describe it – or tag it’, (Shadbolt et al 2006).
‘Rather than a centralized form of classification, users can assign keywords to documents or other information sources’, (Shadbolt et al 2006).
This links with my vision of future online genealogy, objects could be linked by tagging and specifying a description of that relationship.

If we have two XML files, perhaps both representing people in our family tree, a user could tag one object from another as being related in some way, perhaps a cousin for example. Behind the scenes the tagging application creates a RDF file which describes that XML file A, representing person A, is the cousin of XML file B, representing person B.

Alternatively, pictures of a person could be tagged, (in a similar way to Facebook, Flickr etc). Again, a RDF file created describing the tagged picture as a picture of my ancestor, who is represented by an XML file elsewhere.

Consider the situation where a paragraph in an online text mentions a relative of some kind, again a link could be made between the two in the same way as described above.
‘But folksonomies serve very different purposes from ontologies. Ontologies are attempts to more carefully define parts of the data world and to allow mappings and interactions between data held in different formats. Ontologies refer by virtue of URIs; tags use words’, (Shadbolt et al 2006).
I don’t see this as a ‘one or other’ situation, but think we need both. An ontology needs to be defined to define a standard for basic types of genealogical link, (e.g. parent, spouse, sibling), and to ensure system compatibility. A folksonomy system has a particular advantage in that it can cover inadequacies of the ontology.

In any application users should be given the choice of creating a standard type of link, as defined in the ontology, or if the ontology is missing a link type they could define it themselves, (folksonomy).

Ontologies become particularly important when we think on a global scale, in that ontologies can be language independent, whereas a folksonomy is very hard to translate.

The Friend Of A Friend (FOAF) project exists as a ontology that could, and should, be used in an online genealogy system.

Whether using a prescribed ontology, or a folksonomy, creating tags must be simple in order to encourage participation.

The Principle of Least Power - Keep It Simple Stupid
‘When Berners-Lee developed the Web, he took the salient ideas of hypertext and SGML syntax and removed complexities such as backward hyperlinks. At the time, many criticized their absence from HTML because, without them, pages can simply vanish and links can break. But the need to control both the linking and linked pages is a burden to authoring, sharing, and copying’, (McCool 2006).
In my research, (Thomas 2009), the idea of splitting one GEDCOM file into multiple XML files, one XML file for each object, has raised similar concerns. Although, in my opinion, the benefits outlined in this blog out-weigh this problem.
‘Early forms of HTML paid no regard to SGML document-type definitions (DTDs). Berners-Lee simply ignored these difficult to create and understand declarations of how markup tags are used’, (McCool 2006).
In a similar way, it does not really matter how people define their data in XML, as long as there are ontologies in order that we can associate XML tags.
‘[Folksonomies have] ‘no notion of synonyms or disambiguation... For a Web community with simple, easy-to-use authoring tools that support synonyms, disambiguation, and categories, we can look to Wikipedia... Wikipedia calls synonyms redirect pages, and disambiguation is explicitly handled via special pages’, (McCool 2006).
Wikipedia is different from my vision in that there is not any XML based data behind the presentation layer. Where not implied by the ontology, presentation layer pages (HTML) could be used in the same way as Wikipedia to support the issues of described in the above paragraph.

Conclusion

It is clear that the GEDCOM format would never be able to facilitate the Semantic Web.
I have created a project, called GenPACK, which breaks GEDCOM files down into XML files, retaining the same tags as GEDCOM. (Some call this an intermediate format).

Although GenPACK is currently being written to provide linking by importing those XML files into a database, from this research it is clear that the next step is to use RDF files to imply relations between these files. In this process I will need to analyse FOAF to see how it can be applied to this situation.

The GenPACK project can be found at: http://sourceforge.net/projects/genpack/

  1. McCool, R., 2006. Rethinking the Semantic Web, Part 2. IEEE Internet Computing, 10(1), 96-95.
  2. Shadbolt, N., Hall, W. & Berners-Lee, T., 2006. The Semantic Web Revisited. Intelligent Systems, IEEE, 21(3), 96-101.
  3. Thomas, N., 2009. It’s All Relative: But is GEDCOM still a member of the family? Pre-Print Conference Paper. Available from: http://genpack.wiki.sourceforge.net/space/showimage/ItsAllRelativeD01InternetVersion.doc/ [Accessed 19 March 2009].

No comments:

Post a Comment