Nathan's Blog: 2009

Tuesday, 21 April 2009

Peter Funch Manhatten Street Corners

http://www.kottke.org/09/04/composite-nyc-street-scenes

Peter Funch manipulates photos taken on street corners so that there is a common theme in the passers by:

http://www.v1gallery.com/artist/show/3

Are you paranoid yet?

Prompted by a blog on Schneier's blog I was redirected to:

http://sierracharlie.wordpress.com/2009/04/10/terror/

This may not be of interest, but the alternative version generator is pure comedy gold. It can be found at:

http://jamesholden.net/billboard/

Here are some examples I created:

Wednesday, 25 March 2009

Web Dev Research Presentation

Presentation about Web services developed for my degree's web programming module.
Includes TCP/IP, FTP and HTTP due to assignment requirements:

Web Dev Research

View more presentations from nathomas82.

Monday, 23 March 2009

Collaboration: A Vision for Content Managment

Reading interview with Ross Mayfield, CEO and founder of Socialtext, found in ‘Wikinomics’ by Tapscott and Williams...

‘One of a growing number of start-ups that have emerged to supply social computing technologies (especially wikis) to enterprises.’

‘”For a long time,” says Mayfield, “personal productivity tools and applications – the kind that Microsoft makes – have been centred on a single user who generates documents. You also have highly structured enterprise systems designed and implemented from the top down – in many ways as an instrument of control – with rigid work flow, business rules, and ontologies that users must fit themselves into. The problem is that users don’t like using those kinds of tools, and what they end up doing is trying to circumvent them. That’s why ninety percent of collaboration exists in emails.”’

‘Mayfield argues that traditional organizations have reached a point where e-mail itself is breaking. “You could argue that ten or twenty percent of e-mail is productive”.’

‘Mayfield thinks the solution is collaboration tools that adapt to the habits of workplace teams and social networks rather than the other way around.’

So, in similar way to envisaged genealogy software, each employee is represented by an XML or Web page object. FOAF or similar networks can be created and used as required to provide information about how the employees relate to each other, without prescribing a hierarchy of any kind.

A modern Content Management System (CMS) should provide the facility for each employee to create blogs, wiki articles etc. In the same way that they expect be able to use Social Networking software when at home. The difference being that these activities are only posted to an internal intranet, (unless required by the employee to link to some public facility like Wikipedia).

Concerning Mayfield’s comment about spam... I think, it may be better to employ a twitter style stream – admittedly 80% may be irrelevant, but it would enable workers to know what each is doing. I would also build in an ‘ignore hashtag’ function though, as I don’t necessarily want to know about every conversation – or even ‘block hashtag’. (I need to find this function within twitter!)

And, oh yes, of course I think that we can apply the same system to genealogy.

Tapscott, D. & Williams, A., 2008. Wikinomics, Atlantic Books.

Thursday, 19 March 2009

The Semantic Web relating to Genealogy: Thoughts Re-jiggled

‘Associating a URI with a resource means that anyone can link to it, refer to it, or retrieve a representation of it’, (Shadbolt et al 2006).

‘Much of the motivation for the Semantic Web comes from the value locked in relational databases. To release this value, database objects must be exported to the Web as first-class objects and therefore must be mapped into a system of URIs’, (Shadbolt et al 2006).

So, I envisage, any genealogical object, (such as a person, family, source, repository, place, note or media), must exist as an individual XML file on the Web that can then be linked to, as desired or required.

As far as possible, these files should be normalised. In the same way as the database rule that a table should relate to ‘the key, the whole key and nothing but the key’, objects should only include tags that relate to them.

Folksonomies

‘[Folksonomies] represent a structure that emerges organically when individuals manage their own information requirements. Folksonomies arise when a large number of people are interested in particular information and are encourage to describe it – or tag it’, (Shadbolt et al 2006).

‘Rather than a centralized form of classification, users can assign keywords to documents or other information sources’, (Shadbolt et al 2006).

This links with my vision of future online genealogy, objects could be linked by tagging and specifying a description of that relationship.

If we have two XML files, perhaps both representing people in our family tree, a user could tag one object from another as being related in some way, perhaps a cousin for example. Behind the scenes the tagging application creates a RDF file which describes that XML file A, representing person A, is the cousin of XML file B, representing person B.

Alternatively, pictures of a person could be tagged, (in a similar way to Facebook, Flickr etc). Again, a RDF file created describing the tagged picture as a picture of my ancestor, who is represented by an XML file elsewhere.

Consider the situation where a paragraph in an online text mentions a relative of some kind, again a link could be made between the two in the same way as described above.

‘But folksonomies serve very different purposes from ontologies. Ontologies are attempts to more carefully define parts of the data world and to allow mappings and interactions between data held in different formats. Ontologies refer by virtue of URIs; tags use words’, (Shadbolt et al 2006).

I don’t see this as a ‘one or other’ situation, but think we need both. An ontology needs to be defined to define a standard for basic types of genealogical link, (e.g. parent, spouse, sibling), and to ensure system compatibility. A folksonomy system has a particular advantage in that it can cover inadequacies of the ontology.

In any application users should be given the choice of creating a standard type of link, as defined in the ontology, or if the ontology is missing a link type they could define it themselves, (folksonomy).

Ontologies become particularly important when we think on a global scale, in that ontologies can be language independent, whereas a folksonomy is very hard to translate.

The Friend Of A Friend (FOAF) project exists as a ontology that could, and should, be used in an online genealogy system.

Whether using a prescribed ontology, or a folksonomy, creating tags must be simple in order to encourage participation.

The Principle of Least Power - Keep It Simple Stupid

‘When Berners-Lee developed the Web, he took the salient ideas of hypertext and SGML syntax and removed complexities such as backward hyperlinks. At the time, many criticized their absence from HTML because, without them, pages can simply vanish and links can break. But the need to control both the linking and linked pages is a burden to authoring, sharing, and copying’, (McCool 2006).

In my research, (Thomas 2009), the idea of splitting one GEDCOM file into multiple XML files, one XML file for each object, has raised similar concerns. Although, in my opinion, the benefits outlined in this blog out-weigh this problem.

‘Early forms of HTML paid no regard to SGML document-type definitions (DTDs). Berners-Lee simply ignored these difficult to create and understand declarations of how markup tags are used’, (McCool 2006).

In a similar way, it does not really matter how people define their data in XML, as long as there are ontologies in order that we can associate XML tags.

‘[Folksonomies have] ‘no notion of synonyms or disambiguation... For a Web community with simple, easy-to-use authoring tools that support synonyms, disambiguation, and categories, we can look to Wikipedia... Wikipedia calls synonyms redirect pages, and disambiguation is explicitly handled via special pages’, (McCool 2006).

Wikipedia is different from my vision in that there is not any XML based data behind the presentation layer. Where not implied by the ontology, presentation layer pages (HTML) could be used in the same way as Wikipedia to support the issues of described in the above paragraph.

Conclusion

It is clear that the GEDCOM format would never be able to facilitate the Semantic Web.
I have created a project, called GenPACK, which breaks GEDCOM files down into XML files, retaining the same tags as GEDCOM. (Some call this an intermediate format).

Although GenPACK is currently being written to provide linking by importing those XML files into a database, from this research it is clear that the next step is to use RDF files to imply relations between these files. In this process I will need to analyse FOAF to see how it can be applied to this situation.

The GenPACK project can be found at: http://sourceforge.net/projects/genpack/

McCool, R., 2006. Rethinking the Semantic Web, Part 2. IEEE Internet Computing, 10(1), 96-95.
Shadbolt, N., Hall, W. & Berners-Lee, T., 2006. The Semantic Web Revisited. Intelligent Systems, IEEE, 21(3), 96-101.
Thomas, N., 2009. It’s All Relative: But is GEDCOM still a member of the family? Pre-Print Conference Paper. Available from: http://genpack.wiki.sourceforge.net/space/showimage/ItsAllRelativeD01InternetVersion.doc/ [Accessed 19 March 2009].

Hypertext, Hypermedia and the Semantic Web: What is all this Semantics stuff anyway?

What is the ‘Semantic Web’?

‘The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation. The first steps in weaving the Semantic Web into the structure of the existing Web are already under way. In the near future, these developments will usher in significant new functionality as machines become much better able to process and “understand” the data that they merely display at present’, (Berners-Lee et al 2001).

What are the design principles of the Semantic Web?

‘The essential property of the World Wide Web is its universality. The power of a hypertext link is that “anything can link to anything”... Like the Internet, the Semantic Web will be as decentralized as possible... Decentralization requires compromises: the Web had to throw away the ideal of total consistency of all its interconnections, ushering in the infamous message “Error 404: Not Found” but allowing unchecked exponential growth’, (Berners-Lee et al 2001).

We can trace this property back to the original Web’s design principles, particularly the Web’s ability to record random associations between objects.

‘The Web was designed to be a universal space of information, so when you make a bookmark or a hypertext link, you should be able to make that link to absolutely any piece of information that can be accessed using networks. The universality is essential to the Web: it loses its power if there are certain types of things to which you can’t link’, (Berners-Lee 1998).

‘The second part of the dream was... The computer re-enters the scene visibly as a software agent, doing anything it can to help us deal with the bulk of data, to take over the tedium of anything that can be reduced to a rational process, and to manage the scale of our human systems’, (Berners-Lee 1998).

How will the Semantic Web work?

‘For the semantic web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning’, (Berners-Lee et al 2001).

‘Traditional knowledge-representation systems typically have been centralized, requiring everyone to share exactly the same definition of common concepts such as “parent” or “vehicle”. But central control is stifling, and increasing the size and scope of such a system rapidly becomes unmanageable’, (Berners-Lee et al 2001).

‘For example, a genealogy system, acting on a database of family trees, might include the rule “a wife of an uncle is an aunt”. Even if the data could be transferred from one system to another, the rules, existing in a completely different form, usually could not”, (Berners-Lee et al 2001).

The point that Sir Tim Berners-Lee is making here, is that data is usually stored on a range of different systems and the ‘semantic’ rules that define objects are found in a variety of different formats, dependent on the information storage system used to store the data. It is then impossible to associate semantic rule sets when moving data from one information system to another.

‘Moreover, these systems usually carefully limit the questions that can be asked so that the computer can answer reliably or answer at all. The problem is reminiscent of [Kurt] Godel’s [incompleteness] theorem from mathematics: any system that is complex enough to be useful also encompasses unanswerable questions... Semantic Web researchers, in contrast, accept that paradoxes and unanswerable questions are a price that must be paid to achieve versatility. We make the language for the rules as expressive as needed to allow the Web to reason as widely as desired’, (Berners-Lee et al 2001).

‘Early in the Web’s development, detractors pointed out that it could never be a well-organized library; without a central database and tree structure, one would never be sure of finding everything. They were right’, (Berners-Lee et al 2001).

‘The challenge of the Semantic Web, therefore, is to provide a language that expresses both data and rules for reasoning about the data and that allows rules from any existing knowledge-representation system to be exported onto the Web’, (Berners-Lee et al 2001).

The earlier example about a genealogy system is particularly close to my heart, GEDCOM is certainly a ‘traditional knowledge representation system’. New genealogy formats are being created, with their own ways of defining data. We must therefore find a way of associating data...

eXtensible Markup Language (XML)

XML is a Markup Language, meaning that tags are inserted into the document to signify that the data between them is related to the tag. It was developed by the World Wide Web Consortium’s (W3C) XML Working Group in 1996, (W3C 2006).

‘XML’s power comes from the fact that it can be used regardless of the platform, language, or data store of the system using it to expose datasets’, (Evjen et al 2007).

‘XML is considered ideal for data representation purposes because it enables developers to structure XML documents as they see fit. For this reason, it is also a bit chaotic. Sending self-structured XML documents between dissimilar systems doesn’t make a lot of sense – it requires custom building of both the exposure and consumption models for each communication pair’, (Evejen et al 2007).

So really, everyone can create their own definition of how to represent data using XML. Again, the genealogy developers are doing this, so how will we associate?

A Resource Definition Framework (RDF)

In 1997, the W3C defined the first Resource Description Framework specification. It became a W3C recommendation in 1999. But what exactly does it do?

‘Meaning is expressed by RDF, which encodes it in sets of triples, each triple being rather like the subject, verb and object of an elementary sentence. These triples can be written using XML tags. In RDF, a document makes assertions that particular things (people, Web pages or whatever) have properties (such as “is a sister of”, “is the author of”) with certain values (another person, another Web page). This structure turns out to be a natural way to describe the vast majority of the data processed by machines’, (Berners-Lee et al 2001).

‘Subject and object are each identified by a Universal Resource Indicator (URI), just as used in a link on a Web page. (URLs, Uniform Resource Locators, are the most common type of URI). The verbs are also identified by URIs, which enables anyone to define a new concept, a new verb, just be defining a URI for it somewhere on the Web’, (Berners-Lee et al 2001).

So, as a genealogist, one URI, (web page), can represent one person; another URI represents another person and I can link them together using a RDF file at an intermediate location, (URI), which defines their relationship.

‘Two databases may use different identifiers for what is in fact the same concept... A program that wants to compare or combine information across the two databases has to know that these two terms are being used to mean the same thing. Ideally, the program must have a way to discover such common meanings for whatever databases it encounters’, (Berners-Lee et al 2001).

Yes, as I said earlier, I may want to link with another genealogy held in a different system elsewhere, which uses its own XML and RDF structures.

‘A solution to this problem is provided by the third basic component of the Semantic Web, collections of information called ontologies... [In terms of the Semantic Web] an ontology is a document or file that formally defines the relations among terms. The most typical kind of ontology for the Web has a taxonomy and a set of inference rules’, (Berners-Lee et al 2001).

‘The taxonomy defines classes of objects and relations among them... Classes, subclasses and relations among entities are a very powerful tool for Web use. We can express a large number of relations among entities by assigning properties to classes and allowing subclasses to inherit such properties’, (Berners-Lee 2001).

‘Inference rules in ontologies supply further power... A program could then readily deduce, for instance, that a Cornell University address, being in Ithaca, must be in New York State, which is in the U.S., and therefore should be formatted to U.S. standards. The computer doesn’t truly “understand” any of this information, but it can now manipulate the terms much more effectively in ways that are useful and meaningful to the human user’, (Berners-Lee et al 2001).

‘Ontologies can enhance the functioning of the Web in many ways. They can be used in a simple fashion to improve the accuracy of Web searches and the search program can look for only those pages that refer to a precise concept instead of all the ones using ambiguous keywords. More advanced applications will use ontologies to relate the information on a page to the associated knowledge structures and inference rules’, (Berners-Lee et al 2001).

Ontologies can be defined using the Web Ontology Language (OWL). Isn’t that neat?

‘Another vital feature will be digital signatures, which are encrypted blocks of data that computers and agents can use to verify that the attached information has been provided by a specific trusted source’, (Berners-Lee et al 2001).

Digital signatures can be used to sign the objects (XML files) or links (RDF) to ensure their validity.

Digital signatures only enable us to make an assertion that an object is linked with a person, or that they are who they say they are, but matched with checking modules in applications the following situations could be possible:

‘Proxy caches... will be able to check that they are really acting in accordance with the publisher’s wishes when it comes to re-distributing material [e.g. distribution controls selected dependent on the publisher’s certificate]. A browser will be able to get an assurance, before imparting personal information in a Web form, on how that information will be used [a digitally signed Web service]. People will be able to endorse Web pages that they perceive to be of value [a digitally signed hyperlink]. Search engines will be able to take such endorsements into account and give results that are perceived to be of much higher quality’, (Berners-Lee 1998).

‘When we have this, we will be able to ask the computer not just for information, but why we should believe it. Imagine an ‘Oh, yeah?’ button on your browser’, (Berners-Lee 1998).

Berners-Lee, T., 1998. Realising the Full Potential of the Web. Available at: http://www.w3.org/1998/02/Potential.html [Accessed February 25, 2009].
Berners-Lee, T., Hendler, J. & Lassila, O., 2001. The Semantic Web: Scientific American. Scientific American Magazine. Available from: http://www.sciam.com/article.cfm?id=the-semantic-web [Accessed February 27, 2009].
Evjen, B., Sharkey, K., Thangarathinam, T., Kay, M., Vernet, A. & Ferguson, S., 2007. Professional XML. John Wiley & Sons.
W3C, 2006. Extensible Markup Language (XML) 1.0 (Fourth Edition), W3C Recommendation 16 August 2006. World Wide Web Consortium. Available from: http://www.w3.org/TR/2006/REC-xml-20060816/ [Accessed 09 January 2009].

Hypertext, Hypermedia and the Semantic Web: The Web Itself

Sir Tim Berners-Lee described computing in 1980 as a world of ‘incompatible networks, disk formats, data formats and character encoding schemes’, this was particularly frustrating ‘given that... to a greater extent computers were being used directly for most information handling, and so almost anything one might want to know was almost certainly recorded magnetically somewhere’, (Berners-Lee 1996).

The ‘Design Criteria’ of the World Wide Web, described in Sir Tim Berners-Lee’s 1996 paper make very interesting reading:

‘An information system must be able to record random associations between any arbitrary objects, unlike most database systems’.
‘To make a link from one system to another should be an incremental effort, not requiring un-scalable operations such as the merging of databases’.
‘Any attempt to constrain users as a whole to the use of particular languages or operating systems was always doomed to failure’.
‘Information must be available on all platforms, including future ones’.
‘Any attempt to constrain the mental model users have of data into a given pattern was always doomed to failure’.
‘Entering or correcting [information] must be trivial for the person directly knowledgeable’.

The Web is formed around three common standards: the Address Space, Hyper-Text Transfer Protocol (HTTP) and Hyper-Text Mark-up Language (HTML), all originally designed by Sir Tim Berners-Lee.

The Web was designed around a principle of minimal constraint, in order that it could be incrementally improved by future developers. Additionally, the Web’s standards needed to be modular and support information-hiding. So that anybody designing anything on top of those standards did not have to know how the standards actually worked, (Berners-Lee 1996).

‘A test of this ability was to replace them with older specifications, and demonstrate the ability to intermix those with the new. Thus, the old FTP protocol could be intermixed with the new HTTP protocol in the address space, and conventional text documents could be intermixed with the new hypertext documents’, (Berners-Lee 1996).

Also, as a further example, we can look at HTTP’s ability to carry images (JPG, PNG, VRML) or even Java code.

‘Typically, hypertext systems were built around a database of links. This did not scale... However, it did guarantee that links would be consistent and links to documents would be removed when documents were removed. The removal of this feature was the principle compromise made in the [World Wide Web] architecture... allowing references to be made without consultation with the destination, allowed the scalability which the later growth of the web exploited’, (Berners-Lee 1996).

File Transfer Protocol (FTP) existed when the web was first developed, but was ‘not optimal for the web, in that it was too slow and not sufficiently rich in features’, (Berners-Lee 1996). So the Hyper-Text Transfer Protocol (HTTP) was created.

Universal Resource Identifiers (URIs) are the primary element of Web architecture. ‘Any new space of any kind which has some kind of identifying, naming or addressing syntax can be mapped into a printable syntax and given a prefix’, (Berners-Lee 1996). ‘URIs are generally treated as opaque strings: client software is not allowed to look inside them and to draw conclusions about the object referenced’, (Berners-Lee 1996). ‘HTTP URIs are resolved... by splitting them into two halves. The first half is applied to the Domain Name Service to discover a suitable server, and the second half is an opaque string which is handed to that server’, (Berners-Lee 1996).

Hyper-Text Markup Language (HTML) was defined as the data format to be transmitted over HTTP. HTML was based around SGML in order to encourage its adoption by those already using SGML.

The initial prototype browser was written in NeXTStep in late 1990. It allowed HTML to be edited as well as browsed. The limited use of NeXT limited its visability, so in 1991 a read-only ‘line mode’ browser was written. This enabled the early web to be viewed on a range of systems. As more people became involved, full browsers were written.

In 1993, rumours threatend that the Web’s competition ‘Gopher’ was to become a licenced product. As a result, a mass of people and organisations transferred their hypermedia systems to be WWW systems instead.

The World Wide Web Consortium (W3C) was formed in 1994. The rest is history...

Berners-Lee, T., 1996. The World Wide Web: Past, Present and Future. Available at: http://www.w3.org/People/Berners-Lee/1996/ppf.html [Accessed February 25, 2009].

Wednesday, 18 March 2009

Chromon

Repeat the musical pattern

Thursday, 26 February 2009

Development of the Web: Video Links

Links only I'm afraid...

Tim Berners Lee and others at the Web's 10 Year Anniversary conference:

World Wide Web: Ten Year Anniversary, 2004. Available at: http://forum.wgbh.org/node/1714 [Accessed February 25, 2009].

Tim Berners Lee Lecture at Southampton University:

The World Wide Web: Looking Back, looking forward, Available at: http://www.ecs.soton.ac.uk/podcasts/video.php?id=75 [Accessed February 26, 2009].

Wednesday, 25 February 2009

Hypertext, Hypermedia and the Semantic Web: Hypertext to Hypermedia and Beyond...

It is frequently questioned how hypermedia relates to hypertext.

Nelson defined hypertext in his 1965 paper ‘Complex information processing: a file structure for the complex, the changing and the indeterminate’. In that paper Nelson states that the prefix ‘hyper’ is an ‘extension and generality’ used to signify non-linear media. There is no pre-determined path through non-linear media, unlike traditional linear media (e.g. a book). He also defines ‘hyperfilm’. ‘“Hyperfilm” and “hypertext” are characterised as “new media”... the larger category in which at least the hyperfilm is included is “hypermedia”’, (Wardrip-Fruin 2004).

Wardrip-Fruin points out that Nelson concluded in 1970’s “No More Teachers’ Dirty Looks”, (Reprinted in ‘The New Media Reader’), that ‘hypertext began as a term for forms of hypermedia (human-authored media that “branch or perform on request”) that operate textually’, (Wardrip-Fruin 2004).

In 1992, Nelson wrote in ‘Literary Machines’:

‘By now the word “hypertext” has become generally accepted for branching and responding text, but the corresponding word “hypermedia”, meaning complexes of branching and responding graphics, movies and sound –as well as text – is much less used. Instead they use the strange term “interactive multimedia” – four syllables longer, and not expressing the idea that it extends hypertext’, (Nelson 1992).

We can therefore conclude, without doubt, that ‘hypertext’ is a sub-set of the larger category ‘hypermedia’.

Lowe et al, (1998) define a hypermedia application as:

‘An application which uses associative relationships among information contained within multiple media data for the purpose of facilitating access to, and manipulation of, the information encapsulated by the data’.

Lowe et al describe websites as hypermedia applications and the World Wide Web as a hypermedia system.

Hypertext and hypermedia systems would have struggled without the innovations of Douglas Engelbart.

Doug Engelbart had been working on office automation at the Stanford Research Instituted since the late 1950s. In 1962 he wrote a paper entitled ‘Augmenting Human Intellect: A Conceptual Framework’, (Englebart 1962). This led to the formation of the Augmentation Research Centre (ARC). The ARC primarily worked on the development of the NLS, (‘oN-Line System’), which was the first system to use hyperlinks. The NLS was finished and demoed in 1968, in what has been termed ‘The Mother of All Demos’, (Levy 2000). Engelbart also invented the mouse, which was patented in 1970, (Wikipedia 2009).

In 1967 Andries Van Dam and Ted Nelson began working on the Hypertext Editing System at Brown University, (Dam 1988). It enabled the use of Instances:

‘Instances are references, so that if you changed, for example, a piece of legal boilerplate that was referenced in multiple places, the change would show up in all the places that referenced it’, (Dam 1988).

In 1968 they created FRESS (File Retrieval and Editing System):

‘The most popular feature... FRESS was the first system to have an undo. We saved every edit in a shadow version of the data structure, and that allowed us to do both an autosave and an undo. I think the most important feature in an system built today has to be an indefinite undo and redo’, (Dam 1988).

This is still something we should strive for today.

After NLS and HES came Xerox PARC’s Notecards and Carnegie Mellon’s ZOG, (that used a ‘card’ model to display data).

The first hypermedia system (media, not just text) was the Aspen Movie Map, created in 1977.

In the early 1980s, Shneiderman’s TIES and Brown University’s Intermedia were popular.

In 1980 Tim Berners-Lee begun work on ENQUIRE, which would eventually develop into the Web itself. Other hypermedia systems that were popular up until the advent of the web included Guide and Hypercard.

Dam, A.V., 1988. Hypertext '87: keynote address. Commun. ACM, 31(7), 887-895. Available from: http://doi.acm.org/10.1145/48511.48519 [Accessed 25 February 2009].
Engelbart, D. C., 1962. Augmenting Human Intellect: A Conceptual Framework. Available from: http://www.dougengelbart.org/pubs/augment-3906.html [Accessed 25 February 2009].
Levy, S., 2000. Insanely Great: The Life and Times of Macintosh, the Computer That Changed Everything Reissue., Penguin Books.
Lowe, D. & Hall, W., 1998. Hypermedia and the Web: An Engineering Approach, John Wiley & Sons.
Nelson, T. H., 1965. Complex information processing: a file structure for the complex, the changing and the indeterminate. In: Proceedings of the 1965 20th national conference, August 24-26, 1965, Cleveland, Ohio, United States. New York, NY, USA: Association for Computing Machinery, 84-100. Available from: http://doi.acm.org/10.1145/800197.806036 [Accessed 08 January 2009].
Nelson, T. H., 1992. Literary Machines. CA, USA: Mindful Press.
Wardrip-Fruin, N., 2004. What Hypertext Is. In: Proceedings of the fifteenth ACM conference on Hypertext and hypermedia, August 09 - 13, 2004, Santa Cruz, CA, USA. New York, NY, USA: Association for Computing Machinery, 126-127. Available from: http://doi.acm.org/10.1145/1012807.1012844 [Accessed 09 February 2009].
Wikipedia, 2009. Douglas Engelbart. Wikipedia. Available at: http://en.wikipedia.org/wiki/Engelbart [Accessed February 25, 2009].
Wikipedia, 2009. Hypertext. Wikipedia. Available from: http://en.wikipedia.org/wiki/Hypertext [Accessed 09 February 2009].
Wikipedia, 2009. NLS (computer system). Wikipedia. Available from: http://en.wikipedia.org/wiki/NLS_(computer_system) [Accessed 25 February 2009].

Thursday, 19 February 2009

History of the Internet and a Communications Primer

The following videos are of interest in relation to the history and development of the Internet and communications...

History of the Internet:

YouTube - History of the Internet. Available at: http://www.youtube.com/watch?v=9hIQjrMHTv4 [Accessed February 19, 2009].

A Communications Primer:

Eames, C. & Eames, R., 1953. Internet Archive: Details: Communications Primer, A. Available at: http://www.archive.org/details/communications_primer [Accessed February 19, 2009].

Saturday, 7 February 2009

Hypertext, Hypermedia and the Semantic Web: Nelson’s ELF and the definition of Hypertext

20 Years after Vannevar Bush described his Memex, Ted Nelson presented a conference paper entitled ‘A File Structure for the Complex, the Changing and the Indeterminate’ at the ACM’s 20th National Conference. A paper he had been working on since 1960.

Nelson’s paper tried to envisage ‘a computer system for personal information retrieval and documentation’. Nelson knew about and had referenced Bush’s Memex, but understood that, although the required hardware was becoming available, the software required was not being developed.

Nelson wanted to design his system around the system used by a writer:

‘The task of writing is one of rearrangement and reprocessing, and the real outline develops slowly. The original crude or fragmentary texts created at the outset generally undergo many revision processes before they are finished. Intellectually they are pondered, copied, overwritten with revision markings, rearranged and copied again. This cycle may be repeated many times. The whole grows by trial and error in the processes of arrangement, comparison and retrenchment. By examining and mentally noting many different versions, some whole but most fragmentary, the intertwining and organizing of the final written work gradually takes place’, (Nelson 1965).

The preliminary specifications of the system were as follows:

‘It would provide an up-to-date index of its own contents (supplanting the “code book” suggested by Bush)’, (Nelson 1965).

I think Nelson misunderstands Bush here. I read the Memex’s ‘code book’ or ‘code space’ to be equivalent to the HTML behind a web page. A space that is programmable by the site designer, but otherwise not seen by the user, who, as Bush described, only has to activate the link to bring up the linked item. Here Nelson tries to describe a traditional style index that updates itself.
‘It would accept large and growing bodies of text and commentary, listed in such complex forms as the user might stipulate’, (Nelson 1965).
‘It would file under an unlimited number of categories’, (Nelson 1965).

This requirement seems to be satisfied by modern day meta tags.
‘Besides the file entries themselves, it would hold commentaries and explanations connected with them’, (Nelson 1965).

This is a requirement that Tim Berners Lee wanted to try to establish with the World Wide Web and the idea behind the W3C’s Amaya project. Although in the Web 2.0 world we seem to be accomplishing this through commenting boxes on the same web page as the item in question.

Further to these primary points, Nelson specified that it should be possible to preserve a draft of work while its successor was created. ‘Consequently the system must be able to hold different versions of the same sets of materials’, (Nelson 1965).

Nelson revisits the same point later in his paper, explaining that:

‘The user must be permitted, given a list of what he has done recently, to undo it. It follows that “destroy” instructions must fail safe; if given accidentally, they are to be recoverable. For safety’s sake, it should take several steps to throw a thing away completely. An important option would permit the user to retrace chronologically everything he does on the system’, (Nelson 1965).

If we consider that the item in question is a web page, then perhaps we could have a meta-tag in the mark-up that identifies different web pages that are associated with it, e.g. different versions of the same page. There is no limit to the amount of meta-tags we can place in a page, or what they contain, so we could also use these tags to identify pages in entirely different locations that relate to the same set.

Another possibility is that the top level file is actually a package. Within this package are contained different versions of the web page. This would allow us to roll-back and forth through changes as required. This option would require web browsers to be adapted in order to extract the relevant page from the package before viewing.

Storage of multiple versions of a document allows us another development, mooted by Nelson:

‘Systems of paper have grave limitations for either organizing or presenting ideas. A book is never perfectly suited to the reader...
However... a new, readable medium... will let the reader find his level, suit his taste and find the parts that take on special meaning for him, as instruction or entertainment’, (Nelson 1965).

This specification resulted in what Nelson called an ‘evolutionary file structure: a file structure that can be shaped into various forms, changed from one arrangement to another in accordance with the user’s changing need’, (Nelson 1965). Nelson proposed the ‘Evolutionary List File, or ELF’ as a system used to implement an evolutionary file structure.

‘The ELF has three elements: entries, lists and links...
An entry is a discrete unit of information designated by the user...
A list is an ordered set of entries...
A link is a connector, designated by the user, between two particular entries which are in different lists’, (Nelson 1965).

We can consider that the ELF equates to the World Wide Web, an entry to a Web Page, a list to a Website and, of course, a link is a link.

Nelson theorised features of entries, lists and links:

Entries (or modern day web page):

‘An entry in one list may be linked to only one entry in another list’...
‘Entries may be combined or divided... Entries may be put in any list, and the same entry may be put in different lists. The user may direct that entries of one list be automatically copied onto another list, without affecting the original list’...
‘It would be possible to allow sub-entries and super-entries to behave and link up like normal entries, even though they contained or were contained in other entries’, (Nelson 1965).

Lists (or modern day website):

‘A change in the sequence of either list, or additions to either list, will not change the links that stand between them’...
‘[The user] may at will make new copies of lists. [The user] may rearrange the sequence of a list, or copy the list and change the sequence of that copy. Lists may be combined; lists may be cut into sublists’ , (Nelson 1965).

Links:

‘Changes in the link structure will occur only if the user specifically changes the links, or if he destroys entries which are linked to others’...
‘Any number of legal links may be created, although the upper limit of links between any two lists is determined by the 1-for-1 rule. When an entry or a list is copied into a list, links will remain between parent and daughter entries. Moreover, after a list-copying operation, the daughter list will have the same links to all other lists as does the parent list’, (Nelson 1965).

Nelson highlighted that there is a problem when an entry is disposed of ‘what other lists is an entry on?’ Programmed bots now roam the web carrying out simple tasks. Surely a bot could be made to constantly check the links within one website in order to check that they are not ‘dead links’ and if they are to remove the underlying link and highlight the issue to the administrator.

Uses of the ELF? Even though we now know what the Web can be used for, the following section makes interesting reading:

‘the ELF may be used as a glorified card file.. . [This permits] assignment of one entry to different lists. It permits sub-sets and sub-sequences for any use to be held apart and examined without disturbing the lists from which they have been drawn, by copying them onto other, new lists. The ELF permits the filing of historical trails or associative (Bush) trails... and the mixture of trail with categorical filing’, (Nelson 1965).

There is a hint of a, so-far unexplored, feature in the above paragraph. He later expands:

‘the ELF is capable of storing many texts in parallel, if they are equivalent or linked in some way. For example, instruction manuals for different models of the same machine may be kept in the file as linked lists and referred to when machines are to be compared, used or fixed. This is of special use to repairmen, project managers and technical writers’, (Nelson 1965).

The point I am picking up on, is the ability to compare data. Is it not possible for us to develop applications that take two different pages, even their code, and compare them so we can see where the differences are? This could be a particularly useful feature if they are versions of the same page. Or perhaps if they followed a standard layout used in a specific industry. From comparing layouts we could even suggest sections that have been missed or otherwise aid the author.

With this specification, Nelson had to create a new definition:

‘Let me introduce the word “hypertext” to mean a body of written or pictorial material interconnected in such a complex way that it could not conveniently be presented or represented on paper’...
‘The sense of “hyper-“ used here connotes extension and generality; cf. “hyperspace.” The criterion for this prefix is the inability of these objects to be comprised sensibly into linear media, like the text string, or even media of somewhat higher complexity”, (Nelson 1965).

Lastly, Nelson provides a warning to future computer scientists:

‘Last week’s categories, perhaps last night’s field, may be gone today...
Categories are chimerical (or temporal) and our categorisation systems must evolve as they do. Information systems must have built-in the capacity to accept the new categorisation systems as they evolve from, or outside, the framework of the old. Not just the new material, but the capacity for new arrangements and indefinite re-arrangements of the old, must be possible’, (Nelson 1965).

Nelson, T. H., 1965. Complex information processing: a file structure for the complex, the changing and the indeterminate. In: Proceedings of the 1965 20th national conference, August 24-26, 1965, Cleveland, Ohio, United States. New York, NY, USA: Association for Computing Machinery, 84-100. Available from: http://doi.acm.org/10.1145/800197.806036 [Accessed 08 January 2008].

Thursday, 5 February 2009

Hypertext, Hypermedia and the Semantic Web: Bush and his Memex

According to ‘The Good Study Guide’, (Northedge 2007), ‘learning through study doesn’t create a detailed replica of knowledge in your head; rather, it develops the way you think’. He points out that learning can be evidenced by a student’s ability to summarise a subject in their own words.

The book also talks of the academic disciplines of scholarship and debate. Scholarship is the academic requirement to read all that has gone before and debate is the ability to understand and analyse both sides of an argument before reaching a conclusion.

In this blog I will be discussing the Semantic Web, so it makes sense to have early postings that explain what it is all about. I should also be able to use my musings as part of my IT research and project report, so it is not wasted time.

The idea of a Semantic system is as old as the earliest computers.

It all starts with a man called Vannevar Bush. He received a doctorate in engineering from MIT and Harvard (jointly) in 1917. During World War I he worked on developing improved techniques for detecting. He rejoined MIT after the war in 1919. In 1922 he founded a company with a colleague to market a device called the ‘S-tube’, which was used to improve the efficiency of radios. This company later became Raytheon, a defence contractor. No doubt the S-tube was a result of his research undertaken during World War I.

In 1927 he started to construct a ‘Differential Analyser’. A Differential Analyser is an early type of computer. Mechanical and analogue, they used wheel-and-disc mechanisms to solve differential equations. They were later rendered obsolete by binary programmable (Zuse’s Z3) and electronic (Flower’s Colossus) computers.

‘During World War I, Bush had seen the lack of co-operation between civilian scientists and the military and in 1939 proposed a general directive agency.’ Bush presented a paper describing the proposed National Defence Research Committee (NDRC) to President Roosevelt on the 12th June 1940, which Roosevelt duly approved. Bush was chairman of the committee and continued to be involved with it throughout World War II, later becoming Director of the Office of Scientific Research and Development, (Wikipedia, 2009).

In July 1945 the Atlantic Monthly published Bush’s article entitled ‘As We May Think’, (it is described as having been written in 1936 but set aside when war loomed – Wikipedia, 2009). The article was to act as ‘an incentive for scientists when the fighting has ceased’ and was based on the premise that ‘knowledge evolves and endures throughout the life of a race rather than that of an individual’, (Bush, 1945).

Bush realised that science depends on an ever increasing body of research and ideas. Scientists had to collaborate and devise new methods during the war ‘in the demand of a common cause’. Of course this meant a surge in the body of research available and Bush believed ‘there is increased evidence that we are being bogged down today as specialisation extends’.

We, as computer scientists, tend to think that this is a modern problem that goes with the advent of the web and social networking. But here is Bush saying this back in 1945 – he would definitely fit in today.

At that time, the dawn of computing, their methods for transmitting and reviewing research were inadequate and Bush realised this. Bush used the following example to illustrate the point:

‘Mendel’s concept of the laws of genetics was lost to the world for a generation because his publication did not reach the few who were capable of grasping and extending it; and this sort of catastrophe is undoubtedly being repeated all about us’, (Bush, 1945).

Bush discusses how the new technologies of Television and Microfilm could be used in the future. He then goes on to describe what we know today as Voice Dictation Software! His discussion of ‘machines for repetitive thought’ in the next part is crucial.

Punched card and Keyboard machines capable of arithmetic already existed:

‘Keyboard machines for accounting and the like, manually controlled for the insertion of data, and usually automatically controlled as far as the sequence of operations is concerned;
and Punched-Card machines in which separate operations are usually delegated to a series of machines, and the cards then transferred bodily from one to another.
Both forms are very useful; but as far as complex computations are concerned, both are still in embryo’, (Bush, 1945).

He talked about the need to develop machines capable of using established logic processes and understood:

‘The selection of the data and the process to be employed and the manipulation thereafter is repetitive in nature and hence a fit matter to be relegated to the machine’, (Bush, 1945).

The standard programming constructs of sequence, selection and repetition are central to computers today. So we can ultimately say that in 1945 Bush could envisage computers as we know them today.

He understood that the solution to the problem goes ‘deeper than a lag in the adoption of mechanisms’ and that the ‘ineptitude in getting at the record is largely caused by the artificiality of systems of indexing’.

Bush also realised that the human mind operates by association, a very important point that we still strive to satisfy today. He noticed that this meant that selection of records must therefore be by association, rather than indexing.

‘Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, “memex” will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory’, (Bush, 1945).

He details a desk style machine, with screen, keyboard, levers and buttons. Microfilm is projected onto the screen. The machine can photograph books placed on a platter into the next available slot on the microfilm. Books can be accessed by a traditional index and both books and index can be rapidly searched in the same way that microfilm is searched today. The user is also able to add marginal notes and comments to what he is reading.

‘All this is conventional... It affords an immediate step, however, to associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another. This is the essential feature of the memex. The process of tying two items together is the important thing’, (Bush, 1945).

‘When the user is building a trail, he names it, inserts the name in his code book, and taps it out on his keyboard. Before him are the two items to be joined, projected onto adjacent viewing positions. At the bottom of each there are a number of blank code spaces and a pointer is set to indicate one of these on each item. The user taps a single key, and the items are permanently joined. In each code space appears the code word. Out of view, but also in the code space, is inserted a set of dots for photocell viewing; and on each item these dots by their positions designate the index number of the other item’ , (Bush, 1945).

‘Thereafter, at any time, when one of these items is in view, the other can be instantly recalled merely by taping a button below the corresponding code space’ , (Bush, 1945).

Of course Bush in effect here is describing a modern day web page hyperlink.

Bush, V., 1945. As We May Think. US: The Atlantic Monthly Group. Available from: http://www.theatlantic.com/doc/194507/bush [Accessed 10 January 2009].
Northedge, A., 2007. The Good Study Guide. 2nd ed. Milton Keynes: The Open University.
Wikipedia, 2009. Vannevar Bush. Wikipedia. Available from: http://en.wikipedia.org/wiki/Vannevar_Bush [Accessed 05 February 2009].

Wednesday, 4 February 2009

Who needs to actually know SQL?

Wow! Forget learning SQL!
I've found using a 'Dataset' automates the whole process of creating methods to manipulate a database, and provides a nice Data Access Layer which is abstracted from the Presentation Layer.

This should simplify the development of my 4 table Doctors Appointment System (for Web Development module) and take days off my 88 table project.

Tutorial available from:
http://www.asp.net/learn/data-access/tutorial-01-cs.aspx - C#
http://www.asp.net/learn/data-access/tutorial-01-vb.aspx - vb

OK Computer and the Trousers of Time

I am currently studying a 1 year top-up degree at Bournemouth University, entitled BSc(Hons) Computing and Internet Technology. A question I sometimes find asking myself is 'how did I end up here?'

This probably all started with my first computer bought back in 1991 - the Amiga 500. What a fine machine it was! It had a very good games collection, (Lucasarts adventures have always been my favourite) and there were many magazines for the format at the time which I enjoyed reading. I also had my first go at programming about 1993 using AMOS, (a Basic derivative). The Amiga continued to be used up to 1998, as I can remember using the 'Deluxe Music Construction Set' for my Music GCSE.

We joined the PC bandwagon in 1994 with a Compaq Presario CDS 520, although I still remember my Dad brining home an Amstrad PC in 1993 destined for his work. Still the Presario brought with it some exciting developments, noteably the PC CDROM format. The Encarta encylopedia provided hours of browsing and I also had some good games too, (I rember having to go and get a big box full of them!). The PC also brought with it Windows 3.1, although Compaq provided their own interface called Tabworks which was slightly better. Windows was never as good as the Amiga OS, infact Microsoft have only arguably just caught up with Windows 7.

Very quickly there was a need to upgrade the RAM on the Presario. So my Dad bought an extra 4MB at around £100 or so. I later upgraded to Windows 98. Although eventually it was unable to play any games, even when upgraded with a P133MHz Overdrive Processor.

Anyway. The Trousers of Time.. Up until 1998 I had wanted to do Marine Engineering or Naval Architecture, even though never very good at art. Fate dictated that there would be no Engineering GNVQ that year at the college, as only 4 others wanted to do it and probably some of us failed the aptitude test. So I ended up doing an IT GNVQ.

In the GNVQ ('98-'00) we learnt Pascal and later Java. At this time I also taught myself HTML and Dynamic HTML using Javascript.

At the beginning of 2001 I started a career with the States of Guernsey. Very enthusiastic at this stage. My first project was to code an events diary website using ASP 3.0 (JScript). This was eventually completed in 2002, but never used. (Unfortunately I have now lost all the events diary source code after a catastrophic disc failure last year). I began reading the Inquirer and then the Register, and still do so regularly today.

My placement finished in 2003 and I was relocated, becoming more involved in Project Management. My interest and enthusiasm for computing waned at this point.

Meanwhile, I have always been interested in my family history. My Maternal side has been researched back to the 10th Century by my Uncle, (although the notes need tidying up). I started to research my Paternal side in 2007, getting back as far as the mid 18th century.

Since 2004 I had been studying for the BCS Professional Examinations and achieved my Diploma in 2007, enabling me to join my course last year.

I am now working on my project codenamed, as it were, 'GenPACK'. This can be found at Sourceforge here. I am writing a research paper about my project in relation to the GEDCOM format that is available here.

The enthusiasm for computing seems to have returned with the advent of the Web 2.0 revolution. I have thus recently decided to create this blog, partly because there does not seem to be room for all these thoughts in my head!

Tuesday, 3 February 2009

Transformer Helicopters

The Register recently reported in their article 'DARPA seeks Transformer helicopters' that the US Defense Advanced Research Projects Agency (DARPA) has launched a project to research and presumably develop a rotor that is reconfigurable in flight, in order to achieve performance and efficiency benefits. This programme has been designated the Mission Adaptive Rotor (MAR) project.
Perhaps the most obvious way of achieving this could be that the rotor is extended outwards in a way similar to a telescope, with an outer rotor containing an inner rotor. This could presumably be controlled by pneumatic pressure.

I also considered that perhaps they could research the use of a Wingtip device at the end of ther rotor. The A380 makes use of wingtip devices to enables it to have shorter wings, (in order that it can manouvre in airports), that have the same characteristics as a longer wing. Without the long wing characteristics it would not be able to fly.

Google Analytics