jim.shamlin.com

Social Semantic Web and Semantic Web Services

The World Wide Web (WWW) is a vast repository made up of interconnected "islands" of data (sites), which the author (aptly) compares to a massive and poorly-organized digital library. While it was conceived to be an open library of documents, its character has changed in both regards: not all data is open, and not all content is stored in "documents" in the traditional sense.

The term "Web 2.0" has been coined to reflect a shift in perspective, from a phenomenon that is based on information to one that is based on people and organizations that contribute information by means of the medium. While it tends to be applied only to sites where the content is contributed by users who do not have ownership or control of the sites, the contribution of data has always been by "users" of the Internet. Perhaps the only significant difference is in the number of users who are now contributing content in ever-increasing volume.

The concept of the "Semantic Web" (SW) pertains to the effort to derive meaning from the gangly mass of information the web comprises, which can be done from the context of a given subject (where information exists in multiple sources), the context of a document or site (a document relates to sites, both the one it is housed within and others that link to it), or more recently, to the context of users (the data originating from a single source across multiple sites).

A factor that merits consideration is the multiplicity of sources of information, which has frustrated the desires of some to have "ownership" of a topic, and others to have a single authoritative source of information. The goal of the SW theory is not to sort out these domains of authority, but to acknowledge that multiple domains exist and consider them from the perspectives of the publishers and consumers of data, both as individuals and in groups.

BACKGROUND

The author provides a basic description of the Web for those who are unfamiliar with its origins and growth. (EN: skipping this - as it's general detail.) Of particular interest is the facility to use links and, more recently, mash-ups, to enable a Web site to link to or utilize resources stored on other sites, which has met with some consternation from those who wish to lock-down or control the data they contribute to the WWW.

The confluence of many sources of data and the chaos of links among sites has made the WWW a gangly mess: users must reply on search engines and other technologies to locate resources, and much information is hidden and un-findable in the clutter. Much of this is blamed on search engines' lack of comprehensiveness, or the futility of search algorithms, but they represent the best efforts to make the Web more usable.

While various attempts at personalization have been attempted, these have been even worse failures, and the attempt to use artificial intelligence to sort the clutter has not been fruitful. Some of the core problems are listed:

(EN: Another source I read put it a bit more succinctly: information is treated as "data" - the significance is not considered, and it is treated as "strings" of characters with loose associations with one another that have nothing to do with the meaning they convey.)

SOCIAL WEB

The "Social Web" refers to a recent shift in the internet geared to empower a greater number of users to become active participants who contribute or manage information rather than merely being consumers of it. Some of the more popular phenomena in this regard are:

(EN: It bears repeating that none of this represents "new" capabilities - but tools have been provided that simply the task. It is "simple" in the sense that the learning curve has been decreased, but also "simple" in that these tools provide only basic capabilities that restrict the information that can be communicated)

The author discusses Wikis as a method of users to generate original content and participate in a community editing process that enables individuals to collaborate and share information, and cites the popularity of Wikipedia, as compared to Encyclopedia Britannica, as evidence of the success of the method.

(EN: This is flawed in every regard. Most "wiki" sites contain little original content, but merely republish information from other sources. Individuals seldom collaborate, as most wikis are dominated either by a clique or a few individuals who put considerable effort into driving others away from "their" topics. And while the popularity of Wikipedia is remarkable, comparing a free site to one that requires a paid subscription merely proves that "free" is a compelling price. That's not to say all wikis are bad, but that they are not as open, friendly, or popular as the author contends.)

The del.icio.us site is used as an example of an online bookmarking site, which provides a free service to individuals (having a single source to store their links rather than relying on a desktop browser) in exchanged for the ability to aggregate data provided by all users, such that the aggregated data becomes a Web directory that is managed by the individual users.

(EN: This site has waned in popularity since the book was written. I don't know if that's primarily because the use of the site tends to be trendy and frivolous, or because it was taken over by Yahoo, which has done much in recent years to alienate users. Whatever the case, the site hasn't lived up to its potential.)

(EN: The author continues along this vein, and I'm skipping it. The previous two examples should suffice to demonstrate that he's blowing smoke, or has been blown full of it, over the new media fad. And I have to admit that calling it a "fad" is probably too dismissive - it's definitely a significant development, but its results have fallen far short of the hype.)

SEMANTIC WEB

The concept of the "Semantic Web" is an attempt to consider the structure of information on the Internet, not in terms of the way it is arranged by publishers, but it terms of the way that a user might perceive the information as being relevant to the topics and their meaning. As the concept arose in the context of artificial intelligence, it necessarily considers "meaning" in the context of machine-usable descriptions that support navigation and exploration by software agents rather than human beings. (The ultimate aim of this would be for the software agents to present the information to human users, but the application of the semantic web precedes this interaction.)

From a technology perspective, the semantic web architecture "begins" with an examination of the resources that exist on the internet, then translate them into a universal syntax (XML), and then apply various means to parse and categorize content. The resource description framework (RDF) is a key component that applies a common schema of properties to information - in essence, that subjects, verbs, and objects are the "atoms" of meaning and the interconnection of subject-verb-object is a "molecule" of meaning.

The approach is not without its weaknesses. Primarily, it accepts information at face value, such that if a Web page claims to have been created by a house pet (which a human reader would recognize as a false claim), the analysis accepts this as fact. Also, when there are multiple conflicting sources of information (an article that considers contradictory statements), the RDF schema produces contradictory results, as it lacks the ability to arbitrate conflicting statements.

All things considered, the semantic web still requires much refinement to be usable for deriving the high-level meaning of information, though it can be useful in identifying information on the sentence level for human evaluation.

SOCIAL SEMANTIC WEB

As the name implies, the "social semantic Web" is a cross-pollination of semantic technologies and social networking, as a method of applying human interpretation to the analytical tasks of categorizing information. The trade-off between the two appears to be a categorization scheme that is mathematically clean, but inaccurate in its assessment of meaning, versus the ability of users to more accurately assess meaning, but on a more flexible and untidy schema.

The author uses the term "low end" to describe approaches such as "tagging" and "folksonomy" (where taxonomy is determined by those who lack knowledge and sophistication) - the implication being that, just as Web 2.0 has flooded the network with sure-generated content of dubious accuracy and value, so will the application of social semantics bloat and corrupt the categorization schema for data.

The author provides a few examples of using semantics to improve article content (Wikis), search engines, and blogs. In each instance, it requires the author to embed metadata in Web pages that help machines to parse the information - to indicate, for example, that a string of digits is meant to represent the population of a specific country, in a way that machines that seek to parse the content can understand it better.

A project to semantically interlink online communities (SIOC) seeks to coordinate the information that is available in online community sites. Each community site has a rich store of information, but there is little attempt to coordinate the information of one community with similar or complementary information in others. Since community sites are structurally similar (the discussion boards and chat rooms on one site is, except for the content, largely identical to those on another), and it should therefore be relatively straightforward to integrate them. The author does not comment on present progress, merely the theory, and suggests that it is a "major step in achieving the integration of social content."

The FOAF project (originally meaning "friend of a friend"), sought to explore connections between individuals, but it was discovered that an individual person may have a wide array of "names" online - multiple e-mail accounts, accounts and profiles on multiple sites, etc. - so preceding any attempt to ignore the connections between people was the need to identify a person, as an individual, in their various incarnations.

Semantic Web Services

The author discusses Web services as the preferred method for developing functionality across the Internet. To skip the details: web services are lightweight applications that communicate over standard Web protocols (HTTP, FTP, SMTP, etc.) and manage data stored in XML format.

A service is intended to be accessed by other applications, to which it provides data in response to an inquiry. The author expresses that this has "great potential" for the Semantic Web as services support machine-to-machine communication and have generated interest in standardization of formats and interoperability of systems.

A handful of specific technologies and standards relevant to Semantic Web services are listed (EN: I'm skipping the details - it's just a laundry list and there's not much original consideration shown).

CONCLUSION

The author asserts that the growth of the social aspects of the Web will be an ongoing trend, and that as the amount of information continues to increase, so will there follow a precipitating need to sort through the vast amounts of information on the Internet. Semantic Web technology is a potential answer to this growing problem, and shows significant promise.

He concedes that the "killer application" of semantic web technology has not been discovered yet. There are a few contenders, research into various aspects, but no single application has emerged. The work being done currently deals with infrastructure and foundational services upon which the future applications can be built, and there may remain more work to be done before the technology has sufficiently evolved.