designingriasforsearchengineaccessibility

8/15/2019 DesigningRIAsForSearchEngineAccessibility

1/8

Designing Rich Internet Applications

For Search Engine Accessibility


2/8

Designing Rich Internet Applications For Search Engine AccessibilityPage o 8

Rich Internet Applications create new opportunities. The

most undamental o these is the ability to create Single Page

Interaces (SPIs). A SPI is an interace that consists o a singleHTML page. Additional inormation that is required, when

the user clicks on a link or when some other event occurs,

is not supplied by means o a traditional ull page reload,

but is instead retrieved via an XML message. The original

page remains intact, its contents or state is simply updated

by the contents o the XML message. JavaScript is used to

acilitate this whole process. Although it is not mandatory to

create a SPI, when using Backbases sotware, a SPI provides a

more intuitive user interace and smoother user experience.

There are a ew questions that need to be answered however

when you make use o this new paradigm. One o the main

questions is that o search engine accessibility and deeplinking.

The web sites that have been created up until now, consist

almost entirely o Multi Page Interaces (MPIs). These web

sites and applications consist o multiple unique pages, which

may or may not have been dynamically generated. Since each

page, and or dynamic pages every page state, has a unique

URI; it is very easy to link to any page or state within this site.

Navigation between pages is done by the user clicking on

links or submitting orms, both o which contain the location

and state inormation or the new page. It is these unique

URIs that make deep linking possible. Deep linking does not

just link to a particular web site, but links directly to a specic

page within the site. It is this MPI paradigm which inorms the

robots which are used by search engines such as Google orYahoo to index the inormation in web sites. Search bots are

sotware agents that crawl through web sites; they start at

the index page and, ater categorizing all o the inormation

on the page; they ollow the links on this page to other pages

on the site. In this way they crawl through the entire web site,

visiting any page that has been linked to using a link tag o

the type:

Next Page

However in an SPI, the linked page structure that the search

bot is expecting, has been extended with BXML commands,which indicate the use o include les, load commands and

orm submissions, which only partially update the page,

instead o causing a ull reload as is the case with normal

orms. Since search bots arent proper web browsers, they

dont understand or execute any JavaScript. This means that

a Backbase SPI needs to be specically designed to work

with these search bots.

This article puts orward a set o guidelines, which you can

use to design your SPI or maximal search engine accessibility

and shows you techniques to allow or deep linking into your

SPI.

Introduction


3/8

Designing Rich Internet Applications For Search Engine Accessibility Page o 8

Several approaches are available or making your web site

accessible to search engines; these approaches dier in

the level o indexing, which is obtainable and how this isachieved. For certain sites, it is not necessarily a requirement

that every part o the site can be indexed by search engines.

For example, a site, which provides a web-based e-mail

service, does not require every single piece o inormation on

the site to be indexed by a search bot. Other sites, however,

do require that every piece o inormation can easily be ound

and indexed by search engines. For example, a web site with

inormation about the courses provided by a university is

such a case. Backbase has identied the ollowing strategies

or getting a SPI indexed by search engines:

Lightweight Indexing: no structurally changes are madeto your site; existing tags such as meta, title and h1 are

leveraged.

Extra Link Strategy: extra links are placed on the site,

which search bots can ollow and thereby index the whole

site.

Secondary Site Strategy: a secondary site is created,

which is ully accessible to the search engine.

For each o these strategies the ollowing questions will be

answered:

To what extent is the content o the page indexed?

Can links be ollowed on the page (e.g. link elements () or s:include elements)?

When a link is ollowed by the search bot, what is the

status o the URL that is being indexed. Can this URL be

displayed by browsers or will some type o redirection be

required?

Lightweight Indexing

This strategy should be used i only certain key inormation

needs to be indexed by search engines. In this case it is

recommended that you take the ollowing steps when

designing your SPI:

Use a title element in the document head, preerably

containing one or more keywords that specically relate

to the contents o the site. For example:

BXML WebMail Sign In

Use a keywords meta element with a contentattribute

containing some appropriate keywords. For example:

Use a descriptionmeta element with a contentattribute,

which contains a relevant description o the web page.

The value o this element is oten printed as part o a

search result by Google. For example:

Place key content within the main HTML structure and

not in an include le, or some other dynamically loaded

content. I possible, place this important content within

a h1, h2 or h3 element, since search bots deem these to

contain more important inormation. Remember that

these tags can be styled in anyway you want using CSS.

It should be noted that these points can also be put to good

use, in the design o your SPI, in conjunction with the extra

link strategy or the secondary site strategy.

In summary by using this lightweight-indexing strategy

only the content supplied by the title and meta elements and

those elements that are directly located on the index page

is indexed. No links o type s:include are ollowed; thereore

there is no requirement to deal with redirection. This is not a

very ull indexing scheme, but it is extremely simple to apply

to your site.

The Extra Link Strategy

There are two main approaches to making a site ully

indexable by search engines: the extra link strategy and the

secondary site strategy. The extra link strategy is the easiest

o these two to implement and it can make the site entirely

indexable by search engines, but does not create a secondary

site in normal HTML and is thereore not accessible to older

browser, which are incompatible with BXML. The essence o

this strategy is to create an extra link on the main SPI index

page or each include le, whose contents you wish to be

indexed. Some experimentation has revealed that the extra

links must be o the type:

include 1

Making SPIs Search Engine Accessible


4/8


The ollowing points must be ollowed, i you want Google to

index these pages:

The link must be made by an a element and the include

le must be indicated by the hrefattribute.

The include le must have the .htmlor .htm le extension.

This is a bit o workaround, since in reality include les

arent proper HTML les but are instead XML les. However

i you use a divelement or a similar HTML element as the

root tag, then all modern browsers will be able to read the

le as i they were HTML and Google will index it. As ar

as the BPC (Backbase Presentation Client) is concerned, it

merely stipulates that a include le should be well-ormed

XML and isnt interested in which le-type extension ituses.

NB: The include les should not have a XML declaration or

a document type denition, otherwise Internet Explorer

will be unable to accept .htmlor .htm les as include les.

The link tag must have some text content. Without this

Google will simply ignore it.

No attempt should be made at using HTML to hide these

links, since Google rowns on this and may not index such

pages. You can however use BXML to remove or hide

these links, by way o a constructevent handler, as shown

in the example below:

Left Panel

Right Panel

It is not necessary to detect the user agent o the search bots

(see the appendix at the end o this article or ull details o

this process), since they will simply ollow the extra links that

are provided or them. However it is necessary to do some

detection when these include les are being served up. This

is tricky since these include les can be requested by the user

in two dierent ways. When a user is directed to one o these

pages through a search engine, they need to be redirected

to the main index page. On the other hand when the BPC

requests theses pages as include les, no redirection should

occur. Due to the act that both search bots and the BPC

ignore meta reresh tags it is possible to solve this problem.

Such a meta reresh tag must be included directly inside the

body o the include le. Even though these tags are normally

placed inside the headelement, they will still be executedanywhere in the body by all BXML-compatible browsers.

Below is an example o such a meta reresh tag:

Once the browser has been redirected to the SPI index page,

this page must parse out the reerrer and trigger an event

handler, which will update the state o the SPI accordingly.

This process o detecting deep linking and updating the

page state is explained in much more detail in the appendix

at the end o this document.

In summary the extra link strategy makes the whole site ully

indexable. By adding extra link elements search bots are able

to index all pages o the site. However since the URLs o the

pages that get indexed, point to include les, which arentully BXML-capable pages, it is necessary to redirect normal

browser back to the SPI version o the site and then update

the state o this SPI accordingly.

The Secondary Site Strategy

The secondary site strategy is the most complete o all o the

indexing strategies. It is also the most labor intensive. The

site should be made out o plain HTML and contain a linked

multi-paged structure. Though this may seem laborious;

having a secondary site to all back upon makes your siteavailable to people that are using older browsers, which

arent supported by Backbase, as well as browsers on mobile

devices and to disabled people. This gives you a chance to

make your site accessible to all users, not just search engines.

This strategy has three important components:

1. Generating the secondary sites pages.

. User-agent detection o both the search bots and

BXML-compatible browser.

. Redirection o browsers and the detection o this

redirection, which allows the status o the SPI to be

updated to refect this deep linking.

Generating the Search Engine Accessible Pages

The search engine accessible pages can be generated

in several ways. It is possible to manually generate the

secondary all-back site. It is also possible to automate this

process using XSLT.

Manual Site Generation. This is a simple, lo-tech solution,

but it is also labor-intensive, since you have to build two

versions o your web site. There is also a danger that when

you update your site with new inormation, you will orget to

update the secondary pages. This will cause the two versionso the site to be out o sync with each other and or the

inormation ound on search engines to not be up to date.


5/8


The ull solution to this problem consists o two parts. Firstly,

BXML compatible browsers need to be redirected to the

SPI version o the site. And secondly, the SPI version needsto detect that it has been redirected rom one o these

deep linked pages and then update the state o the page

accordingly, so that the inormation relevant to this link is

shown.

Browser Redirection. When one o the MPI pages intended

or the search engine, is requested, the user agent must

be detected again. However, in this case when a BXML-

compatible browser is detected, it is redirected and not the

search bot. The browser is sent to the index page o the SPI

version o the site.

Detecting Deep Linking. The BXML version o index.html

needs to ascertain rom which page it was reerred. This must

be done as soon as the page is loaded, so that the transition

appears to be seamless to the user. Full details o how to

detect deep linking and how to update the page state can

be ound in the appendix at the end o this article.

In summary the secondary site strategy makes the whole

site ully indexable. Since the search bot is redirected to a

normal HTML site, all links are ollowable by the search bot.

However since the URLs o the pages, which get indexed

when the links are ollowed, point to non-BXML pages, it is

necessary to redirect normal browser back to the SPI version

o the site and then update the state o this SPI accordingly.

XSLT-Driven Generation. An alternative strategy, which is

especially eective i you use a content management system

(CMS), is to store all o the inormation, or at least the copyor your site as plain XML. This can be in a ormat dened

by yoursel or your CMS. This XML must then be transormed

into BXML using an XSLT. A second, much simpler, XSLT

is used to transorm the XML into the secondary, search-

engine accessible site. Although this approach requires a

little more eort when you initially develop the site, once

both XSLTs are ready, new content can easily be added to the

XML data source and then both versions will be generated

automatically.

User-Agent Detection

A vital component o this two-site strategy is browser

detection. Techniques that can be used or user-agent

detection are discussed in the appendix at the end o the

article. Once the user agent has been detected it is necessary

to make sure that the BXML-compatible browsers get sent

to the BXML site and that the search bots and non BXML-

compatible browsers get sent to the accessible site.

Deep Linking and Browser Redirection

This section looks at an issue that arises rom having a

secondary multi-paged version o your site that is indexed

by search engines. The solution to this problem also

immediately oers a solution to the issue o deep linking

in an SPI. The issue boils down to the act that a site with

multiple pages is being used to represent a site that consists

o only a single page. Lets take an example to illustrate this

problem: a simple SPI, which consists o a main index page

that itsel consists o a tabbed interace, which contains

three dierent tabs. The contents o each o these tabs will

be stored in a separate include le and be loaded into the SPI

as and when they are required. Thereore to make this siteindexable by a search engine, a MPI version o this site would

presumably have been made with one index page (e.g.

index.html) and three separate HTML pages representing

the include iles or each o the tabs (e.g. tab1.html, tab2.

html and tab3.html). Now i a users search term closely

matched something indexed on the third tab, then the

search engine would point the user to tab3.html. However,

in reality, you do not want your user to be redirected to tab3.

html. Instead, you want him to be sent to the index.html

page o the SPI version o your site and when this page is

opened, the third tab, which correlates to tab3.htmlshould

be selected.


6/8


Google especially and presumably other search engines

deeply rown upon any attempts to try and unairly

manipulate search results. Any site that is caught willullytrying to manipulate Google will be banned rom Googles

index. Redirection to another site, with dierent content,

based on the user agent is technically called cloaking and

is rowned upon. Thereore, you should make sure that the

inormation conveyed by any secondary web sites, which

have been set up, with the intention o making your site

indexable by Google and other search engines, is exactly the

same as the inormation contained by your BXML site.

Ethics


7/8


User-Agent Detection

A vital component o both the secondary site strategy and

the extra link strategy is browser detection. The technical

term or a web browser or a search robot or any other piece

o sotware that approaches a web site is a user agent. When

a user agent requests a particular page, it supplies details o

itsel by way o one o the HTTP headers that are sent along

with the request. The Fireox browser or instance sends the

ollowing request header:

User-Agent: Mozilla/5.0 (Windows; U; Windows

NT 5.1; en-US; rv:1.7.8) Gecko/20050511

Firefox/1.0.4

It is thereore relatively straightorward to write a script,

which determines what the user agent is, and then redi-

rects the user agent to the appropriate version o the site.

The most straightorward technique is not to try to nd the

search bots or other incompatible browsers, since this group

is relatively large and hard to qualiy. It is easier to determine

whether the user agent is a BXML compatible browser and

then assume that i the user agent isnt one o these then it

is either a search bot or an incompatible browser. The ollow-

ing browsers are BXML compatible:

Internet Explorer .0 and newer

Mozilla 1. and newer

Fireox 1.0 and newer

Netscape . and newer

User-agent detection can be done on the server using a PHP,

ASP or JSP script. There are standard libraries, which help take

care o this. Alternatively i you cannot or do not wish to use

server-side scripts to determine the user agent, it is possible

to do this in JavaScript. I you take this approach, you should

be aware o the act that search bots cannot be expected to

execute any JavaScript. Thereore i you are using the sec-

ondary site strategy in conjunction with JavaScript based

detection, the deault page provided by the initial page re-quest must be the non-BXML site, which is intended or the

search engine bot. When you ascertain that the user agent is

a BXML-compatible browser, then JavaScript should redirect

the browser to the BXML version o your site. The ollowing

code ragment shows a simple JavaScript unction, which

tests whether a BXML-compatible Mozilla-based browser is

in use and then redirects the browser based on this.

function testUA(){

var bCompatible = false;

var sUA = window.navigator.userAgent;

var iIOGecko = sUA.indexOf(Gecko);

//Test if the User-Agent string contains

//the string Gecko

Appendix

if (iIOGecko >= 0){

//extract the string directly after rv:

//and check valuevar iIOrv = sUA.indexOf(rv:);

var sRv = sUA.substr(iIOrv + 3, 3);

if (sRv >= 1.5)

bCompatible = true;

}

//now if compatible redirect

if (bCompatible)

window.location.href = bxmlIndex.html;

}

This unction is relatively straightorward but certain parts

may need explaining. Firstly, both Netscape and Fireox

browsers use the same Gecko core as Mozilla does. They

also have similar User-Agent strings. Thereore, the unction

above rstly searches or a Gecko sub-string, which all o

their User-Agent string will contain. Once this sub-string has

been ound, the unction searches or the rv: sub-string. This

is short or revision and it is ollowed by the version number

o the Gecko engine. I this number is 1. or higher, then the

Gecko engine is BXML compatible. Thereore, this relatively

simple unction is able to test or all compatible Netscape,

Fireox and Mozilla browsers.

Obviously, it is also necessary to test or compatible versions

o Internet Explorer too. This can be done in a similar way,but there is one added complication. All compatible versions

o Internet Explorer have a User-Agent string that contains

the sub-string: MSIE, which is directly ollowed by the ver-

sion number. Below is an example o such a header rom an

Internet Explorer browser.

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0;

Windows NT 5.1; SV1; .NET CLR 1.1.4322)

However unortunately Opera browsers have a very similar

User-Agent string:

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0;

Windows NT 5.1; en) Opera 8.00

Thereore you must rstly test that the User-Agent string

doesnt contain the Opera sub-string and once this has been

ascertained, then simply parse out the version number which

ollows the MSIE sub-string.


8/8

Designing Rich Internet Applications For Search Engine AccessibilityPage 8 o 8

Detecting Deep Linking and Updating the

Pages StateThis section looks at how redirection based on deep linking

can be detected and then at how the state o a page can

then be updated using this inormation. Deep linking can be

detected on the server by reading the ReferrerHTTP request

header using a server-side script. Once the reerrer has been

read then an appropriate construct event handler must be

created, which updates the initial state. Alternatively, i you

do not have access to server-side scripting, you can use

a JavaScript unction to do this. The js action is a special

BXML action, which is used to call JavaScript unctions. The

ollowing behavior takes care o calling this unction when

the page is loaded:

... Other event handlers go here ...

The updateState unction, which this action calls, then needs

to parse out the reerrer. Once this value has been ound

the JavaScript unction triggers an appropriate BXML event,

hereby passing control back to the BPC. This is done by calling

the execute method o the bpcobject, with a BXML string. A

simple version o such a unction looks like this:

function updateState(){

//rst parse out the value of the referrer

//var sReferrer = document.referrer;

//do quick test to make sure that referrer

//is from the same host

if(sReferrer.indexOf(

window.location.hostname) >= 0){

var iLastSlash =sReferrer.lastIndexOf(/);

var sValue =

sReferrer.substr(iLastSlash + 1);

//trigger an event with the same name as

//the referrer

var sExecute = ;

bpc.execute(sExecute);

}

}

You should note that this is a very simplistic implementation osuch a reerrer parsing unction. For a more complicated web

site structure, it is important that it is totally unambiguous

which page the reerrer was, otherwise mistakes can be

made. For such cases, more complicated JavaScript will be

required to veriy this.

Now nally lets look at an example o the type o event

handler that could be triggered by such an updateState

unction:

... Other event handlers go here ...

This behavior contains an event handler or the custom event

tab3.html, which is triggered by the JavaScript unction when

redirection has occurred rom the tab3.html page. All it does

is perorm a selectaction on a target with an id o tab. I this

corresponds to the appropriate tab, then simply by selecting

this tab, the tab should be loaded and become visible.

designingriasforsearchengineaccessibility

Documents