let's call the whole thing off

Let’s call the whole thing off

Tout ce qui a besoin d'être dit l'a déjà été. Mais puisque personne n'écoutait, tout doit être dit à nouveau.

Everything that needs to be said has already been said. But since no one was listening, everything must be

said again.

André Gide, Le traité du Narcisse, 1891*

Although considered a mature and widespread concept, quality is a relative and largely subjective notion.

There is no unique conventional set of metrics for translation quality measurement and, as in many other

fields of application, translation quality broadly corresponds to the fulfillment of a set of specifications,

encompassing the buyer’s requirements.

Quality, utility and pricing Utility is defined as the ability of something to satisfy needs or wants. In this sense, it is quite similar to

quality, defined as ‘fitness for purpose’. Both refer to customer’s satisfaction for a good or a service.

In business, quality has a pragmatic meaning as the non-inferiority or superiority of something, but is

always an intuitive, conditional, and subjective attribute and may be interpreted differently by different

people.

In economics, utility is a representation of preferences over some set of goods and services.

As is the case for quality, utility cannot be measured. Nobel laureate Paul Samuelson named ‘revealed

preferences’ the choices outlining utility.

In economics, the marginal utility of a good or service is the gain from an increase or loss from a decrease in

the consumption of that good or service. In other words, the first unit of consumption of a good or service

yields more utility than the second and subsequent units, with a continuing reduction for greater amounts.

A good or service should then be consumed at a quantity at which the marginal utility equals the change in

the cost of producing one more unit of a good (marginal cost).

Due to information asymmetry, translation is supplied without qualitative differentiation across markets.

This makes it a typical commodity.

* Many thanks to Kirti Vashee for the quote.

http://en.wikipedia.org/wiki/Andr%C3%A9_Gide

http://en.wikipedia.org/wiki/Utility

http://en.wikipedia.org/wiki/Paul_Samuelson

http://en.wikipedia.org/wiki/Marginal_utility

http://en.wikipedia.org/wiki/Marginal_cost

http://thebigwave.it/quirks/asymmetry/

http://en.wikipedia.org/wiki/Commodity

Research claims that the demand for translation has been increasing, although at a slower pace in the last

few years. As the rate of commodity acquisition increases, marginal utility decreases, and if commodity

consumption continues to rise, marginal utility at some point may fall to zero, reaching maximum total

utility. Further increase in consumption of units of commodities causes marginal utility to become negative;

this signifies dissatisfaction.

Price is determined by both marginal utility and marginal cost, and this dynamic explains clearly not only

why the marginal cost of water is far lower than that of diamonds, but also why quality is an expected

feature in a good or service, which is not linked to its selling price.

Is then quality a way to differentiation? Is ‘purity’ the right way to differentiation? Is diversity still richness?

Is differentiation really important? Or do we have to wear camouflage to stay alive?

Most often, the buyer perceives translation as the only material available for scrutiny. Therefore,

particularly in translation, there is no such thing as absolute quality, with different jobs meeting different

requirements and different quality criteria.

To be reliable, translation quality assessment must be undisputable and repeatable; effective metrics must

be available that are objective (measurable), unbiased, and able to provide enough resolution (detail) to

assess the factors that need improvement.

Since there are no common protocol or tools for automated translation quality assessment, guidelines

enable a human team to perform this task while keeping error margin as low as possible.

However, so far, two different people following the same protocol could hardly achieve the same result (or

at least a comparable result).

In fact, the detailed and strict error-based evaluation models used so far have proved costly, ineffectual,

and erratic as they hardly consider content type, end-user requirements, and usability, in one word, fitness

for purpose. These models have been developed, unfolded, and implemented by linguists for linguists. They

focus on linguistic features instead of cost-effectiveness and functionality, with time and cost growing

linearly with volume.

Technology and Incomes

Google has made machine translation a General Purpose Technology (GPT), thus helping spread the

concept of translation as a utility. It cannot be, however, be accused of having contributed to the

commodification of translation, being the two concepts distinct and unconnected. If anything, Google

Translate has helped raise awareness for the importance of translation for the circulation of information

and knowledge, even if only indirectly.

The Moses engine has been harming translation more because it is charge-free and apparently easy and

convenient to implement, while, like any other complex technology, no matter how seemingly simple, it

requires specific skills, know-how, understanding, and patience. Improvisation does not pay. Gambling

neither, especially if the ultimate goal is to lower costs and increase profits. Just like players in the

translation industry would like prospects to see translation as an investment, professional users should see

machine translation as a complex technology and they should refrain from proclaiming themselves experts

just for being able to install and run a DIY of a piece of software. This applies for any software product.

In the last decade the acceleration in technology has shocked not only the industry, but virtually everyone.

Skills and institutions in the translation industry have not been able to keep pace with the rapid changes of

technology. Also in the translation industry, skill-biased technological change (SBTC) increases the incomes

of highly skilled workers and reduce the incomes and employment of low-skilled workers.

http://en.wikipedia.org/wiki/General_purpose_technology

http://en.wikipedia.org/wiki/Moses_(machine_translation)

http://www.econ.nyu.edu/user/violante/Books/sbtc_january16.pdf

As Erik Brynjolfsson and Andrew McAfee argue in The Second Machine Age, in the last decade, the fall in

demand has been greater for those who find themselves in the middle of skill distribution. Highly qualified

workers have done well, but workers with lower qualifications have been less affected than those with

medium qualifications, reflecting a polarization of labor demand and an interesting fact about automation.

Physical activities requiring a physical and sensory perception coordination have proved more resilient to

automation than basic data processing, following Moravec's Paradox, which claims that high-level

reasoning requires very little computation, but low-level sensorimotor skills require enormous

computational resources.

In this respect, a recent Economist article helps clarify this point. Lower qualified jobs are and will most

probably remain low paid; this makes replacement of highly qualified jobs with machines convenient,

especially in the long run, despite Keynes’s opinions (This long run is a misleading guide to current affairs. In

the long run we are all dead.)

As Brynjolfsson and McAfee suggest, the whole translation industry should pursue a strategy of innovating

and reshaping organizations, structures, processes, and business models to leverage developing

technologies and human skills. These would easier to achieve than technological disruptive innovations that

its history proved the industry is incapable to produce, to rather undergo and endure outsiders (see also

Moore’s Law and Commoditization (of Translation too)).

On the other hand, the more technologies are present in an industry, the harsher the competition. The

spread between the highest and lowest performers increases as well as the profit margin spread between

the companies at the top and at the bottom of the scale.

Going back for a moment to information asymmetry, it is worth recalling a study by Robert Jensen of the

John F. Kennedy School of Government of Harvard University on the digital provide in the fisheries sector in

Kerala. In Professor Jensen’s words, “when information is limited or costly, agents are unable to engage in

optimal arbitrage. Excess price dispersion across markets can arise, and goods may not be allocated

efficiently.”

Information technologies and mobile phones in Kerala allowed fishermen to access information on prices

and market demand in real time and use this information to make decisions. This resulted in a significant

reduction in price dispersion and improved market performance, after an initial drop in prices and

subsequent stabilization, with an eventual increase in profits.

http://en.wikipedia.org/wiki/Erik_Brynjolfsson

http://en.wikipedia.org/wiki/Andrew_McAfee

http://www.amazon.com/The-Second-Machine-Age-Technologies/dp/1480577472

http://en.wikipedia.org/wiki/Moravec%E2%80%99s_paradox

http://www.economist.com/news/united-states/21600131-too-many-degrees-are-waste-money-return-higher-education-would-be-much-better

http://en.wikiquote.org/wiki/John_Maynard_Keynes

http://www.s-quid.it/en/moore/

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115.6791&rep=rep1&type=pdf

http://en.wikipedia.org/wiki/Price_dispersion

A New Standard After a lustrum, a seemingly endless gestation, especially for

our fast-paced times, ISO/DIS 17100 (Translation Services —

Requirements for translation services) has eventually reached

the quasi final draft status (voting terminated on November

20, 2013). This draft has been submitted to the ISO member

bodies and to the CEN member bodies for a parallel enquiry,

which is about to end as well. Waiting for imprimatur, this 20-

page draft is available for purchase at CHF 66,00 (€ 54,18 or

US $ 74.50).

Very ambitiously, in its introduction, ISO/DIS 17100 declares

to specify “requirements for all aspects of the translation

process directly affecting the quality and delivery of

translation services.” ‘All’ is a very challenging word,

especially when it comes to a typical human task like

translation; in reality, more realistically, in its scope section,

the standard only “provides requirements for the core

processes, resources and other aspects necessary for the

delivery of a quality translation service that meets applicable

specifications.”

It is not a good start for a supposed state-of-the-art standard

made by presumably renowned experts.

In fact, ISO/DIS 17100 is a reworking of EN 15038 to partly accommodate ASTM F2575-06 and blink an eye

to the Chinese GB/T 19363 1-2003. ISO/TS 11669 is crucial to ISO/DIS 17100 framework.

ISO/TS 11669 is a technical specification. The shelf life of an ISO technical specification is six years: within

this timeframe it is either converted to a full standard or eliminated.

ISO/TS 11669 provides a framework for developing structured specifications for translation projects, but it

does not cover legally binding contracts between parties involved in a translation project. It addresses

quality assurance and provides the basis for qualitative assessment, but it does not provide procedures for

a quantitative measurement of the quality of a translation product.

ISO/TS 11669 describes a decision-making system about how translation projects should be carried out.

Those decisions — or project specifications — would then become a resource for both the requester (and

the translation service provider (TSP) throughout all phases of a translation project. These specifications

can be attached to a legally binding contract to define the work to be done. In the absence of a contract,

they can be attached to a purchase order or any other document supporting the request.

Requesters and TSPs should determine project specifications together. The project specifications can be

used to guide assessments made by either the TSP or the end user. The use of the same specifications by all

parties allows to avoid assessment based on personal opinions of how source content should be translated.

ISO/TS 11669 does not provide any procedures for quantitative measurement of the quality of a translation

product.

ISO/TS 11669 introduces translation parameters, intended as key factors, activities, elements and attributes

of a given project used for creating project specifications. However, the long listing of translation

parameters is a surreptitious way to levy vague and blurry translation quality assessment criteria, which are

traditionally subjective.

http://en.wikipedia.org/wiki/Lustrum

http://www.iso.org/iso/catalogue_detail.htm?csnumber=59149

http://en.wikipedia.org/wiki/Imprimatur

http://en.wikipedia.org/wiki/A_New_Leaf

In addition, since quality is defined as the degree to which the translation product conforms to the project

specifications, and no guidance is given for qualitative assessment, register should not be a parameter, as

its compliance to requirements is highly subjective.

Like EN 15038:2006, ISO/DIS 17100 specifies requirements that a provider of translation services must

meet, in terms of staff and equipment, project management and processes. Like EN 15038:2006,

ISO/DIS 17100 shows the typical conservatism of the translation industry. Although the EN 15038:2006

draft was finalized two years well before its release, like EN 15038:2006, ISO/DIS 17100 still reflects the

typical old business model of the whole translation industry. Like EN 15038:2006, ISO/DIS 17100 contains

no commitment towards metrics, and no hints on how the quality of these translations achieves a certain

level. Anyway, in one of the informative annexes, ISO/DIS 17100 contains a timid commitment towards

service level agreements (SLAs) that could outline such a framework.

Translators’ competences are still a weak point in ISO/DIS 17100. The TSP is required to “have a

documented process in place to ensure that the people selected to perform translation projects have the

required competences and qualifications,” but no means is envisage to ensure it shall anyway in any case. A

basic requirement for translator qualification is “a recognized graduate qualification in translation” or a

substantial full-time professional experience in translating. The same basic errors as in EN 15038:2006,

reflecting a candied view, which is now far away from reality, proving unfailingly the inadequacy of the

newbies being churned out by old-fashioned translation schools flocking the old and the new world. Not

surprisingly, these schools are under the thumb of the same advocates of EN 15038:2006 and

ISO/DIS 17100.

On the other hand, ISO/DIS 17100 takes translation vendor and project management into consideration, in

the view of the assurance that “the people selected to perform translation projects have the required

competences and qualifications.” According to the standard, “translation project management competence

can be acquired in the course of formal or informal training, e.g. as part of a relevant higher educational

course or by means of on-the-job training or by industry experience.” This is somewhat dismissive of the

importance admittedly acknowledge to translation project management, and yet is definitely much more

than the attention devoted to translation vendor management, which is in fact crucial. Indeed, the

standard does not envisage any requirement in this respect.

Here comes the biggest flaw in ISO/DIS 17100, in section 5.3 Translation process. With the typical dirigist

trait of translation scholars, the abundance of details is not accompanied by any specification of

requirements as to who and how should monitor the several tasks in the process, ending in an utter

manifestation of the typical wishful thinking that permeates the industry.

A blatant example is given in section 5.3.3 Revision. Beyond the impractical revival of the typical academic

approach based on contrastive analysis, no indication is given about the base to “correct any errors found

in the translation output or recommend the corrective measures to be implemented”, leaving any decision

entirely to the reviser’s discretion, thus wide space for the introduction of further errors.

ISO/DIS 17100 still contains all the flaws and limitations of EN 15038:2006 and incorporates some from

ASTM F2575-06, although both left much room for improvement, and four years of life for both at the start

and as much of drafting were enough time span for doing better.

Annex A and G are perfect examples in these respect. They seem quite a divertissement in themselves, with

the translation workflow outlined in the first one still offering a monolithic serial model afar from agility,

and a ‘DOK’ in the latter, with no definition/elaboration, being something that would most probably disturb

the sleep of many uninformed readers. For being informative, both annexes surely miss their goal.

Annex B offers a list of elements to be included in an agreement as project specifications possibly in “the

form of statements of work such as a service level agreement (SLA),” but it gives no definition for

statements of work (SoW) or SLA.

Standards are all about allowing stakeholders to overcome information asymmetries and make informed

decisions; to this end, they must be simple, functional, and end-user oriented.

ISO/DIS 17100 is another missed opportunity to gain respect and consideration for the translation industry.

Measurability and Metrics The quality process standard par excellence, ISO 9001:2008 is based on the assumption that regulating and

systematizing tasks in repeatable processes, with strong audit trails, will eventually lead to control

production processes and products/services delivered with repeatable quality (attributes).

Over the years, the concept of continuous improvement has been spreading, to be eventually incorporated

in this standard. While leading industries developed complementary sets of techniques and tools for

process improvement, the translation industry pursued its own standards, which respected its peculiarity

and the special nature of its services.

The manufacturing industry applied the concept of Kaizen and conceived Total Quality Management (TQM),

Six Sigma (6) and CMMI to improve the quality of process outputs by identifying and removing the causes

of defects (errors) and minimizing variability in manufacturing and business processes.

The table below gives a measure of process performance corresponding to Sis Sigma levels roughly

expressed in errors per million units.

Sigma level DPMO

Percentage

yield

1 690,000 31%

2 310,000 69%

3 67,000 93.3%

4 6,200 99.38%

5 230 99.977%

6 3.4 99.99966%

This means that, in a 10,000 word projects, the seemingly minute difference between 99,38% and

99,99% means 62 errors compared to 1; 2 errors every three pages compared to only 1 in total.

In the language industry, quality is a most debated subject. The most commonly asked question about

quality is: how can quality be measured? To measure something, you must know what it is, and then

you must develop metrics that measure it.

Metrics definition is the hardest part for people who have always thought of quality in their

deliverables as a questionable subject.

The best way to assess quality remains measuring the number and magnitude of defects, and when

defects cannot be physically removed, their features and scope must be specified.

The first step, then, is to establish a model or definition of quality, and translate it into a set of metrics

that measure each of the elements of quality in it. Measuring things just because they can be

measured is not useful. If something is not relevant to the quality model established, it is not a good

use of time to develop metrics to measure it.

http://en.wikipedia.org/wiki/Kaizen

http://en.wikipedia.org/wiki/Total_quality_management

http://en.wikipedia.org/wiki/Six_Sigma

http://en.wikipedia.org/wiki/CMMI

http://en.wikipedia.org/wiki/Defects_per_million_opportunities

Striving for a single, all-encompassing metric is not only troublesome, it can be useless, as a simple

metric would not reveal all the problems. Creating multiple metrics that assess the various aspects of

what is to be measured can help re-compose the overall framework: knowing which parts of a process

work well and which ones do not allows to take measures to correct the problems.

A comprehensive set of metrics must measure quality from several perspectives and at several points

during the production process, regardless of the quality model. At a minimum, metrics should tell

something about:

The quality of the finished product or the lack of it;

The quality of the process, i.e. how reliable it is to produce quality products;

The likelihood of achieving quality in a deliverable.

The quality of the finished product corresponds to general customer satisfaction ratings, while the lack of

quality can be given by defects such as technical errors, the quality of process comes from repeatability,

and typical predictors of quality are in-process indicators such as editing.

Sampling In this perspective, the distinction is important between quality assurance, quality assessment, and quality

inspection and control.

Quality assurance is a planned and systematic pattern of all actions necessary to provide adequate

confidence that the item or product conforms to established technical requirements. Quality assurance

covers all activities, in accordance with two basic rules, “fit for purpose” and “do it right the first time”.

Quality control and quality assessment contribute to quality assurance.

Quality assurance is the full set of procedures applied before, during and after the production process, by

all members of an organization, to ensure that quality objectives important to clients are being met.

Quality assessment is intended for establishing whether contract conditions have been met. Whereas

quality control is product-oriented and customer-oriented, quality assessment is business-oriented.

Unlike quality control, which always occurs before the final product is delivered to the client, quality

assessment may take place after delivery. Assessment is not part of the production process. It consists in

identifying — but not correcting — problems in one or more randomly selected samples of a product

output to determine the degree to which it meets the agreed standards.

In the translation industry, quality control is done with specific software tools, whether standalone or

integrated in translation environments. These tools usually detect mechanical errors, spelling errors,

omissions, inconsistencies, and oversights, especially when reference material is provided.

Nevertheless, since there is no ‘perfect’ translation, the intended purpose of a translation and its suitability

remain the only judgment criteria which, for the sake of objectivity, must be accompanied by assessment

metrics. The combination of process and output quality assessment of translation work will eventually tell

simply whether it is acceptable or defective.

Therefore, translation quality assessment (TQA) criteria are to be agreed upon with the client, be subject of

requirements and be formalized in a separate document.

So far, TQA has been performed on the basis of a strict correspondence between source and target texts

and on intensive error detection and analysis. While this could be the best approach from a theoretical —

and maybe pedagogical — point of view, it is uneconomic. It requires a considerable investment in human

resources and time, and it reduces translation to a matter of trust.

On the other hand, who will go over 100,000 words of translation to check for terminology changes after a

translation has been delivered? However, if terminology issues can be approached in a systematic way,

style is a matter of personal preferences. The same goes for correctness and meaning with respect to

completeness. Any translation can be fully checked, automatically, for comprehensiveness with the source

text, freedom from mechanical flaws or errors, and even for grammar, intended as correctness as

conforming to an approved or conventional standard. In any case, any job done by a professional translator

is taken for granted as free from such defects.

Today, any large translation project follows the same standards and rules as a production process in

common business. In this perspective, defects as such should positively be reproduced in the same

conditions, corrected and then removed.

A first step towards improvement in the quality of process outputs consists in preventing the insurgence of

defects by minimizing variability in processes. To this end, a detailed statement of work and an accurate

style guide can be helpful — although time consuming — in most situations, possibly together with

examples of do’s and don’ts. This approach could eventually lead to set defect tracking and assessment

procedures.

Here comes inspection.

Just like any other object, to be measurable, a translation, especially when large, should be apportioned in

definite allotments, to be homogeneous in size and scope for a reasonable estimate in the number and

significance of defects and set a limit for both.

Such apportionment is called sampling. Sampling becomes necessary for any translation project exceeding

a typical freelancer’s single-day capacity, making 100% inspection not sustainable.

Sampling will allow for inspection of meaningful, representative batches, and for accepting or rejecting

them through the determination of the maximum number of defects, based on simple pass/fail criteria.

Acceptance sampling is the middle-of-the-road approach between no inspection and full inspection. Its

main purpose is to decide whether a lot is acceptable, not to estimate its quality. To determine

acceptability, criteria for inspection by attributes must be specified in advance.

Once criteria for inspection are specified, acceptability thresholds must be set. The ISO 2859 series of

standards can be used here as a reference.

For acceptance sampling to be effective, a lot acceptance sampling plan (LASP) must be implemented

indicating the conditions for acceptance or rejection of the lot that is being inspected. These parameters

are usually the number of different defectives in a sample and should vary in quantity and severity in direct

relation to the importance of the characteristics inspected.

Average Outgoing Quality (AOQ) procedures are the best suited for translation projects, since sampling is

non-destructive, lots are fully inspected and all defectives in rejected lots are replaced with good units. In

this case, all rejected lots are made perfect and the only defects left are those in lots that were accepted.

AOQ expresses the average nonconforming fraction that is shipped to clients:

Np1PpnN

PpnNAOQ(p)

A

A

where PA is the probability of accepting the lot, (N-n)PA is the number of pieces that are shipped without

inspection, and p is the nonconforming fraction. The numerator is the number of bad pieces that are

shipped, and the denominator is the total pieces shipped.

Corrections are made to make rejected lots perfect and allow for identifying and removing the causes of

defects, thus preventing their insurgence by improving processes and then the quality of outputs.

To make assessment criteria, methods and tools unambiguous, AQLs (Acceptance Quality Levels) can be

used allowing for tolerance and deviations (errors). AQLs should be agreed upon in a SLA and should specify

the maximal percentage of non-conforming items to be considered as a satisfying process mean. Different

AQLs may be designated for different types of defects.

An implication of acceptance sampling is that a lot exceeding a given percentage of deviations from the

AQL is unsatisfactory and must be rejected. At the same time, a high defect level (Lot Tolerance Percentage

Defective, LTPD) must be designated that would be unacceptable to the consumer.

AQLs imply that a level of non-quality exists in a product where defects remain that ruin a batch, despite

being ‘acceptable’. This level represents a compromise between quality, volume and price negotiated.

To set AQLs, a simple defect prediction technique can be implemented to separate the defects found in a

translation sample in two groups. Depending on the number of defects found in either of the two groups —

but not in both — the defects that have not been found in the sample can then be estimated. This number

gives approximately the number of defects in the entire project.

Drawing Samples

A sampling is a subset of a production output to estimate characteristics of the whole output. The sample

drawing process consists of:

Defining the production output;

Specifying a sampling frame, a set of items to measure;

Specifying a sampling method for selecting items from the frame;

Determining the sample size;

Implementing the sampling plan;

Sampling and data collecting;

Data that can be selected.

In most cases, it is inconvenient and uneconomic to sentence a batch of material from production

(acceptance sampling by lots) by identifying and measuring every single item in the production output and

including any one of them in the sample.

Given the variety and variance in projects, the need to use different providers to match (large) volumes

with (tight) deadlines, and the consequent unpredictable nature of translation, simple random sampling

(SRS) is the most advisable method to minimize bias and simplify the analysis of results.

In SRS, the variance between individual results is a good indicator of the variance in the sample, which

helps estimate the accuracy of results, even though the randomness of a selection may result in a sample

that does not reflect the makeup of the overall output.

Assuming a source content for a translation project is homogeneous per se, the size of samples could be

determined according to the type of deliverables and AOQ.

Purity and Quality

In recent years, Statistical Machine Translation (SMT) have become interesting particularly for LSPs, mostly

thanks to the availability of the free Moses engine.

However, contrary to expectations, corpus creation can be costly for a system to run effectively and

satisfactorily. In fact, for quite some time now, a distinction has been made between generic SMT and

customized SMT, where customized the latter leverages domain resources for phraseology, terminology,

and style. In this respect, a further distinction has been made between clean data and quality data. In

reality, the latter include the first. The following table should help clarify this concept.

http://en.wikipedia.org/wiki/Sampling_frame

http://en.wikipedia.org/wiki/Simple_random_sample

http://en.wikipedia.org/wiki/Sampling_size

Clean Data Quality Data

Small number of trusted quality sources Actual data

Domain relevance (restricted) Standard length sentences

No less than 1,000 segments Terminologically consistent

Encoding consistency Consistent writing style

No empty segments No mistakes or errors (syntax, grammar, spelling)

No mechanical errors (diacritics, punctuation, capitalization, spelling)

Correct translation (exact words, morphology, no loans)

Cleaning data for training purposes can be performed automatically or semi-automatically with the aid of

software tools. These tools can be used to run a series of checks on parallel data, e.g. for no empty

segments, unbroken markups, correct numbers, etc. and even for consistent translations and

correspondence with approved terminology.

Refining data for quality, i.e. to match the intended purpose and target audience with preferred writing

style and terminology, is a human task requiring thorough understanding of the data.

let's call the whole thing off

Education