Machine Trans EN 5

From China Studies Wiki
Jump to navigation Jump to search

Machine Translation - A challenge or a chance for human translators?

Overview Page of Machine Translation

30 Chapters(0/30)

Machine_Trans_EN_1 Machine_Trans_EN_2 Machine_Trans_EN_3 Machine_Trans_EN_4 Machine_Trans_EN_5 Machine_Trans_EN_6 Machine_Trans_EN_7 Machine_Trans_EN_8 Machine_Trans_EN_9 Machine_Trans_EN_10 Machine_Trans_EN_11 Machine_Trans_EN_12 Machine_Trans_EN_13 Machine_Trans_EN_14 Machine_Trans_EN_15 Machine_Trans_EN_16 Machine_Trans_EN_17 Machine_Trans_EN_18 Machine_Trans_EN_19 Machine_Trans_EN_20 Machine_Trans_EN_21 Machine_Trans_EN_22 Machine_Trans_EN_23 Machine_Trans_EN_24 Machine_Trans_EN_25 Machine_Trans_EN_26 Machine_Trans_EN_27 Machine_Trans_EN_28 Machine_Trans_EN_29 Machine_Trans_EN_30 ...

Back to translation project overview

To the To Do list

Chapter 5: Problems in translation study

Muhammad Saqib Mehran, Hunan Normal University, China

Abstract

Key words

Introduction

In this chapter we will look at some of the particular problems that translation task poses for the machine translation system manufacturer, some of the reasons why translation is difficult. Ambiguity, (ii) problems arising from structural and lexical differences between languages, and (iii) multi-word units such as idioms and collocations. Typical problems of ambiguity are discussed in Section 6.2, lexical and structural mismatches in Section 6.3, and units in Section 6.4.

Of course, these kinds of problems aren't the only reasons TM is difficult. Other issues are the size of the company, as evidenced by the number of rules and dictionary entries a realistic system needs, and the fact that there are many constructions whose grammar is not well understood, in the sense that that they are not well understood. It is clear how they are to be presented or by what rules they are to be described, even in well-studied English . and for which there are detailed descriptions - both traditionally "descriptive" and "descriptive" theoretically sophisticated - some of which were written with arithmetic usability in mind.It's an even worse problem for other languages.

While there is a reasonable description of a phenomenon or construction, there is no trivial problem in creating a description that is accurate enough to be used by an automated system.

Ambiguity

In the high-quality of all feasible worlds(as some distance as maximum Natural Language Processing is concerned, anyway) each phrase could have one and simplest one meaning. But, as all of us know, that is not the case. When a phrase has a couple of meaning, it's miles stated to be lexically ambiguous. When a word or sentence may have a couple of shape it's miles stated to be structurally ambiguous.

Ambiguity is a pervasive phenomenon in human languages. It could be very tough to locate words that aren't at the least methods ambiguous, and sentences which are (out of context) several methods ambiguous are the rule, now no longer the exception. This isn't simplest tricky because a number of the options are unintended (i.e. constitute incorrect interpretations), however because ambiguities ‘multiply’. In the worst case, a sentence containing words, every of which is methods ambiguous can be 4 methods.

1. (a) Do not use abrasive cleaners on the printer cabinet.

(b) The use of abrasive cleaners on the printer housing is not recommended. In the first sentence the usage is a verb and in the second a noun, that is, we have a case of lexical ambiguity. An English-French dictionary says that the verb can be translated by (ua) to operate by and employers, while the noun is translated with employment or use. A reader or automated parser can determine if a noun or verb is used in a sentence by finding out whether it is grammatically possible to have a noun or verb where it occurs. In English, for example, there are no grammatical word sequences that make up the V PP, so of the two possible parts of speech to which the usage can belong, only the noun in the second sentence is possible (1b).

As we showed in Chapter 4, we can provide translation engines with information about grammar in the form of grammar rules. As a noun, it can refer to both the familiar little round object used to attach clothes, and the button on a device.

In order for the machine to choose the correct interpretation, we must give it information about the meaning dangerous, since applying a grammar to a sentence can produce a number of different parsees depending on how the rules are applied, and we may end up with one get large number of alternate parses on a single sentence Ambiguity can match true meaning ambiguity, but very often it does not, and it is when it does. not that we want to eliminate by applying information approximately meaning. We can illustrate this with a few examples. First, allow us to display how grammar rules, differ- ently applied, can produce multiple syntactic evaluation for a sentence. One manner this can arise is in which a phrase is assigned to multiple class with inside the grammar. For example, anticipate that the phrase cleansing is each an adjective and a verb in our grammar. This will permit us to assign one of a kind analyses to the subsequent sentence.

(2) Cleaning fluids may be dangerous

One of those analyses may have cleansing as a verb, and one may have it as an adjective. In the former (much less plausible) case the experience is ‘to smooth a fluid can be dangerous’, i.e. it's miles approximately an pastime being dangerous. In the latter case the experience is that fluids used for cleansing may be dangerous. Choosing among those opportunity syntactic analyses requires information approximately meaning. It can be really well worth noting, in passing, that this ambiguity disappears while can is changed through a verb which suggests wide variety settlement through having one of a kind paperwork for 1/3 character singular and plural. For example, the subsequent aren't ambiguous on this manner: (3a) has most effective the experience that the movement is dangerous, (3b) has most effective the experience that the fluids are dangerous.

(3) a. Cleaning fluids is dangerous.

b. Cleaning fluids are dangerous.

We have visible that syntactic evaluation is beneficial in ruling out a few incorrect analyses, and this is every other such case, since, through checking for settlement of challenge and object, it's miles possible to discover the perfect interpretations. A gadget which not noted such syntactic information might must take into account these kind of examples ambiguous, and might must discover a few different manner of operating out which experience turned into intended, going for walks the chance of creating the incorrect choice. For a gadget with right syntactic evaluation, this trouble might get up most effective with inside the case of verbs like can which do now no longer display wide variety settlement. Another supply of syntactic ambiguity is in which complete phrases, commonly prepositional phrases, can connect to multiple function in a sentence. For example, with inside the following example, the prepositional word with a Postscript interface can connect both to the NP the phrase processor package, meaning “the phrase-processor that is outfitted or provided with a Postscript interface”, or to the verb connect, wherein case the experience is that the Interface is for use to make the connection.

(4) Connect the printer to a phrase processor package deal with a Postscript interface. Notice, however, that this situation isn't always surely ambiguous at all, expertise of what a Postscript interface is (in particular, the reality that it's miles a bit of software, now no longer a bit of hardware that would be used for creating a bodily connection among a printer to an workplace computer) serves to disambiguate. Similar issues stand up with (5), which could suggest that the printer and the phrase processor each want Postscript interfaces, or that only the phrase processor desires them.

(5) You would require a printer and a phrase processor with Postscript interfaces. This form of actual international expertise is likewise an critical issue in disambiguating the pronoun it in examples which include the following.

(6) Put the paper with inside the printer. Then transfer it on. In order to training session that it's miles the printer this is to be switched on, instead of the paper, one desires to apply the expertise of the arena that printers (and now no longer paper) are the type of issue one is probable to replace on. There are different instances in which actual international expertise, aleven though necessary, does now no longer appear to be sufficient. The following, in which humans are re-assembling a printer, appears to be such an example:

Lexical and Structural Mismatches

At the begin of the preceding phase we stated that, with inside the first-class of all feasible worlds for NLP, each phrase might have precisely one sense. While that is real for maximum NLP, it's miles an exaggeration as regards MT. It might be a higher world, however now no longer the first-class of all feasible worlds, due to the fact we'd nonetheless be confronted with tough translation troubles. Some of these troubles are to do with lexical variations among languages — variations with inside the ways wherein languages appear to categorize the world, what standards they select to explicit by unmarried phrases, and which they select now no longer to lexicalize. We will examine a number of these directly. Other troubles stand up due to the fact extraordinary languages use extraordinary systems for the identical purpose, and the identical shape for extraordinary purposes. In both case, the end result is that we need to complicate the interpretation process. In this phase we can examine some consultant examples. Examples just like the ones in (7) under are acquainted to translators, however the examples of colors (7c), and the Japanese examples in (7d) are in particular striking. The latter due to the fact they display how languages want vary now no longer best with admire to the fineness or ‘granularity’ of the differences they make, however additionally with admire to the idea for the distinction: English chooses extraordinary verbs for the action/occasion of placing on, and the action/nation of wearing. Japanese does now no longer make this distinction, however differentiates in line with the item that is worn. In the case of English to Japanese, a reasonably easy check at the semantics of the NPs that accompany a verb can be enough to determine at the proper translation. Some of the color examples are similar, however extra generally, research of color vocabulary shows that languages really carve up the spectrum in alternatively extraordinary ways, and that selecting the first-class translation might also additionally require know-how that is going nicely past what's in the text, and might also be undecidable. In this sense, the interpretation of color terminology starts offevolved to resemble the interpretation of phrases for cultural artifacts (e.g. phrases like English cottage, Russian dacha, French chateau ˆ , etc. for which no good enough translation exists, and for which the human translator need to determine among instantly borrowing, supplying an explanation). In this area, translation is a actually innovative act1 , which is properly past the ability of modern computers.

(7) a. know (V) savoir (a fact) connaˆıtre (a thing) b. leg (N) patte (of an animal) jambe (of a human) pied (of a table) c. brown (A) brun chatain ˆ (of hair) marron (of shoes/leather) d. wear/positioned on (V) kiku haku (shoes) kakeru (glasses) kaburu (hats) hameru (gloves, etc. i.e. on hands) haoru (coat) shimeru (scarves, etc. i.e. spherical the neck) Calling instances along with the ones above lexical mismatches isn't always controversial. However, when one turns to instances of structural mismatch, type isn't always so easy. This is because one might also additionally frequently assume that the cause one language makes use of one production, wherein some other makes use of some other is due to the inventory of lexical objects the 2 languages have. Thus, the difference is to a point a be counted of flavor and convenience. A especially apparent instance of this includes issues bobbing up from what are sometimes known as lexical holes — that is, instances wherein one language has to apply a word to express what some other language expresses in a unmarried word. Examples of this consist of the ‘hole’ that exists in English with recognize to French ignorer (‘to now no longer know’, ‘to be ignorant of’), and se suicider (‘to suicide’, i.e. ‘to devote suicide’, ‘to kill oneself’). The issues raised through such lexical holes have a sure similarity to the ones raised through idioms: in each instances, one has terms translating as unmarried words. We will consequently put off dialogue of these till Section 6.4. One type of structural mismatch takes place wherein languages use the identical production for one of a kind functions, or use one of a kind structures for what seems to be the identical purpose. Cases wherein the identical shape is used for one of a kind functions consist of using passive structures in English, and Japanese. In the instance below, the Japanese particle wa, which we've got glossed as ‘TOP’ right here marks the ‘topic’ of the sentence — intuitively, what the sentence is about.

(8) a. Satoo-san wa shyushoo ni erabaremashita.

Satoo-hon TOP Prime Minister in changed into-elected.

b. Mr. Satoh changed into elected Prime Minister.

Example (8) shows that Japanese has a passive-like production, i.e. a production wherein the PATIENT, that's commonly found out as an OBJECT, is found out as SUBJECT. It isn't the same as the English passive withinside the feel that during Japanese this production tends to have an additional adversive nuance which may make (8a) instead odd, because it shows an interpretation wherein Mr Satoh did now no longer need to be elected, or wherein election is somehow terrible for him. This isn't recommended via way of means of the English translation, of course. The translation trouble from Japanese to English is one of these that appears unsolvable for MT, though one may attempt to carry the supposed feel via way of means of including an adverb which include unfortunately. The translation trouble from English to Japanese is however with inside the scope of MT, in view that one should simply pick any other form. This is possible, in view that Japanese allows SUBJECTs to be neglected freely, to be able to say the equal of elected Mr Satoh, and therefore keep away from having to say an AGENT 2 . However, in general, the end result of that is that one can not have easy regulations like the ones defined in Chapter four for passives. In fact, unless one makes use of a completely summary shape indeed, the regulations could be instead complicated. We can see distinctive buildings used for the identical impact in instances just like the following:

(9) a. He is referred to as Sam. b. Er heißt Sam. ‘He is-named Sam’ c. Il s’appelle Sam. ‘He calls himself Sam’

(10) a. Sam has simply visible Kim.

       b. Sam vient de voir Kim.
          ‘Sam comes of see Kim’ 

(11) a. Sam loves to swim.

        b. Sam zwemt graag. 
            ‘Sam swims likingly’ 

The first instance indicates how English, German and French pick distinctive strategies for expressing ‘naming’. The different examples display one language the usage of an adverbial AD- JUNCT (simply, or graag(Dutch) ‘likingly’ or ‘with pleasure’), wherein any other makes use of a verbal production. This is truly one of the maximum mentioned issues in contemporary MT, and it's far really well worth analyzing why it's far problematic.These representations are quite summary (e.g. the data approximately aggravating and aspect conveyed via way of means of the auxiliary verb have has been expressed in a feature), however they may be still instead different. In particular, be aware that even as the principle verb of (10a) is see, the principle verb of (10b) is venir-de. Now be aware what's worried in writing guidelines which relate these structures (we are able to study the course English French).

1. The adverb simply need to be translated because the verb venir-de (possibly this isn't always the best manner to consider it — the factor is that the French shape need to include venir-de, and simply need to now no longer be translated in another manner).

2. Sam, the SUBJECT of see, need to turn out to be the SUBJECT of venir-de.

3. Some facts approximately tense, etc. need to be taken from the S node of which see is the HEAD, and placed on the S node of which venir-de is the HEAD. This is a complication, due to the fact usually one might assume such facts to move on the node of which the interpretation of see, voir, is the HEAD.

4. Other elements of the English sentence must pass into the corresponding elements of the sentence HEADed through voir. This is straightforward sufficient here, due to the fact in each instances Kim is an OBJECT, however it isn't always constantly the case that OBJECTs translate as OBJECTs, of course.

5. The hyperlink among the SUBJECT of venir-de and the SUBJECT of voir need to be established — however this could possibly be left to French synthesis.

(12) a. She bumped into the room.

b. Elle entra dans l. a. salle en courant. ‘She entered into the room in/whilst running’

The syntactic systems of those examples are very exclusive, and it's miles tough to look how one can obviously lessen them to comparable systems with out the usage of very summary representations indeed. A barely exclusive form of structural mismatch happens in which languages have ‘the equal’ construction (extra precisely, comparable structures, with equal interpretations), however in which exclusive regulations at the structures imply that it isn't always constantly viable to translate withinside the maximum apparent manner. The following is a extraordinarily easy instance of this.

(13) a. These are the letters which I even have already responded to.

b. *Ce sont les lettres lesquelles j’ai dej´ a` repondu ´ a.`

c. These are the letters to which I even have already responded.

d. Ce sont les lettres auxquelles j’ai dej´ a` repondu.

What this indicates is that English and French range in that English lets in prepositions to be ‘stranded’ (i.e. to seem with out their objects, like in 13a). French commonly requires the preposition and its item to seem together, as in (13d) — of course, English allows this too. This will make translating (13a) into French hard for lots kinds of system (in particular, for structures that try and manipulate with out pretty summary syntactic representations). However, the overall answer is reasonably clear — what one needs is to construct a shape in which (13a) is represented withinside the equal manner as (13c), because this could dispose of the translation problem. The maximum apparent illustration could possibly be some thing alongside the lines of (14a), or perhaps (14b).

(14) a. These are the letters I actually have already replied ! to which "#" b. These are the letters I actually have already replied ! to the letters "#" While in no way a whole approach to the remedy of relative clause structures, such an technique likely overcomes this specific translation trouble. There are different instances which pose worse problems, however. In general, relative clause structures in English encompass a head noun (letters in the preceding example), a relative pronoun (which includes which), and a sentence with a ‘gap’ in it. The relative pronoun (and therefore the top noun) is known as though it crammed the gap — this is the concept in the back of the representations in (14). In English, there are regulations on where the ‘gap’ can arise. In specific, it can not arise inner an oblique question, or a ‘reason’ ADJUNCT. Thus, (15b), and (15d) are each ungrammatical. However, those regulations aren't precisely paralleled in different languages. For example, Italian lets in the former, as in (15a), and Japanese the latter, as in (15c). These styles of trouble are past the scope of contemporary MT systems — in fact, they're tough even for human translators.

(15) a. Sinda node minna ga kanasinda hito wa yumei desita. ‘died as a result anyone SUBJ distressed-turned into guy TOP well-known turned into’

b. *The guy who anyone turned into distressed because (he) died turned into well-known. 

c. L’uomo che mi domando chi abbia visto fu arrestato. d. *The guy that I surprise who (he) has visible turned into arrested.

Multiword units: Idioms and Collocations Roughly speaking, idioms are expressions whose that means can not be absolutely under- stood from the meanings of the element parts. For example, while it's far viable to workout the that means of (16a) on the premise of expertise of English grammar and the that means of words, this will now no longer be enough to workout that (16b) can imply something like ‘If Sam dies, her youngsters can be rich’. This is due to the fact kick the bucket is an idiom.

(16) a. If Sam mends the bucket, her youngsters can be rich. b. If Sam kicks the bucket, her youngsters can be rich.

The trouble with idioms, in an MT context, is that it isn't commonly viable to translate them the usage of the everyday guidelines. There are exceptions, as an example take the bull through the horns (that means ‘face and address a problem with out shirking’) may be translated actually into French as prendre le taureau par les cornes, which has the identical that means. But, for the maximum part, using everyday guidelines with a view to translate idioms will bring about nonsense. Instead, one has to deal with idioms as unmarried gadgets in translation. In many cases, a herbal translation for an idiom can be a unmarried phrase — as an example, the French phrase mourir (‘die’) is a probable.

Conclusion

This bankruptcy seems at a few troubles which face the builder of MT systems. We charac- terized them as troubles of ambiguity (lexical and syntactic) and troubles of lexical and structural mismatches. We noticed how one of a kind styles of linguistic and non-linguistic knowl- part are essential to remedy troubles of ambiguity, and withinside the subsequent bankruptcy we examine in greater element a way to constitute this knowledge. In this bankruptcy we mentioned times of lexical and structural mismatches and the trouble of non-compositionality (as exempli- fied via way of means of idioms and collocations) and checked out a few techniques for handling them in MT systems.

Further reading

The problem of ambiguity is pervasive in NLP, and is discussed extensively in the introduc- tions to the subject such as those mentioned in the Further Reading section of Chapter 3. Examples of lexical and structural mismatches are discussed in (Hutchins and Somers, 1992, Chapter 6). Problems of the venir-de/have just sort are discussed extensively in the MT literature. A detailed discussion of the problem can be found in Arnold et al. (1988), and in Sadler (1993). On light verbs or support verbs, see Danlos and Samvelian (1992); Danlos (1992).

Treatments of idioms in MT are given in Arnold and Sadler (1989), and Schenk (1986). On collocations, see for example Allerton (1984), Benson et al. (1986a), Benson et al. (1986b) and Hanks and Church (1989). The notion of lexical functions is due to Mel’cuk, ˇ see for example Mel’cuk ˇ and Polguere (1987); Mel’cuk ˇ and Zholkovsky (1988). A classic discussion of translation problems is Vinay and Darbelnet (1977). This is concerned with translation problems as faced by humans, rather than machines, but it points out several of the problems mentioned here.

The discussion in this chapter touches on two issues of general linguistic and philosophical interest: to what extent human languages really do carve the world up differently, and whether there are some sentences in some languages which cannot be translated into other languages. As regards the first question, it seems as though there are some limits. For example, though languages carve the colour spectrum up rather differently, so there can be rather large differences between colour words in terms of their extensions, there seems to be a high level of agreement about ‘best instances’. That is, though the extension of English red, and Japanese akai is different, nevertheless, the colour which is regarded as the best instance of red by English speakers is the colour which is regarded as the best instance of akai by Japanese speakers. The seminal work on this topic is Berlin and Kay (1969), and see the title essay of Pullum (1991). The second question is sometimes referred to as the question of effability, see Katz (1978); Keenan (1978) for relevant discussion.

References

Kay (1969)

Katz (1978)

Keenan (1978)