S
Use of Paraphrasing to Improve Matching and Retrieval in Translation
Memory Rohit Gupta, University of Wolverhampton
Supervisors: Dr Constantin Orasan, University of Wolverhampton
Prof Josef van Genabith, Saarland University and DFKI Prof Ruslan Mitkov, University of Wolverhampton
Outline
S Objective
S Translation Memory S Incorporating Paraphrasing
S Human Evaluation
S Conclusion
Objective
S Improving matching and retrieval in Translation Memory with the help of advanced language technology. This is achieved by: S using paraphrases
S using semantic information
Limitations of current TMs
S Surface form comparison
S No or very limited linguistic information
Limitations of current TMs
S Surface form comparison
S No or very limited linguistic information
S Paraphrased segments either not retrieved or ranked incorrectly among the retrieved segments
Limitations of current TMs
S Fuzzy scores are really fuzzy S Input_1: the period laid down in article 4(3)
S Input_2: the responsible person defined in article 4(3)
S TM: the duration set forth in article 4(3)
57% fuzzy score as per word-based edit-distance for both input sentences
S
Paraphrasing in TM Matching and Retrieval
Paraphrases
S PPDB: The paraphrase database (Ganitkevitch et al., 2013)
S Phrasal and lexical paraphrases
S L size (2 million)
Concept behind paraphrases
Figure from Ganitkevitch et al., 2013
Trivial Approach
S Generate additional segments based on paraphrases available
Complexity of Trivial Approach
W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 W13 W14 W15 W16 W17 W18 W19 W20 W21 W22 W23 W24 W25
W1 W2 W3 W4 W5 | W6 W7 W8 W9 W10 |W11 W12 W13 W14 W15 |W16 W17 W18 W19 W20 | W21 W22 W23 W24 W25
5 5 5 5 5
Complexity of Trivial Approach
W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 W13 W14 W15 W16 W17 W18 W19 W20 W21 W22 W23 W24 W25
W1 W2 W3 W4 W5 | W6 W7 W8 W9 W10 |W11 W12 W13 W14 W15 |W16 W17 W18 W19 W20 | W21 W22 W23 W24 W25
5 5 5 5 5
(5+1)^5 -1= 7775 more segments
Complexity of Trivial Approach
W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 W13 W14 W15 W16 W17 W18 W19 W20 W21 W22 W23 W24 W25
W1 W2 W3 W4 W5 | W6 W7 W8 W9 W10 |W11 W12 W13 W14 W15 |W16 W17 W18 W19 W20 | W21 W22 W23 W24 W25
5 5 5 5 5
(5+1)^5 -1= 7775 more segments
Our Approach
1. Dynamic programming and Greedy approximation
2. Classification of paraphrases
3. Dealing different paraphrases in different manner
4. Filtering
Classification of Paraphrases: 4 Types
i. One word paraphrases
S “period” => “duration”
Classification of Paraphrases: 4 Types
i. One word paraphrases
S “period” => “duration”
ii. Multiple words but differing in one word
S “in the period” => “during the period”
Classification of Paraphrases: 4 Types
i. One word paraphrases
S “period” => “duration”
ii. Multiple words but differing in one word
S “in the period” => “during the period”
iii. Differing in multiple words but having same number of words
S “laid down in article” => “set forth in article”
Classification of Paraphrases: 4 Types
i. One word paraphrases S “period” => “duration”
ii. Multiple words but differing in one word S “in the period” => “during the period”
iii. Differing in multiple words but having same number of words S “laid down in article” => “set forth in article”
iv. Differing in multiple words with different number of words S “a reasonable period of time to” => “a reasonable period to”
Example
The period laid down in article 4(3) of decision 468…
Example
The period laid down in article 4(3) of decision 468 …
The period duration time
laid down in
article
4(3) of decision 468 …
Example
The period laid down in article 4(3) of decision 468 …
The period duration time
laid down referred to provided for
in in by
article article article
4(3) of decision 468 …
Example
The period laid down in article 4(3) of decision 468 …
The period duration time
laid down referred to provided for
in in by
article article article
4(3) of decision 468 …
Example
The period laid down in article 4(3) of decision 468 …
The period duration time
laid down referred to provided for
in by
article 2 3
4(3) of decision 468 …
Source length
General Edit-distance Implementation
Insertion cost = Deletion cost = Substitution cost =1
Edit-distance Calculation
0 1 2 3 4 5
TM Input
# the period
laid down in
0 #
1 the
2 period
3 referred
4 to
5 in
Edit-distance Calculation
0 1 2 3 4 5
TM Input
# the period duration time
laid down in
0 #
1 the
2 period
3 referred
4 to
5 in
Edit-distance Calculation
0 1 2 3 4 5 31 41 32 42 52
TM Input
# the period duration time
laid down in referred to provided for by
0 #
1 the
2 period
3 referred
4 to
5 in
Edit-distance Calculation
0 1 2 3 4 5 31 41 32 42 52
TM Input
# the period duration time
laid down in referred to provided for by
0 # 0
1 the 1
2 period 2
3 referred 3
4 to 4
5 in 5
Edit-distance Calculation
0 1 2 3 4 5 31 41 32 42 52
TM Input
# the period duration time
laid down in referred to provided for by
0 # 0 1
1 the 1 0
2 period 2 1
3 referred 3 2
4 to 4 3
5 in 5 4
Edit-distance Calculation
0 1 2 3 4 5 31 41 32 42 52
TM Input
# the period duration time
laid down in referred to provided for by
0 # 0 1 2
1 the 1 0 1
2 period 2 1 0
3 referred 3 2 1
4 to 4 3 2
5 in 5 4 3
Edit-distance Calculation
0 1 2 3 4 5 31 41 32 42 52
TM Input
# the period duration time
laid down in referred to provided for by
0 # 0 1 2 3
1 the 1 0 1 2
2 period 2 1 0 1
3 referred 3 2 1 1
4 to 4 3 2 2
5 in 5 4 3 3
Edit-distance Calculation
0 1 2 3 4 5 31 41 32 42 52
TM Input
# the period duration time
laid down in referred to provided for by
0 # 0 1 2 3 4 5
1 the 1 0 1 2 3 4
2 period 2 1 0 1 2 3
3 referred 3 2 1 1 2 3
4 to 4 3 2 2 2 3
5 in 5 4 3 3 3 2
Edit-distance Calculation
0 1 2 3 4 5 31 41 32 42 52
TM Input
# the period duration time
laid down in referred to provided for by
0 # 0 1 2 3 4 5 3 4 3 4 5
1 the 1 0 1 2 3 4 2 3 2 3 4
2 period 2 1 0 1 2 3 1 2 1 2 3
3 referred 3 2 1 1 2 3 0 1 1 2 3
4 to 4 3 2 2 2 3 1 0 2 2 3
5 in 5 4 3 3 3 2 2 1 3 3 3
Edit-distance Calculation
0 1 2 3 4 5 31 41 32 42 52
TM Input
# the period duration time
laid down in referred to provided for by
0 # 0 1 2 3 4 5 3 4 3 4 5
1 the 1 0 1 2 3 4 2 3 2 3 4
2 period 2 1 0 1 2 3 1 2 1 2 3
3 referred 3 2 1 1 2 3 0 1 1 2 3
4 to 4 3 2 2 2 3 1 0 2 2 3
5 in 5 4 3 3 3 2 2 1 3 3 3
Edit-distance Calculation
0 1 2 3 4 5 31 41 32 42 52
TM Input
# the period duration time
laid down in referred to provided for by
0 # 0 1 2 3 4 5 3 4 3 4 5
1 the 1 0 1 2 3 4 2 3 2 3 4
2 period 2 1 0 1 2 3 1 2 1 2 3
3 referred 3 2 1 1 2 3 0 1 1 2 3
4 to 4 3 2 2 2 3 1 0 2 2 3
5 in 5 4 3 3 3 2 2 1 3 3 3
Edit-distance Calculation
0 1 2 3 4 5 31 41 32 42 52
TM Input
# the period duration time
laid down in referred to provided for by
0 # 0 1 2 3 4 5 3 4 3 4 5
1 the 1 0 1 2 3 4 2 3 2 3 4
2 period 2 1 0 1 2 3 1 2 1 2 3
3 referred 3 2 1 1 2 3 0 1 1 2 3
4 to 4 3 2 2 2 3 1 0 2 2 3
5 in 5 4 3 3 3 2 2 1 3 3 3
Edit-distance Calculation
0 1 2 3 4 5 31 41 32 42 52
TM Input
# the period duration time
laid down in referred to provided for by
0 # 0 1 2 3 4 5 3 4 3 4 5
1 the 1 0 1 2 3 4 2 3 2 3 4
2 period 2 1 0 1 2 3 1 2 1 2 3
3 referred 3 2 1 1 2 3 0 1 1 2 3
4 to 4 3 2 2 2 3 1 0 2 2 3
5 in 5 4 3 3 3 2 2 1 3 3 3
Edit-distance Calculation
0 1 2 3 4 5 31 41 32 42 52
TM Input
# the period duration time
laid down in referred to provided for by
0 # 0 1 2 3 4 5 3 4 3 4 5
1 the 1 0 1 2 3 4 2 3 2 3 4
2 period 2 1 0 1 2 3 1 2 1 2 3
3 referred 3 2 1 1 2 3 0 1 1 2 3
4 to 4 3 2 2 2 3 1 0 2 2 3
5 in 5 4 3 3 3 2 2 1 3 3 3
Edit-distance Calculation
0 1 2 3 4 5 31 41 32 42 52
TM Input
# the period duration time
laid down in referred to provided for by
0 # 0 1 2 3 4 5 3 4 3 4 5
1 the 1 0 1 2 3 4 2 3 2 3 4
2 period 2 1 0 1 2 3 1 2 1 2 3
3 referred 3 2 1 1 2 3 0 1 1 2 3
4 to 4 3 2 2 2 3 1 0 2 2 3
5 in 5 4 3 3 3 2 2 1 3 3 3
Edit-distance Calculation
0 1 2 3 4 5 31 41 32 42 52 5
TM Input
# the period duration time
laid down in referred to provided for by in
0 # 0 1 2 3 4 5 3 4 3 4 5 5
1 the 1 0 1 2 3 4 2 3 2 3 4 4
2 period 2 1 0 1 2 3 1 2 1 2 3 3
3 referred 3 2 1 1 2 3 0 1 1 2 3 2
4 to 4 3 2 2 2 3 1 0 2 2 3 1
5 in 5 4 3 3 3 2 2 1 3 3 3 0
Computational Complexity
S Only type (i) and type (ii) paraphrases: S O(mnlog(p)) , p: paraphrases of types (i) and (ii)
Computational Complexity
S Only type (i) and type (ii) paraphrases: S O(mnlog(p)) , p: paraphrases of types (i) and (ii)
S All paraphrases: S O(lmn(log(p) + q)) , q: paraphrases of types (iii) and (iv),
l: length of paraphrase
Filtering
1. Filter out the segments based on length (39%)
Filtering
1. Filter out the segments based on length (39%)
2. Filter out the candidates based on baseline edit-distance similarity (39%)
Filtering
1. Filter out the segments based on length (39%)
2. Filter out the candidates based on baseline edit-distance similarity (39%)
3. Pick the top 100 segments
Filtering
1. Filter out the segments based on length (39%)
2. Filter out the candidates based on baseline edit-distance similarity (39%)
3. Pick the top 100 segments
4. Segments within a certain range of similarity with the most similar segment are selected for paraphrasing (35%)
Experiments
S Corpus Used: S Europarl V7.0
S English-German pairs
More results on DGT-TM (English-French) in:
Rohit Gupta and Constantin Orasan 2014. Incorporating Paraphrasing in Translation Memory Matching and Retrieval. In Proceeding of EAMT-2014, Dubrovnik Croatia.
Corpus statistics: Europarl
TM Test
Segments 1,565,194 9,981
Source words 37,824,634 240,916
Target words 36,267,909 230,620
Source average length 24.16 24.13
Target average length 23.17 23.10
Results: Europarl dataset
TH 100 95 90 85 80 75 70
Edit Retrieved 117 127 163 215 257 337 440
+Para Retrieved 16 16 22 33 49 79 102
% Improve 13.68 12.6 13.5 15.35 19.07 23.44 23.18
Results: Europarl dataset
TH 100 95 90 85 80 75 70
Edit Retrieved 117 127 163 215 257 337 440
+Para Retrieved 16 16 22 33 49 79 102
% Improve 13.68 12.6 13.5 15.35 19.07 23.44 23.18
Rank Change (RC) 9 19 16 25 36 65 97
Results: Europarl dataset
TH 100 95 90 85 80 75 70
Edit Retrieved 117 127 163 215 257 337 440
+Para Retrieved 16 16 22 33 49 79 102
% Improve 13.68 12.6 13.5 15.35 19.07 23.44 23.18
Rank Change (RC) 9 19 16 25 36 65 97
METEOR-Edit-RC 45.48 46.48 45.59 39.24 37.32 34.02 31.10
METEOR-Para-RC 68.08 67.03 61.09 50.07 44.16 38.35 33.19
BLEU-Edit-RC 31.88 32.37 27.70 21.71 19.32 14.98 12.25
BLEU-Para-RC 52.00 47.92 43.90 31.76 25.24 19.75 15.28
Results: Europarl dataset
TH 100 [85, 100) [70, 85)
Edit Retrieved 117 127 163
+Para Retrieved 16 30 98
% Improve 13.67 30.61 43.55
Rank Change (RC) 9 14 55
METEOR-Edit-RC 45.48 34.37 25.76
METEOR-Para-RC 68.08 40.00 25.82
BLEU-Edit-RC 31.88 13.18 6.85
BLEU-Para-RC 52.00 17.10 8.37
S
Human Evaluation
Dataset: Human Evaluation
TH 100 [85, 100) [70, 85) Total
Set1 2 6 6 14
Set2 5 4 7 16
Total 7 10 13 30
Evaluations
S Post-Editing time
S Keystrokes
S Subjective Evaluation 2 Options S A is better
S B is better
S Subjective Evaluation 3 Options, Added One more S Both are equal
Experimental Settings: Post-editing time and
Keystrokes
S Each file contains segments of both types (ED+PP)
S Each file is post-edited by 5 translation student
S German: Native
S English: C1
Screen: Editing…
Screen: Resting or Start
Results: Keystrokes
532.6 356.2
570.6
468.59
0
200
400
600
800
1000
1200
Edit-Distance Paraphrasing
Num
ber
of K
eyst
roke
s
Set2
Set1
25.23% less keystrokes
Results: Post-Editing Time
520.02 466.44
657.75 603.17
0
200
400
600
800
1000
1200
1400
Edit-Distance Paraphrasing
Pos
t-E
diti
ng T
ime
(Sec
onds
)
Set2
Set1
9.18% time saved
Results: Subjective Evaluation (Two Options, 17 Translators)
66
172 110
162
0 50
100 150 200 250 300 350 400
Edit-Distance is better Paraphrasing is better
Rep
lies
Set2
Set1
Results: Subjective Evaluation (Three Options, Seven Translators)
12 46 40 26
53 33
0
20
40
60
80
100
120
Edit-Distance is better
Paraphrasing is better
Both are equal
Rep
lies
Set2
Set1
H-TER and H-METEOR
Set1 Set2
Edit Distance Paraphrasing Edit Distance Paraphrasing
HMETEOR5 59.82 81.44 69.81 80.60
HTER5 39.72 17.63 27.81 18.71
HMETEOR10 59.82 81.44 69.81 80.61
HTER10 36.93 18.46 27.26 18.40
Segment-wise analysis
S Statistical significance testing per segment S Welch-t test (One tailed, p<0.05)
Segment-wise analysis
S Statistical significance testing per segment S Welch-t test (One tailed, p<0.05)
S Paraphrasing (Keystrokes/Post-Editing Time): S Twelve segments are significantly better
Segment-wise analysis
S Statistical significance testing per segment S Welch-t test (One tailed, p<0.05)
S Paraphrasing (Keystrokes/Post-Editing Time): S Twelve segments are significantly better
S For ten segments all other evaluations also shows them better
Segment-wise analysis
S Statistical significance testing per segment S Welch-t test (One tailed, p<0.05)
S Paraphrasing (Keystrokes/Post-Editing Time): S Twelve segments are significantly better S For ten segments all other evaluations also shows them better
S Edit-Distance (Keystrokes/Post-Editing Time): S Three segments are significantly better S Not all evaluations shows them better
Conclusion
S Presented approach to include paraphrasing and machine and retrieval
S Presented human evaluations
S In future, we will use deep learning for TM matching and retrieval
Related Publications
S Rohit Gupta and Constantin Orasan. 2014. Incorporating Paraphrasing in Translation Memory Matching and Retrieval. In Proceeding of EAMT-2014, Dubrovnik Croatia.
S Rohit Gupta, Constantin Orasan, Marcos Zampieri, Mihaela Vela and Josef van Genabith. 2015. Can Transfer Memories afford not to use paraphrasing? In Proceeding of EAMT-2015, Antalya Turkey.
S Rohit Gupta, Hanna Bechara, Ismail El Maarouf, and Constantin Orasan. 2014a. UoW: NLP techniques developed at the University of Wolverhampton for Semantic Similarity and Textual Entailment. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval-2014), COLING-2014 Dublin Ireland.
S Rohit Gupta, Hanna Bechara, and Constantin Orasan. 2014b. Intelligent Translation Memory Matching and Retrieval Metric Exploiting Linguistic Technology. In Proceedings of the thirty sixth Conference on Translating and Computer, London, UK.
References
S Jane Bradbury and Ismaıl El Maarouf. 2013. An empirical classification of verbs based on Semantic Types: the case of the ’poison’ verbs. In Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora, pages 70–74.
S Juri Ganitkevitch, Van Durme Benjamin, and Chris Callison-Burch. 2013. Ppdb: The paraphrase database. In Proceedings of NAACL-HLT, pages 758–764, Atlanta, Georgia. Association for Computational Linguistics.
S Marco Marelli, Luisa Bentivogli, Marco Baroni, Raffaella Bernardi, Stefano Menini, and Roberto Zamparelli. 2014a. Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval-2014).
S Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, and Roberto Zamparelli. 2014b. A sick cure for the evaluation of compositional distributional semantic models. In Proceedings of LREC 2014.
S Steinberger, Ralf, Andreas Eisele, Szymon Klocek, Spyridon Pilos, and Patrick Schluter. 2012. DGT- TM: A freely available Translation Memory in 22 languages. LREC, pages 454–459.
Thank you!