Extracting Parallel Phrases from English-Punjabi Corpora

Author: Manpreet Singh Lehal
Publisher: LAP Lambert Academic Publishing
ISBN:

9786208225414

Pages: 204
Publication Date: 25 October 2024
Format: Paperback
Availability: Available To Order

We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately.

Our Price $224.40 Quantity:

Share |

Extracting Parallel Phrases from English-Punjabi Corpora

Overview

This study presents a novel approach to extract parallel data from a comparable English-Punjabi corpus, addressing the scarcity of parallel corpora for this language pair. Unlike previous research, this approach focuses on creating high-precision parallel data using minimal resources. The data is sourced from diverse domains, including Wikipedia articles, TDIL's noisy parallel sentences, and Gyan Nidhi reports. The methodology consists of three phases: extracting and aligning documents, translating Punjabi texts into English using OpenNMT-py, and calculating content similarity through three measures-Euclidean Distance, Cosine, and Jaccard. These algorithms are run individually, and then their results are integrated to improve accuracy. By combining the scores of all three measures, the system achieves a precision of 93% and an accuracy of 86%. This integrated approach significantly enhances parallel data extraction for English-Punjabi corpora and holds potential for improving Statistical Machine Translation (SMT) models.

Full Product Details

Author: Manpreet Singh Lehal
Publisher: LAP Lambert Academic Publishing
Imprint: LAP Lambert Academic Publishing
Dimensions: Width: 15.20cm , Height: 1.20cm , Length: 22.90cm
Weight: 0.304kg
ISBN:

9786208225414

ISBN 10: 6208225418
Pages: 204
Publication Date: 25 October 2024
Audience: General/trade , General
Format: Paperback
Publisher's Status: Active
Availability: Available To Order

We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately.

Reviews

Author Information

Tab Content 6

Author Website:

Countries Available

All regions

Latest Reading Guide

Shopping Cart

Your cart is empty

Mailing List

Extracting Parallel Phrases from English-Punjabi Corpora

9786208225414

Availability Information

Overview

Full Product Details

9786208225414

Table of Contents

Reviews

Author Information

Tab Content 6

Countries Available

Sign up now