Human evaluation usage (for project managers)

To initiate a human evaluation in MT Companion, the project manager can use the New Human Evaluation Job button in the Human Evaluation section of MT Companion.

Human Evaluation Job list

For all Job Types, the project manager must:

  • be assigned Owner or Maintainer role in a Workspace.

  • know which evaluation job type will be used. See more information about Job Types.

  • provide a meaningful job name. This name is for internal use only and is not exposed to the evaluator(s).

  • know the language pairs to be evaluated.

  • know which scoring system will be used for the evaluation, and that scoring system must be available in MT Companion

  • have data prepared in a specific format based on the human evaluation job type – one document per language pair. Also, the file must be named according to the HEVAL Input File Naming Convention.

  • know how many evaluators are required for the job.

  • know which resources (evaluators) should be assigned to the job for each language.

  • know the project end date.

Following sub-chapters describes how to create job for particular type and explains required fields as well as the input file format:

HEVAL Language pairs

Each HEVAL job must have defined at least one language pair, on which the evaluation is performed. Once the job has been created, language pair can be added with addicon button next to ‘Language Pairs’ title:

New Language Pair Button

When clicked, the modal dialog appears with ability to select source and target language:

New Language Pair Dialog Box


HEVAL Job assignment workflow

Once a job has been created, the languages added, and files for evaluation uploaded, the project manager must invite evaluators to perform the evaluations. The invitation workflow works the same for all job types.

Job Assignment
  1. Select Evaluators.

    • Use the dropdown to select an existing user. If the selected user is an internal (RWS) user who has not yet logged on to MT Companion to perform work in the current workspace, they will receive an email notifying them that they have been added to the workspace in MT Companion. If the evaluator is not an existing user, you can:

    • Invite and assign a new user. New users will receive an invitation to register a new account in MT Companion. The new user will be added to the MT Companion user database only once they register their new account.

Note

List of users contains following user types:

  • registered RWS users, already registered in MT Companion

  • azure Not yet registered RWS users (records from RWS Azure tenant)

  • associate Associate users (registered in MT Companion)

  1. Each invited Evaluator will receive an email from MT Companion inviting them to perform a human evaluation job. The email will contain summary information about the job and will offer the invitee the possibility to Accept or Reject the assignment. Until the invitee responds, the job Status will remain PENDING.

    • If the user rejects the assignment, the job Status will be updated with REJECTED, and the project manager can delete the invitee and add a new Evaluator, as needed.

    • If the user accepts the assignment, the job Status will be updated with ACCEPTED, and the evaluator can proceed.

  2. At any point during an active project, any user in the Workspace with the role of Maintainer or Owner can:

    • Cancel the job. When the job is canceled, the evaluator can be removed from the list and whatever work has been completed can be downloaded in a report.

    • Preview the evaluation job. The preview shows the job as it will be seen by the evaluator, but the evaluation features remain inactive.

    • View an evaluation. Individual evaluation results can be viewed in MT Companion for both completed and partially complete jobs.

    • Assign a new evaluator. Regardless of the number of minimum evaluators defined during set up, new evaluators can be added as needed.

    • Add a new language pair.


HEVAL output reports

Human evaluation job results can be downloaded for individual evaluators or for the whole language pair at any time by selecting the download icon repdownload .

The report is based on templates defined under Workspace Human Evaluation Report templates. In addition to the default empty template, users may create and upload their own template(s) for more efficient results processing.

In case there are multiple templates available, the following dialog is displayed allowing the user to select their preferred report template.

../_images/ltgear-job-heval-download-report.png

The results file name follows the convention:

{{Job name}}_{{srcLangCode}}_{{targetLangCode}}_report.xlsx

If the job is incomplete, the results file name will be appended with “_partial”.

Note

Job Feedback

Each report consists of tab(s) containing output data and Job Feedback tab, where you can find a general feedback, provided by a linguist, after job is done.


QA output report details

Quality Assessment report details

COLUMN NAME

DESCRIPTION

Doc. Pair #

Segment identification.

Word Count

Number of words.

Character Count

Number of characters.

Line Count

Number of lines of segment.

Prescored

Not used in current implementation.

FSA_Scale Used

Scoring system ID.

FirstStepSourceVisible

‘NO’ in case the option ‘Hide Source Panel’ for given job was selected. Otherwise ‘YES’.

FSA Path-{{Linguist_Name}}

Not used in current implementation.

Automatic Score-{{Linguist_Name}}

Selected score value.

Adjusted Score-{{Linguist_Name}}

Selected score value.

Bad Source-{{Linguist_Name}}

Bad source checkbox was selected.

Not Translated-{{Linguist_Name}}

Not Translated checkbox was selected.

Wrong Language-{{Linguist_Name}}

Wrong Language checkbox was selected.

FSA Time-{{Linguist_Name}}

Not used in current implementation.

Judgement duration-{{Linguist_Name}}

Not used in current implementation.

Validation duration-{{Linguist_Name}}

Not used in current implementation.

Total Time Taken-{{Linguist_Name}}

Not used in current implementation.

Feedback-{{Linguist_Name}}

Segment-level feedback filled-in by linguist.


PA output report details

Productivity Assessment report creates two tabs, containing segment information on first, and action-level information on the second one.

Productivity Assessment report details (Import1 tab, Segments)

COLUMN NAME

DESCRIPTION

serial

Segment position in the data set (generated incrementally, same as id).

id

Segment ID in the data set (generated incrementally).

Source

Source text.

MT

Machine-translated text (from the original data set).

PE

Post-edited (or human translated) text.

Name

Linguist name.

EngineID

Not used in current implementation.

MTOK

Value set to 1, when Translation is fine is selected during evaluation (applied only to post-edited part of data set).

WordCount

Number of source words.

Distance

Calculated Levenshtein distance (from MT to post-edited text).

Sec

Total time, spent on the segment (in seconds).

TypingTime

Typing time (in seconds).

EntryCount

0

Switched

0

ContinueLater

Value set to 1, when linguist exited evaluation screen with button ‘Continue later’.

Reconstructor

Each segment is reconstructed, based on actions performed (see Import2 tab). When reconstruction failed, this value is set to 0, otherwise (reconstruction succeeded) is 1.

Created

Timestamp of the evaluation (segment completed time).

TimeTotal

Total time spent on editing (in milliseconds).

TimeInit

Initialization time (in milliseconds); Time between segment is opened and first edit.

TimeEdit

Edit time (in milliseconds); Time of providing text edits (where pause between two changes is max. 1 second).

TimeFinal

Finalization time (in milliseconds); Time between last change and go to next segment (Save and Next button clicked).

TimePauseShort

Short pauses time (in milliseconds); Sum of pauses where pause between two changes is 2-6 seconds).

TimePauseMedium

Short pauses time (in milliseconds); Sum of pauses where pause between two changes is 7-60 seconds).

TimePauseLong

Short pauses time (in milliseconds); Sum of pauses where pause between two changes is longer than 60 seconds).

TimeOffFocus

Off-focus time (in milliseconds); Time spent outside the editation box.

Feedback

Segment-level feedback filled-in by linguist.

Productivity Assessment report details (Import2 tab, Actions)

COLUMN NAME

DESCRIPTION

id

Action ID (generated incrementally).

segment

Segment ID in the data set.

NewValue

New text value of the change.

OldValue

Old text value of the change (which has been changed by the new text value).

Action

Code of the action.

ActionLength

Time (in milliseconds) spent on the action.

Name

Linguist name.

ActionName

Name of the action.


CA output report details

Comparative Assessment report details

COLUMN NAME

DESCRIPTION

Sentence #

Segment identification (generated incrementally).

Source Sentence

Source text.

Translation System 1

Translation from the system 1.

Translation System 2

Translation from the system 2.

System 1 Engine

Name of the system 1.

System 2 Engine

Name of the system 2.

Scale used

Scoring system ID.

System 1 Score-{{Linguist_Name}}

Score for Translation 1.

System 2 Score-{{Linguist_Name}}

Score for Translation 2.

Which is Better?-{{Linguist_Name}}

Which system is better (values: 1, 2 or 0 in case both are of the same quality).

Delta (S1 - S2)-{{Linguist_Name}}

Difference of the score (in case of numeric scoring values).

Judgement Duration (seconds)-{{Linguist_Name}}

Not used in current implementation.

Total Duration (seconds)-{{Linguist_Name}}

Not used in current implementation.

Bad Source-{{Linguist_Name}}

Bad source checkbox was selected.

Not Translated 1-{{Linguist_Name}}

Not Translated checkbox was selected for Translation 1.

Not Translated 2-{{Linguist_Name}}

Not Translated checkbox was selected for Translation 2.

Wrong Language 1-{{Linguist_Name}}

Wrong Language checkbox was selected for Translation 1.

Wrong Language 2-{{Linguist_Name}}

Wrong Language checkbox was selected for Translation 2.

Source Feedback-{{Linguist_Name}}

Linguist feedback for source filled-in by linguist.

Translation 1 Feedback-{{Linguist_Name}}

Linguist feedback for Translation 1 filled-in by linguist.

Translation 2 Feedback-{{Linguist_Name}}

Linguist feedback for Translation 2 filled-in by linguist.