Human evaluation usage (for project managers)
To initiate a human evaluation in MT Companion, the project manager can use the New Human Evaluation Job button in the Human Evaluation section of MT Companion.
For all Job Types, the project manager must:
be assigned Owner or Maintainer role in a Workspace.
know which evaluation job type will be used. See more information about Job Types.
provide a meaningful job name. This name is for internal use only and is not exposed to the evaluator(s).
know the language pairs to be evaluated.
know which scoring system will be used for the evaluation, and that scoring system must be available in MT Companion
have data prepared in a specific format based on the human evaluation job type – one document per language pair. Also, the file must be named according to the HEVAL Input File Naming Convention.
know how many evaluators are required for the job.
know which resources (evaluators) should be assigned to the job for each language.
know the project end date.
Following sub-chapters describes how to create job for particular type and explains required fields as well as the input file format:
HEVAL Language pairs
Each HEVAL job must have defined at least one language pair, on which the evaluation is performed. Once the job has been created, language pair can be added with button next to ‘Language Pairs’ title:
When clicked, the modal dialog appears with ability to select source and target language:
HEVAL Job assignment workflow
Once a job has been created, the languages added, and files for evaluation uploaded, the project manager must invite evaluators to perform the evaluations. The invitation workflow works the same for all job types.
Select Evaluators.
Use the dropdown to select an existing user. If the selected user is an internal (RWS) user who has not yet logged on to MT Companion to perform work in the current workspace, they will receive an email notifying them that they have been added to the workspace in MT Companion. If the evaluator is not an existing user, you can:
Invite and assign a new user. New users will receive an invitation to register a new account in MT Companion. The new user will be added to the MT Companion user database only once they register their new account.
Note
List of users contains following user types:
Each invited Evaluator will receive an email from MT Companion inviting them to perform a human evaluation job. The email will contain summary information about the job and will offer the invitee the possibility to Accept or Reject the assignment. Until the invitee responds, the job Status will remain PENDING.
If the user rejects the assignment, the job Status will be updated with REJECTED, and the project manager can delete the invitee and add a new Evaluator, as needed.
If the user accepts the assignment, the job Status will be updated with ACCEPTED, and the evaluator can proceed.
At any point during an active project, any user in the Workspace with the role of Maintainer or Owner can:
Cancel the job. When the job is canceled, the evaluator can be removed from the list and whatever work has been completed can be downloaded in a report.
Preview the evaluation job. The preview shows the job as it will be seen by the evaluator, but the evaluation features remain inactive.
View an evaluation. Individual evaluation results can be viewed in MT Companion for both completed and partially complete jobs.
Assign a new evaluator. Regardless of the number of minimum evaluators defined during set up, new evaluators can be added as needed.
Add a new language pair.
HEVAL output reports
Human evaluation job results can be downloaded for individual evaluators or for the whole language pair at any time by selecting the download icon .
The report is based on templates defined under Workspace Human Evaluation Report templates. In addition to the default empty template, users may create and upload their own template(s) for more efficient results processing.
In case there are multiple templates available, the following dialog is displayed allowing the user to select their preferred report template.
The results file name follows the convention:
{{Job name}}_{{srcLangCode}}_{{targetLangCode}}_report.xlsx
If the job is incomplete, the results file name will be appended with “_partial”.
Note
Job Feedback
Each report consists of tab(s) containing output data and Job Feedback tab, where you can find a general feedback, provided by a linguist, after job is done.
QA output report details
COLUMN NAME |
DESCRIPTION |
---|---|
Doc. Pair # |
Segment identification. |
Word Count |
Number of words. |
Character Count |
Number of characters. |
Line Count |
Number of lines of segment. |
Prescored |
Not used in current implementation. |
FSA_Scale Used |
Scoring system ID. |
FirstStepSourceVisible |
‘NO’ in case the option ‘Hide Source Panel’ for given job was selected. Otherwise ‘YES’. |
FSA Path-{{Linguist_Name}} |
Not used in current implementation. |
Automatic Score-{{Linguist_Name}} |
Selected score value. |
Adjusted Score-{{Linguist_Name}} |
Selected score value. |
Bad Source-{{Linguist_Name}} |
Bad source checkbox was selected. |
Not Translated-{{Linguist_Name}} |
Not Translated checkbox was selected. |
Wrong Language-{{Linguist_Name}} |
Wrong Language checkbox was selected. |
FSA Time-{{Linguist_Name}} |
Not used in current implementation. |
Judgement duration-{{Linguist_Name}} |
Not used in current implementation. |
Validation duration-{{Linguist_Name}} |
Not used in current implementation. |
Total Time Taken-{{Linguist_Name}} |
Not used in current implementation. |
Feedback-{{Linguist_Name}} |
Segment-level feedback filled-in by linguist. |
PA output report details
Productivity Assessment report creates two tabs, containing segment information on first, and action-level information on the second one.
COLUMN NAME |
DESCRIPTION |
---|---|
serial |
Segment position in the data set (generated incrementally, same as id). |
id |
Segment ID in the data set (generated incrementally). |
Source |
Source text. |
MT |
Machine-translated text (from the original data set). |
PE |
Post-edited (or human translated) text. |
Name |
Linguist name. |
EngineID |
Not used in current implementation. |
MTOK |
Value set to 1, when Translation is fine is selected during evaluation (applied only to post-edited part of data set). |
WordCount |
Number of source words. |
Distance |
Calculated Levenshtein distance (from MT to post-edited text). |
Sec |
Total time, spent on the segment (in seconds). |
TypingTime |
Typing time (in seconds). |
EntryCount |
0 |
Switched |
0 |
ContinueLater |
Value set to 1, when linguist exited evaluation screen with button ‘Continue later’. |
Reconstructor |
Each segment is reconstructed, based on actions performed (see Import2 tab). When reconstruction failed, this value is set to 0, otherwise (reconstruction succeeded) is 1. |
Created |
Timestamp of the evaluation (segment completed time). |
TimeTotal |
Total time spent on editing (in milliseconds). |
TimeInit |
Initialization time (in milliseconds); Time between segment is opened and first edit. |
TimeEdit |
Edit time (in milliseconds); Time of providing text edits (where pause between two changes is max. 1 second). |
TimeFinal |
Finalization time (in milliseconds); Time between last change and go to next segment (Save and Next button clicked). |
TimePauseShort |
Short pauses time (in milliseconds); Sum of pauses where pause between two changes is 2-6 seconds). |
TimePauseMedium |
Short pauses time (in milliseconds); Sum of pauses where pause between two changes is 7-60 seconds). |
TimePauseLong |
Short pauses time (in milliseconds); Sum of pauses where pause between two changes is longer than 60 seconds). |
TimeOffFocus |
Off-focus time (in milliseconds); Time spent outside the editation box. |
Feedback |
Segment-level feedback filled-in by linguist. |
COLUMN NAME |
DESCRIPTION |
---|---|
id |
Action ID (generated incrementally). |
segment |
Segment ID in the data set. |
NewValue |
New text value of the change. |
OldValue |
Old text value of the change (which has been changed by the new text value). |
Action |
Code of the action. |
ActionLength |
Time (in milliseconds) spent on the action. |
Name |
Linguist name. |
ActionName |
Name of the action. |
CA output report details
COLUMN NAME |
DESCRIPTION |
---|---|
Sentence # |
Segment identification (generated incrementally). |
Source Sentence |
Source text. |
Translation System 1 |
Translation from the system 1. |
Translation System 2 |
Translation from the system 2. |
System 1 Engine |
Name of the system 1. |
System 2 Engine |
Name of the system 2. |
Scale used |
Scoring system ID. |
System 1 Score-{{Linguist_Name}} |
Score for Translation 1. |
System 2 Score-{{Linguist_Name}} |
Score for Translation 2. |
Which is Better?-{{Linguist_Name}} |
Which system is better (values: 1, 2 or 0 in case both are of the same quality). |
Delta (S1 - S2)-{{Linguist_Name}} |
Difference of the score (in case of numeric scoring values). |
Judgement Duration (seconds)-{{Linguist_Name}} |
Not used in current implementation. |
Total Duration (seconds)-{{Linguist_Name}} |
Not used in current implementation. |
Bad Source-{{Linguist_Name}} |
Bad source checkbox was selected. |
Not Translated 1-{{Linguist_Name}} |
Not Translated checkbox was selected for Translation 1. |
Not Translated 2-{{Linguist_Name}} |
Not Translated checkbox was selected for Translation 2. |
Wrong Language 1-{{Linguist_Name}} |
Wrong Language checkbox was selected for Translation 1. |
Wrong Language 2-{{Linguist_Name}} |
Wrong Language checkbox was selected for Translation 2. |
Source Feedback-{{Linguist_Name}} |
Linguist feedback for source filled-in by linguist. |
Translation 1 Feedback-{{Linguist_Name}} |
Linguist feedback for Translation 1 filled-in by linguist. |
Translation 2 Feedback-{{Linguist_Name}} |
Linguist feedback for Translation 2 filled-in by linguist. |