Human evaluation usage (for project managers)
To initiate a human evaluation in MT Companion, the project manager can use the New Human Evaluation Job button in the Human Evaluation section of MT Companion.
For all Job Types, the project manager must:
be assigned Owner or Maintainer role in a Workspace.
know which evaluation job type will be used. See more information about Job Types.
provide a meaningful job name. This name is for internal use only and is not exposed to the evaluator(s).
know the language pairs to be evaluated.
know which scoring system will be used for the evaluation, and that scoring system must be available in MT Companion
have data prepared in a specific format based on the human evaluation job type – one document per language pair. Also, the file must be named according to the HEVAL Input File Naming Convention.
know how many evaluators are required for the job.
know which resources (evaluators) should be assigned to the job for each language.
know the project end date.
Following sub-chapters describes how to create job for particular type and explains required fields as well as the input file format:
HEVAL Language pairs
Each HEVAL job must have defined at least one language pair, on which the evaluation is performed. Once the job has been created, language pair can be added with button next to ‘Language Pairs’ title:
When clicked, the modal dialog appears with ability to select source and target language:
HEVAL Job assignment workflow
Once a job has been created, the languages added, and files for evaluation uploaded, the project manager must invite evaluators to perform the evaluations. The invitation workflow works the same for all job types.
Select Evaluators.
Use the dropdown to select an existing user. If the selected user is an internal (RWS) user who has not yet logged on to MT Companion to perform work in the current workspace, they will receive an email notifying them that they have been added to the workspace in MT Companion. If the evaluator is not an existing user, you can:
Invite and assign a new user. New users will receive an invitation to register a new account in MT Companion. The new user will be added to the MT Companion user database only once they register their new account.
Note
List of users contains following user types:
Each invited Evaluator will receive an email from MT Companion inviting them to perform a human evaluation job. The email will contain summary information about the job and will offer the invitee the possibility to Accept or Reject the assignment. Until the invitee responds, the job Status will remain PENDING.
If the user rejects the assignment, the job Status will be updated with REJECTED, and the project manager can delete the invitee and add a new Evaluator, as needed.
If the user accepts the assignment, the job Status will be updated with ACCEPTED, and the evaluator can proceed.
At any point during an active project, any user in the Workspace with the role of Maintainer or Owner can:
Cancel the job. When the job is canceled, the evaluator can be removed from the list and whatever work has been completed can be downloaded in a report.
Preview the evaluation job. The preview shows the job as it will be seen by the evaluator, but the evaluation features remain inactive.
View an evaluation. Individual evaluation results can be viewed in MT Companion for both completed and partially complete jobs.
Assign a new evaluator. Regardless of the number of minimum evaluators defined during set up, new evaluators can be added as needed.
Add a new language pair.
HEVAL Job review
PMs can analyze results of evaluated content (using downloaded Excel reports) and decide if there are issues that must be fixed by the linguist. In this case, PMs can share the issues with the linguist and switch the status to In-Review using the button (‘Unlock for review’). The downloaded Excel reports contain deep links to each of the evaluated segments, so if needed, the PM can send an extract of the Excel report including the URLs of the segments that need to be fixed back to the linguist.
When the linguist is done with the review, they will mark the task as complete by clicking the button (‘Set as reviewed’). Once clicked, the task’s status is changed to Reviewed.
Note
There might be several rounds of review, where the PM can change the status directly from Reviewed to In-review using the button.
HEVAL Status overview
Here is the list of task statuses and their descriptions that you may see during the process of human evaluation. The task is an evaluation job for defined language pair, assigned to a linguist.
STATUS |
DESCRIPTION |
---|---|
NEW (PENDING) |
Task has been created (Job is created with appropriate language pair and a linguist has been assigned to it). The Linguist receives a notification e-mail where the task can be either Accepted or Rejected. |
REJECTED |
Rejected task is not available for the linguist anymore, PM can still see it on the job list page with ability to remove it. |
ACCEPTED |
When the task is accepted, the evaluation can be executed by assigned linguist only. At this point, linguists can’t update already evaluated segments, while PMs can update them any time (except the task is in the final Validated status). |
CANCELLED |
PM can cancel tasks at any time. In such case, task disappears to assigned linguist and can’t be evaluated anymore. |
COMPLETED |
When all segments in the task are evaluated, its status is automatically changed to Completed. Based on evaluated content analysis (see HEVAL Job review) PMs can switch to In-Review status using button (‘Unlock for review’)
|
IN-REVIEW |
In-review status enables updating of all segments in the task. Updates can be made only by the assigned linguist and performed as described and sent by PM (see HEVAL Job review). |
REVIEWED |
Reviewed status informs the PM that the task’s review is done and another round of content analysis can be performed. The PM decides (similarly to the Completed status) whether the task can be set to Validated, or another round of review is necessary. |
VALIDATED |
Final status, where further (mainly automated) processing of evaluation data is enabled. Note that the Validated status will disable the ability to review the task, but it can be un-validated (with the same button) at any time. |
The following chart outlines the status flow during the human evaluation process:
HEVAL output reports
Human evaluation job results can be downloaded for individual evaluators or for the whole language pair at any time by selecting the download icon .
The report is based on templates defined under Workspace Human Evaluation Report templates. In addition to the default empty template, users may create and upload their own template(s) for more efficient results processing.
In case there are multiple templates available, the following dialog is displayed allowing the user to select their preferred report template.
The results file name follows the convention:
{{Job name}}_{{srcLangCode}}_{{targetLangCode}}_report.xlsx
If the job is incomplete, the results file name will be appended with “_partial”.
Note
Job Feedback
Each report consists of tab(s) containing output data and Job Feedback tab, where you can find a general feedback, provided by a linguist, after job is done.
QA output report details
COLUMN NAME |
DESCRIPTION |
---|---|
Doc. Pair # |
Segment identification. |
Segment ID |
Segment ID per the original database. |
Source |
Source segment. |
MT Output |
MT output for the source. |
System Engine |
Name of the MT system. |
Domain |
Domain or client name, as applicable. |
Additional information |
Optional additional info, e.g. a content flow. |
Metadata |
Propagated metadata from input CSV (Optional). |
Word Count |
Number of words. |
Character Count |
Number of characters. |
Line Count |
Number of lines of segment. |
Prescored |
Not used in current implementation. |
FSA_Scale Used |
Scoring system ID. |
FirstStepSourceVisible |
‘NO’ in case the option ‘Hide Source Panel’ for given job was selected. Otherwise ‘YES’. |
FSA Path-{{Linguist_Name}} |
Not used in current implementation. |
Automatic Score-{{Linguist_Name}} |
Selected score value. |
Adjusted Score-{{Linguist_Name}} |
Selected score value. |
Bad Source-{{Linguist_Name}} |
Bad source checkbox was selected. |
Offensive Source-{{Linguist_Name}} |
Offensive source checkbox was selected. |
Specialist Source-{{Linguist_Name}} |
Specialist source checkbox was selected. |
Not Translated-{{Linguist_Name}} |
Not Translated checkbox was selected. |
Wrong Language-{{Linguist_Name}} |
Wrong Language checkbox was selected. |
FSA Time-{{Linguist_Name}} |
Not used in current implementation. |
Judgement duration-{{Linguist_Name}} |
Not used in current implementation. |
Validation duration-{{Linguist_Name}} |
Not used in current implementation. |
Total Time Taken-{{Linguist_Name}} |
Not used in current implementation. |
Feedback-{{Linguist_Name}} |
Segment-level feedback filled-in by linguist. |
Update Count-{{Linguist_Name}} |
Number of edits on given segment. |
Last Updated-{{Linguist_Name}} |
Date/time of the last edit. |
Last Updated By-{{Linguist_Name}} |
Name of user who made a last edit. |
PA output report details
Productivity Assessment report creates two tabs, containing segment information on first, and action-level information on the second one.
COLUMN NAME |
DESCRIPTION |
---|---|
serial |
Segment position in the data set (generated incrementally, same as id). |
id |
Segment ID in the data set (generated incrementally). |
Segment ID |
Segment ID per the original database. |
Source |
Source text. |
MT |
Machine-translated text (from the original data set). |
PE |
Post-edited (or human translated) text. |
Domain |
Domain or client name, as applicable. |
Additional information |
Optional additional info, e.g. a content flow. |
Metadata |
Propagated metadata from input CSV (Optional). |
System Engine |
Name of the system used to translate this segment. |
Name |
Linguist name. |
MTOK |
Value set to 1, when Translation is fine is selected during evaluation (applied only to post-edited part of data set). |
WordCount |
Number of source words. |
Distance |
Calculated Levenshtein distance (from MT to post-edited text). |
Sec |
Total time, spent on the segment (in seconds). |
TypingTime |
Typing time (in seconds). |
EntryCount |
0 |
Switched |
0 |
ContinueLater |
Value set to 1, when linguist exited evaluation screen with button ‘Continue later’. |
Reconstructor |
Each segment is reconstructed, based on actions performed (see Import2 tab). When reconstruction failed, this value is set to 0, otherwise (reconstruction succeeded) is 1. |
Created |
Timestamp of the evaluation (segment completed time). |
TimeTotal |
Total time spent on editing (in milliseconds). |
TimeInit |
Initialization time (in milliseconds); Time between segment is opened and first edit. |
TimeEdit |
Edit time (in milliseconds); Time of providing text edits (where pause between two changes is max. 1 second). |
TimeFinal |
Finalization time (in milliseconds); Time between last change and go to next segment (Save and Next button clicked). |
TimePauseShort |
Short pauses time (in milliseconds); Sum of pauses where pause between two changes is 2-6 seconds). |
TimePauseMedium |
Short pauses time (in milliseconds); Sum of pauses where pause between two changes is 7-60 seconds). |
TimePauseLong |
Short pauses time (in milliseconds); Sum of pauses where pause between two changes is longer than 60 seconds). |
TimeOffFocus |
Off-focus time (in milliseconds); Time spent outside the editation box. |
Feedback |
Segment-level feedback filled-in by linguist. |
Update Count |
Number of edits on given segment. |
Last Updated |
Date/time of the last edit. |
Last Updated By |
Name of user who made a last edit. |
COLUMN NAME |
DESCRIPTION |
---|---|
id |
Action ID (generated incrementally). |
segment |
Segment ID in the data set. |
NewValue |
New text value of the change. |
OldValue |
Old text value of the change (which has been changed by the new text value). |
Action |
Code of the action. |
ActionLength |
Time (in milliseconds) spent on the action. |
Name |
Linguist name. |
ActionName |
Name of the action. |
CA output report details
COLUMN NAME |
DESCRIPTION |
---|---|
Sentence # |
Segment identification (generated incrementally). |
Segment ID |
Segment ID per the original database. |
Source Sentence |
Source text. |
Translation System 1 |
Translation from the system 1. |
Translation System 2 |
Translation from the system 2. |
System 1 Engine |
Name of the system 1. |
System 2 Engine |
Name of the system 2. |
Domain |
Domain or client name, as applicable. |
Additional information |
Optional additional info, e.g. a content flow. |
Metadata |
Propagated metadata from input CSV (Optional). |
Scale used |
Scoring system ID. |
System 1 Score-{{Linguist_Name}} |
Score for Translation 1. |
System 2 Score-{{Linguist_Name}} |
Score for Translation 2. |
Which is Better?-{{Linguist_Name}} |
Which system is better (values: 1, 2 or 0 in case both are of the same quality). |
Delta (S1 - S2)-{{Linguist_Name}} |
Difference of the score (in case of numeric scoring values). |
Judgement Duration (seconds)-{{Linguist_Name}} |
Not used in current implementation. |
Total Duration (seconds)-{{Linguist_Name}} |
Not used in current implementation. |
Bad Source-{{Linguist_Name}} |
Bad source checkbox was selected. |
Offensive Source-{{Linguist_Name}} |
Offensive source checkbox was selected. |
Specialist Source-{{Linguist_Name}} |
Specialist source checkbox was selected. |
Not Translated 1-{{Linguist_Name}} |
Not Translated checkbox was selected for Translation 1. |
Not Translated 2-{{Linguist_Name}} |
Not Translated checkbox was selected for Translation 2. |
Wrong Language 1-{{Linguist_Name}} |
Wrong Language checkbox was selected for Translation 1. |
Wrong Language 2-{{Linguist_Name}} |
Wrong Language checkbox was selected for Translation 2. |
Source Feedback-{{Linguist_Name}} |
Linguist feedback for source filled-in by linguist. |
Translation 1 Feedback-{{Linguist_Name}} |
Linguist feedback for Translation 1 filled-in by linguist. |
Translation 2 Feedback-{{Linguist_Name}} |
Linguist feedback for Translation 2 filled-in by linguist. |
Update Count-{{Linguist_Name}} |
Number of edits on given segment. |
Last Updated-{{Linguist_Name}} |
Date/time of the last edit. |
Last Updated By-{{Linguist_Name}} |
Name of user who made a last edit. |