natural language processing technology to support ...pronunciation explanation of difficult...

2
Broadcast Technology No.58, Autumn 2014 C NHK STRL 21 N HK’s Web service called “NEWSWEB EASY” (http://www.nhk.or.jp/ news/easy/) provides news stories written in simplified Japanese text for non-native residents in Japan and children. Each item is a rewrite of a news script originally written in ordinary Japanese language and various hints are attached in order to enhance readers’ comprehension. STRL supports the production of the service through its research on natural language processing tech- nologies. We recently constructed a new production support system (Figure). NEWSWEB EASY presents news in simple Japanese text, and its articles are written through a col- laboration between an experienced news writer who edits the original news script into simpler composi- tions and a Japanese language instructor (specializing in teaching Japanese language to non-Japanese speakers) who paraphrases unfa- miliar expressions or complicated sentence structures to fit the target readers. For these rewriters’ assis- tance in replacing all the difficult words with more appropriate ones, our system color-codes every word in the manuscript they are writing to indicate the vocabu- lary type (e.g., biographical/geo- graphical names) and difficulty level. The system then tags each word in the finished rewrite with various information. After manu- ally correcting errors in automatic tagging, the simplified Japanese news items are published with the hints generated from the tags, such as pronunciations of Kanji words and plain explanations of difficult words. The automatic tagging technol- ogy is based on machine learning, which acquires knowledge from manually corrected tags made in the daily production. Our new technology has approximately 95% automatic tagging accuracy. This means only small manual error corrections are needed on the auto- matically generated tags. Since the system learns from manually cor- rected tags incrementally (stream learning), it increases its knowl- edge and accuracy day by day. Our system has a number of functions, such as one to search for past rewrites, and it has proven to be useful in daily production. NEWSWEB EASY currently pub- lishes about five news items daily, and we are conducting various studies, including ones on how to assist rewriters in different ways, aimed at making it easier for the service to provide more simplified Japanese news items. Natural Language Processing Technology to Support Simplified Japanese News Service “NEWSWEB EASY” Tadashi Kumano, Human Interface Research Division 日本 大久保 嘉人 選手 など place name basic word person name person name semi-difficult word basic word にっぽん おおくぼ よしと せんしゅ 「にほん」ともいい競技に出るために .. おもな例をあげてWord Type/difficulty Pronunciation Explanation NEWSWEB EASY online page Word tagging knowledge Automatic tagging Vocabulary type/ difficulty color-coding Difficulty level confirmation screen Tag editor Errors are corrected manually Incremental learning from corrected tags Pronunciation Explanation of difficult vocabularies Color-coding geographical biographical names Checks manuscript currently being edited Rewrite work Tag work Original news script ワールドカップ 日本0-0 ギリ シャ引き分け 20 サッカーワールドカップ日本ギリシャ試合をしました。 日本15 試合コートジボワー 負けました。ギリシャにも負ける と、決勝トーナメント (= から 16 までのチーム優勝を決めるため行う試合 ) 出ることできません。 必ず勝ちたいと考えて、日本大久保 嘉人選手などを試合最初から出しした。 Figure: NEWSWEB EASY production support using automatic tagging technology

Upload: others

Post on 25-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Broadcast Technology No.58, Autumn 2014 ● C NHK STRL 21

    NHK’s Web service called “NEWSWEB EASY” (http://www.nhk.or.jp/news/easy/) provides news stories written in simplified Japanese text for non-native residents in Japan and children. Each item is a rewrite of a news script originally written in ordinary Japanese language and various hints are attached in order to enhance readers’ comprehension. STRL supports the production of the service through its research on natural language processing tech-nologies. We recently constructed a new production support system (Figure).

    NEWSWEB EASY presents news in simple Japanese text, and its articles are written through a col-laboration between an experienced news writer who edits the original news script into simpler composi-tions and a Japanese language instructor (specializing in teaching Japanese language to non-Japanese

    speakers) who paraphrases unfa-miliar expressions or complicated sentence structures to fit the target readers. For these rewriters’ assis-tance in replacing all the difficult words with more appropriate ones, our system color-codes every word in the manuscript they are writing to indicate the vocabu-lary type (e.g., biographical/geo-graphical names) and difficulty level. The system then tags each word in the finished rewrite with various information. After manu-ally correcting errors in automatic tagging, the simplified Japanese news items are published with the hints generated from the tags, such as pronunciations of Kanji words and plain explanations of difficult words.

    The automatic tagging technol-ogy is based on machine learning, which acquires knowledge from manually corrected tags made in the daily production. Our new

    technology has approximately 95% automatic tagging accuracy. This means only small manual error corrections are needed on the auto-matically generated tags. Since the system learns from manually cor-rected tags incrementally (stream learning), it increases its knowl-edge and accuracy day by day.

    Our system has a number of functions, such as one to search for past rewrites, and it has proven to be useful in daily production. NEWSWEB EASY currently pub-lishes about five news items daily, and we are conducting various studies, including ones on how to assist rewriters in different ways, aimed at making it easier for the service to provide more simplified Japanese news items.

    Natural Language Processing Technology to Support Simplified Japanese News Service “NEWSWEB EASY”

    Tadashi Kumano, Human Interface Research Division

    日本は大久保嘉人選手など

    place namebasic wordperson nameperson namesemi-difficult wordbasic word

    にっぽん

    おおくぼよしとせんしゅ

    「にほん」ともいい…

    競技に出るために ..おもな例をあげて…

    Word Type/difficulty Pronunciation Explanation

    NEWSWEB EASY online page

    Word tagging knowledge

    Automatic tagging

    Vocabulary type/difficulty color-coding

    Difficulty level confirmation screen

    Tag editor Errors are corrected manually

    Incremental learning from corrected tags

    Pronunciation

    Explanation of difficult vocabularies

    Color-coding geographical/biographical names

    Checks manuscript currently being edited

    Rewrite work Tag work

    Original news script

    ワールドカップ 日本は 0−0 でギリシャと引き分け20 日、サッカーのワールドカップで日本はギリシャと試合をしました。日本は 15 日の試合でコートジボワールに負けました。ギリシャにも負けると、決勝トーナメント (=上から 16番目までのチームが優勝を決めるために行う試合 ) に出ることができません。必ず勝ちたいと考えて、日本は大久保嘉人選手などを試合の最初から出しました。

    Figure: NEWSWEB EASY production support using automatic tagging technology

  • Broadcast Technology No.58, Autumn 2014 ● C NHK STRL22

    Bidirectional FPU for High-speed Program Materials File Transmission

    Broadcasting stations are rap-idly switching to file-based systems, which manage video and audio footage on non-tape-based media such as hard disks. While video editing and playout systems have started incorporating file-based systems, the FPU systems, which transmit video and audio sig-nals from news-gathering and relay locations to broadcasting stations using microwaves, are not designed to transfer file data. This makes it necessary to decode video files into video signals, thereby making it harder to reduce the transmission time to less than the actual duration.

    In addition to the uplink used to send video to a broadcasting station, a high-speed, lossless file-based ma-terial transmission system requires a downlink for returning control data for transmission rate control and retransmission requests to the news-gathering or relay location. To resolve this issue, STRL has devel-

    oped a novel FPU that is capable of bidirectional communication.

    This bidirectional FPU uses a time division duplex (TDD) scheme in which the uplink and downlink use the same frequency channel but at different times (Figure). This system can set the uplink/downlink data rate ratio by controlling uplink/downlink time ratio. Since the data rate of control data is much less than that of file data, assigning a higher data rate to the uplink enables high-speed transmissions in less than half the time of the actual video duration through the use of efficient dual-polarized 2×2 MIMO*1 transmission technology and 256-QAM*2 multi-level modulation.

    The bidirectional FPU’s IP inter-face multiplexes signals to make it feasible to transmit real-time video signals, in addition to video files. Our prototype device enables return video signals from the studio to be transmitted to the news-gathering

    or relay location and recorded video materials to be uploaded in the back-ground.

    In the future, we will develop a means to select appropriate trans-mission data rates for individual propagation paths, with the goal of developing a more reliable bidirec-tional FPU system.

    *1 Dual-polarized 2×2 MIMO: a transmis-sion technology based on spatial mul-tiplexing using multiple transmission/reception antennas over horizontally and vertically polarized waves.

    *2 QAM: quadrature amplitude modula-tion

    Fumiki Uzawa, Advanced Transmission Systems Research Division

    Video material server

    Video file reception

    Video fileFile-based editing

    Camera video Relay staff

    Camera video

    IP interface IP interface

    UplinkNews-gathering/outside broadcasting Broadcasting stationDownlink Time

    Studio

    Polarization multiplexing

    Studio video return

    Video file transmission

    Camera video transmission

    Bidirectional communication

    using TDD scheme

    Bidirectional FPU baseband transceiver

    Bidirectional FPU RF

    front-end

    Bidirectional FPU baseband transceiver

    Bidirectional FPU RF

    front-end

    Studio video return

    Studio video return

    Video file transmission

    Camera video transmission

    Figure: Bidirectional FPU overview