‘Shōsetsuka ni Narō’ Novel Website Scraped for A.I. Developer

‘Shōsetsuka ni Narō’ Novel Website Scraped for A.I. Developer


Over 700,000 works sourced from self-publishing website


Naro FanDB, an unofficial fan X (formerly Twitter) account for the Japanese novel website Shōsetsuka ni Narō, posted on April 27 that a public dataset from the RyokoAI project scraped about 711,700 works from the website. Datasets can be used for the training of A.I. models.

Netizens questioned the ethics of using works from Shōsetsuka ni Narō, which can be viewed publicly without registering an account.

Shōsetsuka ni Narō’s terms of use forbid users from engaging in any acts that may infringe on the copyrights, trademarks, or other intellectual property rights of the site’s Hina Project maintainers or other users. Users can read works on the website without agreeing to the terms of use, but intellectual property laws can still apply without the terms of use.

The dataset’s licensing disclaimer asserts that all material besides those created by Ronsor Labs or the Ryoko AI Production Committee themselves “is distributed under fair use principles.” However, the current Copyright Act of Japan and similar laws in other countries do not include the doctrine of fair use as codified in the United States.

The 65-gigabyte dataset is split into 21 segments, and ANN confirmed that at least five of the segments contain text from Shōsetsuka ni Narō.

Stories on Shōsetsuka ni Narō are often picked up by publishers for print and digital publication, with some titles being adapted into manga and anime such as Tsutomu Satou‘s The Irregular at Magic High School (Mahōka Kōkō no Rettōsei) light novel series.

RyokoAI describes the company as “committed to producing open-source AI solutions and releasing open-source models, datasets, and more.”

Sources: Naro Fan DB’s Twitter account, Hugging Face via Comic Book Resources






Source link

Leave a Reply

Your email address will not be published. Required fields are marked *