For projects intended for public release rather than personal research, copyright and licensing of data must be handled carefully.
 In academia, whether data copyright and licensing were properly observed is also being evaluated. https://2021.aclweb.org/ethics/Ethics-review-questions/
Level of Copyright Protection
Copyright is protected where creativity is recognized.
- Court precedents are not recognized as creative, so they are not copyright-protected — precedent search services can freely use precedent data.
- Comments are subject to copyright depending on creativity. Everyday, idiomatic phrases are not protected.
Copyright Usage Procedure
For data where copyright naturally arises from recognized creativity, one must negotiate with the rights holder.
- Agreement on exclusive/non-exclusive licensing of economic copyright
- Full/partial transfer of economic copyright
Licenses
The copyright usage procedure is quite cumbersome. So licenses define usage permission terms, and you use the copyrighted work according to the license. ref: http://cckorea.org/xe/ccl e.g., CCL, Open Government License (Korean equivalent)
- Namu Wiki: CC BY-NC-SA, usable for non-commercial purposes. Must cite the data source.
- KorQuAD: CC BY-ND, attribution required / no derivatives
News Data
-
News outlets: Most news outlet copyrights are managed in trust by the Korea Press Foundation
- Negotiate with the rights holders (Korea Press Foundation, news outlet). For Chosun/JoongAng/Dong-A, contact directly; for others, contact the Korea Press Foundation.
- Very rarely, some news articles have CCL applied (e.g., Wikitree)
-
Even if you purchased news data for 0 KRW, follow the purchase terms of use.
-
Newspaper headlines are stated by the Korea Copyright Commission as not protected by copyright.
Fair Use
For educational, judicial, exhibition, and similar purposes, copyrighted work can be used without the rights holder’s permission.
https://www.copyright.or.kr/education/educlass/learning/what-the-copyright/definition/index06.do
Gray Area Between Copyright Law and AI
- How should data generated by GPT-3 be handled under copyright law?
- Who owns the copyright of a news article generated by a news summarization model?