collection of text that supposed to represent the language. A wide range of domains, topics, and materials is considered and sentences are randomly selected to avoid biases. • Example: BCCWJ (Balanced Corpus of Contemporary Written Japanese) – released in 2011. 100M words. – Publication subcorpus: 35M words • books,magazines, and newspapers published during 2001-2005 – Library subcorpus: 30M words • books cataloged at more than 13 public libraries in Tokyo area, and published after 1985 – Special purpose subcorpus: 35M words • governmental whitepaper, textbook, laws, Internet (Yahoo! Q&A), Diet minutes, best selling books, etc.