Chomsky Normal Form (thing) by rp

Chomsky Normal Form (CNF) is a restriction on the form of context-free grammars: the right hand side of every rule must consist of either one terminal or two nonterminals.

It is indeed a normal form: algorithms exist to convert any context-free grammar into an equivalent grammar in CNF. But it is not possible to speak of "the Chomsky Normal Form of a context-free grammar": different normalization algorithms produce different results.

Also, many different grammars can produce the same normal form. (For these reasons, it is incorrect to say that a CNF grammar "describes" a context-free grammar, and I've /msged PeterPan asking to correct this.)

There is one caveat: grammars in CNF cannot generate the empty string, so they can only represent arbitrary grammars up to the empty string.

An obvious normalization algorithm:

recursively expand all empty-generating rules
for each terminal, introduce a nonterminal to generate it, and replace all terminals occurring as part of existing right hand sides with those nonterminals
recursively replace all rules A -> BX, where A, B are nonterminals and X is a string of 2 or more nonterminals, with the rules A -> BC and C -> X, where C is a new nonterminal

For example: in grammar


  S -> a
  S -> aAS
  A -> a
  A -> b
  A -> SS
  A ->

we eliminate the empty rules (all one of them) to get


  S -> a
  S -> aAS
  S -> aS
  A -> a
  A -> b
  A -> SS

then eliminate non-isolated terminals (all one of them) to get


  S -> a
  S -> CAS
  S -> CS
  A -> a
  A -> b
  A -> SS
  C -> a

and finally "chain" the long right hand sides (all one of them) to get a grammar in CNF:


  S -> a
  S -> CD
  S -> CS
  A -> a
  A -> b
  A -> SS
  C -> a
  D -> AS

context-free grammar	Noam Chomsky	Colorless green ideas sleep furiously	ambiguous grammar
English: a language with (almost) no negative verbs	Perineum	BNF	Greibach Normal Form
Renouncing U.S. citizenship	Rain in 7 flavors	Chris Sharma	RP
formal language theory	Chomsky hierarchy	Hilbert Problems	What am I?
Everything Quest: Node the Ages	normal form	closed captioning	CNF
conjunctive normal form	HTML symbol reference	To, Too, Two	Iran-Contra Scandal