How much knowledge of the semantics of the data do we put in the scanner?
In our first examples our scanner incorporated very little knowledge of the data - the data was just comprised of fields and slashes.
Then we added knowledge about Titles.
Now we have added knowledge that the first field is a Title, the second field is an Author, etc. We have added a lot of knowledge of the semantics of the data to our lexer.