Lexer

class Lexer(ignorePattern: Regex = Regex("""\s+"""), singleLineComments: Regex? = null, multilineComments: Pair<Regex, Regex>? = null, identifiers: Regex = Regex("""[a-zA-Z_]\w*"""), hardKeywords: Set<String> = emptySet(), operators: Set<String> = emptySet(), separators: Set<String> = emptySet(), literals: Literals = Literals())(source)

The lexer is responsible to convert the given string into a stream of Tokens. The lexer take in multiple settings that configure how it behaves. It will perform lexical analysis on a line-by-line basis and return the next unconsumed token. A newline character is always separates a token.

Author

Nishant Aanjaney Jalan

Since

0.1.0

Parameters

ignorePattern

characters that satisfy this regex would be skipped. (Default: "\s+")

singleLineComments

The regex that defines how a single-line comment starts. Once identified, the lexer will skip the remaining line. (Default: null)

multilineComments

A pair of regexes, the starting pattern and the ending pattern for a multiline comment block. (Default: null)

identifiers

A regex string that defines the rules for defining a name. (Default: "a-zA-Z_\w*")

hardKeywords

A set of strings that are considered hard keywords. Hard keywords are a characters and symbols that give a particular meaning to a program. They may not be used as identifiers. (Default: [])

operators

A set of strings that are considered as operators. Operators are characters and symbols that may perform arithmetic or logical operations. (Default: [])

separators

A set of strings that are considered as separators. Separators are characters and symbols that act like delimiters to separate other meaningful elements. (Default: [])

literals

The configuration of literals. Literals denote constant values such as numbers, strings, and characters. (Default: see Literals)

Constructors

Link copied to clipboard
constructor(ignorePattern: Regex = Regex("""\s+"""), singleLineComments: Regex? = null, multilineComments: Pair<Regex, Regex>? = null, identifiers: Regex = Regex("""[a-zA-Z_]\w*"""), hardKeywords: Set<String> = emptySet(), operators: Set<String> = emptySet(), separators: Set<String> = emptySet(), literals: Literals = Literals())

Creates a lexer with the provided properties.