• Main Page
  • Packages
  • Classes
  • Files

antlr3.py

Go to the documentation of this file.
00001 ##
00002 #  @package antlr3
00003 # @brief ANTLR3 runtime package
00004 # 
00005 # This module contains all support classes, which are needed to use recognizers
00006 # generated by ANTLR3.
00007 # 
00008 # @mainpage
00009 # 
00010 # \note Please be warned that the line numbers in the API documentation do not
00011 # match the real locations in the source code of the package. This is an
00012 # unintended artifact of doxygen, which I could only convince to use the
00013 # correct module names by concatenating all files from the package into a single
00014 # module file...
00015 # 
00016 # Here is a little overview over the most commonly used classes provided by
00017 # this runtime:
00018 # 
00019 # @section recognizers Recognizers
00020 # 
00021 # These recognizers are baseclasses for the code which is generated by ANTLR3.
00022 # 
00023 # - BaseRecognizer: Base class with common recognizer functionality.
00024 # - Lexer: Base class for lexers.
00025 # - Parser: Base class for parsers.
00026 # - tree.TreeParser: Base class for %tree parser.
00027 # 
00028 # @section streams Streams
00029 # 
00030 # Each recognizer pulls its input from one of the stream classes below. Streams
00031 # handle stuff like buffering, look-ahead and seeking.
00032 # 
00033 # A character stream is usually the first element in the pipeline of a typical
00034 # ANTLR3 application. It is used as the input for a Lexer.
00035 # 
00036 # - ANTLRStringStream: Reads from a string objects. The input should be a unicode
00037 #   object, or ANTLR3 will have trouble decoding non-ascii data.
00038 # - ANTLRFileStream: Opens a file and read the contents, with optional character
00039 #   decoding.
00040 # - ANTLRInputStream: Reads the date from a file-like object, with optional
00041 #   character decoding.
00042 # 
00043 # A Parser needs a TokenStream as input (which in turn is usually fed by a
00044 # Lexer):
00045 # 
00046 # - CommonTokenStream: A basic and most commonly used TokenStream
00047 #   implementation.
00048 # - TokenRewriteStream: A modification of CommonTokenStream that allows the
00049 #   stream to be altered (by the Parser). See the 'tweak' example for a usecase.
00050 # 
00051 # And tree.TreeParser finally fetches its input from a tree.TreeNodeStream:
00052 # 
00053 # - tree.CommonTreeNodeStream: A basic and most commonly used tree.TreeNodeStream
00054 #   implementation.
00055 #   
00056 # 
00057 # @section tokenstrees Tokens and Trees
00058 # 
00059 # A Lexer emits Token objects which are usually buffered by a TokenStream. A
00060 # Parser can build a Tree, if the output=AST option has been set in the grammar.
00061 # 
00062 # The runtime provides these Token implementations:
00063 # 
00064 # - CommonToken: A basic and most commonly used Token implementation.
00065 # - ClassicToken: A Token object as used in ANTLR 2.x, used to %tree
00066 #   construction.
00067 # 
00068 # Tree objects are wrapper for Token objects.
00069 # 
00070 # - tree.CommonTree: A basic and most commonly used Tree implementation.
00071 # 
00072 # A tree.TreeAdaptor is used by the parser to create tree.Tree objects for the
00073 # input Token objects.
00074 # 
00075 # - tree.CommonTreeAdaptor: A basic and most commonly used tree.TreeAdaptor
00076 # implementation.
00077 # 
00078 # 
00079 # @section Exceptions
00080 # 
00081 # RecognitionException are generated, when a recognizer encounters incorrect
00082 # or unexpected input.
00083 # 
00084 # - RecognitionException
00085 #   - MismatchedRangeException
00086 #   - MismatchedSetException
00087 #     - MismatchedNotSetException
00088 #     .
00089 #   - MismatchedTokenException
00090 #   - MismatchedTreeNodeException
00091 #   - NoViableAltException
00092 #   - EarlyExitException
00093 #   - FailedPredicateException
00094 #   .
00095 # .
00096 # 
00097 # A tree.RewriteCardinalityException is raised, when the parsers hits a
00098 # cardinality mismatch during AST construction. Although this is basically a
00099 # bug in your grammar, it can only be detected at runtime.
00100 # 
00101 # - tree.RewriteCardinalityException
00102 #   - tree.RewriteEarlyExitException
00103 #   - tree.RewriteEmptyStreamException
00104 #   .
00105 # .
00106 # 
00107 # 
00108 
00109 # tree.RewriteRuleElementStream
00110 # tree.RewriteRuleSubtreeStream
00111 # tree.RewriteRuleTokenStream
00112 # CharStream
00113 # DFA
00114 # TokenSource
00115 
00116 # [The "BSD licence"]
00117 # Copyright (c) 2005-2008 Terence Parr
00118 # All rights reserved.
00119 #
00120 # Redistribution and use in source and binary forms, with or without
00121 # modification, are permitted provided that the following conditions
00122 # are met:
00123 # 1. Redistributions of source code must retain the above copyright
00124 #    notice, this list of conditions and the following disclaimer.
00125 # 2. Redistributions in binary form must reproduce the above copyright
00126 #    notice, this list of conditions and the following disclaimer in the
00127 #    documentation and/or other materials provided with the distribution.
00128 # 3. The name of the author may not be used to endorse or promote products
00129 #    derived from this software without specific prior written permission.
00130 #
00131 # THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
00132 # IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
00133 # OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
00134 # IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
00135 # INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
00136 # NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
00137 # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
00138 # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
00139 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
00140 # THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
00141 
00142 __version__ = '3.1.1'
00143 
00144 def version_str_to_tuple(version_str):
00145     import re
00146     import sys
00147 
00148     if version_str == 'HEAD':
00149         return (sys.maxint, sys.maxint, sys.maxint, sys.maxint)
00150 
00151     m = re.match(r'(\d+)\.(\d+)(\.(\d+))?(b(\d+))?', version_str)
00152     if m is None:
00153         raise ValueError("Bad version string %r" % version_str)
00154 
00155     major = int(m.group(1))
00156     minor = int(m.group(2))
00157     patch = int(m.group(4) or 0)
00158     beta = int(m.group(6) or sys.maxint)
00159 
00160     return (major, minor, patch, beta)
00161 
00162 
00163 runtime_version_str = __version__
00164 runtime_version = version_str_to_tuple(runtime_version_str)
00165 
00166 
00167 from constants import *
00168 from dfa import *
00169 from exceptions import *
00170 from recognizers import *
00171 from streams import *
00172 from tokens import *
00173 """ANTLR3 exception hierarchy"""
00174 
00175 
00176 from antlr3.constants import INVALID_TOKEN_TYPE
00177 
00178 
00179 ##
00180 # @brief Raised to signal failed backtrack attempt
00181 class BacktrackingFailed(Exception):
00182 
00183     pass
00184 
00185 
00186 ##
00187 # @brief The root of the ANTLR exception hierarchy.
00188 # 
00189 #     To avoid English-only error messages and to generally make things
00190 #     as flexible as possible, these exceptions are not created with strings,
00191 #     but rather the information necessary to generate an error.  Then
00192 #     the various reporting methods in Parser and Lexer can be overridden
00193 #     to generate a localized error message.  For example, MismatchedToken
00194 #     exceptions are built with the expected token type.
00195 #     So, don't expect getMessage() to return anything.
00196 # 
00197 #     Note that as of Java 1.4, you can access the stack trace, which means
00198 #     that you can compute the complete trace of rules from the start symbol.
00199 #     This gives you considerable context information with which to generate
00200 #     useful error messages.
00201 # 
00202 #     ANTLR generates code that throws exceptions upon recognition error and
00203 #     also generates code to catch these exceptions in each rule.  If you
00204 #     want to quit upon first error, you can turn off the automatic error
00205 #     handling mechanism using rulecatch action, but you still need to
00206 #     override methods mismatch and recoverFromMismatchSet.
00207 #     
00208 #     In general, the recognition exceptions can track where in a grammar a
00209 #     problem occurred and/or what was the expected input.  While the parser
00210 #     knows its state (such as current input symbol and line info) that
00211 #     state can change before the exception is reported so current token index
00212 #     is computed and stored at exception time.  From this info, you can
00213 #     perhaps print an entire line of input not just a single token, for example.
00214 #     Better to just say the recognizer had a problem and then let the parser
00215 #     figure out a fancy report.
00216 #     
00217 #     
00218 class RecognitionException(Exception):
00219 
00220     def __init__(self, input=None):
00221         Exception.__init__(self)
00222 
00223         # What input stream did the error occur in?
00224         self.input = None
00225 
00226         # What is index of token/char were we looking at when the error
00227         # occurred?
00228         self.index = None
00229 
00230         # The current Token when an error occurred.  Since not all streams
00231         # can retrieve the ith Token, we have to track the Token object.
00232         # For parsers.  Even when it's a tree parser, token might be set.
00233         self.token = None
00234 
00235         # If this is a tree parser exception, node is set to the node with
00236         # the problem.
00237         self.node = None
00238 
00239         # The current char when an error occurred. For lexers.
00240         self.c = None
00241 
00242         # Track the line at which the error occurred in case this is
00243         # generated from a lexer.  We need to track this since the
00244         # unexpected char doesn't carry the line info.
00245         self.line = None
00246 
00247         self.charPositionInLine = None
00248 
00249         # If you are parsing a tree node stream, you will encounter som
00250         # imaginary nodes w/o line/col info.  We now search backwards looking
00251         # for most recent token with line/col info, but notify getErrorHeader()
00252         # that info is approximate.
00253         self.approximateLineInfo = False
00254 
00255         
00256         if input is not None:
00257             self.input = input
00258             self.index = input.index()
00259 
00260             # late import to avoid cyclic dependencies
00261             from antlr3.streams import TokenStream, CharStream
00262             from antlr3.tree import TreeNodeStream
00263 
00264             if isinstance(self.input, TokenStream):
00265                 self.token = self.input.LT(1)
00266                 self.line = self.token.line
00267                 self.charPositionInLine = self.token.charPositionInLine
00268 
00269             if isinstance(self.input, TreeNodeStream):
00270                 self.extractInformationFromTreeNodeStream(self.input)
00271 
00272             else:
00273                 if isinstance(self.input, CharStream):
00274                     self.c = self.input.LT(1)
00275                     self.line = self.input.line
00276                     self.charPositionInLine = self.input.charPositionInLine
00277 
00278                 else:
00279                     self.c = self.input.LA(1)
00280 
00281     def extractInformationFromTreeNodeStream(self, nodes):
00282         from antlr3.tree import Tree, CommonTree
00283         from antlr3.tokens import CommonToken
00284         
00285         self.node = nodes.LT(1)
00286         adaptor = nodes.adaptor
00287         payload = adaptor.getToken(self.node)
00288         if payload is not None:
00289             self.token = payload
00290             if payload.line <= 0:
00291                 # imaginary node; no line/pos info; scan backwards
00292                 i = -1
00293                 priorNode = nodes.LT(i)
00294                 while priorNode is not None:
00295                     priorPayload = adaptor.getToken(priorNode)
00296                     if priorPayload is not None and priorPayload.line > 0:
00297                         # we found the most recent real line / pos info
00298                         self.line = priorPayload.line
00299                         self.charPositionInLine = priorPayload.charPositionInLine
00300                         self.approximateLineInfo = True
00301                         break
00302                     
00303                     i -= 1
00304                     priorNode = nodes.LT(i)
00305                     
00306             else: # node created from real token
00307                 self.line = payload.line
00308                 self.charPositionInLine = payload.charPositionInLine
00309                 
00310         elif isinstance(self.node, Tree):
00311             self.line = self.node.line
00312             self.charPositionInLine = self.node.charPositionInLine
00313             if isinstance(self.node, CommonTree):
00314                 self.token = self.node.token
00315 
00316         else:
00317             type = adaptor.getType(self.node)
00318             text = adaptor.getText(self.node)
00319             self.token = CommonToken(type=type, text=text)
00320 
00321      
00322     ##
00323     # Return the token type or char of the unexpected input element
00324     def getUnexpectedType(self):
00325 
00326         from antlr3.streams import TokenStream
00327         from antlr3.tree import TreeNodeStream
00328 
00329         if isinstance(self.input, TokenStream):
00330             return self.token.type
00331 
00332         elif isinstance(self.input, TreeNodeStream):
00333             adaptor = self.input.treeAdaptor
00334             return adaptor.getType(self.node)
00335 
00336         else:
00337             return self.c
00338 
00339     unexpectedType = property(getUnexpectedType)
00340     
00341 
00342 ##
00343 # @brief A mismatched char or Token or tree node.
00344 class MismatchedTokenException(RecognitionException):
00345     
00346     def __init__(self, expecting, input):
00347         RecognitionException.__init__(self, input)
00348         self.expecting = expecting
00349         
00350 
00351     def __str__(self):
00352         #return "MismatchedTokenException("+self.expecting+")"
00353         return "MismatchedTokenException(%r!=%r)" % (
00354             self.getUnexpectedType(), self.expecting
00355             )
00356     __repr__ = __str__
00357 
00358 
00359 ##
00360 # An extra token while parsing a TokenStream
00361 class UnwantedTokenException(MismatchedTokenException):
00362 
00363     def getUnexpectedToken(self):
00364         return self.token
00365 
00366 
00367     def __str__(self):
00368         exp = ", expected %s" % self.expecting
00369         if self.expecting == INVALID_TOKEN_TYPE:
00370             exp = ""
00371 
00372         if self.token is None:
00373             return "UnwantedTokenException(found=%s%s)" % (None, exp)
00374 
00375         return "UnwantedTokenException(found=%s%s)" % (self.token.text, exp)
00376     __repr__ = __str__
00377 
00378 
00379 ##
00380 # 
00381 #     We were expecting a token but it's not found.  The current token
00382 #     is actually what we wanted next.
00383 #     
00384 class MissingTokenException(MismatchedTokenException):
00385 
00386     def __init__(self, expecting, input, inserted):
00387         MismatchedTokenException.__init__(self, expecting, input)
00388 
00389         self.inserted = inserted
00390 
00391 
00392     def getMissingType(self):
00393         return self.expecting
00394 
00395 
00396     def __str__(self):
00397         if self.inserted is not None and self.token is not None:
00398             return "MissingTokenException(inserted %r at %r)" % (
00399                 self.inserted, self.token.text)
00400 
00401         if self.token is not None:
00402             return "MissingTokenException(at %r)" % self.token.text
00403 
00404         return "MissingTokenException"
00405     __repr__ = __str__
00406 
00407 
00408 ##
00409 # @brief The next token does not match a range of expected types.
00410 class MismatchedRangeException(RecognitionException):
00411 
00412     def __init__(self, a, b, input):
00413         RecognitionException.__init__(self, input)
00414 
00415         self.a = a
00416         self.b = b
00417         
00418 
00419     def __str__(self):
00420         return "MismatchedRangeException(%r not in [%r..%r])" % (
00421             self.getUnexpectedType(), self.a, self.b
00422             )
00423     __repr__ = __str__
00424     
00425 
00426 ##
00427 # @brief The next token does not match a set of expected types.
00428 class MismatchedSetException(RecognitionException):
00429 
00430     def __init__(self, expecting, input):
00431         RecognitionException.__init__(self, input)
00432 
00433         self.expecting = expecting
00434         
00435 
00436     def __str__(self):
00437         return "MismatchedSetException(%r not in %r)" % (
00438             self.getUnexpectedType(), self.expecting
00439             )
00440     __repr__ = __str__
00441 
00442 
00443 ##
00444 # @brief Used for remote debugger deserialization
00445 class MismatchedNotSetException(MismatchedSetException):
00446     
00447     def __str__(self):
00448         return "MismatchedNotSetException(%r!=%r)" % (
00449             self.getUnexpectedType(), self.expecting
00450             )
00451     __repr__ = __str__
00452 
00453 
00454 ##
00455 # @brief Unable to decide which alternative to choose.
00456 class NoViableAltException(RecognitionException):
00457 
00458     def __init__(
00459         self, grammarDecisionDescription, decisionNumber, stateNumber, input
00460         ):
00461         RecognitionException.__init__(self, input)
00462 
00463         self.grammarDecisionDescription = grammarDecisionDescription
00464         self.decisionNumber = decisionNumber
00465         self.stateNumber = stateNumber
00466 
00467 
00468     def __str__(self):
00469         return "NoViableAltException(%r!=[%r])" % (
00470             self.unexpectedType, self.grammarDecisionDescription
00471             )
00472     __repr__ = __str__
00473     
00474 
00475 ##
00476 # @brief The recognizer did not match anything for a (..)+ loop.
00477 class EarlyExitException(RecognitionException):
00478 
00479     def __init__(self, decisionNumber, input):
00480         RecognitionException.__init__(self, input)
00481 
00482         self.decisionNumber = decisionNumber
00483 
00484 
00485 ##
00486 # @brief A semantic predicate failed during validation.
00487 # 
00488 #     Validation of predicates
00489 #     occurs when normally parsing the alternative just like matching a token.
00490 #     Disambiguating predicate evaluation occurs when we hoist a predicate into
00491 #     a prediction decision.
00492 #     
00493 class FailedPredicateException(RecognitionException):
00494 
00495     def __init__(self, input, ruleName, predicateText):
00496         RecognitionException.__init__(self, input)
00497         
00498         self.ruleName = ruleName
00499         self.predicateText = predicateText
00500 
00501 
00502     def __str__(self):
00503         return "FailedPredicateException("+self.ruleName+",{"+self.predicateText+"}?)"
00504     __repr__ = __str__
00505     
00506 
00507 ##
00508 # @brief The next tree mode does not match the expected type.
00509 class MismatchedTreeNodeException(RecognitionException):
00510 
00511     def __init__(self, expecting, input):
00512         RecognitionException.__init__(self, input)
00513         
00514         self.expecting = expecting
00515 
00516     def __str__(self):
00517         return "MismatchedTreeNodeException(%r!=%r)" % (
00518             self.getUnexpectedType(), self.expecting
00519             )
00520     __repr__ = __str__
00521 """ANTLR3 runtime package"""
00522 
00523 
00524 EOF = -1
00525 
00526 ## All tokens go to the parser (unless skip() is called in that rule)
00527 # on a particular "channel".  The parser tunes to a particular channel
00528 # so that whitespace etc... can go to the parser on a "hidden" channel.
00529 DEFAULT_CHANNEL = 0
00530 
00531 ## Anything on different channel than DEFAULT_CHANNEL is not parsed
00532 # by parser.
00533 HIDDEN_CHANNEL = 99
00534 
00535 # Predefined token types
00536 EOR_TOKEN_TYPE = 1
00537 
00538 ##
00539 # imaginary tree navigation type; traverse "get child" link
00540 DOWN = 2
00541 ##
00542 #imaginary tree navigation type; finish with a child list
00543 UP = 3
00544 
00545 MIN_TOKEN_TYPE = UP+1
00546         
00547 INVALID_TOKEN_TYPE = 0
00548 
00549 """ANTLR3 runtime package"""
00550 
00551 """ANTLR3 runtime package"""
00552 
00553 
00554 from antlr3.constants import EOF, DEFAULT_CHANNEL, INVALID_TOKEN_TYPE
00555 
00556 ############################################################################
00557 #
00558 # basic token interface
00559 #
00560 ############################################################################
00561 
00562 ##
00563 # @brief Abstract token baseclass.
00564 class Token(object):
00565 
00566     ##
00567     # @brief Get the text of the token.
00568     # 
00569     #         Using setter/getter methods is deprecated. Use o.text instead.
00570     #         
00571     def getText(self):
00572         raise NotImplementedError
00573     
00574     ##
00575     # @brief Set the text of the token.
00576     # 
00577     #         Using setter/getter methods is deprecated. Use o.text instead.
00578     #         
00579     def setText(self, text):
00580         raise NotImplementedError
00581 
00582 
00583     ##
00584     # @brief Get the type of the token.
00585     # 
00586     #         Using setter/getter methods is deprecated. Use o.type instead.
00587     def getType(self):
00588 
00589         raise NotImplementedError
00590     
00591     ##
00592     # @brief Get the type of the token.
00593     # 
00594     #         Using setter/getter methods is deprecated. Use o.type instead.
00595     def setType(self, ttype):
00596 
00597         raise NotImplementedError
00598     
00599     
00600     ##
00601     # @brief Get the line number on which this token was matched
00602     # 
00603     #         Lines are numbered 1..n
00604     #         
00605     #         Using setter/getter methods is deprecated. Use o.line instead.
00606     def getLine(self):
00607 
00608         raise NotImplementedError
00609     
00610     ##
00611     # @brief Set the line number on which this token was matched
00612     # 
00613     #         Using setter/getter methods is deprecated. Use o.line instead.
00614     def setLine(self, line):
00615 
00616         raise NotImplementedError
00617     
00618     
00619     ##
00620     # @brief Get the column of the tokens first character,
00621     #         
00622     #         Columns are numbered 0..n-1
00623     #         
00624     #         Using setter/getter methods is deprecated. Use o.charPositionInLine instead.
00625     def getCharPositionInLine(self):
00626 
00627         raise NotImplementedError
00628     
00629     ##
00630     # @brief Set the column of the tokens first character,
00631     # 
00632     #         Using setter/getter methods is deprecated. Use o.charPositionInLine instead.
00633     def setCharPositionInLine(self, pos):
00634 
00635         raise NotImplementedError
00636     
00637 
00638     ##
00639     # @brief Get the channel of the token
00640     # 
00641     #         Using setter/getter methods is deprecated. Use o.channel instead.
00642     def getChannel(self):
00643 
00644         raise NotImplementedError
00645     
00646     ##
00647     # @brief Set the channel of the token
00648     # 
00649     #         Using setter/getter methods is deprecated. Use o.channel instead.
00650     def setChannel(self, channel):
00651 
00652         raise NotImplementedError
00653     
00654 
00655     ##
00656     # @brief Get the index in the input stream.
00657     # 
00658     #         An index from 0..n-1 of the token object in the input stream.
00659     #         This must be valid in order to use the ANTLRWorks debugger.
00660     #         
00661     #         Using setter/getter methods is deprecated. Use o.index instead.
00662     def getTokenIndex(self):
00663 
00664         raise NotImplementedError
00665     
00666     ##
00667     # @brief Set the index in the input stream.
00668     # 
00669     #         Using setter/getter methods is deprecated. Use o.index instead.
00670     def setTokenIndex(self, index):
00671 
00672         raise NotImplementedError
00673 
00674 
00675     ##
00676     # @brief From what character stream was this token created.
00677     # 
00678     #         You don't have to implement but it's nice to know where a Token
00679     #         comes from if you have include files etc... on the input.
00680     def getInputStream(self):
00681 
00682         raise NotImplementedError
00683 
00684     ##
00685     # @brief From what character stream was this token created.
00686     # 
00687     #         You don't have to implement but it's nice to know where a Token
00688     #         comes from if you have include files etc... on the input.
00689     def setInputStream(self, input):
00690 
00691         raise NotImplementedError
00692 
00693 
00694 ############################################################################
00695 #
00696 # token implementations
00697 #
00698 # Token
00699 # +- CommonToken
00700 # \- ClassicToken
00701 #
00702 ############################################################################
00703 
00704 ##
00705 # @brief Basic token implementation.
00706 # 
00707 #     This implementation does not copy the text from the input stream upon
00708 #     creation, but keeps start/stop pointers into the stream to avoid
00709 #     unnecessary copy operations.
00710 # 
00711 #     
00712 class CommonToken(Token):
00713     
00714     def __init__(self, type=None, channel=DEFAULT_CHANNEL, text=None,
00715                  input=None, start=None, stop=None, oldToken=None):
00716         Token.__init__(self)
00717         
00718         if oldToken is not None:
00719             self.type = oldToken.type
00720             self.line = oldToken.line
00721             self.charPositionInLine = oldToken.charPositionInLine
00722             self.channel = oldToken.channel
00723             self.index = oldToken.index
00724             self._text = oldToken._text
00725             if isinstance(oldToken, CommonToken):
00726                 self.input = oldToken.input
00727                 self.start = oldToken.start
00728                 self.stop = oldToken.stop
00729             
00730         else:
00731             self.type = type
00732             self.input = input
00733             self.charPositionInLine = -1 # set to invalid position
00734             self.line = 0
00735             self.channel = channel
00736             
00737             #What token number is this from 0..n-1 tokens; < 0 implies invalid index
00738             self.index = -1
00739             
00740             # We need to be able to change the text once in a while.  If
00741             # this is non-null, then getText should return this.  Note that
00742             # start/stop are not affected by changing this.
00743             self._text = text
00744 
00745             # The char position into the input buffer where this token starts
00746             self.start = start
00747 
00748             # The char position into the input buffer where this token stops
00749             # This is the index of the last char, *not* the index after it!
00750             self.stop = stop
00751 
00752 
00753     def getText(self):
00754         if self._text is not None:
00755             return self._text
00756 
00757         if self.input is None:
00758             return None
00759         
00760         return self.input.substring(self.start, self.stop)
00761 
00762 
00763     ##
00764     # 
00765     #         Override the text for this token.  getText() will return this text
00766     #         rather than pulling from the buffer.  Note that this does not mean
00767     #         that start/stop indexes are not valid.  It means that that input
00768     #         was converted to a new string in the token object.
00769     #   
00770     def setText(self, text):
00771         self._text = text
00772 
00773     text = property(getText, setText)
00774 
00775 
00776     def getType(self):
00777         return self.type 
00778 
00779     def setType(self, ttype):
00780         self.type = ttype
00781 
00782     
00783     def getLine(self):
00784         return self.line
00785     
00786     def setLine(self, line):
00787         self.line = line
00788 
00789 
00790     def getCharPositionInLine(self):
00791         return self.charPositionInLine
00792     
00793     def setCharPositionInLine(self, pos):
00794         self.charPositionInLine = pos
00795 
00796 
00797     def getChannel(self):
00798         return self.channel
00799     
00800     def setChannel(self, channel):
00801         self.channel = channel
00802     
00803 
00804     def getTokenIndex(self):
00805         return self.index
00806     
00807     def setTokenIndex(self, index):
00808         self.index = index
00809 
00810 
00811     def getInputStream(self):
00812         return self.input
00813 
00814     def setInputStream(self, input):
00815         self.input = input
00816 
00817 
00818     def __str__(self):
00819         if self.type == EOF:
00820             return "<EOF>"
00821 
00822         channelStr = ""
00823         if self.channel > 0:
00824             channelStr = ",channel=" + str(self.channel)
00825 
00826         txt = self.text
00827         if txt is not None:
00828             txt = txt.replace("\n","\\\\n")
00829             txt = txt.replace("\r","\\\\r")
00830             txt = txt.replace("\t","\\\\t")
00831         else:
00832             txt = "<no text>"
00833 
00834         return "[@%d,%d:%d=%r,<%d>%s,%d:%d]" % (
00835             self.index,
00836             self.start, self.stop,
00837             txt,
00838             self.type, channelStr,
00839             self.line, self.charPositionInLine
00840             )
00841     
00842 
00843 ##
00844 # @brief Alternative token implementation.
00845 #     
00846 #     A Token object like we'd use in ANTLR 2.x; has an actual string created
00847 #     and associated with this object.  These objects are needed for imaginary
00848 #     tree nodes that have payload objects.  We need to create a Token object
00849 #     that has a string; the tree node will point at this token.  CommonToken
00850 #     has indexes into a char stream and hence cannot be used to introduce
00851 #     new strings.
00852 #     
00853 class ClassicToken(Token):
00854 
00855     def __init__(self, type=None, text=None, channel=DEFAULT_CHANNEL,
00856                  oldToken=None
00857                  ):
00858         Token.__init__(self)
00859         
00860         if oldToken is not None:
00861             self.text = oldToken.text
00862             self.type = oldToken.type
00863             self.line = oldToken.line
00864             self.charPositionInLine = oldToken.charPositionInLine
00865             self.channel = oldToken.channel
00866             
00867         self.text = text
00868         self.type = type
00869         self.line = None
00870         self.charPositionInLine = None
00871         self.channel = channel
00872         self.index = None
00873 
00874 
00875     def getText(self):
00876         return self.text
00877 
00878     def setText(self, text):
00879         self.text = text
00880 
00881 
00882     def getType(self):
00883         return self.type 
00884 
00885     def setType(self, ttype):
00886         self.type = ttype
00887 
00888     
00889     def getLine(self):
00890         return self.line
00891     
00892     def setLine(self, line):
00893         self.line = line
00894 
00895 
00896     def getCharPositionInLine(self):
00897         return self.charPositionInLine
00898     
00899     def setCharPositionInLine(self, pos):
00900         self.charPositionInLine = pos
00901 
00902 
00903     def getChannel(self):
00904         return self.channel
00905     
00906     def setChannel(self, channel):
00907         self.channel = channel
00908     
00909 
00910     def getTokenIndex(self):
00911         return self.index
00912     
00913     def setTokenIndex(self, index):
00914         self.index = index
00915 
00916 
00917     def getInputStream(self):
00918         return None
00919 
00920     def setInputStream(self, input):
00921         pass
00922 
00923 
00924     def toString(self):
00925         channelStr = ""
00926         if self.channel > 0:
00927             channelStr = ",channel=" + str(self.channel)
00928             
00929         txt = self.text
00930         if txt is None:
00931             txt = "<no text>"
00932 
00933         return "[@%r,%r,<%r>%s,%r:%r]" % (self.index,
00934                                           txt,
00935                                           self.type,
00936                                           channelStr,
00937                                           self.line,
00938                                           self.charPositionInLine
00939                                           )
00940     
00941 
00942     __str__ = toString
00943     __repr__ = toString
00944 
00945 
00946 
00947 EOF_TOKEN = CommonToken(type=EOF)
00948         
00949 INVALID_TOKEN = CommonToken(type=INVALID_TOKEN_TYPE)
00950 
00951 # In an action, a lexer rule can set token to this SKIP_TOKEN and ANTLR
00952 # will avoid creating a token for this symbol and try to fetch another.
00953 SKIP_TOKEN = CommonToken(type=INVALID_TOKEN_TYPE)
00954 
00955 
00956 """ANTLR3 runtime package"""
00957 
00958 
00959 import codecs
00960 from StringIO import StringIO
00961 
00962 from antlr3.constants import DEFAULT_CHANNEL, EOF
00963 from antlr3.tokens import Token, EOF_TOKEN
00964 
00965 
00966 ############################################################################
00967 #
00968 # basic interfaces
00969 #   IntStream
00970 #    +- CharStream
00971 #    \- TokenStream
00972 #
00973 # subclasses must implemented all methods
00974 #
00975 ############################################################################
00976 
00977 ##
00978 # 
00979 #     @brief Base interface for streams of integer values.
00980 # 
00981 #     A simple stream of integers used when all I care about is the char
00982 #     or token type sequence (such as interpretation).
00983 #     
00984 class IntStream(object):
00985 
00986     def consume(self):
00987         raise NotImplementedError
00988     
00989 
00990     ##
00991     # Get int at current input pointer + i ahead where i=1 is next int.
00992     # 
00993     #         Negative indexes are allowed.  LA(-1) is previous token (token
00994     #   just matched).  LA(-i) where i is before first token should
00995     #   yield -1, invalid char / EOF.
00996     #   
00997     def LA(self, i):
00998         
00999         raise NotImplementedError
01000         
01001 
01002     ##
01003     # 
01004     #         Tell the stream to start buffering if it hasn't already.  Return
01005     #         current input position, index(), or some other marker so that
01006     #         when passed to rewind() you get back to the same spot.
01007     #         rewind(mark()) should not affect the input cursor.  The Lexer
01008     #         track line/col info as well as input index so its markers are
01009     #         not pure input indexes.  Same for tree node streams.
01010     #         
01011     def mark(self):
01012 
01013         raise NotImplementedError
01014 
01015 
01016     ##
01017     # 
01018     #         Return the current input symbol index 0..n where n indicates the
01019     #         last symbol has been read.  The index is the symbol about to be
01020     #         read not the most recently read symbol.
01021     #         
01022     def index(self):
01023 
01024         raise NotImplementedError
01025 
01026 
01027     ##
01028     # 
01029     #         Reset the stream so that next call to index would return marker.
01030     #         The marker will usually be index() but it doesn't have to be.  It's
01031     #         just a marker to indicate what state the stream was in.  This is
01032     #         essentially calling release() and seek().  If there are markers
01033     #         created after this marker argument, this routine must unroll them
01034     #         like a stack.  Assume the state the stream was in when this marker
01035     #         was created.
01036     # 
01037     #         If marker is None:
01038     #         Rewind to the input position of the last marker.
01039     #         Used currently only after a cyclic DFA and just
01040     #         before starting a sem/syn predicate to get the
01041     #         input position back to the start of the decision.
01042     #         Do not "pop" the marker off the state.  mark(i)
01043     #         and rewind(i) should balance still. It is
01044     #         like invoking rewind(last marker) but it should not "pop"
01045     #         the marker off.  It's like seek(last marker's input position).       
01046     #   
01047     def rewind(self, marker=None):
01048 
01049         raise NotImplementedError
01050 
01051 
01052     ##
01053     # 
01054     #         You may want to commit to a backtrack but don't want to force the
01055     #         stream to keep bookkeeping objects around for a marker that is
01056     #         no longer necessary.  This will have the same behavior as
01057     #         rewind() except it releases resources without the backward seek.
01058     #         This must throw away resources for all markers back to the marker
01059     #         argument.  So if you're nested 5 levels of mark(), and then release(2)
01060     #         you have to release resources for depths 2..5.
01061     #   
01062     def release(self, marker=None):
01063 
01064         raise NotImplementedError
01065 
01066 
01067     ##
01068     # 
01069     #         Set the input cursor to the position indicated by index.  This is
01070     #         normally used to seek ahead in the input stream.  No buffering is
01071     #         required to do this unless you know your stream will use seek to
01072     #         move backwards such as when backtracking.
01073     # 
01074     #         This is different from rewind in its multi-directional
01075     #         requirement and in that its argument is strictly an input cursor
01076     #         (index).
01077     # 
01078     #         For char streams, seeking forward must update the stream state such
01079     #         as line number.  For seeking backwards, you will be presumably
01080     #         backtracking using the mark/rewind mechanism that restores state and
01081     #         so this method does not need to update state when seeking backwards.
01082     # 
01083     #         Currently, this method is only used for efficient backtracking using
01084     #         memoization, but in the future it may be used for incremental parsing.
01085     # 
01086     #         The index is 0..n-1.  A seek to position i means that LA(1) will
01087     #         return the ith symbol.  So, seeking to 0 means LA(1) will return the
01088     #         first element in the stream. 
01089     #         
01090     def seek(self, index):
01091 
01092         raise NotImplementedError
01093 
01094 
01095     ##
01096     # 
01097     #         Only makes sense for streams that buffer everything up probably, but
01098     #         might be useful to display the entire stream or for testing.  This
01099     #         value includes a single EOF.
01100     #   
01101     def size(self):
01102 
01103         raise