Building a PL/SQL Code Parser (using PL/SQL), Part 2

From the Database-Geek.

Continuing with the parser, begun last week in Building a PL/SQL Code Parser Part 1, today I am going to modify the code to allow for comments. Rather than dive back into code already covered, I will discuss what I have added and then provide the full code below.

I changed my sample code and test proc to a new format:

declare

  v_string varchar2(32000) :=
  'CREATE OR REPLACE PROCEDURE yada' || chr(10)  ||
      ' AS ' || chr(10) ||
      ' /* This is a comment */  ' || chr(10) ||
     ' BEGIN' || chr(10) ||
     '   DBMS_OUTPUT.PUT_LINE(''string with a space'' ||' || chr(10) ||
     '   ''string with emb''''edd''''''''ed quotes'');' || chr(10) ||
     '   -- second_func(''test'');' || chr(10) ||
     ' END;' ;

begin

  v_string := lrc_plsql_parser.parse_line(v_string);

END;

Notice that in addition to the comments, I have added line feeds. I think I’m ready to modify the code to handle comments.

I already had a type called a_keys and a variable declared using it called v_quotes. To these, I add some new arrays:

TYPE a_keys IS TABLE OF VARCHAR2(32000);

  -- Strings identified as End Of Line
  v_EOL a_keys := a_keys(chr(10), chr(13));

  -- Strings identified as pl/sql quotes
  v_quotes a_keys := a_keys('''');

  -- Strings identified as multi-line comments
  v_MLC a_keys := a_keys('/*');

  -- Strings identified as end of multi-line comments
  v_EMLC a_keys := a_keys('*/');

  -- Strings identified as to-EOL comments
  v_EOLC a_keys := a_keys('--');

  -- All starting comments
  v_start_comment a_keys;

I will use these as lookups just as I will eventually have a look up array for functions, procedures and keywords. The variable v_start_comment is populated in the package body executable section:

BEGIN
  v_start_comment := v_mlc
                 MULTISET UNION DISTINCT
                 v_eolc;

END lrc_plsql_parser;

The code above combines two of my arrays into a new single array with all of the values of the first two.

Next, we jump down to the is_a_break function and add a few more tests (for our new arrays). Like the test for quotes, it is a simple lookup loop followed by a return. I am just showing a single one below to save space. The entire function is in the code at the bottom of this post.

    WHEN p_break_type = 'MLCOMMENT' THEN
      FOR i          IN v_mlc.FIRST..v_mlc.LAST
       LOOP
        IF v_mlc(i) = p_string THEN
          RETURN TRUE;
        END IF;
      END LOOP;

Now we jump back up to the main function, parse_line, and add the code to detect, track and print comments. There is no reason to touch the quotes code. That code is working and the comment code will work without changing existing code. Eventually, I plan to move each of these types into their own helper function. For now, the code will just be sort of ugly.

Obviously, the new code requires some new variables. I am not going to post those here. They are in the full listing below.

The interesting code begins within the main body loop. Remember that loop is walking through a string, one line at a time. The new comment code works much like the quotes code. Check for a string that matches the beginning of a comment, look at the peek variable to see if the next character is one we are expecting, if it is, print out the value and start over.

          CASE
      -- When the current character is a COMMENT
            WHEN is_a_break(v_char || v_char_peek, 'COMMENT') AND
           NOT l_comment
                THEN
        l_comment := TRUE;
        DBMS_OUTPUT.PUT_LINE( 'Starting comment: ' || v_char || v_char_peek );
        IF is_a_break(v_char || v_char_peek, 'MLCOMMENT')
        THEN
          l_mlc := TRUE;
        END IF;
        v_working_data := v_working_data || v_char;

            WHEN is_a_break(v_char || v_char_peek, 'EMLCOMMENT') AND
           l_comment AND
           l_mlc
                THEN
        v_working_data := v_working_data || v_char;
        l_bookem_dano := TRUE;

     WHEN l_bookem_dano
     THEN
        v_working_data := v_working_data || v_char;
        DBMS_OUTPUT.PUT_LINE('Comment: ' || v_working_data );
        l_comment := FALSE;
        l_mlc := FALSE;
        l_bookem_dano := FALSE;
        v_working_data := NULL;

      -- When the current character is EOL and a comment and not multi line
            WHEN is_a_break(v_char, 'EOL') AND
           l_comment AND
           NOT l_mlc
                THEN
        DBMS_OUTPUT.PUT_LINE('Comment: ' || v_working_data );
        l_comment := FALSE;
        v_working_data := NULL;

The code that we are adding here will look strange when printing. We are not resetting after printing keywords and functions so we have extra text at the beginning of the comments. That is expected and will be fixed in a future update. Right now, we are just concerned that comments are printing and we are breaking at the end of comments.

When run, the output looks like:

Input: CREATE OR REPLACE PROCEDURE YADA
 AS
 /* THIS IS A COMMENT */
 BEGIN
   DBMS_OUTPUT.PUT_LINE('STRING WITH A SPACE' ||
   'STRING WITH EMB''EDD''''ED QUOTES');
   -- SECOND_FUNC('TEST');
 END;
Starting comment: /*
Comment: CREATE OR REPLACE PROCEDURE YADA
 AS
 /* THIS IS A COMMENT */
Data: 'STRING WITH A SPACE''
Data: 'STRING WITH EMB''''EDD''''''''ED QUOTES''
Starting comment: --
Comment: );
   -- SECOND_FUNC('TEST');

As I said, this doesn’t look very good as of yet due to the extra text in the output. It is the expected result though. In the next update, I will add keyword breaks and it should start looking a lot better.

The full listing is below.

create or replace PACKAGE BODY lrc_plsql_parser
AS

  TYPE a_keys IS TABLE OF VARCHAR2(32000);

  -- Strings identified as End Of Line
  v_EOL a_keys := a_keys(chr(10), chr(13));

  -- Strings identified as pl/sql quotes
  v_quotes a_keys := a_keys('''');

  -- Strings identified as multi-line comments
  v_MLC a_keys := a_keys('/*');

  -- Strings identified as end of multi-line comments
  v_EMLC a_keys := a_keys('*/');

  -- Strings identified as to-EOL comments
  v_EOLC a_keys := a_keys('--');

  -- All starting comments
  v_start_comment a_keys;

  -- Function to test for break conditions
  FUNCTION is_a_break
  (
    p_string     IN VARCHAR2,
    p_break_type IN VARCHAR2 )
  RETURN BOOLEAN;

  -- Main line parers
  FUNCTION parse_line
  (
    p_string IN VARCHAR2 )
  RETURN VARCHAR2
  IS
    -- Dummy return value, not currently used
    v_string VARCHAR2(32000);
    -- Working string, holds intermediate results
    v_working_data VARCHAR2(32000);
    -- Easier to work on upper case
    v_upper_p      VARCHAR2(32000) := UPPER(p_string) || ' ';
    -- Current character in buffer
    v_char         VARCHAR2(1);
    -- Next character in buffer
    v_char_peek    VARCHAR2(1);
    -- Are we currently working on a string
    l_quotes    BOOLEAN := FALSE;
    -- Are we currently working on a string
    l_comment    BOOLEAN := FALSE;
    -- Are we currently working on a string
    l_mlc    BOOLEAN := FALSE;
    -- How many quote chars have been found
    l_quote_count INTEGER := 0;
    l_bookem_dano BOOLEAN := FALSE;
  BEGIN

    -- Display the output
    DBMS_OUTPUT.PUT_LINE( 'Input: ' || v_upper_p );

    -- Loop through the string line by line
    FOR i IN 1..length(v_upper_p)
    LOOP
      v_char      := SUBSTR(v_upper_p, i, 1);
      v_char_peek := SUBSTR(v_upper_p, i+1, 1);

          CASE
      -- When the current character is a COMMENT
            WHEN is_a_break(v_char || v_char_peek, 'COMMENT') AND
           NOT l_comment
                THEN
        l_comment := TRUE;
        DBMS_OUTPUT.PUT_LINE( 'Starting comment: ' || v_char || v_char_peek );
        IF is_a_break(v_char || v_char_peek, 'MLCOMMENT')
        THEN
          l_mlc := TRUE;
        END IF;
        v_working_data := v_working_data || v_char;

            WHEN is_a_break(v_char || v_char_peek, 'EMLCOMMENT') AND
           l_comment AND
           l_mlc
                THEN
        v_working_data := v_working_data || v_char;
        l_bookem_dano := TRUE;

     WHEN l_bookem_dano
     THEN
        v_working_data := v_working_data || v_char;
        DBMS_OUTPUT.PUT_LINE('Comment: ' || v_working_data );
        l_comment := FALSE;
        l_mlc := FALSE;
        l_bookem_dano := FALSE;
        v_working_data := NULL;

      -- When the current character is EOL and a comment and not multi line
            WHEN is_a_break(v_char, 'EOL') AND
           l_comment AND
           NOT l_mlc
                THEN
        DBMS_OUTPUT.PUT_LINE('Comment: ' || v_working_data );
        l_comment := FALSE;
        v_working_data := NULL;

      -- When the current character is a quote
            WHEN is_a_break(v_char, 'QUOTE') AND
           NOT l_comment
                THEN
         v_working_data := v_working_data || v_char;

        CASE
        -- If we are in the midst of a string and the next char is also a quote
              WHEN l_quotes AND
             is_a_break(v_char_peek, 'QUOTE')
        THEN
          -- Continue getting data
          v_working_data := v_working_data || v_char;
          -- Undo the nested quote count
          l_quote_count := l_quote_count - 1;

        -- If we are in the midst of a string and the next char is NOT a quote
        -- AND the number of quotes so far is not even (this one would make it even)
            WHEN l_quotes AND
           NOT is_a_break(v_char_peek, 'QUOTE')
           AND mod(l_quote_count,2) != 0
        THEN

          -- Continue getting data
          v_working_data := v_working_data || v_char;
          -- Write out a string
          DBMS_OUTPUT.PUT_LINE('Data: ' || v_working_data );
          -- Clear the working buffer
          v_working_data := NULL;
          -- Turn off quote mode
          l_quotes := FALSE;
          -- Reset quote count
          l_quote_count := 0;

        -- If we are in the midst of a string and the next char is NOT a quote
        -- AND the number of quotes so far is even (this one would make it odd)
              WHEN l_quotes AND
           NOT is_a_break(v_char_peek, 'QUOTE')
           AND mod(l_quote_count,2) = 0
        THEN

          -- Undo the nested quote count
          l_quote_count := l_quote_count - 1;
          -- Continue getting data
          v_working_data := v_working_data || v_char;
        WHEN NOT l_quotes AND
             NOT l_comment
        THEN

          -- Increment quote count
          l_quote_count := l_quote_count + 1;
          -- Start up quote mode
          l_quotes := TRUE;
          -- Start the working string
          v_working_data := v_char;

        ELSE

          -- Continue getting data
          v_working_data := v_working_data || v_char;
        END CASE;

      ELSE  -- Default case, most visited piece of code
        -- Continue getting data
        v_working_data := v_working_data || v_char;

      END CASE;

    END LOOP;

    RETURN v_string;

  END;

  FUNCTION is_a_break
  (
    p_string     IN VARCHAR2,
    p_break_type IN VARCHAR2 )
  RETURN BOOLEAN
  AS
  BEGIN
   CASE
    WHEN p_break_type = 'COMMENT' THEN
      FOR i          IN v_start_comment.FIRST..v_start_comment.LAST
       LOOP

       --DBMS_OUTPUT.put_line(v_start_comment(i));
        IF v_start_comment(i) = p_string THEN
          RETURN TRUE;
        END IF;
      END LOOP;
    WHEN p_break_type = 'MLCOMMENT' THEN
      FOR i          IN v_mlc.FIRST..v_mlc.LAST
       LOOP
        IF v_mlc(i) = p_string THEN
          RETURN TRUE;
        END IF;
      END LOOP;
    WHEN p_break_type = 'EMLCOMMENT' THEN
      FOR i          IN v_emlc.FIRST..v_emlc.LAST
       LOOP
        IF v_emlc(i) = p_string THEN
          RETURN TRUE;
        END IF;
      END LOOP;
    WHEN p_break_type = 'QUOTE' THEN
      FOR i          IN v_quotes.FIRST..v_quotes.LAST
       LOOP
        IF v_quotes(i) = p_string THEN
          RETURN TRUE;
        END IF;
      END LOOP;
    WHEN p_break_type = 'EOL' THEN
      FOR i          IN v_EOL.FIRST..v_EOL.LAST
       LOOP
        IF v_EOL(i) = p_string THEN
          RETURN TRUE;
        END IF;
      END LOOP;
    ELSE
      NULL;
    END CASE;
    RETURN FALSE;
  END is_a_break;

BEGIN
  v_start_comment := v_mlc
                 MULTISET UNION DISTINCT
                 v_eolc;

END lrc_plsql_parser;

Thanks,

LewisC

Technorati : , , , ,

You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Comments are closed.