Building a PL/SQL Code Parser (using PL/SQL), Part 2
Continuing with the parser, begun last week in Building a PL/SQL Code Parser Part 1, today I am going to modify the code to allow for comments. Rather than dive back into code already covered, I will discuss what I have added and then provide the full code below.
I changed my sample code and test proc to a new format:
declare
v_string varchar2(32000) :=
'CREATE OR REPLACE PROCEDURE yada' || chr(10) ||
' AS ' || chr(10) ||
' /* This is a comment */ ' || chr(10) ||
' BEGIN' || chr(10) ||
' DBMS_OUTPUT.PUT_LINE(''string with a space'' ||' || chr(10) ||
' ''string with emb''''edd''''''''ed quotes'');' || chr(10) ||
' -- second_func(''test'');' || chr(10) ||
' END;' ;
begin
v_string := lrc_plsql_parser.parse_line(v_string);
END;
Notice that in addition to the comments, I have added line feeds. I think I’m ready to modify the code to handle comments.
I already had a type called a_keys and a variable declared using it called v_quotes. To these, I add some new arrays:
TYPE a_keys IS TABLE OF VARCHAR2(32000);
-- Strings identified as End Of Line
v_EOL a_keys := a_keys(chr(10), chr(13));
-- Strings identified as pl/sql quotes
v_quotes a_keys := a_keys('''');
-- Strings identified as multi-line comments
v_MLC a_keys := a_keys('/*');
-- Strings identified as end of multi-line comments
v_EMLC a_keys := a_keys('*/');
-- Strings identified as to-EOL comments
v_EOLC a_keys := a_keys('--');
-- All starting comments
v_start_comment a_keys;
I will use these as lookups just as I will eventually have a look up array for functions, procedures and keywords. The variable v_start_comment is populated in the package body executable section:
BEGIN
v_start_comment := v_mlc
MULTISET UNION DISTINCT
v_eolc;
END lrc_plsql_parser;
The code above combines two of my arrays into a new single array with all of the values of the first two.
Next, we jump down to the is_a_break function and add a few more tests (for our new arrays). Like the test for quotes, it is a simple lookup loop followed by a return. I am just showing a single one below to save space. The entire function is in the code at the bottom of this post.
WHEN p_break_type = 'MLCOMMENT' THEN
FOR i IN v_mlc.FIRST..v_mlc.LAST
LOOP
IF v_mlc(i) = p_string THEN
RETURN TRUE;
END IF;
END LOOP;
Now we jump back up to the main function, parse_line, and add the code to detect, track and print comments. There is no reason to touch the quotes code. That code is working and the comment code will work without changing existing code. Eventually, I plan to move each of these types into their own helper function. For now, the code will just be sort of ugly.
Obviously, the new code requires some new variables. I am not going to post those here. They are in the full listing below.
The interesting code begins within the main body loop. Remember that loop is walking through a string, one line at a time. The new comment code works much like the quotes code. Check for a string that matches the beginning of a comment, look at the peek variable to see if the next character is one we are expecting, if it is, print out the value and start over.
CASE
-- When the current character is a COMMENT
WHEN is_a_break(v_char || v_char_peek, 'COMMENT') AND
NOT l_comment
THEN
l_comment := TRUE;
DBMS_OUTPUT.PUT_LINE( 'Starting comment: ' || v_char || v_char_peek );
IF is_a_break(v_char || v_char_peek, 'MLCOMMENT')
THEN
l_mlc := TRUE;
END IF;
v_working_data := v_working_data || v_char;
WHEN is_a_break(v_char || v_char_peek, 'EMLCOMMENT') AND
l_comment AND
l_mlc
THEN
v_working_data := v_working_data || v_char;
l_bookem_dano := TRUE;
WHEN l_bookem_dano
THEN
v_working_data := v_working_data || v_char;
DBMS_OUTPUT.PUT_LINE('Comment: ' || v_working_data );
l_comment := FALSE;
l_mlc := FALSE;
l_bookem_dano := FALSE;
v_working_data := NULL;
-- When the current character is EOL and a comment and not multi line
WHEN is_a_break(v_char, 'EOL') AND
l_comment AND
NOT l_mlc
THEN
DBMS_OUTPUT.PUT_LINE('Comment: ' || v_working_data );
l_comment := FALSE;
v_working_data := NULL;
The code that we are adding here will look strange when printing. We are not resetting after printing keywords and functions so we have extra text at the beginning of the comments. That is expected and will be fixed in a future update. Right now, we are just concerned that comments are printing and we are breaking at the end of comments.
When run, the output looks like:
Input: CREATE OR REPLACE PROCEDURE YADA
AS
/* THIS IS A COMMENT */
BEGIN
DBMS_OUTPUT.PUT_LINE('STRING WITH A SPACE' ||
'STRING WITH EMB''EDD''''ED QUOTES');
-- SECOND_FUNC('TEST');
END;
Starting comment: /*
Comment: CREATE OR REPLACE PROCEDURE YADA
AS
/* THIS IS A COMMENT */
Data: 'STRING WITH A SPACE''
Data: 'STRING WITH EMB''''EDD''''''''ED QUOTES''
Starting comment: --
Comment: );
-- SECOND_FUNC('TEST');
As I said, this doesn’t look very good as of yet due to the extra text in the output. It is the expected result though. In the next update, I will add keyword breaks and it should start looking a lot better.
The full listing is below.
create or replace PACKAGE BODY lrc_plsql_parser
AS
TYPE a_keys IS TABLE OF VARCHAR2(32000);
-- Strings identified as End Of Line
v_EOL a_keys := a_keys(chr(10), chr(13));
-- Strings identified as pl/sql quotes
v_quotes a_keys := a_keys('''');
-- Strings identified as multi-line comments
v_MLC a_keys := a_keys('/*');
-- Strings identified as end of multi-line comments
v_EMLC a_keys := a_keys('*/');
-- Strings identified as to-EOL comments
v_EOLC a_keys := a_keys('--');
-- All starting comments
v_start_comment a_keys;
-- Function to test for break conditions
FUNCTION is_a_break
(
p_string IN VARCHAR2,
p_break_type IN VARCHAR2 )
RETURN BOOLEAN;
-- Main line parers
FUNCTION parse_line
(
p_string IN VARCHAR2 )
RETURN VARCHAR2
IS
-- Dummy return value, not currently used
v_string VARCHAR2(32000);
-- Working string, holds intermediate results
v_working_data VARCHAR2(32000);
-- Easier to work on upper case
v_upper_p VARCHAR2(32000) := UPPER(p_string) || ' ';
-- Current character in buffer
v_char VARCHAR2(1);
-- Next character in buffer
v_char_peek VARCHAR2(1);
-- Are we currently working on a string
l_quotes BOOLEAN := FALSE;
-- Are we currently working on a string
l_comment BOOLEAN := FALSE;
-- Are we currently working on a string
l_mlc BOOLEAN := FALSE;
-- How many quote chars have been found
l_quote_count INTEGER := 0;
l_bookem_dano BOOLEAN := FALSE;
BEGIN
-- Display the output
DBMS_OUTPUT.PUT_LINE( 'Input: ' || v_upper_p );
-- Loop through the string line by line
FOR i IN 1..length(v_upper_p)
LOOP
v_char := SUBSTR(v_upper_p, i, 1);
v_char_peek := SUBSTR(v_upper_p, i+1, 1);
CASE
-- When the current character is a COMMENT
WHEN is_a_break(v_char || v_char_peek, 'COMMENT') AND
NOT l_comment
THEN
l_comment := TRUE;
DBMS_OUTPUT.PUT_LINE( 'Starting comment: ' || v_char || v_char_peek );
IF is_a_break(v_char || v_char_peek, 'MLCOMMENT')
THEN
l_mlc := TRUE;
END IF;
v_working_data := v_working_data || v_char;
WHEN is_a_break(v_char || v_char_peek, 'EMLCOMMENT') AND
l_comment AND
l_mlc
THEN
v_working_data := v_working_data || v_char;
l_bookem_dano := TRUE;
WHEN l_bookem_dano
THEN
v_working_data := v_working_data || v_char;
DBMS_OUTPUT.PUT_LINE('Comment: ' || v_working_data );
l_comment := FALSE;
l_mlc := FALSE;
l_bookem_dano := FALSE;
v_working_data := NULL;
-- When the current character is EOL and a comment and not multi line
WHEN is_a_break(v_char, 'EOL') AND
l_comment AND
NOT l_mlc
THEN
DBMS_OUTPUT.PUT_LINE('Comment: ' || v_working_data );
l_comment := FALSE;
v_working_data := NULL;
-- When the current character is a quote
WHEN is_a_break(v_char, 'QUOTE') AND
NOT l_comment
THEN
v_working_data := v_working_data || v_char;
CASE
-- If we are in the midst of a string and the next char is also a quote
WHEN l_quotes AND
is_a_break(v_char_peek, 'QUOTE')
THEN
-- Continue getting data
v_working_data := v_working_data || v_char;
-- Undo the nested quote count
l_quote_count := l_quote_count - 1;
-- If we are in the midst of a string and the next char is NOT a quote
-- AND the number of quotes so far is not even (this one would make it even)
WHEN l_quotes AND
NOT is_a_break(v_char_peek, 'QUOTE')
AND mod(l_quote_count,2) != 0
THEN
-- Continue getting data
v_working_data := v_working_data || v_char;
-- Write out a string
DBMS_OUTPUT.PUT_LINE('Data: ' || v_working_data );
-- Clear the working buffer
v_working_data := NULL;
-- Turn off quote mode
l_quotes := FALSE;
-- Reset quote count
l_quote_count := 0;
-- If we are in the midst of a string and the next char is NOT a quote
-- AND the number of quotes so far is even (this one would make it odd)
WHEN l_quotes AND
NOT is_a_break(v_char_peek, 'QUOTE')
AND mod(l_quote_count,2) = 0
THEN
-- Undo the nested quote count
l_quote_count := l_quote_count - 1;
-- Continue getting data
v_working_data := v_working_data || v_char;
WHEN NOT l_quotes AND
NOT l_comment
THEN
-- Increment quote count
l_quote_count := l_quote_count + 1;
-- Start up quote mode
l_quotes := TRUE;
-- Start the working string
v_working_data := v_char;
ELSE
-- Continue getting data
v_working_data := v_working_data || v_char;
END CASE;
ELSE -- Default case, most visited piece of code
-- Continue getting data
v_working_data := v_working_data || v_char;
END CASE;
END LOOP;
RETURN v_string;
END;
FUNCTION is_a_break
(
p_string IN VARCHAR2,
p_break_type IN VARCHAR2 )
RETURN BOOLEAN
AS
BEGIN
CASE
WHEN p_break_type = 'COMMENT' THEN
FOR i IN v_start_comment.FIRST..v_start_comment.LAST
LOOP
--DBMS_OUTPUT.put_line(v_start_comment(i));
IF v_start_comment(i) = p_string THEN
RETURN TRUE;
END IF;
END LOOP;
WHEN p_break_type = 'MLCOMMENT' THEN
FOR i IN v_mlc.FIRST..v_mlc.LAST
LOOP
IF v_mlc(i) = p_string THEN
RETURN TRUE;
END IF;
END LOOP;
WHEN p_break_type = 'EMLCOMMENT' THEN
FOR i IN v_emlc.FIRST..v_emlc.LAST
LOOP
IF v_emlc(i) = p_string THEN
RETURN TRUE;
END IF;
END LOOP;
WHEN p_break_type = 'QUOTE' THEN
FOR i IN v_quotes.FIRST..v_quotes.LAST
LOOP
IF v_quotes(i) = p_string THEN
RETURN TRUE;
END IF;
END LOOP;
WHEN p_break_type = 'EOL' THEN
FOR i IN v_EOL.FIRST..v_EOL.LAST
LOOP
IF v_EOL(i) = p_string THEN
RETURN TRUE;
END IF;
END LOOP;
ELSE
NULL;
END CASE;
RETURN FALSE;
END is_a_break;
BEGIN
v_start_comment := v_mlc
MULTISET UNION DISTINCT
v_eolc;
END lrc_plsql_parser;
Thanks,
LewisC
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.



