Building a PL/SQL Code Parser (using PL/SQL), Part 1
LewisC’s An Expert’s Guide to Oracle Technology
Is it possible to build a PL/SQL parser using nothing but PL/SQL? To answer that question, I guess I need to define “PL/SQL parser”. What would be the intention of this parser?
I would like a way to parse a PL/SQL code block and let me determine some statistics: number of lines, number of keywords, number of DML statements. It would need to recognize comments. I also want to be able to format the code for pretty printing. I want to be able to apply user defined checks for adherence to coding standards. Finally, I want to be able to pull the code from the data dictionary.
Possible? I think so. There are languages that would make this an easier task but I want to use PL/SQL. In addition to being a useful tool, I think a parser like this would make a fun project. So that’s what I’m going to do.
Some Assumptions
PL/SQL is a HUGE language. The number of keywords and all of the possible combinations are daunting. Throw in SQL (which is huge in it’s own right) and a parser is just short of nightmarish. Fun, huh!
Like any problem, if we think about how hard it is, we’ll never get started. So, I am going to reduce the task a bit and give myself a starting point. I’m going to give myself some base rules to begin with. I can add to these rules later.
PL/SQL code is made up of four components Keywords, Operators, Comments and Data. Each component is separated by a delimiter. Data may be literals and variables but data may also be non-keyword function and procedure calls. Keywords are any command or instruction that is part of the language.
To make it easy to get started, we are going to say that anything surrounded by single quotes is user data. There are other sting delimiters but I am going to ignore those for now. Also, inside of a quoted string, all keywords, comments and operators are ignored. So, ‘lalala BEGIN lalala’ = DATA. ‘lalala’ BEGIN ‘lalala’ = DATA KEYWORD DATA.
Comments begin with — or /*. // ends with the EOL. /* ends with */. Everything inside of a comment is ignored.
We assume that the code is valid and compiled. We will pull the code from the data dictionary rather than from a source file.
We assume that the only keywords that we need to worry about are these: ‘BEGIN’, ‘PROCEDURE’, ‘FUNCTION’, ‘DECLARE’, ‘SELECT’, ‘FROM’, ‘TYPE’, ‘IS’, ‘TABLE’, ‘OF’, ‘VARCHAR2′, ‘AS’, ‘PACKAGE’, ‘BODY’, ‘LOOP’, ‘END, ‘RETURN’, ‘IN’, ‘FOR’, ‘..’, ‘%’, ‘END’, ‘EXCEPTION’, ‘WHEN’, ‘IF’, ‘THEN’, ‘%TYPE’.
The code:
Now I begin my coding. I’m going to use multiple procedures and functions so I will create a package. I’ll start with a shell package and a simple line parser. To start with, I am just going to write a parser that will extract quoted strings. That will prove the concept and give me a starting place for my package.
Amazon uses a now famous approach to developing software, start at the customer and work backwards. I really like that approach. I will start with a sample procedure and write a parser that will parse that.
The sample code:
CREATE OR REPLACE PROCEDURE yada
AS
BEGIN
DBMS_OUTPUT.PUT_LINE('string with a space' || 'string with emb''edd''''ed quotes');
END;
Very simple. It is a dbms_output of two strings concatenated together.
So, my first pass of the parser will just extract the two strings that exist in the procedure. I plan to populate a record type eventually so I will make my line parser return a value. Since I have not defined my record type, I will just return a varchar2 for now. The function will accept a string and parse it into its components.
Eventually, this code will pull a procedure from the data dictionary. For now, I will just manually pass in the code I want parsed.
Here is the package spec:
create or replace PACKAGE lrc_plsql_parser
AS
FUNCTION parse_line
(
p_string IN VARCHAR2 )
RETURN VARCHAR2;
END lrc_plsql_parser;
And here is the body:
create or replace PACKAGE BODY lrc_plsql_parser
AS
TYPE a_keys IS TABLE OF VARCHAR2(32000);
-- Strings identified as pl/sql quotes
v_quotes a_keys := a_keys('''');
-- Function to test for break conditions
FUNCTION is_a_break
(
p_string IN VARCHAR2,
p_break_type IN VARCHAR2 )
RETURN BOOLEAN;
-- Main line parers
FUNCTION parse_line
(
p_string IN VARCHAR2 )
RETURN VARCHAR2
IS
-- Dummy return value, not currently used
v_string VARCHAR2(32000);
-- Working string, holds intermediate results
v_working_data VARCHAR2(32000);
-- Easier to work on upper case
v_upper_p VARCHAR2(32000) := UPPER(p_string) || ' ';
-- Current character in buffer
v_char VARCHAR2(1);
-- Next character in buffer
v_char_peek VARCHAR2(1);
-- Are we currently working on a string
l_quotes BOOLEAN := FALSE;
-- How many quote chars have been found
l_quote_count INTEGER := 0;
BEGIN
-- Display the output
DBMS_OUTPUT.PUT_LINE( 'Input: ' || v_upper_p );
-- Loop through the string line by line
FOR i IN 1..length(v_upper_p)
LOOP
v_char := SUBSTR(v_upper_p, i, 1);
v_char_peek := SUBSTR(v_upper_p, i+1, 1);
CASE
-- When the current character is a quote
WHEN is_a_break(v_char, 'QUOTE')
THEN
CASE
-- If we are in the midst of a string and the next char is also a quote
WHEN l_quotes AND
is_a_break(v_char_peek, 'QUOTE')
THEN
-- Continue getting data
v_working_data := v_working_data || v_char;
-- Undo the nested quote count
l_quote_count := l_quote_count - 1;
-- If we are in the midst of a string and the next char is NOT a quote
-- AND the number of quotes so far is not even (this one would make it even)
WHEN l_quotes AND
NOT is_a_break(v_char_peek, 'QUOTE')
AND mod(l_quote_count,2) != 0
THEN
-- Continue getting data
v_working_data := v_working_data || v_char;
-- Write out a string
DBMS_OUTPUT.PUT_LINE('Data: ' || v_working_data );
-- Clear the working buffer
v_working_data := NULL;
-- Turn off quote mode
l_quotes := FALSE;
-- Reset quote count
l_quote_count := 0;
-- If we are in the midst of a string and the next char is NOT a quote
-- AND the number of quotes so far is even (this one would make it odd)
WHEN l_quotes AND
NOT is_a_break(v_char_peek, 'QUOTE')
AND mod(l_quote_count,2) = 0
THEN
-- Undo the nested quote count
l_quote_count := l_quote_count - 1;
-- Continue getting data
v_working_data := v_working_data || v_char;
WHEN NOT l_quotes
THEN
-- Increment quote count
l_quote_count := l_quote_count + 1;
-- Start up quote mode
l_quotes := TRUE;
-- Start the working string
v_working_data := v_char;
ELSE
-- Continue getting data
v_working_data := v_working_data || v_char;
END CASE;
ELSE -- Default case, most visited piece of code
-- Continue getting data
v_working_data := v_working_data || v_char;
END CASE;
END LOOP;
RETURN v_string;
END;
FUNCTION is_a_break
(
p_string IN VARCHAR2,
p_break_type IN VARCHAR2 )
RETURN BOOLEAN
AS
BEGIN
CASE
WHEN p_break_type = 'QUOTE' THEN
FOR i IN v_quotes.FIRST..v_quotes.LAST
LOOP
IF v_quotes(i) = p_string THEN
RETURN TRUE;
END IF;
END LOOP;
ELSE
NULL;
END CASE;
RETURN FALSE;
END is_a_break;
END lrc_plsql_parser;
The code is not that complex (yet). Remember this is just to extract the strings.
I have a simple anonymous block to test the procedure (quotes are doubled on purpose):
SET SERVEROUTPUT ON SIZE UNLIMITED
DECLARE
v_string VARCHAR2(32000);
BEGIN
v_string :=
lrc_plsql_parser.parse_line(
'CREATE PROCEDURE yada AS BEGIN DBMS_OUTPUT.PUT_LINE(' ||
' ''string with a space''|| ''string with ' ||
' emb''''edd''''''''ed quotes''); ' ||
'END;');
END;
The results look like this:
Input: CREATE PROCEDURE YADA AS BEGIN DBMS_OUTPUT.PUT_LINE('STRING WITH A
SPACE'|| 'STRING WITH EMB''EDD''''ED QUOTES'); END;
Data: 'STRING WITH A SPACE'
Data: 'STRING WITH EMB''EDD''''ED QUOTES'
PL/SQL procedure successfully completed.
SQL>
Not bad for a couple of hours on a saturday. Next will be adding comment extraction and rearranging the code to make it a bit cleaner.
LewisC
You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.




[...] with the parser, begun last week in Building a PL/SQL Code Parser Part 1, today I am going to modify the code to allow for comments. Rather than dive back into code already [...]
[...] with the parser, begun week in PL/SQL Parser Part 1 and PL/SQL Parser Part 2, today I am going to modify the code to account for keywords, operators [...]