Pascal 语法分析器 简易实现
目录
一、实现内容
二、实现准备
1.词法分析器
2.自上而下的语法分析(理论)
三、代码节选
四、测试截图
前言
此文仅笔者学习记录,并不是多么高深的东西,而且离完成也过了一段时间(不小心给忘了,汗),不过非常欢迎各种指导建议
一、实现内容
PASCAL语言子集(PL/0)词法分析器的设计与实现
PL/0语言的BNF描述(扩充的巴克斯范式表示法)
<prog> → program <id>;<block>
<block> → [<condecl>][<vardecl>][<proc>]<body>
<condecl> → const <const>{,<const>};
<const> → <id>:=<integer>
<vardecl> → var <id>{,<id>};
<proc> → procedure <id>([<id>{,<id>]});<block>{;<proc>}
<body> → begin <statement>{;<statement>}end
<statement> → <id> := <exp>
|if <lexp> then <statement>[else <statement>]
|while <lexp> do <statement>
|call <id>([<exp>{,<exp>}])
|<body>
|read (<id>{,<id>})
|write (<exp>{,<exp>})
<lexp> → <exp> <lop> <exp>|odd <exp>
<exp> → [+|-]<term>{<aop><term>}
<term> → <factor>{<mop><factor>}
<factor>→<id>|<integer>|(<exp>)
<lop> → =|<>|<|<=|>|>=
<aop> → +|-
<mop> → *|/
<id> → l{l|d} (注:l表示字母)
<integer> → d{d}
注释:
<prog>:程序 ;<block>:块、程序体 ;<condecl>:常量说明 ;<const>:常量;
<vardecl>:变量说明 ;<proc>:分程序 ; <body>:复合语句 ;<statement>:语句;
<exp>:表达式 ;<lexp>:条件 ;<term>:项 ; <factor>:因子 ;<aop>:加法运算符;
<mop>:乘法运算符; <lop>:关系运算符
odd:判断表达式的奇偶性。
要求:
使用循环分支方法实现PL/0语言的词法分析器,该词法分析器能够读入使用PL/0语言书写的源程序,输出单词符号串及其属性到一中间文件中,具有一定的错误处理能力,给出词法错误提示
(需要输出错误所在的行列)
二、实现准备
1.词法分析器
要实现语法分析器必须有词法分析器,将一整段代码分割成不同种别编号的单词符号
词法分析器 简易实现
2.自上而下的语法分析(理论)
由于题干要求以循环分支的方式实现,因此需要掌握一定的自上而下的语法分析知识(如果对优化感到很麻烦,可以省略)
三、代码节选
- "Ana_XXX"即为题干中对应的XXX的语法分析子过程
- "IsId()"函数为变量名函数,即识别用户自己定义的变量名
- “Reverse()”函数为关键词函数,即对单词符号进行识别,是否为关键词(即while、do.....)
- “GetWordArray()”函数将txt文本中的代码进行分割,利用Analysis()函数(词法分析器中有代码,稍做了改变,但算法思想一致)剔除空格识别单词符号,并储存进WordArray数组
- fp为文本指针(并非数据类型,只是一种形容),指向当前浏览到的char字符
- codefile为char数组,储存txt文本中的代码内容,filelen为codefile数组的字符个数
- ReadFile为读文件函数,得到codefile数组和filelen
- WordNum仅是一个增量常量(#define WordNum 100)
void Concat()
{
strToken += ch;
}
void Retract()
{
fp--;
ch = codefile[fp];
}
void Analysis()
{
strToken = "";
GetChar();
GetBC();
if (IsLetter())
{
while (IsLetter() || IsDigit())
{
Concat();
GetChar();
}
Retract();
}
else if (IsDigit())
{
while (IsDigit())
{
Concat();
GetChar();
}
Retract();
}
else if (ch == ':')
{
Concat();
GetChar();
if (ch == '=')
Concat();
else
Retract();
}
else if (ch == ';')
Concat();
else if (ch == ',')
Concat();
else if (ch == '=')
Concat();
else if (ch == '<>')
Concat();
else if (ch == '<')
Concat();
else if (ch == '<=')
Concat();
else if (ch == '>')
Concat();
else if (ch == '>=')
Concat();
else if (ch == '+')
Concat();
else if (ch == '-')
Concat();
else if (ch == '*')
Concat();
else if (ch == '/')
Concat();
else if (ch == '(')
Concat();
else if (ch == ')')
Concat();
else if (ch == '\n')
{
line++;
Concat();
}
}
void GetWordArray()
{
ReadFile();
while (codefile[fp])
{
if (wp >= WAlen)
{
string *temp = new string[WAlen];
for (int i = 0; i < WAlen; i++)
temp[i] = WordArray[i];
delete[]WordArray;
WordArray = new string[WAlen + WordNum];
for (int i = 0; i < WAlen; i++)
WordArray[i] = temp[i];
delete[]temp;
WAlen += WordNum;
}
Analysis();
if (IsUseStrToken())
{
lines[wp] = line;
WordArray[wp++] = strToken;
}
}
WAlen = wp;
wp = 0;
}
void Ana_statement()
{
void Ana_body();
if (IsId() || WordArray[wp] == ":=")
{
if (WordArray[wp] == ":=")
cout << lines[wp - 1] << ": " << "第" << wp + 1 << "个词处发生了statement语法错误: 缺少id" << endl;
else
wp++;
if (WordArray[wp] == ":=")
{
wp++;
Ana_exp();
}
else {
cout << lines[wp - 1] << ": " << "第" << wp + 1 << "个词处发生了statement语法错误: 缺少“:=”" << endl;
Ana_exp();
}
}
else if (WordArray[wp] == "if")
{
wp++;
Ana_lexp();
if (WordArray[wp] == "then")
{
wp++;
Ana_statement();
if (WordArray[wp] == "else")
{
wp++;
Ana_statement();
}
}
else {
cout << lines[wp - 1] << ": " << "第" << wp + 1 << "个词处发生了statement语法错误: 缺少关键字“then”" << endl;
Ana_statement();
if (WordArray[wp] == "else")
{
wp++;
Ana_statement();
}
}
}
else if (WordArray[wp] == "while")
{
wp++;
Ana_lexp();
if (WordArray[wp] == "do")
wp++;
else {
cout << lines[wp - 1] << ": " << "第" << wp + 1 << "个词处发生了statement语法错误: 缺少关键字“do”" << endl;
}
Ana_statement();
}
else if (WordArray[wp] == "call")
{
wp++;
if (IsId())
wp++;
else {
cout << lines[wp - 1] << ": " << "第" << wp + 1 << "个词处发生了statement语法错误: 缺少id" << endl;
}
if (WordArray[wp] == "(")
wp++;
else {
cout << lines[wp - 1] << ": " << "第" << wp + 1 << "个词处发生了statement语法错误: 缺少“(”" << endl;
}
if ((WordArray[wp] == "+") || (WordArray[wp] == "-") || (WordArray[wp] == "(") || IsId() || IsInteger())
{
Ana_exp();
while (WordArray[wp] == ",")
{
wp++;
Ana_exp();
}
}
if (WordArray[wp] == ")")
wp++;
else {
cout << lines[wp - 1] << ": " << "第" << wp + 1 << "个词处发生了statement语法错误: 缺少“)”" << endl;
}
}
else if (WordArray[wp] == "begin")
Ana_body();
else if (WordArray[wp] == "read")
{
wp++;
if (WordArray[wp] == "(")
wp++;
else {
cout << lines[wp - 1] << ": " << "第" << wp + 1 << "个词处发生了statement语法错误: 缺少“(”" << endl;
}
if (IsId())
wp++;
else {
cout << lines[wp - 1] << ": " << "第" << wp + 1 << "个词处发生了statement语法错误: 缺少id" << endl;
}
while (WordArray[wp] == ",")
{
wp++;
if (IsId())
wp++;
else {
cout << lines[wp - 1] << ": " << "第" << wp + 1 << "个词处发生了statement语法错误: 缺少id" << endl;
}
}
if (WordArray[wp] == ")")
wp++;
else {
cout << lines[wp - 1] << ": " << "第" << wp + 1 << "个词处发生了statement语法错误: 缺少“)”" << endl;
}
}
else if (WordArray[wp] == "write")
{
wp++;
if (WordArray[wp] == "(")
wp++;
else {
cout << lines[wp - 1] << ": " << "第" << wp + 1 << "个词处发生了statement语法错误: 缺少“(”" << endl;
}
Ana_exp();
while (WordArray[wp] == ",")
{
wp++;
Ana_exp();
}
if (WordArray[wp] == ")")
wp++;
else {
cout << lines[wp - 1] << ": " << "第" << wp + 1 << "个词处发生了statement语法错误: 缺少“)”" << endl;
}
}
else {
cout << lines[wp - 1] << ": " << "第" << wp + 1 << "个词处发生了statement语法错误: 未找到开始符号" << endl;
}
}
当时没养成写注释的好习惯,如果有什么问题就在下面问吧,不保证还记得
四、测试截图
program ;
const m:=24 ,:=81;
var x,y,z,q,r
procedure multiply(x,y;
var a,b;
begin
a:=x; b:=y; z;
while b>0
begin
if odd b z:=z+a;
a:=2*b*c; b:=b/2
end
end
x:=m y:=n;
call multiply);
write(a+b-1,d*c*b*a);
read (a,b
以上为测试代码,语言Pascal
以下为运行结果,格式为 “行: 第n个词处发生了XXX语法错误,错误原因”
Mr.Poem: 会,本质是空间换时间,很简单的一个思想罢了
Zuo蓝: 原数组输入超过50的数为什么不会越界?
Zuo蓝: 原数组输入超过50的数为什么不会越界?
qq_58936835: 大佬牛逼
Mr.Poem: 没有这么多,好像就500行左右吧,子函数可以嵌套调用,某些表达式不考虑健壮性的话几乎三四行就搞定了。