mirror of
https://github.com/danbulant/Cosmos
synced 2026-05-21 13:28:41 +00:00
Documentation added to the X# compiler. Several comments in source code as well as an XSharp.htm document in the Docs folder that clarify language syntax.
This commit is contained in:
parent
783eaee16d
commit
5cd8fba8a1
6 changed files with 489 additions and 62 deletions
|
|
@ -69,6 +69,7 @@
|
|||
<Content Include="Docs\index.html" />
|
||||
<Content Include="Docs\Old.html" />
|
||||
<Content Include="Docs\ToDo.html" />
|
||||
<Content Include="Docs\XSharp.htm" />
|
||||
</ItemGroup>
|
||||
<Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
|
||||
<!-- To modify your build process, add your task inside one of the targets below and uncomment it.
|
||||
|
|
|
|||
366
source2/Compiler/Cosmos.XSharp/Docs/XSharp.htm
Normal file
366
source2/Compiler/Cosmos.XSharp/Docs/XSharp.htm
Normal file
|
|
@ -0,0 +1,366 @@
|
|||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
<head>
|
||||
<title>XSharp explained</title>
|
||||
</head>
|
||||
<body>
|
||||
<h1>INTRODUCTION</h1>
|
||||
<p>X# pronounced X-Sharp is an High Level Assembly language that target the x86 architecture and is
|
||||
expected to be flexible enough to later target other kinds of processors.</p>
|
||||
<p>The language is line based which means an instruction doesn't span several lines. This make the
|
||||
language easier to parse. Also parsing is performed in one path. This imply that some semantic checks
|
||||
are not performed by the parser which may lead to assembly failures when NASM is invoked later.</p>
|
||||
<p>Close to 1:1 mapping for debugging, non disconnect. No large compounds.</p>
|
||||
|
||||
<h1>SYNTAX</h1>
|
||||
<h2>Comments</h2>
|
||||
<p>A comment must appear on its own line. You can't mix code and comments on a single line. A comment line
|
||||
is one that starts with two consecutive slashes. Whitespaces may be inserted before the comment line. For example :<br />
|
||||
<code>// This is a comment.<br />
|
||||
// Another comment prefixed with whitespaces.<br />
|
||||
</code></p>
|
||||
|
||||
<h2>Literal values</h2>
|
||||
<h3>String literals</h3>
|
||||
<p>A string literal is surrounded with single quotes. Should your string contain a single quote you must
|
||||
escape it with a backslash character. For example :<br/>
|
||||
<code>'Waiting for \'debugger\' connection...'</code></p>
|
||||
|
||||
<h3>Integer literals</h3>
|
||||
<p>You can write integer literal values either in decimal or hexadecimal. For hexadecimal values prefix
|
||||
the value with a dollar sign:<br />
|
||||
<code>// Those two constant values are actually equal<br />
|
||||
const decimal = 255<br />
|
||||
const hexadecimal = $FF</code></p>
|
||||
|
||||
<h2><a name="namespace">Namespaces</a></h2>
|
||||
<p>A namespace is a naming scope that lets you organize your code to avoid naming collision. You
|
||||
declare a namespace by using the <code>namespace</code> keyword and giving it a name. For example :<br />
|
||||
<code>namespace TEST</code><br /></p>
|
||||
<p>The namespace name is automatically used as a prefix for each named item that appear in that namespace
|
||||
(function name, labels, variables ...). The namespace extents from the souce code line it is declared
|
||||
until either another namespace definition appear or the end of the source code file is reached.
|
||||
Consequently there is no namespace hierarchy and you cannot "embed" a namespace into another one.</p>
|
||||
<p><b>WARNING : Code inside a namespace has no way to reference or use code or data from another namespace.</b><br />
|
||||
Nothing prevents you to reuse a namespace including inside a single source code file. For example the
|
||||
following source code will compile without error.<br />
|
||||
<code>namespace FIRST<br />
|
||||
// Everything here will be prefixed with FIRST. Hence the "true" full name of the below variable<br />
|
||||
// is FIRST_aVar<br />
|
||||
var aVar<br />
|
||||
namespace SECOND<br />
|
||||
// Not a problem to name another variable aVar. Its true name is SECOND_aVar<br />
|
||||
var aVar<br />
|
||||
namespace FIRST<br />
|
||||
// And here we get back to the FIRST namespace<br />
|
||||
</code></p>
|
||||
<p><b>Every program artefact MUST appear inside a namespace.</b> It is hence strongly recommended to define
|
||||
a namespace at the very beginning of any X# source file.</p>
|
||||
|
||||
<h2><a name="datatypes">Datatypes</a></h2>
|
||||
X# is targeted at 32 bits assembler code generation. It support the following datatypes :<br />
|
||||
|
||||
<ul>
|
||||
<li>8 bits value as defined by the <code>byte</code> keyword.</li>
|
||||
<li>16 bits value as defined by the <code>word</code> keyword.</li>
|
||||
<li>32 bits value as defined by the <code>dword</code> keyword.</li>
|
||||
</ul>
|
||||
|
||||
<p>The signedness of the datatype is undefined. The X# code needs to handle itself the various
|
||||
control flags (carry, sign and overflow) according to the context. Also notice that X# is
|
||||
lacking floating point datatypes.</p>
|
||||
|
||||
<h2>Constants</h2>
|
||||
<p>Constants are symbolic names associated with a numeric litteral value. A constant definition
|
||||
is introduced by the <code>const</code> keyword, followed by the constant name an equal sign and a
|
||||
constant numeric value. Constants are always considered to be of double word type. For example :<br />
|
||||
<code>namespace TEST<br />
|
||||
const twoHundred = 200</code><br /></p>
|
||||
<p>The constant name itself is built differently than for other items. The above constant
|
||||
declaration is actually named <code>TEST_Const_twoHundred</code>. Consequently you can
|
||||
define another (non const) item with the same name without fearing name collision. However
|
||||
this is bad programming practice and is strongly discouraged.</p>
|
||||
<p><b>WARNING : Whenever you want to reference one of you constants in your source code, you MUST
|
||||
have its name be prefixed with a dash.</b> For example the following code initialize the EAX register
|
||||
with the value of the twoHundred constant :<br />
|
||||
<code>EAX = #twoHundred</code></p>
|
||||
|
||||
<h2>Variables</h2>
|
||||
<p>You can define either atomic variables of either doubleword or text type or one dimension array
|
||||
of any of the available <a href="#datatypes">datatypes</a>. You declare a variable by giving it
|
||||
a name and optionally a value. For example the code below declares two variables :<br />
|
||||
<code>var myNumVar = 876<br />
|
||||
var myTextVar = 'A message'</code><br />
|
||||
If you omit to give the variable a value it will be assumed to be a doubleword and will be
|
||||
initialized with a default value of 0.<br /> The X# compiler silently appends a null byte at the
|
||||
end of textual initialization value.</p>
|
||||
|
||||
<p>You also can define a one dimension array of one of the available <a href="#datatypes">datatypes</a>.
|
||||
All array members are initialized to 0. You must provide the array size at declaration time.
|
||||
For example delaring an array of 256 bytes is :<br />
|
||||
<code>var myArray byte[256]</code></p>
|
||||
|
||||
<h2><a name="#registers">Registers</a></h2>
|
||||
X# support all the four general purpose registers from the x86 architecture. These registers are
|
||||
available as byte sized : <code>AH AL BH BL CH CL DH DL</code> as well as word sized :
|
||||
<code>AX BX CX DX</code> and doubleword sized <code>EAX EBX ECX EDX</code>. The four specific
|
||||
registers are also available as doubleword sized : <code>ESI EDI ESP EBP</code>
|
||||
|
||||
<h2>Labels</h2>
|
||||
<p>Labels are a way to give a name to some memory addresses. This is a convenient way to be able
|
||||
to reference these addresses at coding time without having to know there value at runtime. The X#
|
||||
compiler automatically creates several labels. For example each time you define a variable, a
|
||||
label will be created having the variable name and referencing the memory address of the variable.
|
||||
This will be usefull to read and write variable content.<br />
|
||||
When you create a function a label will also be defined to be the address of the beginning of the
|
||||
function. This label will be used when you call the function.<br />Those automatically created
|
||||
labels are largely transparent for you. On the other hand you may want to explicitly define labels
|
||||
to denote some particular position in your code. This is the case for example when you want to
|
||||
perform a test and jump to a specific line of code depending on the result of the test. You will
|
||||
create a label at the code location where you will want to jump.<br />A label is nothing more than
|
||||
a name suffixed with <code>:</code><br />
|
||||
<code>// This is a useless label because the variable already got one.<br />
|
||||
MyUselessLabel:<br />
|
||||
var myVar</code></p>
|
||||
|
||||
<h2>Functions</h2>
|
||||
<p>Functions are declared using the <code>function</code> keyword. A function name must follow the
|
||||
keyword and be followed by an opening curly brace. Be carefull to keep the opening curly brace on
|
||||
the same line than the <code>function</code> keyword. Contrarily to high level languages, X# function
|
||||
declaration doesn't support parameters declaration. You must handle parameters passing by yourself
|
||||
either using the stack and/or well known registers. For example :<br />
|
||||
<code>function MyFirstFunction {<br />
|
||||
// Your code here<br />
|
||||
// Do not forget the closing curly brace.<br />
|
||||
}</code></p>
|
||||
|
||||
<h3>Returning from a function</h3>
|
||||
<p>When the X# compiler encounters the closing curly brace that signal the end of the function source
|
||||
code, the compiler automatically adds a <code>ret</code> instruction. The recommended way to return
|
||||
from a function is to use the <code>return</code> keyword. Internally the X# compiler will translate
|
||||
it to an unconditional jump to a special label local to the function which is named <code>Exit</code>.
|
||||
The X# compiler tracks the use of this label and is wise enough to add such a label at the end of the
|
||||
function code if you don't define it by yourself.</p>
|
||||
<p>Sometimes you will want to explicitly return from your function without going to the cleanup code that
|
||||
may be defined at and below the function <code>Exit</code> label. You can do so by using the <code>ret</code>
|
||||
keyword.<br />
|
||||
<code>// This instruction will directly exit the function without jumping to the Exit label.<br />
|
||||
ret</code></p>
|
||||
<p><b>WARNING : The X# compiler doesn't monitor stack content. It is the responsibility of your code to
|
||||
make sure that the return address is immediately on top of the stack before the <code>ret</code> instruction
|
||||
is executed, including for the one that is automatically added by the compiler at the end of the function
|
||||
body.</b></p>
|
||||
|
||||
<h3>Invoking a function</h3>
|
||||
<p>You invoke a function by using the <code>call</code> keyword followed by the function name.<br />
|
||||
<code>Call myFunction</code><br />
|
||||
Because X# doesn't support function parameters you must make sure you properly setup the stack and/or
|
||||
the registers that are expected by the invoked function.</p>
|
||||
|
||||
<h2>Interrupt handlers</h2>
|
||||
<p>Interrupt handlers are special kind of functions used to handle an interruption. Those functions
|
||||
do not support parameters and are declared using the <code>interrupt</code> keyword. An interrupt
|
||||
function name must follow the keyword and be followed by an opening curly brace. Be carefull to keep
|
||||
the opening curly brace on the same line than the <code>interrupt</code> keyword. For example :<br />
|
||||
<code>interrupt DivideByZero {<br />
|
||||
// Your code here<br />
|
||||
// Do not forget the closing curly brace.<br />
|
||||
}</code></p>
|
||||
|
||||
<p>Interrupt handlers are executed in a specific processor context that is different from the
|
||||
normal control flow within functions. So there must be a way for the processor to know when
|
||||
interrupt processing is done and normal operations should resume. This require a specific
|
||||
instruction, namely <code>iret</code> in x86 processors architecture. Normally you do not
|
||||
have to take care of this because the X# compiler knows you're defining an interrupt handler
|
||||
and silently insert the <code>iret</code> instruction at the end of the interrupt handler
|
||||
code. However you can diretcly insert the <code>iret</code> instruction in your X# code,
|
||||
including in a normal function.</p>
|
||||
<p><b>WARNING : You must be very carefull not to use this instruction when your code is not
|
||||
handling an interruption otherwise the processor will trigger an exception. The X# compiler
|
||||
doesn't perform any control when you hardcode this instruction.</b></p>
|
||||
|
||||
<h2>Assigning value</h2>
|
||||
<p>You can assign a value to a <a href="#registers">register</a> or to a variable. You do it using
|
||||
the <code>=</code> operator. The left side is the register or variable name while the right side
|
||||
is the value to be assigned. For example :<br />
|
||||
<code>// Assign the immediate value 123 to the EAX register (32 bits).<br />
|
||||
EAX = 123</code><br /></p>
|
||||
<p>On the right side of the assignment operator you can use either an immediate value, a constant
|
||||
(which name must be prefixed with a dash sign), or a register name.<br />
|
||||
When the left side of the assignment operator is a variable name and the right size is an immediate
|
||||
value you can additionally explicitey define the size of the right operand using an <code>as</code>
|
||||
clause associated with the <a href="#datatype">datatype</a>. For example :<br />
|
||||
<code>// Assign the immediate value 200 as a word (16 bits) to the myVar variable.
|
||||
myVar = 200 as word</code></p>
|
||||
|
||||
<h3>Address indirection</h3>
|
||||
<p>Sometimes a register contains the in memory address of another element, most lkely a variable.
|
||||
In this case you do not want to assign a value to the register itself and want instead to store
|
||||
the value at the memory adress stored in the register. This is called address indirection and is
|
||||
denoted by the register name being followed by a number surrounded between square brackets and
|
||||
known as an offset (more on this later). Address indirection may be used on both the right side and
|
||||
the left side of the <code>=</code> assignment operator. However you can't use it on both side at
|
||||
the same time. Let's take an example :<br />
|
||||
<code>EAX[10] = EBX</code><br />
|
||||
The behavior is as follow : take the content of the EAX register, add to it the offset value (10
|
||||
in our example) and consider this to be a memory address. Now store the content of the EBX register
|
||||
at this memory address.<br />
|
||||
The offset value must be a literal number including 0 or even a negative number.</p>
|
||||
<p>So now how does it come for a register's value to be a memory address ? We do this with a special
|
||||
<code>@</code> operator that is used as a suffix to a label name. Knowing each time you declare a
|
||||
variable the X# compiler automatically creates a label for this variable it comes that we now have
|
||||
the following syntax :<br />
|
||||
<code>// Declare a variable<br />
|
||||
var myVar<br />
|
||||
// Read variable content into EAX register by using the variable name.<br />
|
||||
EAX = myVar<br />
|
||||
// Load EAX register with the in memory address of the myVar variable.
|
||||
EAX = @myVar<br />
|
||||
// So now we can store the content of EBX register into myVar variable.<br />
|
||||
EAX[0] = EBX<br />
|
||||
// And read back the content of the myVar variable into ECX register.<br />
|
||||
ECX = EAX[0]</code></p>
|
||||
|
||||
<h2>Register arithmetic</h2>
|
||||
<p>X# support additive and substractive register arithmetic with the <code>+</code> and <code>-</code>
|
||||
operators. X# support a shotcut syntactic version for incrementing and decrementing a <a href="#registers">register</a>.
|
||||
This syntax is not supported for variables. When incrementing or decrementing a register you must omit the
|
||||
assigment part of the instruction. The target register is the one on the left side of the operator. For
|
||||
example the following instruction increment the EAX register by 2 :<br />
|
||||
<code>EAX + 2</code><br />
|
||||
In the above example you can replace the literal value with a register name but not with a variable
|
||||
name. In the following example the value of the EAX register is decremented by the value of the EBX
|
||||
register :<br />
|
||||
<code>EAX - EBX</code></p>
|
||||
<p>Finally there is even a shorter version when you want to increment or decrement a register by 1.
|
||||
This is performed with the <code>++</code> and <code>--</code> operators. They must be applied to a
|
||||
register only. Incrementing and decrementing a variable this way is not supported. Additionally the
|
||||
operator must be used as a register suffix with no additional space between register name and operator.
|
||||
For example :<br />
|
||||
<code>// Increment EAX register<br />
|
||||
EAX++<br />
|
||||
// Decrement ECX register<br />
|
||||
ECX--</code></p>
|
||||
|
||||
<h2>Register shifting and rolling</h2>
|
||||
<p>Shifting a register to the right or to the left is performed with <code>>></code> and
|
||||
<code><<</code> keywords respectively. Following the keyword you must provide a literal
|
||||
number that define how many bits to shift. For example :<br />
|
||||
code>// Shift EAX to the right by 8 bits.<br />
|
||||
EAX >> 8</p>
|
||||
<p>Shifting a register to the right or to the left is performed with <code>~></code> and
|
||||
<code><~</code> keywords respectively. Following the keyword you must provide a literal
|
||||
number that define how many bits to shift. For example :<br />
|
||||
code>// Roll EAX to the left by 12 bits.<br />
|
||||
EAX <~ 12</p>
|
||||
|
||||
<h2>Comparision</h2>
|
||||
Classical comparision operatotrs are supported :<br />
|
||||
<code>< > = <= >= !=</code>.<br />
|
||||
|
||||
See the two collections for what is supported in if statements
|
||||
foreach (var xComparison in mCompareOps)
|
||||
foreach (var xCompare in mCompares)
|
||||
|
||||
The while statement only support the mCompares style.
|
||||
|
||||
<h3>Pure comparison</h3>
|
||||
<p>Sometimes you want to compare a register content for equality with a literal number, a variable
|
||||
content or a constant. You can do this with the <code>?=</code> operator. The left side of the
|
||||
operator is the register name while the right side is the value to be compared with. The result
|
||||
of such an operation is to have the processor context flags (sign overflow, equality and carry) to
|
||||
be set accordingly with the comparison result.<br />
|
||||
<code>// Compare EAX register content with literal value 812.<br />
|
||||
EAX ?= 812</code></p>
|
||||
<p>You may also which to test some specific bits of the register value and not the full register
|
||||
value as a whole. This is where you use the <code>?&</code> operator. Once again processor context
|
||||
flags are updated with the result of the bitwise AND comparison of the register value and the
|
||||
compared value.<br />
|
||||
<code>// Test whether the fourth least significant bit of EAX register is set.<br />
|
||||
EAX ?& $08</code></p>
|
||||
|
||||
<h2>Control flow instructions</h2>
|
||||
|
||||
<h3>Branching</h3>
|
||||
<p>The <code>goto</code> keyword lets you perform unconditional branching. Following the keyword
|
||||
you must name the target label. For example :<br />
|
||||
<code>// Assuming a somewhereElse label is defined.<br />
|
||||
goto somewhereElse</code><br /></p>
|
||||
|
||||
<p>The <code>if</code> keyword lets you perform conditional branching. Following the keyword and
|
||||
on the same line you must provide a condition followed by either a <code>goto</code> statement or
|
||||
a <code>return</code> statement or you must begin a code block with an opening curly brace.<br />
|
||||
The condition itself is usually a simple comparison as described above. It can also be a test
|
||||
involving just a comparison operator and nothing else. This special syntax is used to directly
|
||||
test one of the three main flags updated by the processor on almost any instruction : (signedness,
|
||||
overflow and carry). This syntax is not recommended unless you know very well how the processor
|
||||
behaves. Most of the time you can use the standard syntax to achieve the same result, albeit with
|
||||
a couple less line of codes sometimes. For example :<br />
|
||||
<code>// A simple test with standard syntax :<br />
|
||||
if EAX > 10 return<br />
|
||||
// This is equivalent to this one with special syntax : <br />
|
||||
EAX ?= 10<br />
|
||||
if > return</code><br /></p>
|
||||
<p>Notice that unlike higher level languages there is no "else" construct available.</p>
|
||||
|
||||
<h3>Looping</h3>
|
||||
<p>The while keyword only support standard comparison. Special syntax available with <code>if</code>
|
||||
statement can't be used with the <code>while</code> statement.</p>
|
||||
Define a loop on a simple condition. Example : <br />
|
||||
<code>while eax < 0 {<br />
|
||||
eax = 1<br />
|
||||
}</code>
|
||||
|
||||
<h2>Playing with the stack</h2>
|
||||
<p>The x86 architecture supports a stack concept that is backed by the <code>ESP</code> processor
|
||||
register. Pushing value(s) onto the stack is denoted with the <code>+</code> sign while popping
|
||||
value(s) from the stack is denoted by the <code>-</code> sign. You can push or pop a single
|
||||
register at a time by prefixing its name with the appropriate operation sign. There must not be
|
||||
any whitespace character between the sign and the register name. For example:<br />
|
||||
<code>// Pop the EAX register from the stack.<br />
|
||||
-EAX</code><br />
|
||||
The datatype of the pushed/popped value is implied by the register name.</p>
|
||||
<p>You can also directly push (and obvioulsy can't pop) an immediate numeric value value onto the
|
||||
stack. Should the value be defined as a constant with the <code>const</code> keyword do not forget
|
||||
the dash sign that must appear between the operation sign and the constant name. For example :<br />
|
||||
<code>// Push the immediate value 200 onto the stack.<br />
|
||||
+200<br />
|
||||
// Push the value for the twoHundred constant onto the stack.<br />
|
||||
+#twoHundred</code><br />
|
||||
The default datatype for a pushed immediate value is doubleword. You can also explictly state the
|
||||
kind of <a href="#datatype">datatype</a> for the pushed/popped constant. You do this by appending a
|
||||
<code>as</code> clause at the end of the instruction such as :<br />
|
||||
<code>// Push the immediate value 200 onto the stack as a word (2 bytes).<br />
|
||||
+200 as word<br />
|
||||
// Push the twoHundred constant onto the stack as a single byte.<br />
|
||||
+#twoHundred as byte</code></p>
|
||||
<p>Finally is also a convenient instruction that let you push or pop all common purpose registers with
|
||||
the <code>All</code> instruction. Once again you must prefix this keyword with the appropriate
|
||||
operation sign.</p>
|
||||
|
||||
<h2>Working with I/O ports</h2>
|
||||
<p>Reading and writing I/O ports is performed with the <code>Port</code> keyword. The port number must
|
||||
be set in the DX register. You can read or write a byte, a word or a doubleword at a time. The input
|
||||
or output data will be in AL, AX or EAX register respectively. To read a byte use the following syntax :<br />
|
||||
<code>AL = Port[DX]</code><br />
|
||||
To write a double word use the following syntax :<br />
|
||||
<code>Port[DX] = EAX</code></p>
|
||||
|
||||
<h2>Debugging helper</h2>
|
||||
<p>The <code>checkpoint</code> instruction let you write a simple text to the console by directly
|
||||
copying text content to the video buffer. The text must fllow the keyword and be surrounded with single
|
||||
quotes. Should it contain quotes they must be escaped with an antislash.<br />
|
||||
<code>checkpoint 'This is a \'debugging\' message'</code></p>
|
||||
|
||||
<h2>Literal assembler code</h2>
|
||||
Despite our efforts you may find necessary to directly write assembler code in your X# soure code. Any
|
||||
source code line which first non whitespace character is an exclamation point will be copied verbatim
|
||||
in the target assembler source. This may be usefull for some rarely used instruction. For exmaple :<br />
|
||||
<code>// Hope our Execution state block in System Management RAM is valid otherwise crash-boom<br />
|
||||
! RSM</code><br />
|
||||
The most likely reason you may emit literal assembler code is for floating point operations which
|
||||
are not supported by the X# compiler. However these kind of operations is rarely encountered at an
|
||||
OS kernel level.
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
|
@ -9,18 +9,25 @@ namespace Cosmos.Compiler.XSharp {
|
|||
protected int mStart = 0;
|
||||
/// <summary>Initial text provided as a constructor parameter.</summary>
|
||||
protected string mData;
|
||||
/// <summary>true if whitespace tokens should be kept and propagated to the next parsing
|
||||
/// stage.</summary>
|
||||
protected bool mIncludeWhiteSpace;
|
||||
/// <summary>true while every token encountered until so far by this parser are whitespace
|
||||
/// tokens.</summary>
|
||||
protected bool mAllWhitespace;
|
||||
/// <summary>true if the parser supports patterns recognition.</summary>
|
||||
protected bool mAllowPatterns;
|
||||
|
||||
/// <summary>Tokens retrieved so far by the parser.</summary>
|
||||
protected TokenList mTokens;
|
||||
|
||||
/// <summary>Get a list of tokens that has been built at class instanciation.</summary>
|
||||
public TokenList Tokens {
|
||||
get { return mTokens; }
|
||||
}
|
||||
|
||||
protected static readonly char[] mComma = ",".ToCharArray();
|
||||
protected static readonly char[] mSpace = " ".ToCharArray();
|
||||
protected static readonly char[] mComma = new char[] { ',' };
|
||||
protected static readonly char[] mSpace = new char[] { ' ' };
|
||||
public static string[] mKeywords = (
|
||||
"As,All"
|
||||
+ ",BYTE"
|
||||
|
|
@ -65,6 +72,12 @@ namespace Cosmos.Compiler.XSharp {
|
|||
RegistersAddr = xRegistersAddr.ToArray();
|
||||
}
|
||||
|
||||
/// <summary>Parse next token from currently parsed line, starting at given position and
|
||||
/// add the retrieved token at end of given token list.</summary>
|
||||
/// <param name="aList">The token list where to add the newly recognized token.</param>
|
||||
/// <param name="rPos">The index in current source code line of the first not yet consumed
|
||||
/// character. On return this parameter will be updated to account for characters that would
|
||||
/// have been consumed.</param>
|
||||
protected void NewToken(TokenList aList, ref int rPos) {
|
||||
#region Pattern Notes
|
||||
// All patterns start with _, this makes them reserved. User can use too, but at own risk of conflict.
|
||||
|
|
@ -98,6 +111,7 @@ namespace Cosmos.Compiler.XSharp {
|
|||
char xChar1 = mData[mStart];
|
||||
var xToken = new Token();
|
||||
|
||||
// Recognize comments and literal assembler code.
|
||||
if (mAllWhitespace && "/!".Contains(xChar1)) {
|
||||
rPos = mData.Length; // This will account for the dummy whitespace at the end.
|
||||
xString = mData.Substring(mStart + 1, rPos - mStart - 1).Trim();
|
||||
|
|
@ -110,6 +124,7 @@ namespace Cosmos.Compiler.XSharp {
|
|||
xString = xString.Substring(1);
|
||||
xToken.Type = TokenType.Comment;
|
||||
} else if (xChar1 == '!') {
|
||||
// Literal assembler code.
|
||||
xToken.Type = TokenType.LiteralAsm;
|
||||
}
|
||||
} else {
|
||||
|
|
@ -133,6 +148,8 @@ namespace Cosmos.Compiler.XSharp {
|
|||
} else if (IsAlphaNum(xChar1)) { // This must be after check for ValueInt
|
||||
string xUpper = xString.ToUpper();
|
||||
|
||||
// Special parsing when in pattern mode. We recognize some special strings
|
||||
// which would otherwise be considered as simple AlphaNum token otherwise.
|
||||
if (mAllowPatterns) {
|
||||
if (RegisterPatterns.Contains(xUpper)) {
|
||||
xToken.Type = TokenType.Register;
|
||||
|
|
@ -166,12 +183,12 @@ namespace Cosmos.Compiler.XSharp {
|
|||
xToken.Value = xString;
|
||||
xToken.SrcPosStart = mStart;
|
||||
xToken.SrcPosEnd = rPos - 1;
|
||||
if (mAllWhitespace && xToken.Type != TokenType.WhiteSpace) {
|
||||
if (mAllWhitespace && (xToken.Type != TokenType.WhiteSpace)) {
|
||||
mAllWhitespace = false;
|
||||
}
|
||||
mStart = rPos;
|
||||
|
||||
if (mIncludeWhiteSpace || xToken.Type != TokenType.WhiteSpace) {
|
||||
if (mIncludeWhiteSpace || (xToken.Type != TokenType.WhiteSpace)) {
|
||||
aList.Add(xToken);
|
||||
}
|
||||
}
|
||||
|
|
@ -190,7 +207,6 @@ namespace Cosmos.Compiler.XSharp {
|
|||
//var xRegex = new Regex(@"(\W)");
|
||||
|
||||
var xResult = new TokenList();
|
||||
char xLastChar = ' ';
|
||||
CharType xLastCharType = CharType.WhiteSpace;
|
||||
char xChar;
|
||||
CharType xCharType = CharType.WhiteSpace;
|
||||
|
|
@ -237,11 +253,10 @@ namespace Cosmos.Compiler.XSharp {
|
|||
|
||||
// i > 0 - Never do NewToken on first char. i = 0 is just a pass to get char and set lastchar.
|
||||
// But its faster as the second short circuit rather than a separate if.
|
||||
if (xCharType != xLastCharType && i > 0) {
|
||||
if ((xCharType != xLastCharType) && (0 < i)) {
|
||||
NewToken(xResult, ref i);
|
||||
}
|
||||
|
||||
xLastChar = xChar;
|
||||
xLastCharType = xCharType;
|
||||
}
|
||||
|
||||
|
|
@ -255,9 +270,11 @@ namespace Cosmos.Compiler.XSharp {
|
|||
|
||||
/// <summary>Create a new Parser instance and immediately consume the given <paramref name="aData"/>
|
||||
/// string. On return the <seealso cref="Tokens"/> property is available for enumeration.</summary>
|
||||
/// <param name="aData">The text to be parsed.</param>
|
||||
/// <param name="aData">The text to be parsed. WARNING : This is expected to be a single full line
|
||||
/// of text. The parser can be create with a special "pattern recognition" mode.</param>
|
||||
/// <param name="aIncludeWhiteSpace"></param>
|
||||
/// <param name="aAllowPatterns"></param>
|
||||
/// <param name="aAllowPatterns">True if <paramref name="aData"/> is a pattern and thus the parsing
|
||||
/// should be performed specifically.</param>
|
||||
/// <exception cref="Exception">At least one unrecognized token has been parsed.</exception>
|
||||
public Parser(string aData, bool aIncludeWhiteSpace, bool aAllowPatterns) {
|
||||
mData = aData;
|
||||
|
|
|
|||
|
|
@ -27,7 +27,6 @@ namespace Cosmos.Compiler.XSharp {
|
|||
return Value;
|
||||
}
|
||||
|
||||
|
||||
static public implicit operator string(Token aToken) {
|
||||
return aToken.Value;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -55,10 +55,12 @@ namespace Cosmos.Compiler.XSharp {
|
|||
return true;
|
||||
}
|
||||
|
||||
public bool PatternMatches(string aPattern) {
|
||||
var xParser = new Parser(aPattern, false, true);
|
||||
return PatternMatches(xParser.Tokens);
|
||||
}
|
||||
// BlueSkeye : Seems to be unused. Commented out.
|
||||
//public bool PatternMatches(string aPattern) {
|
||||
// var xParser = new Parser(aPattern, false, true);
|
||||
// return PatternMatches(xParser.Tokens);
|
||||
//}
|
||||
|
||||
public bool PatternMatches(TokenList aObj) {
|
||||
// Dont compare TokenHashCodes, they take just as long to calculate
|
||||
// as a full comparison. Besides this function is often called after
|
||||
|
|
@ -101,14 +103,15 @@ namespace Cosmos.Compiler.XSharp {
|
|||
return true;
|
||||
}
|
||||
|
||||
public int IndexOf(string aValue) {
|
||||
for (int i = 0; i < Count; i++) {
|
||||
if (this[i].Value == aValue) {
|
||||
return i;
|
||||
}
|
||||
}
|
||||
return -1;
|
||||
}
|
||||
// BlueSkeye : Seems to be unused. Commented out.
|
||||
//public int IndexOf(string aValue) {
|
||||
// for (int i = 0; i < Count; i++) {
|
||||
// if (this[i].Value == aValue) {
|
||||
// return i;
|
||||
// }
|
||||
// }
|
||||
// return -1;
|
||||
//}
|
||||
|
||||
// We could use values to further differntiate, however
|
||||
// with types alone it still provides a decent and fash hash.
|
||||
|
|
|
|||
|
|
@ -85,6 +85,9 @@ namespace Cosmos.Compiler.XSharp {
|
|||
public TokenPatterns() {
|
||||
mCompareOps = "< > = != <= >= 0 !0".Split(" ".ToCharArray());
|
||||
var xSizes = "byte , word , dword ".Split(",".ToCharArray()).ToList();
|
||||
// We must add this empty size so that we allow constructs where the size is not
|
||||
// explicitly defined in source code. For example : while eax < 0
|
||||
// otherwise we would have to write : while dword eax < 0
|
||||
xSizes.Add("");
|
||||
foreach (var xSize in xSizes) {
|
||||
foreach (var xComparison in mCompareOps) {
|
||||
|
|
@ -114,41 +117,62 @@ namespace Cosmos.Compiler.XSharp {
|
|||
AddPatterns();
|
||||
}
|
||||
|
||||
protected string Quoted(string aString) {
|
||||
return "\"" + aString + "\"";
|
||||
}
|
||||
// BlueSkeye : Seems to be unused. Quoted out.
|
||||
//protected string Quoted(string aString) {
|
||||
// return "\"" + aString + "\"";
|
||||
//}
|
||||
|
||||
protected int IntValue(Token aToken) {
|
||||
if (aToken.Value.StartsWith("0x")) {
|
||||
return int.Parse(aToken.Value.Substring(2), NumberStyles.AllowHexSpecifier);
|
||||
} else {
|
||||
return int.Parse(aToken.Value);
|
||||
}
|
||||
}
|
||||
// BlueSkeye : Seems to be unused. Quoted out.
|
||||
//protected int IntValue(Token aToken)
|
||||
//{
|
||||
// if (aToken.Value.StartsWith("0x")) {
|
||||
// return int.Parse(aToken.Value.Substring(2), NumberStyles.AllowHexSpecifier);
|
||||
// } else {
|
||||
// return int.Parse(aToken.Value);
|
||||
// }
|
||||
//}
|
||||
|
||||
/// <summary>Builds a label that is suitable to denote a constant which name is given by the
|
||||
/// token.</summary>
|
||||
/// <param name="aToken"></param>
|
||||
/// <returns></returns>
|
||||
protected string ConstLabel(Token aToken) {
|
||||
return GroupLabel("Const_" + aToken);
|
||||
}
|
||||
|
||||
/// <summary>Builds a label at namespace level having the given name.</summary>
|
||||
/// <param name="aLabel">Local label name at namespace level.</param>
|
||||
/// <returns>The label name</returns>
|
||||
protected string GroupLabel(string aLabel) {
|
||||
return GetNamespace() + "_" + aLabel;
|
||||
}
|
||||
|
||||
protected string FuncLabel(string aLabel) {
|
||||
/// <summary>Builds a label at function level having the given name.</summary>
|
||||
/// <param name="aLabel">Local label name at function level.</param>
|
||||
/// <returns>The label name</returns>
|
||||
protected string FuncLabel(string aLabel)
|
||||
{
|
||||
return GetNamespace() + "_" + mFuncName + "_" + aLabel;
|
||||
}
|
||||
|
||||
/// <summary>Builds a label having the given name at current function block level.</summary>
|
||||
/// <param name="aLabel">Local label name at function block level.</param>
|
||||
/// <returns>The label name.</returns>
|
||||
protected string BlockLabel(string aLabel) {
|
||||
return FuncLabel("Block" + mBlocks.Current().LabelID + "_" + aLabel);
|
||||
}
|
||||
|
||||
/// <summary>Build a label name for the given token. This method enforce the rule for .
|
||||
/// and .. prefixes and build the label at appropriate level.</summary>
|
||||
/// <param name="aToken"></param>
|
||||
/// <returns></returns>
|
||||
protected string GetLabel(Token aToken) {
|
||||
if (aToken.Type != TokenType.AlphaNum && !aToken.Matches("exit")) {
|
||||
if ((aToken.Type != TokenType.AlphaNum) && !aToken.Matches("exit")) {
|
||||
throw new Exception("Label must be AlphaNum.");
|
||||
}
|
||||
|
||||
string xValue = aToken;
|
||||
if (mFuncName == null) {
|
||||
if (!InFunctionBody) {
|
||||
if (xValue.StartsWith(".")) {
|
||||
return xValue.Substring(1);
|
||||
}
|
||||
|
|
@ -207,35 +231,39 @@ namespace Cosmos.Compiler.XSharp {
|
|||
mFuncName = null;
|
||||
}
|
||||
|
||||
protected string GetDestRegister(TokenList aTokens, int aIdx) {
|
||||
return GetRegister("Destination", aTokens, aIdx);
|
||||
}
|
||||
// BlueSkeye : Seems to be unused. Commented out.
|
||||
//protected string GetDestRegister(TokenList aTokens, int aIdx) {
|
||||
// return GetRegister("Destination", aTokens, aIdx);
|
||||
//}
|
||||
|
||||
protected string GetSrcRegister(TokenList aTokens, int aIdx) {
|
||||
return GetRegister("Source", aTokens, aIdx);
|
||||
}
|
||||
// BlueSkeye : Seems to be unused. Commented out.
|
||||
//protected string GetSrcRegister(TokenList aTokens, int aIdx) {
|
||||
// return GetRegister("Source", aTokens, aIdx);
|
||||
//}
|
||||
|
||||
protected string GetRegister(string aPrefix, TokenList aTokens, int aIdx) {
|
||||
var xToken = aTokens[aIdx].Type;
|
||||
Token xNext = null;
|
||||
if (aIdx + 1 < aTokens.Count) {
|
||||
xNext = aTokens[aIdx + 1];
|
||||
}
|
||||
// BlueSkeye : Seems to be unused. Commented out.
|
||||
//protected string GetRegister(string aPrefix, TokenList aTokens, int aIdx)
|
||||
//{
|
||||
// var xToken = aTokens[aIdx].Type;
|
||||
// Token xNext = null;
|
||||
// if (aIdx + 1 < aTokens.Count) {
|
||||
// xNext = aTokens[aIdx + 1];
|
||||
// }
|
||||
|
||||
string xResult = aPrefix + "Reg = RegistersEnum." + aTokens[aIdx].Value;
|
||||
if (xNext != null) {
|
||||
if (xNext.Value == "[") {
|
||||
string xDisplacement;
|
||||
if (aTokens[aIdx + 2].Value == "-") {
|
||||
xDisplacement = "-" + aTokens[aIdx + 2].Value;
|
||||
} else {
|
||||
xDisplacement = aTokens[aIdx + 2].Value;
|
||||
}
|
||||
xResult = xResult + ", " + aPrefix + "IsIndirect = true, " + aPrefix + "Displacement = " + xDisplacement;
|
||||
}
|
||||
}
|
||||
return xResult;
|
||||
}
|
||||
// string xResult = aPrefix + "Reg = RegistersEnum." + aTokens[aIdx].Value;
|
||||
// if (xNext != null) {
|
||||
// if (xNext.Value == "[") {
|
||||
// string xDisplacement;
|
||||
// if (aTokens[aIdx + 2].Value == "-") {
|
||||
// xDisplacement = "-" + aTokens[aIdx + 2].Value;
|
||||
// } else {
|
||||
// xDisplacement = aTokens[aIdx + 2].Value;
|
||||
// }
|
||||
// xResult = xResult + ", " + aPrefix + "IsIndirect = true, " + aPrefix + "Displacement = " + xDisplacement;
|
||||
// }
|
||||
// }
|
||||
// return xResult;
|
||||
//}
|
||||
|
||||
protected string GetRef(TokenList aTokens, ref int rIdx) {
|
||||
var xToken1 = aTokens[rIdx];
|
||||
|
|
@ -375,10 +403,14 @@ namespace Cosmos.Compiler.XSharp {
|
|||
// ..Name: - Global level. Emitted exactly as is.
|
||||
// .Name: - Group level. Group_Name
|
||||
// Name: - Function level. Group_ProcName_Name
|
||||
|
||||
// The Exit label is a special one that is used as a target for the return instruction.
|
||||
// It deserve special handling.
|
||||
AddPattern("Exit:", delegate(TokenList aTokens, Assembler aAsm) {
|
||||
aAsm += GetLabel(aTokens[0]) + ":";
|
||||
mFuncExitFound = true;
|
||||
});
|
||||
// Regular label recognition.
|
||||
AddPattern("_ABC:", delegate(TokenList aTokens, Assembler aAsm) {
|
||||
aAsm += GetLabel(aTokens[0]) + ":";
|
||||
});
|
||||
|
|
@ -391,22 +423,31 @@ namespace Cosmos.Compiler.XSharp {
|
|||
aAsm += "Jmp " + GetLabel(aTokens[1]);
|
||||
});
|
||||
|
||||
// Defines a constant having the given name and initial value.
|
||||
AddPattern("const _ABC = 123", delegate(TokenList aTokens, Assembler aAsm) {
|
||||
aAsm += ConstLabel(aTokens[1]) + " equ " + aTokens[3];
|
||||
});
|
||||
|
||||
// Declare a double word variable having the given name and initialized to 0. The
|
||||
// variable is declared at namespace level.
|
||||
AddPattern("var _ABC", delegate(TokenList aTokens, Assembler aAsm) {
|
||||
aAsm.Data.Add(GetLabel(aTokens[1]) + " dd 0");
|
||||
});
|
||||
// Declare a doubleword variable having the given name and an explicit initial value. The
|
||||
// variable is declared at namespace level.
|
||||
AddPattern("var _ABC = 123", delegate(TokenList aTokens, Assembler aAsm) {
|
||||
aAsm.Data.Add(GetLabel(aTokens[1]) + " dd " + aTokens[3].Value);
|
||||
});
|
||||
// Declare a textual variable having the given name and value. The variable is defined at
|
||||
// namespace level and a null terminating byte is automatically added after the textual
|
||||
// value.
|
||||
AddPattern("var _ABC = 'Text'", delegate(TokenList aTokens, Assembler aAsm) {
|
||||
// , 0 adds null term to our strings.
|
||||
// Fix issue #15660 by using backquotes for string surrounding and escaping embedded
|
||||
// back quotes.
|
||||
aAsm.Data.Add(GetLabel(aTokens[1]) + " db `" + EscapeBackQuotes(aTokens[3].Value) + "`, 0");
|
||||
});
|
||||
// Declare a one-dimension array of bytes, words or doublewords. All members are initialized to 0.
|
||||
// _ABC is array name. 123 is the total number of items in the array.
|
||||
AddPattern(new string[] {
|
||||
"var _ABC byte[123]",
|
||||
"var _ABC word[123]",
|
||||
|
|
|
|||
Loading…
Reference in a new issue