Cosmos/source2/Compiler/Cosmos.XSharp/Docs/XSharp.htm

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>XSharp explained</title>
</head>
<body>
<h1>INTRODUCTION</h1>
<p>X# pronounced X-Sharp is an High Level Assembly language that target the x86 architecture and is
expected to be flexible enough to later target other kinds of processors.</p>
<p>The language is line based which means an instruction doesn&#39;t span several lines. This make the
language easier to parse. Also parsing is performed in one path. This imply that some semantic checks
are not performed by the parser which may lead to assembly failures when NASM is invoked later.</p>
<p>Close to 1:1 mapping for debugging, non disconnect. No large compounds.</p>

<h1>SYNTAX</h1>
<h2>Comments</h2>
<p>A comment must appear on its own line. You can&#39;t mix code and comments on a single line. A comment line
is one that starts with two consecutive slashes. Whitespaces may be inserted before the comment line. For example :<br />
<code>// This is a comment.<br />
&nbsp;&nbsp;&nbsp;&nbsp;// Another comment prefixed with whitespaces.<br />
</code></p>

<h2>Literal values</h2>
<h3>String literals</h3>
<p>A string literal is surrounded with single quotes. Should your string contain a single quote you must
escape it with a backslash character. For example :<br/>
<code>&#39;Waiting for \&#39;debugger\&#39; connection...&#39;</code></p>

<h3>Integer literals</h3>
<p>You can write integer literal values either in decimal or hexadecimal. For hexadecimal values prefix
the value with a dollar sign:<br />
<code>// Those two constant values are actually equal<br />
const decimal = 255<br />
const hexadecimal = $FF</code></p>

<h2><a name="namespace">Namespaces</a></h2>
<p>A namespace is a naming scope that lets you organize your code to avoid naming collision. You
declare a namespace by using the <code>namespace</code> keyword and giving it a name. For example :<br />
<code>namespace TEST</code><br /></p>
<p>The namespace name is automatically used as a prefix for each named item that appear in that namespace
(function name, labels, variables ...). The namespace extents from the souce code line it is declared
until either another namespace definition appear or the end of the source code file is reached.
Consequently there is no namespace hierarchy and you cannot "embed" a namespace into another one.</p>
<p><b>WARNING : Code inside a namespace has no way to reference or use code or data from another namespace.</b><br />
Nothing prevents you to reuse a namespace including inside a single source code file. For example the
following source code will compile without error.<br />
<code>namespace FIRST<br />
// Everything here will be prefixed with FIRST. Hence the "true" full name of the below variable<br />
// is FIRST_aVar<br />
var aVar<br />
namespace SECOND<br />
// Not a problem to name another variable aVar. Its true name is SECOND_aVar<br />
var aVar<br />
namespace FIRST<br />
// And here we get back to the FIRST namespace<br />
</code></p>
<p><b>Every program artefact MUST appear inside a namespace.</b> It is hence strongly recommended to define
a namespace at the very beginning of any X# source file.</p>

<h2><a name="datatypes">Datatypes</a></h2>
X# is targeted at 32 bits assembler code generation. It support the following datatypes :<br />

<ul>
<li>8 bits value as defined by the <code>byte</code> keyword.</li>
<li>16 bits value as defined by the <code>word</code> keyword.</li>
<li>32 bits value as defined by the <code>dword</code> keyword.</li>
</ul>

<p>The signedness of the datatype is undefined. The X# code needs to handle itself the various
control flags (carry, sign and overflow) according to the context. Also notice that X# is
lacking floating point datatypes.</p>

<h2>Constants</h2>
<p>Constants are symbolic names associated with a numeric litteral value. A constant definition
is introduced by the <code>const</code> keyword, followed by the constant name an equal sign and a
constant numeric value. Constants are always considered to be of double word type. For example :<br />
<code>namespace TEST<br />
const twoHundred = 200</code><br /></p>
<p>The constant name itself is built differently than for other items. The above constant
declaration is actually named <code>TEST_Const_twoHundred</code>. Consequently you can
define another (non const) item with the same name without fearing name collision. However
this is bad programming practice and is strongly discouraged.</p>
<p><b>WARNING : Whenever you want to reference one of you constants in your source code, you MUST
have its name be prefixed with a dash.</b> For example the following code initialize the EAX register
with the value of the twoHundred constant :<br />
<code>EAX = #twoHundred</code></p>

<h2>Variables</h2>
<p>You can define either atomic variables of either doubleword or text type or one dimension array
of any of the available <a href="#datatypes">datatypes</a>. You declare a variable by giving it
a name and optionally a value. For example the code below declares two variables :<br />
<code>var myNumVar = 876<br />
var myTextVar = &#39;A message&#39;</code><br />
If you omit to give the variable a value it will be assumed to be a doubleword and will be
initialized with a default value of 0.<br /> The X# compiler silently appends a null byte at the
end of textual initialization value.</p>

<p>You also can define a one dimension array of one of the available <a href="#datatypes">datatypes</a>.
All array members are initialized to 0. You must provide the array size at declaration time.
For example delaring an array of 256 bytes is :<br />
<code>var myArray byte[256]</code></p>

<h2><a name="#registers">Registers</a></h2>
X# support all the four general purpose registers from the x86 architecture. These registers are
available as byte sized : <code>AH AL BH BL CH CL DH DL</code> as well as word sized :
<code>AX BX CX DX</code> and doubleword sized <code>EAX EBX ECX EDX</code>. The four specific
registers are also available as doubleword sized : <code>ESI EDI ESP EBP</code>

<h2>Labels</h2>
<p>Labels are a way to give a name to some memory addresses. This is a convenient way to be able
to reference these addresses at coding time without having to know there value at runtime. The X#
compiler automatically creates several labels. For example each time you define a variable, a
label will be created having the variable name and referencing the memory address of the variable.
This will be usefull to read and write variable content.<br />
When you create a function a label will also be defined to be the address of the beginning of the
function. This label will be used when you call the function.<br />Those automatically created
labels are largely transparent for you. On the other hand you may want to explicitly define labels
to denote some particular position in your code. This is the case for example when you want to
perform a test and jump to a specific line of code depending on the result of the test. You will
create a label at the code location where you will want to jump.<br />A label is nothing more than
a name suffixed with <code>:</code><br />
<code>// This is a useless label because the variable already got one.<br />
MyUselessLabel:<br />
var myVar</code></p>

<h2>Functions</h2>
<p>Functions are declared using the <code>function</code> keyword. A function name must follow the
keyword and be followed by an opening curly brace. Be carefull to keep the opening curly brace on
the same line than the <code>function</code> keyword. Contrarily to high level languages, X# function
declaration doesn&#39;t support parameters declaration. You must handle parameters passing by yourself
either using the stack and/or well known registers. For example :<br />
<code>function MyFirstFunction {<br />
// Your code here<br />
// Do not forget the closing curly brace.<br />
}</code></p>

<h3>Returning from a function</h3>
<p>When the X# compiler encounters the closing curly brace that signal the end of the function source
code, the compiler automatically adds a <code>ret</code> instruction. The recommended way to return
from a function is to use the <code>return</code> keyword. Internally the X# compiler will translate
it to an unconditional jump to a special label local to the function which is named <code>Exit</code>.
The X# compiler tracks the use of this label and is wise enough to add such a label at the end of the
function code if you don&#39;t define it by yourself.</p>
<p>Sometimes you will want to explicitly return from your function without going to the cleanup code that
may be defined at and below the function <code>Exit</code> label. You can do so by using the <code>ret</code>
keyword.<br />
<code>// This instruction will directly exit the function without jumping to the Exit label.<br />
ret</code></p>
<p><b>WARNING : The X# compiler doesn&#39;t monitor stack content. It is the responsibility of your code to
make sure that the return address is immediately on top of the stack before the <code>ret</code> instruction
is executed, including for the one that is automatically added by the compiler at the end of the function
body.</b></p>

<h3>Invoking a function</h3>
<p>You invoke a function by using the <code>call</code> keyword followed by the function name.<br />
<code>Call myFunction</code><br />
Because X# doesn&#39;t support function parameters you must make sure you properly setup the stack and/or
the registers that are expected by the invoked function.</p>

<h2>Interrupt handlers</h2>
<p>Interrupt handlers are special kind of functions used to handle an interruption. Those functions
do not support parameters and are declared using the <code>interrupt</code> keyword. An interrupt
function name must follow the keyword and be followed by an opening curly brace. Be carefull to keep
the opening curly brace on the same line than the <code>interrupt</code> keyword. For example :<br />
<code>interrupt DivideByZero {<br />
// Your code here<br />
// Do not forget the closing curly brace.<br />
}</code></p>

<p>Interrupt handlers are executed in a specific processor context that is different from the
normal control flow within functions. So there must be a way for the processor to know when
interrupt processing is done and normal operations should resume. This require a specific
instruction, namely <code>iret</code> in x86 processors architecture. Normally you do not
have to take care of this because the X# compiler knows you're defining an interrupt handler
and silently insert the <code>iret</code> instruction at the end of the interrupt handler
code. However you can diretcly insert the <code>iret</code> instruction in your X# code,
including in a normal function.</p>
<p><b>WARNING : You must be very carefull not to use this instruction when your code is not
handling an interruption otherwise the processor will trigger an exception. The X# compiler
doesn&#39;t perform any control when you hardcode this instruction.</b></p>

<h2>Assigning value</h2>
<p>You can assign a value to a <a href="#registers">register</a> or to a variable. You do it using
the <code>=</code> operator. The left side is the register or variable name while the right side
is the value to be assigned. For example :<br />
<code>// Assign the immediate value 123 to the EAX register (32 bits).<br />
EAX = 123</code><br /></p>
<p>On the right side of the assignment operator you can use either an immediate value, a constant
(which name must be prefixed with a dash sign), or a register name.<br />
When the left side of the assignment operator is a variable name and the right size is an immediate
value you can additionally explicitey define the size of the right operand using an <code>as</code>
clause associated with the <a href="#datatype">datatype</a>. For example :<br />
<code>// Assign the immediate value 200 as a word (16 bits) to the myVar variable.
myVar = 200 as word</code></p>

<h3>Address indirection</h3>
<p>Sometimes a register contains the in memory address of another element, most lkely a variable.
In this case you do not want to assign a value to the register itself and want instead to store
the value at the memory adress stored in the register. This is called address indirection and is
denoted by the register name being followed by a number surrounded between square brackets and
known as an offset (more on this later). Address indirection may be used on both the right side and
the left side of the <code>=</code> assignment operator. However you can&#39;t use it on both side at
the same time. Let&#39;s take an example :<br />
<code>EAX[10] = EBX</code><br />
The behavior is as follow : take the content of the EAX register, add to it the offset value (10
in our example) and consider this to be a memory address. Now store the content of the EBX register
at this memory address.<br />
The offset value must be a literal number including 0 or even a negative number.</p>
<p>So now how does it come for a register&#39;s value to be a memory address ? We do this with a special
<code>@</code> operator that is used as a suffix to a label name. Knowing each time you declare a
variable the X# compiler automatically creates a label for this variable it comes that we now have
the following syntax :<br />
<code>// Declare a variable<br />
var myVar<br />
// Read variable content into EAX register by using the variable name.<br />
EAX = myVar<br />
// Load EAX register with the in memory address of the myVar variable.
EAX = @myVar<br />
// So now we can store the content of EBX register into myVar variable.<br />
EAX[0] = EBX<br />
// And read back the content of the myVar variable into ECX register.<br />
ECX = EAX[0]</code></p>

<h2>Register arithmetic</h2>
<p>X# support additive and substractive register arithmetic with the <code>+</code> and <code>-</code>
operators. X# support a shotcut syntactic version for incrementing and decrementing a <a href="#registers">register</a>.
This syntax is not supported for variables. When incrementing or decrementing a register you must omit the
assigment part of the instruction. The target register is the one on the left side of the operator. For
example the following instruction increment the EAX register by 2 :<br />
<code>EAX + 2</code><br />
In the above example you can replace the literal value with a register name but not with a variable
name. In the following example the value of the EAX register is decremented by the value of the EBX
register :<br />
<code>EAX - EBX</code></p>
<p>Finally there is even a shorter version when you want to increment or decrement a register by 1.
This is performed with the <code>++</code> and <code>--</code> operators. They must be applied to a
register only. Incrementing and decrementing a variable this way is not supported. Additionally the
operator must be used as a register suffix with no additional space between register name and operator.
For example :<br />
<code>// Increment EAX register<br />
EAX++<br />
// Decrement ECX register<br />
ECX--</code></p>

<h2>Register shifting and rolling</h2>
<p>Shifting a register to the right or to the left is performed with <code>&gt;&gt;</code> and
<code>&lt;&lt;</code> keywords respectively. Following the keyword you must provide a literal
number that define how many bits to shift. For example :<br />
code>// Shift EAX to the right by 8 bits.<br />
EAX &gt;&gt; 8</p>
<p>Shifting a register to the right or to the left is performed with <code>~&gt;</code> and
<code>&lt;~</code> keywords respectively. Following the keyword you must provide a literal
number that define how many bits to shift. For example :<br />
code>// Roll EAX to the left by 12 bits.<br />
EAX &lt;~ 12</p>

<h2>Comparision</h2>
Classical comparision operatotrs are supported :<br />
<code>&lt; &gt; = &lt;= &gt;= !=</code>.<br />

See the two collections for what is supported in if statements
foreach (var xComparison in mCompareOps)
foreach (var xCompare in mCompares)

The while statement only support the mCompares style.

<h3>Pure comparison</h3>
<p>Sometimes you want to compare a register content for equality with a literal number, a variable
content or a constant. You can do this with the <code>?=</code> operator. The left side of the
operator is the register name while the right side is the value to be compared with. The result
of such an operation is to have the processor context flags (sign overflow, equality and carry) to
be set accordingly with the comparison result.<br />
<code>// Compare EAX register content with literal value 812.<br />
EAX ?= 812</code></p>
<p>You may also which to test some specific bits of the register value and not the full register
value as a whole. This is where you use the <code>?&</code> operator. Once again processor context
flags are updated with the result of the bitwise AND comparison of the register value and the
compared value.<br />
<code>// Test whether the fourth least significant bit of EAX register is set.<br />
EAX ?& $08</code></p>

<h2>Control flow instructions</h2>

<h3>Branching</h3>
<p>The <code>goto</code> keyword lets you perform unconditional branching. Following the keyword
you must name the target label. For example :<br />
<code>// Assuming a somewhereElse label is defined.<br />
goto somewhereElse</code><br /></p>

<p>The <code>if</code> keyword lets you perform conditional branching. Following the keyword and
on the same line you must provide a condition followed by either a <code>goto</code> statement or
a <code>return</code> statement or you must begin a code block with an opening curly brace.<br />
The condition itself is usually a simple comparison as described above. It can also be a test
involving just a comparison operator and nothing else. This special syntax is used to directly
test one of the three main flags updated by the processor on almost any instruction : (signedness,
overflow and carry). This syntax is not recommended unless you know very well how the processor
behaves. Most of the time you can use the standard syntax to achieve the same result, albeit with
a couple less line of codes sometimes. For example :<br />
<code>// A simple test with standard syntax :<br />
if EAX > 10 return<br />
// This is equivalent to this one with special syntax : <br />
EAX ?= 10<br />
if > return</code><br /></p>
<p>Notice that unlike higher level languages there is no "else" construct available.</p>

<h3>Looping</h3>
<p>The while keyword only support standard comparison. Special syntax available with <code>if</code>
statement can&#39;t be used with the <code>while</code> statement.</p>
Define a loop on a simple condition. Example : <br />
<code>while eax < 0 {<br />
eax = 1<br />
}</code>

<h2>Playing with the stack</h2>
<p>The x86 architecture supports a stack concept that is backed by the <code>ESP</code> processor
register. Pushing value(s) onto the stack is denoted with the <code>+</code> sign while popping
value(s) from the stack is denoted by the <code>-</code> sign. You can push or pop a single
register at a time by prefixing its name with the appropriate operation sign. There must not be
any whitespace character between the sign and the register name. For example:<br />
<code>// Pop the EAX register from the stack.<br />
-EAX</code><br />
The datatype of the pushed/popped value is implied by the register name.</p>
<p>You can also directly push (and obvioulsy can&#39;t pop) an immediate numeric value value onto the
stack. Should the value be defined as a constant with the <code>const</code> keyword do not forget
the dash sign that must appear between the operation sign and the constant name. For example :<br />
<code>// Push the immediate value 200 onto the stack.<br />
+200<br />
// Push the value for the twoHundred constant onto the stack.<br />
+#twoHundred</code><br />
The default datatype for a pushed immediate value is doubleword. You can also explictly state the
kind of <a href="#datatype">datatype</a> for the pushed/popped constant. You do this by appending a
<code>as</code> clause at the end of the instruction such as :<br />
<code>// Push the immediate value 200 onto the stack as a word (2 bytes).<br />
+200 as word<br />
// Push the twoHundred constant onto the stack as a single byte.<br />
+#twoHundred as byte</code></p>
<p>Finally is also a convenient instruction that let you push or pop all common purpose registers with
the <code>All</code> instruction. Once again you must prefix this keyword with the appropriate
operation sign.</p>

<h2>Working with I/O ports</h2>
<p>Reading and writing I/O ports is performed with the <code>Port</code> keyword. The port number must
be set in the DX register. You can read or write a byte, a word or a doubleword at a time. The input
or output data will be in AL, AX or EAX register respectively. To read a byte use the following syntax :<br />
<code>AL = Port[DX]</code><br />
To write a double word use the following syntax :<br />
<code>Port[DX] = EAX</code></p>

<h2>Debugging helper</h2>
<p>The <code>checkpoint</code> instruction let you write a simple text to the console by directly
copying text content to the video buffer. The text must fllow the keyword and be surrounded with single
quotes. Should it contain quotes they must be escaped with an antislash.<br />
<code>checkpoint &#39;This is a \&#39;debugging\&#39; message&#39;</code></p>

<h2>Literal assembler code</h2>
Despite our efforts you may find necessary to directly write assembler code in your X# soure code. Any
source code line which first non whitespace character is an exclamation point will be copied verbatim
in the target assembler source. This may be usefull for some rarely used instruction. For exmaple :<br />
<code>// Hope our Execution state block in System Management RAM is valid otherwise crash-boom<br />
! RSM</code><br />
The most likely reason you may emit literal assembler code is for floating point operations which
are not supported by the X# compiler. However these kind of operations is rarely encountered at an
OS kernel level.
</body>
</html>