mirror of
https://github.com/danbulant/Cosmos
synced 2026-05-19 20:39:01 +00:00
366 lines
22 KiB
HTML
366 lines
22 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<title>XSharp explained</title>
|
|
</head>
|
|
<body>
|
|
<h1>INTRODUCTION</h1>
|
|
<p>X# pronounced X-Sharp is an High Level Assembly language that target the x86 architecture and is
|
|
expected to be flexible enough to later target other kinds of processors.</p>
|
|
<p>The language is line based which means an instruction doesn't span several lines. This make the
|
|
language easier to parse. Also parsing is performed in one path. This imply that some semantic checks
|
|
are not performed by the parser which may lead to assembly failures when NASM is invoked later.</p>
|
|
<p>Close to 1:1 mapping for debugging, non disconnect. No large compounds.</p>
|
|
|
|
<h1>SYNTAX</h1>
|
|
<h2>Comments</h2>
|
|
<p>A comment must appear on its own line. You can't mix code and comments on a single line. A comment line
|
|
is one that starts with two consecutive slashes. Whitespaces may be inserted before the comment line. For example :<br />
|
|
<code>// This is a comment.<br />
|
|
// Another comment prefixed with whitespaces.<br />
|
|
</code></p>
|
|
|
|
<h2>Literal values</h2>
|
|
<h3>String literals</h3>
|
|
<p>A string literal is surrounded with single quotes. Should your string contain a single quote you must
|
|
escape it with a backslash character. For example :<br/>
|
|
<code>'Waiting for \'debugger\' connection...'</code></p>
|
|
|
|
<h3>Integer literals</h3>
|
|
<p>You can write integer literal values either in decimal or hexadecimal. For hexadecimal values prefix
|
|
the value with a dollar sign:<br />
|
|
<code>// Those two constant values are actually equal<br />
|
|
const decimal = 255<br />
|
|
const hexadecimal = $FF</code></p>
|
|
|
|
<h2><a name="namespace">Namespaces</a></h2>
|
|
<p>A namespace is a naming scope that lets you organize your code to avoid naming collision. You
|
|
declare a namespace by using the <code>namespace</code> keyword and giving it a name. For example :<br />
|
|
<code>namespace TEST</code><br /></p>
|
|
<p>The namespace name is automatically used as a prefix for each named item that appear in that namespace
|
|
(function name, labels, variables ...). The namespace extents from the souce code line it is declared
|
|
until either another namespace definition appear or the end of the source code file is reached.
|
|
Consequently there is no namespace hierarchy and you cannot "embed" a namespace into another one.</p>
|
|
<p><b>WARNING : Code inside a namespace has no way to reference or use code or data from another namespace.</b><br />
|
|
Nothing prevents you to reuse a namespace including inside a single source code file. For example the
|
|
following source code will compile without error.<br />
|
|
<code>namespace FIRST<br />
|
|
// Everything here will be prefixed with FIRST. Hence the "true" full name of the below variable<br />
|
|
// is FIRST_aVar<br />
|
|
var aVar<br />
|
|
namespace SECOND<br />
|
|
// Not a problem to name another variable aVar. Its true name is SECOND_aVar<br />
|
|
var aVar<br />
|
|
namespace FIRST<br />
|
|
// And here we get back to the FIRST namespace<br />
|
|
</code></p>
|
|
<p><b>Every program artefact MUST appear inside a namespace.</b> It is hence strongly recommended to define
|
|
a namespace at the very beginning of any X# source file.</p>
|
|
|
|
<h2><a name="datatypes">Datatypes</a></h2>
|
|
X# is targeted at 32 bits assembler code generation. It support the following datatypes :<br />
|
|
|
|
<ul>
|
|
<li>8 bits value as defined by the <code>byte</code> keyword.</li>
|
|
<li>16 bits value as defined by the <code>word</code> keyword.</li>
|
|
<li>32 bits value as defined by the <code>dword</code> keyword.</li>
|
|
</ul>
|
|
|
|
<p>The signedness of the datatype is undefined. The X# code needs to handle itself the various
|
|
control flags (carry, sign and overflow) according to the context. Also notice that X# is
|
|
lacking floating point datatypes.</p>
|
|
|
|
<h2>Constants</h2>
|
|
<p>Constants are symbolic names associated with a numeric litteral value. A constant definition
|
|
is introduced by the <code>const</code> keyword, followed by the constant name an equal sign and a
|
|
constant numeric value. Constants are always considered to be of double word type. For example :<br />
|
|
<code>namespace TEST<br />
|
|
const twoHundred = 200</code><br /></p>
|
|
<p>The constant name itself is built differently than for other items. The above constant
|
|
declaration is actually named <code>TEST_Const_twoHundred</code>. Consequently you can
|
|
define another (non const) item with the same name without fearing name collision. However
|
|
this is bad programming practice and is strongly discouraged.</p>
|
|
<p><b>WARNING : Whenever you want to reference one of you constants in your source code, you MUST
|
|
have its name be prefixed with a dash.</b> For example the following code initialize the EAX register
|
|
with the value of the twoHundred constant :<br />
|
|
<code>EAX = #twoHundred</code></p>
|
|
|
|
<h2>Variables</h2>
|
|
<p>You can define either atomic variables of either doubleword or text type or one dimension array
|
|
of any of the available <a href="#datatypes">datatypes</a>. You declare a variable by giving it
|
|
a name and optionally a value. For example the code below declares two variables :<br />
|
|
<code>var myNumVar = 876<br />
|
|
var myTextVar = 'A message'</code><br />
|
|
If you omit to give the variable a value it will be assumed to be a doubleword and will be
|
|
initialized with a default value of 0.<br /> The X# compiler silently appends a null byte at the
|
|
end of textual initialization value.</p>
|
|
|
|
<p>You also can define a one dimension array of one of the available <a href="#datatypes">datatypes</a>.
|
|
All array members are initialized to 0. You must provide the array size at declaration time.
|
|
For example delaring an array of 256 bytes is :<br />
|
|
<code>var myArray byte[256]</code></p>
|
|
|
|
<h2><a name="#registers">Registers</a></h2>
|
|
X# support all the four general purpose registers from the x86 architecture. These registers are
|
|
available as byte sized : <code>AH AL BH BL CH CL DH DL</code> as well as word sized :
|
|
<code>AX BX CX DX</code> and doubleword sized <code>EAX EBX ECX EDX</code>. The four specific
|
|
registers are also available as doubleword sized : <code>ESI EDI ESP EBP</code>
|
|
|
|
<h2>Labels</h2>
|
|
<p>Labels are a way to give a name to some memory addresses. This is a convenient way to be able
|
|
to reference these addresses at coding time without having to know there value at runtime. The X#
|
|
compiler automatically creates several labels. For example each time you define a variable, a
|
|
label will be created having the variable name and referencing the memory address of the variable.
|
|
This will be usefull to read and write variable content.<br />
|
|
When you create a function a label will also be defined to be the address of the beginning of the
|
|
function. This label will be used when you call the function.<br />Those automatically created
|
|
labels are largely transparent for you. On the other hand you may want to explicitly define labels
|
|
to denote some particular position in your code. This is the case for example when you want to
|
|
perform a test and jump to a specific line of code depending on the result of the test. You will
|
|
create a label at the code location where you will want to jump.<br />A label is nothing more than
|
|
a name suffixed with <code>:</code><br />
|
|
<code>// This is a useless label because the variable already got one.<br />
|
|
MyUselessLabel:<br />
|
|
var myVar</code></p>
|
|
|
|
<h2>Functions</h2>
|
|
<p>Functions are declared using the <code>function</code> keyword. A function name must follow the
|
|
keyword and be followed by an opening curly brace. Be carefull to keep the opening curly brace on
|
|
the same line than the <code>function</code> keyword. Contrarily to high level languages, X# function
|
|
declaration doesn't support parameters declaration. You must handle parameters passing by yourself
|
|
either using the stack and/or well known registers. For example :<br />
|
|
<code>function MyFirstFunction {<br />
|
|
// Your code here<br />
|
|
// Do not forget the closing curly brace.<br />
|
|
}</code></p>
|
|
|
|
<h3>Returning from a function</h3>
|
|
<p>When the X# compiler encounters the closing curly brace that signal the end of the function source
|
|
code, the compiler automatically adds a <code>ret</code> instruction. The recommended way to return
|
|
from a function is to use the <code>return</code> keyword. Internally the X# compiler will translate
|
|
it to an unconditional jump to a special label local to the function which is named <code>Exit</code>.
|
|
The X# compiler tracks the use of this label and is wise enough to add such a label at the end of the
|
|
function code if you don't define it by yourself.</p>
|
|
<p>Sometimes you will want to explicitly return from your function without going to the cleanup code that
|
|
may be defined at and below the function <code>Exit</code> label. You can do so by using the <code>ret</code>
|
|
keyword.<br />
|
|
<code>// This instruction will directly exit the function without jumping to the Exit label.<br />
|
|
ret</code></p>
|
|
<p><b>WARNING : The X# compiler doesn't monitor stack content. It is the responsibility of your code to
|
|
make sure that the return address is immediately on top of the stack before the <code>ret</code> instruction
|
|
is executed, including for the one that is automatically added by the compiler at the end of the function
|
|
body.</b></p>
|
|
|
|
<h3>Invoking a function</h3>
|
|
<p>You invoke a function by using the <code>call</code> keyword followed by the function name.<br />
|
|
<code>Call myFunction</code><br />
|
|
Because X# doesn't support function parameters you must make sure you properly setup the stack and/or
|
|
the registers that are expected by the invoked function.</p>
|
|
|
|
<h2>Interrupt handlers</h2>
|
|
<p>Interrupt handlers are special kind of functions used to handle an interruption. Those functions
|
|
do not support parameters and are declared using the <code>interrupt</code> keyword. An interrupt
|
|
function name must follow the keyword and be followed by an opening curly brace. Be carefull to keep
|
|
the opening curly brace on the same line than the <code>interrupt</code> keyword. For example :<br />
|
|
<code>interrupt DivideByZero {<br />
|
|
// Your code here<br />
|
|
// Do not forget the closing curly brace.<br />
|
|
}</code></p>
|
|
|
|
<p>Interrupt handlers are executed in a specific processor context that is different from the
|
|
normal control flow within functions. So there must be a way for the processor to know when
|
|
interrupt processing is done and normal operations should resume. This require a specific
|
|
instruction, namely <code>iret</code> in x86 processors architecture. Normally you do not
|
|
have to take care of this because the X# compiler knows you're defining an interrupt handler
|
|
and silently insert the <code>iret</code> instruction at the end of the interrupt handler
|
|
code. However you can diretcly insert the <code>iret</code> instruction in your X# code,
|
|
including in a normal function.</p>
|
|
<p><b>WARNING : You must be very carefull not to use this instruction when your code is not
|
|
handling an interruption otherwise the processor will trigger an exception. The X# compiler
|
|
doesn't perform any control when you hardcode this instruction.</b></p>
|
|
|
|
<h2>Assigning value</h2>
|
|
<p>You can assign a value to a <a href="#registers">register</a> or to a variable. You do it using
|
|
the <code>=</code> operator. The left side is the register or variable name while the right side
|
|
is the value to be assigned. For example :<br />
|
|
<code>// Assign the immediate value 123 to the EAX register (32 bits).<br />
|
|
EAX = 123</code><br /></p>
|
|
<p>On the right side of the assignment operator you can use either an immediate value, a constant
|
|
(which name must be prefixed with a dash sign), or a register name.<br />
|
|
When the left side of the assignment operator is a variable name and the right size is an immediate
|
|
value you can additionally explicitey define the size of the right operand using an <code>as</code>
|
|
clause associated with the <a href="#datatype">datatype</a>. For example :<br />
|
|
<code>// Assign the immediate value 200 as a word (16 bits) to the myVar variable.
|
|
myVar = 200 as word</code></p>
|
|
|
|
<h3>Address indirection</h3>
|
|
<p>Sometimes a register contains the in memory address of another element, most lkely a variable.
|
|
In this case you do not want to assign a value to the register itself and want instead to store
|
|
the value at the memory adress stored in the register. This is called address indirection and is
|
|
denoted by the register name being followed by a number surrounded between square brackets and
|
|
known as an offset (more on this later). Address indirection may be used on both the right side and
|
|
the left side of the <code>=</code> assignment operator. However you can't use it on both side at
|
|
the same time. Let's take an example :<br />
|
|
<code>EAX[10] = EBX</code><br />
|
|
The behavior is as follow : take the content of the EAX register, add to it the offset value (10
|
|
in our example) and consider this to be a memory address. Now store the content of the EBX register
|
|
at this memory address.<br />
|
|
The offset value must be a literal number including 0 or even a negative number.</p>
|
|
<p>So now how does it come for a register's value to be a memory address ? We do this with a special
|
|
<code>@</code> operator that is used as a suffix to a label name. Knowing each time you declare a
|
|
variable the X# compiler automatically creates a label for this variable it comes that we now have
|
|
the following syntax :<br />
|
|
<code>// Declare a variable<br />
|
|
var myVar<br />
|
|
// Read variable content into EAX register by using the variable name.<br />
|
|
EAX = myVar<br />
|
|
// Load EAX register with the in memory address of the myVar variable.
|
|
EAX = @myVar<br />
|
|
// So now we can store the content of EBX register into myVar variable.<br />
|
|
EAX[0] = EBX<br />
|
|
// And read back the content of the myVar variable into ECX register.<br />
|
|
ECX = EAX[0]</code></p>
|
|
|
|
<h2>Register arithmetic</h2>
|
|
<p>X# support additive and substractive register arithmetic with the <code>+</code> and <code>-</code>
|
|
operators. X# support a shotcut syntactic version for incrementing and decrementing a <a href="#registers">register</a>.
|
|
This syntax is not supported for variables. When incrementing or decrementing a register you must omit the
|
|
assigment part of the instruction. The target register is the one on the left side of the operator. For
|
|
example the following instruction increment the EAX register by 2 :<br />
|
|
<code>EAX + 2</code><br />
|
|
In the above example you can replace the literal value with a register name but not with a variable
|
|
name. In the following example the value of the EAX register is decremented by the value of the EBX
|
|
register :<br />
|
|
<code>EAX - EBX</code></p>
|
|
<p>Finally there is even a shorter version when you want to increment or decrement a register by 1.
|
|
This is performed with the <code>++</code> and <code>--</code> operators. They must be applied to a
|
|
register only. Incrementing and decrementing a variable this way is not supported. Additionally the
|
|
operator must be used as a register suffix with no additional space between register name and operator.
|
|
For example :<br />
|
|
<code>// Increment EAX register<br />
|
|
EAX++<br />
|
|
// Decrement ECX register<br />
|
|
ECX--</code></p>
|
|
|
|
<h2>Register shifting and rolling</h2>
|
|
<p>Shifting a register to the right or to the left is performed with <code>>></code> and
|
|
<code><<</code> keywords respectively. Following the keyword you must provide a literal
|
|
number that define how many bits to shift. For example :<br />
|
|
code>// Shift EAX to the right by 8 bits.<br />
|
|
EAX >> 8</p>
|
|
<p>Shifting a register to the right or to the left is performed with <code>~></code> and
|
|
<code><~</code> keywords respectively. Following the keyword you must provide a literal
|
|
number that define how many bits to shift. For example :<br />
|
|
code>// Roll EAX to the left by 12 bits.<br />
|
|
EAX <~ 12</p>
|
|
|
|
<h2>Comparision</h2>
|
|
Classical comparision operatotrs are supported :<br />
|
|
<code>< > = <= >= !=</code>.<br />
|
|
|
|
See the two collections for what is supported in if statements
|
|
foreach (var xComparison in mCompareOps)
|
|
foreach (var xCompare in mCompares)
|
|
|
|
The while statement only support the mCompares style.
|
|
|
|
<h3>Pure comparison</h3>
|
|
<p>Sometimes you want to compare a register content for equality with a literal number, a variable
|
|
content or a constant. You can do this with the <code>?=</code> operator. The left side of the
|
|
operator is the register name while the right side is the value to be compared with. The result
|
|
of such an operation is to have the processor context flags (sign overflow, equality and carry) to
|
|
be set accordingly with the comparison result.<br />
|
|
<code>// Compare EAX register content with literal value 812.<br />
|
|
EAX ?= 812</code></p>
|
|
<p>You may also which to test some specific bits of the register value and not the full register
|
|
value as a whole. This is where you use the <code>?&</code> operator. Once again processor context
|
|
flags are updated with the result of the bitwise AND comparison of the register value and the
|
|
compared value.<br />
|
|
<code>// Test whether the fourth least significant bit of EAX register is set.<br />
|
|
EAX ?& $08</code></p>
|
|
|
|
<h2>Control flow instructions</h2>
|
|
|
|
<h3>Branching</h3>
|
|
<p>The <code>goto</code> keyword lets you perform unconditional branching. Following the keyword
|
|
you must name the target label. For example :<br />
|
|
<code>// Assuming a somewhereElse label is defined.<br />
|
|
goto somewhereElse</code><br /></p>
|
|
|
|
<p>The <code>if</code> keyword lets you perform conditional branching. Following the keyword and
|
|
on the same line you must provide a condition followed by either a <code>goto</code> statement or
|
|
a <code>return</code> statement or you must begin a code block with an opening curly brace.<br />
|
|
The condition itself is usually a simple comparison as described above. It can also be a test
|
|
involving just a comparison operator and nothing else. This special syntax is used to directly
|
|
test one of the three main flags updated by the processor on almost any instruction : (signedness,
|
|
overflow and carry). This syntax is not recommended unless you know very well how the processor
|
|
behaves. Most of the time you can use the standard syntax to achieve the same result, albeit with
|
|
a couple less line of codes sometimes. For example :<br />
|
|
<code>// A simple test with standard syntax :<br />
|
|
if EAX > 10 return<br />
|
|
// This is equivalent to this one with special syntax : <br />
|
|
EAX ?= 10<br />
|
|
if > return</code><br /></p>
|
|
<p>Notice that unlike higher level languages there is no "else" construct available.</p>
|
|
|
|
<h3>Looping</h3>
|
|
<p>The while keyword only support standard comparison. Special syntax available with <code>if</code>
|
|
statement can't be used with the <code>while</code> statement.</p>
|
|
Define a loop on a simple condition. Example : <br />
|
|
<code>while eax < 0 {<br />
|
|
eax = 1<br />
|
|
}</code>
|
|
|
|
<h2>Playing with the stack</h2>
|
|
<p>The x86 architecture supports a stack concept that is backed by the <code>ESP</code> processor
|
|
register. Pushing value(s) onto the stack is denoted with the <code>+</code> sign while popping
|
|
value(s) from the stack is denoted by the <code>-</code> sign. You can push or pop a single
|
|
register at a time by prefixing its name with the appropriate operation sign. There must not be
|
|
any whitespace character between the sign and the register name. For example:<br />
|
|
<code>// Pop the EAX register from the stack.<br />
|
|
-EAX</code><br />
|
|
The datatype of the pushed/popped value is implied by the register name.</p>
|
|
<p>You can also directly push (and obvioulsy can't pop) an immediate numeric value value onto the
|
|
stack. Should the value be defined as a constant with the <code>const</code> keyword do not forget
|
|
the dash sign that must appear between the operation sign and the constant name. For example :<br />
|
|
<code>// Push the immediate value 200 onto the stack.<br />
|
|
+200<br />
|
|
// Push the value for the twoHundred constant onto the stack.<br />
|
|
+#twoHundred</code><br />
|
|
The default datatype for a pushed immediate value is doubleword. You can also explictly state the
|
|
kind of <a href="#datatype">datatype</a> for the pushed/popped constant. You do this by appending a
|
|
<code>as</code> clause at the end of the instruction such as :<br />
|
|
<code>// Push the immediate value 200 onto the stack as a word (2 bytes).<br />
|
|
+200 as word<br />
|
|
// Push the twoHundred constant onto the stack as a single byte.<br />
|
|
+#twoHundred as byte</code></p>
|
|
<p>Finally is also a convenient instruction that let you push or pop all common purpose registers with
|
|
the <code>All</code> instruction. Once again you must prefix this keyword with the appropriate
|
|
operation sign.</p>
|
|
|
|
<h2>Working with I/O ports</h2>
|
|
<p>Reading and writing I/O ports is performed with the <code>Port</code> keyword. The port number must
|
|
be set in the DX register. You can read or write a byte, a word or a doubleword at a time. The input
|
|
or output data will be in AL, AX or EAX register respectively. To read a byte use the following syntax :<br />
|
|
<code>AL = Port[DX]</code><br />
|
|
To write a double word use the following syntax :<br />
|
|
<code>Port[DX] = EAX</code></p>
|
|
|
|
<h2>Debugging helper</h2>
|
|
<p>The <code>checkpoint</code> instruction let you write a simple text to the console by directly
|
|
copying text content to the video buffer. The text must fllow the keyword and be surrounded with single
|
|
quotes. Should it contain quotes they must be escaped with an antislash.<br />
|
|
<code>checkpoint 'This is a \'debugging\' message'</code></p>
|
|
|
|
<h2>Literal assembler code</h2>
|
|
Despite our efforts you may find necessary to directly write assembler code in your X# soure code. Any
|
|
source code line which first non whitespace character is an exclamation point will be copied verbatim
|
|
in the target assembler source. This may be usefull for some rarely used instruction. For exmaple :<br />
|
|
<code>// Hope our Execution state block in System Management RAM is valid otherwise crash-boom<br />
|
|
! RSM</code><br />
|
|
The most likely reason you may emit literal assembler code is for floating point operations which
|
|
are not supported by the X# compiler. However these kind of operations is rarely encountered at an
|
|
OS kernel level.
|
|
</body>
|
|
</html>
|
|
|