Monday, July 30, 2007

Amazing script

Script: Amazing Draggable Layer

Functions: This script implements a draggable layer that
can be used much like a popup window... but with-
out the usual focus problems that popups often
imply. Also included are simple controls to show
or hide the draggable layer. Compatible with
NS4-7 & IE.

Comments: The script is in two parts. A JavaScript
in the head
section of your page.

There are no setups or changes required.







==============================================================



STEP 2.
Inserting The Layer Code In Your Page

Insert the following code in the body of your page. It may be
placed either immediately after the tag or
immediately before the tag.

This is essentially a couple of nested tables inside a


Colors and spacing are set with the usual table features
(bgcolor, cellpadding).

The width, height, left and top position are set in the style
in the
tag.

Likewise, if you want the layer to be initially invisible, set
visibility:visible instead to visibility:hidden in the style.

(To show or hide the layer from a JavaScript link function
call see the following Step 3.)

Your content goes in the commented area, as shown. It can be
most any html code or text, though additional div or table
tags within the designated content area should be done with
care, and checked in all browser versions.

To change the titlebar text, find the words Layer Title and
replace them with your title.





















Layer Title



X


This is where your content goes.

It can be any html code or text.

Remember to feed the reindeer.

Avoid chewable giblet curtains.








==============================================================



STEP 3. (Optional)
Using Show And Hide Controls

The layer can be shown or hidden via simple function calls.

To show the layer:

show

To hide the layer:

hide



============================[end]=============================

Friday, July 27, 2007

VBScript Coding Conventions

What Are Coding Conventions?
Coding conventions are suggestions that may help you write code using Microsoft Visual Basic Scripting Edition. Coding conventions can include the following:
Naming conventions for objects, variables, and procedures
Commenting conventions
Text formatting and indenting guidelines
The main reason for using a consistent set of coding conventions is to standardize the structure and coding style of a script or set of scripts so that you and others can easily read and understand the code. Using good coding conventions results in precise, readable, and unambiguous source code that is consistent with other language conventions and as intuitive as possible.
Constant Naming Conventions
Earlier versions of VBScript had no mechanism for creating user-defined constants. Constants, if used, were implemented as variables and distinguished from other variables using all uppercase characters. Multiple words were separated using the underscore (_) character. For example:
USER_LIST_MAX
NEW_LINE

While this is still an acceptable way to indentify your constants, you may want to use an alternative naming scheme, now that you can create true constants using the Const statement. This convention uses a mixed-case format in which constant names have a "con" prefix. For example:
conYourOwnConstant

Variable Naming Conventions
For purposes of readability and consistency, use the following prefixes with descriptive names for variables in your VBScript code.
Subtype Prefix Example
Boolean bln blnFound
Byte byt bytRasterData
Date (Time) dtm dtmStart
Double dbl dblTolerance
Error err errOrderNum
Integer int intQuantity
Long lng lngDistance
Object obj objCurrent
Single sng sngAverage
String str strFirstName


Variable Scope
Variables should always be defined with the smallest scope possible. VBScript variables can have the following scope.
Scope Where Variable Is Declared Visibility
Procedure-level Event, Function, or Sub procedure Visible in the procedure in which it is declared
Script-level HEAD section of an HTML page, outside any procedure Visible in every procedure in the script


Variable Scope Prefixes
As script size grows, so does the value of being able to quickly differentiate the scope of variables. A one-letter scope prefix preceding the type prefix provides this, without unduly increasing the size of variable names.
Scope Prefix Example
Procedure-level None dblVelocity
Script-level s sblnCalcInProgress


Descriptive Variable and Procedure Names
The body of a variable or procedure name should use mixed case and should be as complete as necessary to describe its purpose. In addition, procedure names should begin with a verb, such as InitNameArray or CloseDialog.
For frequently used or long terms, standard abbreviations are recommended to help keep name length reasonable. In general, variable names greater than 32 characters can be difficult to read. When using abbreviations, make sure they are consistent throughout the entire script. For example, randomly switching between Cnt and Count within a script or set of scripts may lead to confusion.

Object Naming Conventions
The following table lists recommended conventions for objects you may encounter while programming VBScript.
Object type Prefix Example
3D Panel pnl pnlGroup
Animated button ani aniMailBox
Check box chk chkReadOnly
Combo box, drop-down list box cbo cboEnglish
Command button cmd cmdExit
Common dialog dlg dlgFileOpen
Frame fra fraLanguage
Horizontal scroll bar hsb hsbVolume
Image img imgIcon
Label lbl lblHelpMessage
Line lin linVertical
List Box lst lstPolicyCodes
Spin spn spnPages
Text box txt txtLastName
Vertical scroll bar vsb vsbRate
Slider sld sldScale


Code Commenting Conventions
All procedures should begin with a brief comment describing what they do. This description should not describe the implementation details (how it does it) because these often change over time, resulting in unnecessary comment maintenance work, or worse, erroneous comments. The code itself and any necessary inline comments describe the implementation.
Arguments passed to a procedure should be described when their purpose is not obvious and when the procedure expects the arguments to be in a specific range. Return values for functions and variables that are changed by a procedure, especially through reference arguments, should also be described at the beginning of each procedure.

Procedure header comments should include the following section headings. For examples, see the "Formatting Your Code" section that follows.

Section Heading Comment Contents
Purpose What the procedure does (not how).
Assumptions List of any external variable, control, or other element whose state affects this procedure.
Effects List of the procedure's effect on each external variable, control, or other element.
Inputs Explanation of each argument that isn't obvious. Each argument should be on a separate line with inline comments.
Return Values Explanation of the value returned.
Remember the following points:

Every important variable declaration should include an inline comment describing the use of the variable being declared.
Variables, controls, and procedures should be named clearly enough that inline comments are only needed for complex implementation details.
At the beginning of your script, you should include an overview that describes the script, enumerating objects, procedures, algorithms, dialog boxes, and other system dependencies. Sometimes a piece of pseudocode describing the algorithm can be helpful.
Formatting Your Code
Screen space should be conserved as much as possible, while still allowing code formatting to reflect logic structure and nesting. Here are a few pointers:
Standard nested blocks should be indented four spaces.
The overview comments of a procedure should be indented one space.
The highest level statements that follow the overview comments should be indented four spaces, with each nested block indented an additional four spaces. For example:

'*********************************************************
' Purpose: Locates the first occurrence of a specified user
' in the UserList array.
' Inputs: strUserList(): the list of users to be searched.
' strTargetUser: the name of the user to search for.
' Returns: The index of the first occurrence of the strTargetUser
' in the strUserList array.
' If the target user is not found, return -1.
'*********************************************************

Function intFindUser (strUserList(), strTargetUser)
Dim i ' Loop counter.
Dim blnFound ' Target found flag
intFindUser = -1
i = 0 ' Initialize loop counter
Do While i <= Ubound(strUserList) and Not blnFound
If strUserList(i) = strTargetUser Then
blnFound = True ' Set flag to True
intFindUser = i ' Set return value to loop count
End If
i = i + 1 ' Increment loop counter
Loop
End Function


--------------------------------------------------------------------------------

Tuesday, July 24, 2007

Code Conventions for the JavaScript Programming Language

This is a set of coding conventions and rules for use in JavaScript programming. It is inspired by the Sun document Code Conventions for the Java Programming Language. It is heavily modified of course because JavaScript is not Java.

The long-term value of software to an organization is in direct proportion to the quality of the codebase. Over its lifetime, a program will be handled by many pairs of hands and eyes. If a program is able to clearly communicate its structure and characteristics, it is less likely that it will break when modified in the never-too-distant future.

Code conventions can help in reducing the brittleness of programs.

JavaScript Files
JavaScript programs should be stored in and delivered as .js files.

JavaScript code should not be embedded in HTML files unless the code is specific to a single session. Code in HTML adds significantly to pageweight with no opportunity for mitigation by caching and compression.

< script src=filename.js > tags should be placed as late in the body as possible. This reduces the effects of delays imposed by script loading on other page components. There is no need to use the language or type attributes. It is the server, not the script tag, that determines the MIME type.

Indentation
The unit of indentation is four spaces. Use of tabs should be avoided because (as of this writing in the 21st Century) there still is not a standard for the placement of tabstops. The use of spaces can produce a larger filesize, but the size is not significant over local networks, and the difference is eliminated by minification.

Line Length
Avoid lines longer than 80 characters. When a statement will not fit on a single line, it may be necessary to break it. Place the break after an operator, ideally after a comma. A break after an operator decreases the likelihood that a copy-paste error will be masked by semicolon insertion. The next line should be indented 8 spaces.

Comments
Be generous with comments. It is useful to leave information that will be read at a later time by people (possibly yourself) who will need to understand what you have done. The comments should be well-written and clear, just like the code they are annotating. An occasional nugget of humor might be appreciated. Frustrations and resentments will not.

It is important that comments be kept up-to-date. Erroneous comments can make programs even harder to read and understand.

Make comments meaningful. Focus on what is not immediately visible. Don't waste the reader's time with stuff like

i = 0; // Set i to zero.

Generally use line comments. Save block comments for formal documentation and for commenting out.

Variable Declarations
All variables should be declared before used. JavaScript does not require this, but doing so makes the program easier to read and makes it easier to detect undeclared variables that may become implied globals.

The var statements should be the first statements in the function body.

It is preferred that each variable be given its own line and comment. They should be listed in alphabetical order.

var currentEntry; // currently selected table entry
var level; // indentation level
var size; // size of table

JavaScript does not have block scope, so defining variables in blocks can confuse programmers who are experienced with other C family languages. Define all variables at the top of the function.

Use of global variables should be minimized. Implied global variables should never be used.

Function Declarations
All functions should be declared before they are used. Inner functions should follow the var statement. This helps make it clear what variables are included in its scope.

There should be no space between the name of a function and the ( (left parenthesis) of its parameter list. There should be one space between the ) (right parenthesis) and the { (left curly brace) that begins the statement body. The body itself is indented four spaces. The } (right curly brace) is aligned with the line containing the beginning of the declaration of the function.

function outer(c, d) {
var e = c * d;

function inner(a, b) {
return (e * a) + b;
}

return inner(0, 1);
}

This convention works well with JavaScript because in JavaScript, functions and object literals can be placed anywhere that an expression is allowed. It provides the best readability with inline functions and complex structures.

function getElementsByClassName(className) {
var results = [];
walkTheDOM(document.body, function (node) {
var a; // array of class names
var c = node.className; // the node's classname
var i; // loop counter

// If the node has a class name, then split it into a list of simple names.
// If any of them match the requested name, then append the node to the set of results.

if (c) {
a = c.split(' ');
for (i = 0; i < a.length; i += 1) {
if (a[i] === className) {
results.push(node);
break;
}
}
}
});
return results;
}

If a function literal is anonymous, there should be one space between the word function and the ( (left parenthesis). If the space is omited, then it can appear that the function's name is function, which is an incorrect reading.

div.onclick = function (e) {
return false;
};

that = {
method: function () {
return this.datum;
},
datum: 0
};

Use of global functions should be minimized.

Names
Names should be formed from the 26 upper and lower case letters (A .. Z, a .. z), the 10 digits (0 .. 9), and _ (underbar). Avoid use of international characters because they may not read well or be understood everywhere. Do not use $ (dollar sign) or \ (backslash) in names.

Do not use _ (underbar) as the first character of a name. It is sometimes used to indicate privacy, but it does not actually provide privacy. If privacy is important, use the forms that provide private members. Avoid conventions that demonstrate a lack of competence.

Most variables and functions should start with a lower case letter.

Constructor functions which must be used with the new prefix should start with a capital letter. JavaScript issues neither a compile-time warning nor a run-time warning if a required new is omitted. Bad things can happen if new is not used, so the capitalization convention is the only defense we have.

Global variables should be in all caps. (JavaScript does not have macros or constants, so there isn't much point in using all caps to signify features that JavaScript doesn't have.)

Statements
Simple Statements
Each line should contain at most one statement. Put a ; (semicolon) at the end of every simple statement. Note that an assignment statement which is assigning a function literal or object literal is still an assignment statement and must end with a semicolon.

JavaScript allows any expression to be used as a statement. This can mask some errors, particularly in the presence of semicolon insertion. The only expressions that should be used as statements are assignments and invocations.

Compound Statements
Compound statements are statements that contain lists of statements enclosed in { } (curly braces).

The enclosed statements should be indented four more spaces.
The { (left curly brace) should be at the end of the line that begins the compound statement.
The } (right curly brace) should begin a line and be indented to align with the beginning of the line containing the matching { (left curly brace).
Braces should be used around all statements, even single statements, when they are part of a control structure, such as an if or for statement. This makes it easier to add statements without accidentally introducing bugs.
Labels
Statement labels are optional. Only these statements should be labeled: while, do, for, switch.

return Statement
A return statement with a value should not use ( ) (parentheses) around the value. The return value expression must start on the same line as the return keyword in order to avoid semicolon insertion.

if Statement
The if class of statements should have the following form:

if (condition) {
statements;
}

if (condition) {
statements;
} else {
statements;
}

if (condition) {
statements;
} else if (condition) {
statements;
} else {
statements;
}

for Statement
A for class of statements should have the following form:

for (initialization; condition; update) {
statements;
}

for (variable in object) {
statements;
}

The first form should be used with arrays.

The second form should be used with objects. Be aware that members that are added to the prototype of the object will be included in the enumeration. It is wise to program defensively by using the hasOwnProperty method to distinguish the true members of the object.:

for (variable in object) {
if (object.hasOwnProperty(variable)) {
statements;
}
}

while Statement
A while statement should have the following form:

while (condition) {
statements;
}

do Statement
A do statement should have the following form:

do {
statements;
} while (condition);

Unlike the other compound statements, the do statement always ends with a ; (semicolon).

switch Statement
A switch statement should have the following form:

switch (expression) {
case expression:
statements;
default:
statements;
}


Each case is aligned with the switch. This avoids over-indentation.

Each group of statements (except the default) should end with break, return, or throw. Do not fall through.

try Statement
The try class of statements should have the following form:

try {
statements;
} catch (variable) {
statements;
}

try {
statements;
} catch (variable) {
statements;
} finally {
statements;
}

continue Statement
Avoid use of the continue statement. It tends to obscure the control flow of the function.

with Statement
The with statement should not be used.

Whitespace
Blank lines improve readability by setting off sections of code that are logically related.


Blank spaces should be used in the following circumstances:

A keyword followed by ( (left parenthesis) should be separated by a space.
while (true) {

A blank space should not be used between a function value and its ( (left parenthesis). This helps to distinguish between keywords and function invocations.
All binary operators except . (period) and ( (left parenthesis) and [ (left bracket) should be separated from their operands by a space.
No space should separate a unary operator and its operand except when the operator is a word such as typeof.
Each ; (semicolon) in the control part of a for statement should be followed with a space.
Whitespace should follow every , (comma).
Bonus Suggestions
{} and []
Use {} instead of new Object(). Use [] instead of new Array().

Use arrays when the member names would be sequential integers. Use objects when the member names are arbitrary strings or names.

, (comma) Operator
Avoid the use of the comma operator except for very disciplined use in the control part of for statements. (This does not apply to the comma separator, which is used in object literals, array literals, var statements, and parameter lists.)

Block Scope
In JavaScript blocks do not have scope. Only functions have scope. Do not use blocks except as required by the compound statements.

Assignment Expressions
Avoid doing assignments in the condition part of if and while statements.

Is

if (a = b) {

a correct statement? Or was

if (a == b) {

intended? Avoid constructs that cannot easily be determined to be correct.

=== and !== Operators.
It is almost always better to use the === and !== operators. The == and != operators do type coercion. In particular, do not use == to compare against falsy values.

Confusing Pluses and Minuses
Be careful to not follow a + with + or ++. This pattern can be confusing. Insert parens between them to make your intention clear.

total = subtotal + +myInput.value;

is better written as

total = subtotal + (+myInput.value);

so that the + + is not misread as ++.

eval is Evil
The eval function is the most misused feature of JavaScript. Avoid it.

eval has aliases. Do not use the Function constructor. Do not pass strings to setTimeout or setInterval.

Friday, July 20, 2007

Comparision Oracle & SQL

SQL Factoids
Factoid Numbers to Remember
MS-SQL7 Oracle 8i mySQL
Max. size of SQL Procedure - 64KB
Tables per database 2 billion
Columns per table 1,024
Max characters in Column Name ? 30
max # of columns in composite index 16
max nonclustered indexes per table 249
max key bytes 900
page extent size 64 KB
page size 8 KB (8,192)
max bytes per row 8,060
max char bytes 8,000 255
max int value 2.147 billion (2^31)
max smallint value 32,767
max smallmoney value $214,748
money decimal precision 4 (.4444)

To invoke MS-SQL automatically, the SQL service Manager is placed in the Startup folder with a Target of:


E:\MSSQL7\Binn\sqlmangr.exe /n
Execution Environments - Oracle vs. Microsoft

This table cross-references the jargon for the same concept from both products.

Action Oracle SQL Microsoft SQL
Tool for interactive input, store, and running of stored procedures C:>SQLPLUSW userid/password@what.world Run SQLW
Run isqlw for SQL Query Analyzer

To control output set serveroutput on
set serverout off -
Send output to a file spool myoutput.out -
Stop sending output spool off -
Send output file to printer (for lpr) spool out -
To invoke SQL statements stored in file named my.sql start my.sql -
Return to the OS exit -
Language Name PL/SQL - Procedural Language Transact-SQL
Read product documentation - For Books online: Within SQLW, Help toolbar -> Building SQL Server Applications -> Transact-SQL Reference -> System Stored Procedures (T-SQL)
View properties about an object select * from all_views sp_help 'object_name'
or
sp_helptext 'object_name'
for unencrypted comments in the syscomments system catalog table.
Text source
of all stored objects belonging to the user user_source -
Tool to trap activity
between client app and SQL to a flat file - SQL Server Profiler
Comment operator -- double hyphen or between /* and */ same
To execute a procedure in the buffer start
or
@ (ampersand) ?
To execute SQL script in the buffer . dot
run;
or
/ (slash) GO
or
F5 key or Ctrl-E
To concatenate and print DBMS_OUTPUT.PUT_LINE ( 'hello' || v_name ); -
Ending a block period ends each PL/SQL block ; semicolon ends each SQL block.


Every time an Oracle user invokes SQL*Plus, two scripts are also automatically executed:

the Site Profile glogin.sql for all users defines column formats. By default, environment variable SQLPATH points to this file at $ORACLE_HOME/sqlplus/admin
the login.sql for the user.




Commands
Oracle classifies and separates SQL commands into two distinct groups:


DML (Data Manipulation Language) commands change data in tables under transaction control (can be undone).
DDL (Data Definition Language) commands change metadata of tables containing data. These commands, such as TRUNCATE, require a COMMIT. PL/SQL does not support ANSI-SQL's data definition commands.


Oracle SQL commands:
ALTER
ANALYZE
AUDIT
CONNECT
CREATE
DELETE
DROP
GRANT
INSERT
LOCK
NOAUDIT
RENAME
REVOKE
SELECT
SET ROLE
SET TRANSACTION
TRUNCATE
UPDATE



To define a bind variable in Oracle's SQL*Plus:

PROMPT
ACCEPT
VARIABLE x NUMBER
To define a Oracle PL/SQL procedural block using an explicit cursor:


DECLARE --optional PL/SQL user variables, cursors, local subprograms, exceptions:
v1 NUMBER(3);
v_empno employee.empno%TYPE;
CURSOR emptyp_cur IS
SELECT emptyp.type_desc
FROM employee_type
WHERE type_code = :emp.type_code
BEGIN --mandatory data manipulation statement(s):
v1 := 3;
DBMS_OUTPUT.PUT_LINE('v1= ' || v1);
IF NOT emptyp_cur%ISOPEN THEN
OPEN emptyp_cur;
END IF;
FETCH emptyp_cur INTO :emp.type_desc;
CLOSE emptyp_cur;
INSERT INTO employee VALUES (6, 'TOM LEE', 10000);
UPDATE employee SET sal = sal + 5000 WHERE empno = 6;
SELECT sal INTO v_sal FROM employee WHERE empno = 6;
COMMIT;
EXECUTE procedure

EXCEPTION --optional error handling code
WHEN exception
THEN null; passes control to the next statement.
END; --mandatory
.
To print a bind variable in SQL*Plus:


PRINT :x; --Note the colon to designate an Oracle bind variable
Clauses
Column
clause Table
clause
SELECT * FROM SchemaOwner.TableName tuple ;




Loading Data Into Databases
$99 DBLoad shareware provides a GUI to load data from text delimited files and to transfer data from among databases of different vendors (MySQL to Oracle, etc.), The package can be setup to transfer on a set schedule.


# Manually load a text delimited file named "ImportFile.csv" on a PC into MySQL table1:
LOAD DATA LOCAL INFILE "./ImportFile.csv"
INTO TABLE table1
FIELDS TERMINATED BY ","
OPTIONALLY ENCLOSED BY """"
LINES TERMINATED BY "\r\n"
(field1, filed2, field3);


The order of fields in the flat file should match fields in the database.
If your file is delimited by tabs, use instead: FIELDS TERMINATED BY "\t"

This assumes that there is a valid .my.cnf file with entries such as:


[client]
user = DBUSERNAME
password = DBPASSWORD
host = DBSERVER
[mysql]
database = DBNAME


This also assumes that permissions have been changed on the file to make it writable with a shell command such as:


chmod 600 /.my.cnf

Scriptella ETL (Extract-Transform-Load) tool is a Apache Open sourced Java program that uses XML syntax to load CSV data into databases Javadoca on it. Notes on this





Formatting and Transforms
The lack of a value is called "NULL" -- the value of columns before being populated with actual values. Null values are not returned by SQL queries. Comparing a variable to the reserved word NULL will return NULL rather than TRUE. So to transform nulls to recognizable value, use a special built-in (datatype sensitive) function:


Oracle SQL Microsoft SQL
NVL( column_name, value_if_null )
NVL( num_field, 0 )
NVL( txt_field, "NULL" ) ISNULL( column_name, value_if_null )
ISNULL( price, $0.00 )
ISNULL( SSN, 'NNN-NN-NNNN' )

To convert a small number of fixed values in Oracle:


decode( weekday, 1,'Sun', 2,'Mon', 3,'Tue', 4,'Wed', 5,'Thu', 6,'Fri', 7,'Sun', '???')
To calculate numbers or display text not from any table, use the special dummy table owned by Oracle user sys:


select 1+1 FROM dual



Variables (temporary holding areas)
PL/SQL prompts entry of values for variables preceded with an ampersand and ending with a semi-colon, such as WHERE empid = &empid;

To override Oracle's default prompt text, precede the SELECT statement with

ACCEPT var PROMPT 'Enter ...'
SQL*Plus preserves values for variables preceded with a double ampersand.

Type a slash and press Enter to rerun statements. SQL*Plus stores and reuses the most recently executed SQL statement in file afiedt.buf.




Data Definitions
Action Oracle SQL Microsoft SQL
Datatypes and Value Assignment skalar char, number, long, date, and varchar2 as in
price := qty*(cost*2.5);
plus
v_valid_order boolean not null := true;
and
tables -
Set date format just for current session alter session set nls_date_format = 'YYYY-MON-DD-HH24:MI:ss' SET DATEFORMAT
Create Custom Datatype - EXEC sp_addtype @typename=typeSSN, @phystype='CHAR(11)', @nulltype='NOT NULL'
Create a Table with a Check Constraint
using a custom data type typeSSN - CREATE TABLE SSNTable
( EmployeeID INT PRIMARY KEY,
EmployeeSSN typeSSN,
CONSTRAINT SSNCheck CHECK
( EmployeeSSN LIKE '[0-9][0-9][0-9]—[0-9][0-9]—[0-9][0-9][0-9][0-9]' )
)




Data Integrity Constraints
The preferred method for maintaining data integrity is to add constraints associated with tables. Rules and default objects are, as my tween daughter says, “so last season”!
Type of Constraint Oracle MS-SQL
list constraints select constraint_name from user_constraints; ?
Domain integrity
by limiting acceptable values for a column entry. CHECK
PRIMARY KEY
A column (or a composite index of up to 16 columns) used to uniquely identify a row in a table. A table can only have one. CONSTRAINT tab1_pk PRIMARY KEY (col1, col2)
If this is not specified, the MS-SQL default is to create a clustered index. Tables created without a clustered index are called heaps.

UNIQUE Entity integrity
A non-primary key column is unique among all rows. By default, a nonclustered index is created for each such column. Each combination of columns that could uniquely identify rows in a table is called a candidate key.
NOT NULL Special processing for null values degrade performance.
FOREIGN KEY
A foreign key in one table points to a primary key in another table. ALTER TABLE table1
col_a INT NULL
CONSTRAINT tab1_fk1
REFERENCE table2(tab1_col_pk)
IDENTITY property
(starting IdentitySeed, IdentityIncrement) ALTER TABLE table1
ADD Identity_column INT IDENTITY(1,1)
GO
Whether a table has an identity column can be determined using the OBJECTPROPERTY function. This can be selected using the IDENTITYCOL keyword.




Default Objects
This replaces sp_bindefault and sp_bindrule objects SQL7 now recognizes only for backward compatibility.

USE table1
GO
CREATE DEFAULT default_par AS 72
GO
sp_bindefault default_par, 'scores.par'
By default, table names specified in the FROM clause have an implicit schema associated with the user's login. A schema is a logical grouping of database objects based on the user/owner. Tables in another schema can be specified if the other schema is prepended to the table:


FROM other_schema.other_table
Oracle supplied built-in package UTL_FILE used to read or write text operating system files from within PL/SQL.




Indexes
Action Oracle SQL Microsoft SQL
Create Index - Syntax:
CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED]
INDEX index_name ON table (column [,...n])
[WITH [PAD_INDEX]
[[,] FILLFACTOR = fillfactor]
[[,] IGNORE_DUP_KEY]
[[,] DROP_EXISTING]
[[,] STATISTICS_NONRECOMPUTE]
[ON filegroup]

MS-SQL7 stores as many data rows as can fit in 8KB pages.
To support sequential retrieval of data, each data page contains a header record containing a row locator which points to the previous and next data page.
To arrange for fast sequential access, data in a table may be arranged physically to a key (such as a sequentially assigned employee number or the date/timestamp of entry). Such a table is usually physically pre-sorted according to its clustered index key. This takes 120% more space than the space of data alone. If a new row is inserted in the middle, data rows are physically moved to make room for it. So it's not a good idea to cluster index a table by a column which is not sequentially input. This allows queries to be answered without the time-consuming task of sorting.
In MS-SQL7, key values from several fields (such as last name, first name, middle initial, etc.) are stored in composite index spanning up to 16 columns.
To speed up the random retrieval of data pages, clustered indexes are created and stored on index pages apart from the data pages so that sorting is performed on pointers to the data instead of the entire data pages.
Clustered indexes are organized using a technique called B-trees, which are depicted visually as a triangular upside-down tree (like a pyramid).
Pages containing data are called leaf nodes at the bottom layer.
The top node of the B-tree pyramid is the sysindexes.root node.
A lookup traverses the Intermediate nodes in the middle.
To avoid the need to shuffle index pages, indexes are built with only a percentage full. FILLFACTOR=100 specifies full index leaf, which limits reads to resolve queries. This reduces space usage at the expense of time for page splits during inserts and updates.
The PAD_INDEX option specifies intermediate page fullness.
Index fields should be created as 'NOT NULL'.
When IGNORE_DUP_KEY is specified, SQL ignores individual duplicates but not the entire request.



SQL Indexing Architecture
Microsoft's Index Tuning Wizard uses a workload file generated by the SQL Server Profiler during a trace run and presented in a Index Usage Report which documents the percentage of queries resulted by recommended indexes.

Download this Visio 2000 file


Index Work
Action Oracle MS-SQL
Create statistics on every column supporting statistics in the current database sp_createstats
Create statistics data from the table. CREATE STATISTICS
When were statistics last updated? STATS_DATE
Determines whether a particular statistic will be automatically updated. sp_createstats
The density of an existing statistic set. DBCC_SHOW_STATISTICS
Update statistics data from the table. UPDATE STATISTICS
Remove statistics data from the table. DROP STATISTICS

Thursday, July 19, 2007

Coding with EJB

courtsey codeczar

Design Pattern
To simplify the process of interacting with Entity Beans, a standard pattern will be applied to every entity bean. Class Purpose
Entity Bean The abstract entity bean class. Contains no logic, only concerned with persisting state.
Session Bean Responsible for performing business logic. Relies upon entity beans to persist any data.
Entity Facade EJBObject for a Stateless Session Bean providing methods to create, find, update and remove the Entity Bean.
Entity Facade Util Utility to lookup the Entity Facade home interface.
Entity Value Value Object (synonym Transfer Object) encapsulating data contained within Entity Bean.
Session Bean Remote Remote Interface to Session Bean
Session Bean Local Local Interface to Session Bean
Session Bean Util Utility to access the session bean's home and local home interfaces.
The EntityBean and SessionBean are provided by the bean provider. The Facade, FacadeUtil, EntityValue, Session, SessionLocal and SessionUtil classes are used by the Application Developer. There are many other support classes required by the ejb framework, these are not listed here as they are not used by either the Bean Provider or the Application Developer.

Naming Conventions
Specific Class naming conventions are given against each ejb class type.

Package naming should follow this pattern:

com.codeczar.[application].[entitygroup].ejb
e.g.
com.codeczar.ri.artist.ejb

Formatting Conventions
Documentation Conventions
Structural Conventions
All codeczar projects adopt the Maven project structure for ejb sources:

|
|-src all sources
| |-java java (bean) sources
|
|-target
| |-xdoclet all generated xdoclet files
| | |-ejb ejb-jar descriptor files
| | |-ejbdoclet generated java sources - interfaces, facades, utils etc.
| |-classes compiled java classes
|

Entity Bean
The abstract entity bean class. Contains no logic, only concerned with persisting state. Naming Convention FooBarEntityBean
Example ArtistEntityBean
Javadoc Javadoc
Written By Bean Developer
Used By Enitity Facade Classes, Session Beans
Entity Beans are to only be used locally, as such no Remote or Remote Home interfaces are generated.

Session Bean
Responsible for performing business logic. Relies upon and other Session Beans for logic and Entity Facades for persistence. Naming Convention FooBarSessionBean
Example ArtistSessionBean
Javadoc Javadoc
Written By Bean Developer
Used By Enitity Facade Classes, Session Beans


Entity Facade
EJBObject for a Stateless Session Bean providing methods to create, find, update and remove the Entity Bean. Naming Convention FooBarEntityFacade
Example ArtistEntityFacade
Javadoc Javadoc
Written By xdoclet generated
Used By Session Beans, Application Code


Entity Facade Util
Utility to lookup the Entity Facade home interface. Naming Convention FooBarEntityFacadeUtil
Example ArtistEntityFacadeUtil
Javadoc Javadoc
Written By xdoclet generated
Used By Session Beans, Application Code


Entity Value
Value Object (synonym Transfer Object) encapsulating data contained within an Entity Bean. Naming Convention FooBarEntityValue
Example ArtistEntityValue
Javadoc Javadoc
Written By xdoclet generated
Used By Session Beans, Application Code
All communication in and out of the ejb layer should be using Entity Value Objects, or fields to locate or operate on an entity.

Session Bean Remote
Remote Interface to Session Bean. Naming Convention FooBarSession
Example ArtistSession
Javadoc Javadoc
Written By xdoclet generated
Used By Session Beans, Application Code
All external communication with the ejb layer should be through stateless session bean remote methods.

Session Bean Local
Local Interface to Session Bean. Naming Convention FooBarSessionLocal
Example ArtistSessionLocal
Javadoc Javadoc
Written By xdoclet generated
Used By Session Beans
Whenever a session bean calls another session bean, it should be through this local interface.

Session Bean Util
Utility to lookup the home and local home instances for a Session Bean. Naming Convention FooBarSessionUtil
Example ArtistSessionUtil
Javadoc Javadoc
Written By xdoclet generated
Used By Session Beans, Application Code


Finder Methods
Finder Methods for a specific Entity Bean made available via the related Entity Facade.

Single Object finder methods should return null if ObjectNotFoundException is thrown. Single Object finder methods are likely to be called after some previous mechanism to retrieve the identities. Returning null shouldn't really happen so returning null which could result in a null pointer is probably the correct thing to do.

Multi Object finder methods should never throw ObjectNotFoundException but rather an empty collection.

FinderExceptions. These should indicate that something went wrong and so should be wrapped in a runtime exception and rethrown.

Foreign Key Attributes
It is not necessary to have an attribute for foreign key columns when a cmr relationship exists. The container will maintain the DB fields. The field if needed is available via the CMR relationship. This prevents redundant code.

Primary Keys
Every entity will have the same primary key 'id' field: - java.lang.Integer. This is defined by the entity tags.

Class level tags
* @ejb.bean
* primkey-field="id"
* @ejb.pk
* class="java.lang.Integer"
* generate="false"

Method level tags
* @ejb:pk-field
* @ejb:persistence
* @jboss:persistence
* auto-increment="true"
*/
public abstract Integer getId();
Having a standard primary key mechanism makes life easier when it comes to creating foreign keys and auto-incrementing indexes. If there are existing unique identifiers, add these and associated finder methods.

Tuesday, July 17, 2007

On-Access ("resident") virus scanning with clamav + dazuko

copyrighted of Ubuntu fourms

I managed to bring alive ClamAV's On-Access virus scanning feature (so it detects virus infected files at once when they are accessed/opened) here's how:

NOTE: viruses don't really threat Linux boxes. Virus scanners are actually scan for W32 and MSOffice viruses - this can be handy if you multi boot Ubuntu with a Win or don't want to infect your friends. The HOWTO may seem a bit complicated at first - but it really is not well a bit... But I bet you'll enjoy it if you accomplish it!

1. install clamav and clamav-daemon with apt

Code:
sudo apt-get install clamav clamav-daemon2. install dazuko. NOTE: Ubuntu/Debian package is quite old, use the latest one from dazuko.org like this:

Code:
wget http://www.dazuko.org/files/dazuko-source_2.0.6-1_all.deb
sudo dpkg -i dazuko-source_2.0.6-1_all.deb3. configuration of Dazuko is much easier with module-assistant this way:

Code:
sudo apt-get install module-assistant
m-a a-i dazuko4. After you have the Dazuko kernel module you can't modprobe it yet - because kernel module capability conflicts with it. Solution is that you load dazuko PRIOR capability (this way both will work). You can achieve it this way:

Code:
sudo gedit /etc/modulesand write in dazuko in this file (I've written it before ide-cd but I don't think it matters).

5. Reboot the computer! You can test if dazuko loaded properly by issuing the "lsmod | grep dazuko" command.

6. Now the problem part: Ubuntu's official clamav package was made without clamuko support unfortunately (clamuko is clamav's interface to interact with dazuko - tricky, eh?). No problem the effort will worth the result - believe me. We will rebuild Ubuntu's clamav to our needs:

Code:
sudo apt-get install fakeroot build-essential
sudo apt-get build-dep clamav
sudo apt-get source clamav7. These commands will setup the required build environment and source files for the package rebuild. You should have a clamav-0.83/ subdir by now so

Code:
cd clamav-0.83/
gedit debian/rulesand replace

Code:
./configure --host=$(DEB_HOST_GNU_TYPE) --build=$(DEB_BUILD_GNU_TYPE) --prefix=/usr --mandir=\$${prefix}/share/man --infodir=\$${prefix}/share/info --disable-clamav --with-dbdir=/var/lib/clamav/ --sysconfdir=/etc/clamav --enable-milter --with-tcpwrappers --disable-clamuko --with-gnu-ld --with-libcurl --with-dnswith

Code:
./configure --host=$(DEB_HOST_GNU_TYPE) --build=$(DEB_BUILD_GNU_TYPE) --prefix=/usr --mandir=\$${prefix}/share/man --infodir=\$${prefix}/share/info --disable-clamav --with-dbdir=/var/lib/clamav/ --sysconfdir=/etc/clamav --enable-milter --with-tcpwrappers --with-gnu-ld --with-libcurl --with-dns(don't let these scare you: the only difference between them is that we removed "--disable-clamuko" from the ./configure line)

8. Now rebuild the clamav packages with this command:

Code:
dpkg-buildpackage -rfakeroot -uc -us9. You'll find the new packages up one level in the dir hierarchy so do a

Code:
cd ..
sudo dpkg -i *.deb10. So we've the clamuko enabled packages installed now. Last step is to configure clamav-daemon by editing /etc/clamav/clamd.conf:

Code:
sudo gedit /etc/clamav/clamd.confInsert the following:

Code:
ClamukoScanOnAccess
ClamukoScanOnOpen
ClamukoScanArchive
ClamukoIncludePath /homeAlso replace "User clamav" with "User root" to let things work (* will need a fix for this - give your ideas plz *)

11. Restart clamav-daemon to make changes take effect:

Code:
sudo invoke-rc.d clamav-daemon restart12. You'd like to see how it's working?

You can grab the "eicar" test virus (no malicious code, just for testing) from here:
http://www.eicar.org/anti_virus_test_file.htm

You can also try it with real-life (!) viruses from here:
http://vx.netlux.org/vl.php

You won't be able to open virus infected files/archives - clamav+dazuko immediately bans access to them (deletion is possible though). Also "tail /var/log/clamav/clamav.log" tells you about the virus. Try man clamd.conf for more options. I suggest to look after the "VirusEvent" option. The possibilities from here are limitless.

NOTE: Ubuntu's official clamav package and the dazuko-source (I bet it is not in main/restricted package are outdated. Backport has a more current clamav package but unfortunately both repos has the "clamuko" feature disabled and that's why on-access scanning doesn't work out-of-the-box. I hope this will change with time. I hope you liked my howto and clamav/dazuko. Clamav is btw an excellent virus scanner and is updated multiple times a day.
__________________

Sunday, July 15, 2007

RESIDENT VIRUS ONE

����������
INTERRUPTS
����������
DOS kindly provides us with a powerful method of enhancing itself, namely
memory resident programs. Memory resident programs allow for the extention
and alteration of the normal functioning of DOS. To understand how memory
resident programs work, it is necessary to delve into the intricacies of
the interrupt table. The interrupt table is located from memory location
0000:0000 to 0000:0400h (or 0040:0000), just below the BIOS information
area. It consists of 256 double words, each representing a segment:offset
pair. When an interrupt call is issued via an INT instruction, two things
occur, in this order:

1) The flags are pushed onto the stack.
2) A far call is issued to the segment:offset located in the interrupt
table.

To return from an interrupt, an iret instruction is used. The iret
instruction reverses the order of the int call. It performs a retf
followed by a popf. This call/return procedure has an interesting
sideeffect when considering interrupt handlers which return values in the
flags register. Such handlers must directly manipulate the flags register
saved in the stack rather than simply directly manipulating the register.

The processor searches the interrupt table for the location to call. For
example, when an interrupt 21h is called, the processor searches the
interrupt table to find the address of the interrupt 21h handler. The
segment of this pointer is 0000h and the offset is 21h*4, or 84h. In other
words, the interrupt table is simply a consecutive chain of 256 pointers to
interrupts, ranging from interrupt 0 to interrupt 255. To find a specific
interrupt handler, load in a double word segment:offset pair from segment
0, offset (interrupt number)*4. The interrupt table is stored in standard
Intel reverse double word format, i.e. the offset is stored first, followed
by the segment.

For a program to "capture" an interrupt, that is, redirect the interrupt,
it must change the data in the interrupt table. This can be accomplished
either by direct manipulation of the table or by a call to the appropriate
DOS function. If the program manipulates the table directly, it should put
this code between a CLI/STI pair, as issuing an interrupt by the processor
while the table is half-altered could have dire consequences. Generally,
direct manipulation is the preferable alternative, since some primitive
programs such as FluShot+ trap the interrupt 21h call used to set the
interrupt and will warn the user if any "unauthorised" programs try to
change the handler.

An interrupt handler is a piece of code which is executed when an interrupt
is requested. The interrupt may either be requested by a program or may be
requested by the processor. Interrupt 21h is an example of the former,
while interrupt 8h is an example of the latter. The system BIOS supplies a
portion of the interrupt handlers, with DOS and other programs supplying
the rest. Generally, BIOS interrupts range from 0h to 1Fh, DOS interrupts
range from 20h to 2Fh, and the rest is available for use by programs.

When a program wishes to install its own code, it must consider several
factors. First of all, is it supplanting or overlaying existing code, that
is to say, is there already an interrupt handler present? Secondly, does
the program wish to preserve the functioning of the old interrupt handler?
For example, a program which "hooks" into the BIOS clock tick interrupt
would definitely wish to preserve the old interrupt handler. Ignoring the
presence of the old interrupt handler could lead to disastrous results,
especially if previously-loaded resident programs captured the interrupt.

A technique used in many interrupt handlers is called "chaining." With
chaining, both the new and the old interrupt handlers are executed. There
are two primary methods for chaining: preexecution and postexecution. With
preexecution chaining, the old interrupt handler is called before the new
one. This is accomplished via a pseudo-INT call consisting of a pushf
followed by a call far ptr. The new interrupt handler is passed control
when the old one terminates. Preexecution chaining is used when the new
interrupt handler wishes to use the results of the old interrupt handler in
deciding the appropriate action to take. Postexecution chaining is more
straightforward, simply consisting of a jmp far ptr instruction. This
method doesn't even require an iret instruction to be located in the new
interrupt handler! When the jmp is executed, the new interrupt handler has
completed its actions and control is passed to the old interrupt handler.
This method is used primarily when a program wishes to intercept the
interrupt call before DOS or BIOS gets a chance to process it.

����������������������������������������
AN INTRODUCTION TO DOS MEMORY ALLOCATION
����������������������������������������
Memory allocation is perhaps one of the most difficult concepts, certainly
the hardest to implement, in DOS. The problem lies in the lack of official
documentation by both Microsoft and IBM. Unfortunately, knowledge of the
DOS memory manager is crucial in writing memory-resident virii.

When a program asks DOS for more memory, the operating system carves out a
chunk of memory from the pool of unallocated memory. Although this concept
is simple enough to understand, it is necessary to delve deeper in order to
have sufficient knowledge to write effective memory-resident virii. DOS
creates memory control blocks (MCBs) to help itself keep track of these
chunks of memory. MCBs are paragraph-sized areas of memory which are each
devoted to keeping track of one particular area of allocated memory. When
a program requests memory, one paragraph for the MCB is allocated in
addition to the memory requested by the program. The MCB lies just in
front of the memory it controls. Visually, a MCB and its memory looks
like:

���������������������������������������������Ŀ
� MCB 1 � Chunk o' memory controlled by MCB 1 �
�����������������������������������������������

When a second section of memory is requested, another MCB is created just
above the memory last allocated. Visually:

�����������������������������������Ŀ
� MCB 1 � Chunk 1 � MCB 2 � Chunk 2 �
�������������������������������������

In other words, the MCBs are "stacked" one on top of the other. It is
wasteful to deallocate MCB 1 before MCB 2, as holes in memory develop. The
structure for the MCB is as follows:

Offset Size Meaning
������ ������� �������
0 BYTE 'M' or 'Z'
1 WORD Process ID (PSP of block's owner)
3 WORD Size in paragraphs
5 3 BYTES Reserved (Unused)
8 8 BYTES DOS 4+ uses this. Yay.

If the byte at offset 0 is 'M', then the MCB is not the end of the chain.
The 'Z' denotes the end of the MCB chain. There can be more than one MCB
chain present in memory at once and this "feature" is used by virii to go
resident in high memory. The word at offset 1 is normally equal to the PSP
of the MCB's owner. If it is 0, it means that the block is free and is
available for use by programs. A value of 0008h in this field denotes DOS
as the owner of the block. The value at offset 3 does NOT include the
paragraph allocated for the MCB. It reflects the value passed to the DOS
allocation functions. All fields located after the block size are pretty
useless so you might as well ignore them.

When a COM file is loaded, all available memory is allocated to it by DOS.
When an EXE file is loaded, the amount of memory specified in the EXE
header is allocated. There is both a minimum and maximum value in the
header. Usually, the linker will set the maximum value to FFFFh
paragraphs. If the program wishes to allocate memory, it must first shrink
the main chunk of memory owned by the program to the minimum required.
Otherwise, the pathetic attempt at memory allocation will fail miserably.

Since programs normally are not supposed to manipulate MCBs directly, the
DOS memory manager calls (48h - 4Ah) all return and accept values of the
first program-usable memory paragraph, that is, the paragraph of memory
immediately after the MCB. It is important to keep this in mind when
writing MCB-manipulating code.

�������������������������
METHODS OF GOING RESIDENT
�������������������������
There are a variety of memory resident strategies. The first is the use of
the traditional DOS interrupt TSR routines, either INT 27h or INT
21h/Function 31h. These routines are undesirable when writing virii,
because they do not return control back to the program after execution.
Additionally, they show up on "memory walkers" such as PMAP and MAPMEM.
Even a doorknob can spot such a blatant viral presence.

The traditional viral alternative to using the standard DOS interrupt is,
of course, writing a new residency routine. Almost every modern virus uses
a routine to "load high," that is, to load itself into the highest possible
memory location. For example, in a 640K system, the virus would load
itself just under the 640K but above the area reserved by DOS for program
use. Although this is technically not the high memory area, it shall be
referred to as such in the remainder of this file in order to add confusion
and general chaos into this otherwise well-behaved file. Loading high can
be easily accomplished through a series of interrupt calls for reallocation
and allocation. The general method is:

1. Find the memory size
2. Shrink the program's memory to the total memory size - virus size
3. Allocate memory for the virus (this will be in the high memory area)
4. Change the program's MCB to the end of the chain (Mark it with 'Z')
5. Copy the virus to high memory
6. Save the old interrupt vectors if the virus wishes to chain vectors
7. Set the interrupt vectors to the appropriate locations in high memory

When calculating memory sizes, remember that all sizes are in paragraphs.
The MCB must also be considered, as it takes up one paragraph of memory.
The advantage of this method is that it does not, as a rule, show up on
memory walkers. However, the total system memory as shown by such programs
as CHKDSK will decrease.

A third alternative is no allocation at all. Some virii copy themselves to
the memory just under 640K, but fail to allocate the memory. This can have
disastrous consequences, as any program loaded by DOS can possibly use this
memory. If it is corrupted, unpredictable results can occur. Although no
memory loss is shown by CHKDSK, the possible chaos resulting from this
method is clearly unacceptable. Some virii use memory known to be free.
For example, the top of the interrupt table or parts of video memory all
may be used with some assurance that the memory will not be corrupted.
Once again, this technique is undesirable as it is extremely unstable.

These techniques are by no means the only methods of residency. I have
seen such bizarre methods as going resident in the DOS internal disk
buffers. Where there's memory, there's a way.

It is often desirable to know if the virus is already resident. The
simplest method of doing this is to write a checking function in the
interrupt handler code. For example, a call to interrupt 21h with the ax
register set to 7823h might return a 4323h value in ax, signifying
residency. When using this check, it is important to ensure that no
possible conflicts with either other programs or DOS itself will occur.
Another method, albeit a costly process in terms of both time and code
length, is to check each segment in memory for the code indicating the
presence of the virus. This method is, of course, undesirable, since it is
far, far simpler to code a simple check via the interrupt handler. By
using any type of check, the virus need not fear going resident twice,
which would simply be a waste of memory.

�������������
WHY RESIDENT?
�������������
Memory resident virii have several distinct advantages over runtime virii.
o Size
Memory resident virii are often smaller than their runtime brethern as
they do not need to include code to search for files to infect.
o Effectiveness
They are often more virulent, since even the DIR command can be
"infected." Generally, the standard technique is to infect each file
that is executed while the virus is resident.
o Speed
Runtime virii infect before a file is executed. A poorly written or
large runtime virus will cause a noticible delay before execution
easily spotted by users. Additionally, it causes inordinate disk
activity which is detrimental to the lifespan of the virus.
o Stealth
The manipulation of interrupts allows for the implementation of
stealth techniques, such as the hiding of changes in file lengths in
directory listings and on-the-fly disinfection. Thus it is harder for
the average user to detect the virus. Additionally, the crafty virus
may even hide from CRC checks, thereby obliterating yet another anti-
virus detection technique.

�������������������������������
STRUCTURE OF THE RESIDENT VIRUS
�������������������������������
With the preliminary information out of the way, the discussion can now
shift to more virus-related, certainly more interesting topics. The
structure of the memory resident virus is radically different from that of
the runtime virus. It simply consists of a short stub used to determine if
the virus is already resident. If it is not already in memory, the stuf
loads it into memory through whichever method. Finally, the stub restores
control to the host program. The rest of the code of the resident virus
consists of interrupt handlers where the bulk of the work is done.

The stub is the only portion of the virus which needs to have delta offset
calculations. The interrupt handler ideally will exist at a location which
will not require such mundane fixups. Once loaded, there should be no
further use of the delta offset, as the location of the variables is
preset. Since the resident virus code should originate at offset 0 of the
memory block, originate the source code at offset 0. Do not include a jmp
to the virus code in the original carrier file. When moving the virus to
memory, simply move starting from [bp+startvirus] and the offsets should
work out as they are in the source file. This simplifies (and shortens)
the coding of the interrupt handlers.

Several things must be considered in writing the interrupt handlers for a
virus. First, the virus must preserve the registers. If the virus uses
preexecution chaining, it must save the registers after the call to the
original handler. If the virus uses postexecution chaining, it must
restore the original registers of the interrupt call before the call to the
original handler. Second, it is more difficult, though not impossible, to
implement encryption with memory resident virii. The problem is that if
the interrupt handler is encrypted, that interrupt handler cannot be called
before the decryption function. This can be a major pain in the ass. The
cheesy way out is to simply not include encryption. I prefer the cheesy
way. The noncheesy readers out there might wish to have the memory
simultaneously hold two copies of the virus, encrypt the unused copy, and
use the encrypted copy as the write buffer. Of course, the virus would
then take twice the amount of memory it would normally require. The use of
encryption is a matter of personal choice and cheesiness. A sidebar to
preservation of interrupt handlers: As noted earlier, the flags register is
restored from the stack. It is important in preexecution chaining to save
the new flags register onto the stack where the old flags register was
stored.

Another important factor to consider when writing interrupt handlers,
especially those of BIOS interrupts, is DOS's lack of reentrance. This
means that DOS functions cannot be executed while DOS is in the midst of
processing an interrupt request. This is because DOS sets up the same
stack pointer each time it is called, and calling the second DOS interrupt
will cause the processing of one to overwrite the stack of the other,
causing unpredictable, but often terminal, results. This applies
regardless of which DOS interrupts are called, but it is especially true
for interrupt 21h, since it is often tempting to use it from within an
interrupt handler. Unless it is certain that DOS is not processing a
previous request, do NOT use a DOS function in the interrupt handler. It
is possible to use the "lower" interrupt 21h functions without fear of
corrupting the stack, but they are basically the useless ones, performing
functions easily handled by BIOS calls or direct hardware access. This
entire discussion only applies to hooking non-DOS interrupts. With hooking
DOS interrupts comes the assurance that DOS is not executing elsewhere,
since it would then be corrupting its own stack, which would be a most
unfortunate occurence indeed.

The most common interrupt to hook is, naturally, interrupt 21h. Interrupt
21h is called by just about every DOS program. The usual strategy is for a
virus to find potential files to infect by intercepting certain DOS calls.
The primary functions to hook include the find first, find next, open, and
execute commands. By cleverly using pre and postexecution chaining, a
virus can easily find the file which was found, opened, or executed and
infect it. The trick is simply finding the appropriate method to isolate
the filename. Once that is done, the rest is essentially identical to the
runtime virus.

When calling interrupts hooked by the virus from the virus interrupt code,
make sure that the virus does not trap this particular call, lest an
infinite loop result. For example, if the execute function is trapped and
the virus wishes, for some reason, to execute a particular file using this
function, it should NOT use a simple "int 21h" to do the job. In cases
such as this where the problem is unavoidable, simply simulate the
interrupt call with a pushf/call combination.

The basic structure of the interrupt handler is quite simple. The handler
first screens the registers for either an identification call or for a
trapped function such as execute. If it is not one of the above, the
handler throws control back to the original interrupt handler. If it is an
identification request, the handler simply sets the appropriate registers
and returns to the calling program. Otherwise, the virus must decide if
the request calls for pre or postexecution chaining. Regardless of which
it uses, the virus must find the filename and use that information to
infect. The filename may be found either through the use of registers as
pointers or by searching thorugh certain data structures, such as FCBs.
The infection routine is the same as that of nonresident virii, with the
exception of the guidelines outlined in the previous few paragraphs.

��������������
WHAT'S TO COME
��������������
I apologise for the somewhat cryptic sentences used in the guide, but I'm a
programmer, not a writer. My only suggestion is to read everything over
until it makes sense. I decided to pack this issue of the guide with
theory rather than code. In the next installment, I will present all the
code necessary to write a memory-resident virus, along with some techniques
which may be used. However, all the information needed to write a resident
virii has been included in this installment; it is merely a matter of
implementation. Have buckets o' fun!

Friday, July 13, 2007

Linux kernel coding style



Linux kernel coding style

This is a short document describing the preferred coding style for the
linux kernel. Coding style is very personal, and I won't _force_ my
views on anybody, but this is what goes for anything that I have to be
able to maintain, and I'd prefer it for most other things too. Please
at least consider the points made here.

First off, I'd suggest printing out a copy of the GNU coding standards,
and NOT read it. Burn them, it's a great symbolic gesture.

Anyway, here goes:


Chapter 1: Indentation

Tabs are 8 characters, and thus indentations are also 8 characters.
There are heretic movements that try to make indentations 4 (or even 2!)
characters deep, and that is akin to trying to define the value of PI to
be 3.

Rationale: The whole idea behind indentation is to clearly define where
a block of control starts and ends. Especially when you've been looking
at your screen for 20 straight hours, you'll find it a lot easier to see
how the indentation works if you have large indentations.

Now, some people will claim that having 8-character indentations makes
the code move too far to the right, and makes it hard to read on a
80-character terminal screen. The answer to that is that if you need
more than 3 levels of indentation, you're screwed anyway, and should fix
your program.

In short, 8-char indents make things easier to read, and have the added
benefit of warning you when you're nesting your functions too deep.
Heed that warning.


Chapter 2: Placing Braces

The other issue that always comes up in C styling is the placement of
braces. Unlike the indent size, there are few technical reasons to
choose one placement strategy over the other, but the preferred way, as
shown to us by the prophets Kernighan and Ritchie, is to put the opening
brace last on the line, and put the closing brace first, thusly:

if (x is true) {
we do y
}

However, there is one special case, namely functions: they have the
opening brace at the beginning of the next line, thus:

int function(int x)
{
body of function
}

Heretic people all over the world have claimed that this inconsistency
is ... well ... inconsistent, but all right-thinking people know that
(a) K&R are _right_ and (b) K&R are right. Besides, functions are
special anyway (you can't nest them in C).

Note that the closing brace is empty on a line of its own, _except_ in
the cases where it is followed by a continuation of the same statement,
ie a "while" in a do-statement or an "else" in an if-statement, like
this:

do {
body of do-loop
} while (condition);

and

if (x == y) {
..
} else if (x > y) {
...
} else {
....
}

Rationale: K&R.

Also, note that this brace-placement also minimizes the number of empty
(or almost empty) lines, without any loss of readability. Thus, as the
supply of new-lines on your screen is not a renewable resource (think
25-line terminal screens here), you have more empty lines to put
comments on.


Chapter 3: Naming

C is a Spartan language, and so should your naming be. Unlike Modula-2
and Pascal programmers, C programmers do not use cute names like
ThisVariableIsATemporaryCounter. A C programmer would call that
variable "tmp", which is much easier to write, and not the least more
difficult to understand.

HOWEVER, while mixed-case names are frowned upon, descriptive names for
global variables are a must. To call a global function "foo" is a
shooting offense.

GLOBAL variables (to be used only if you _really_ need them) need to
have descriptive names, as do global functions. If you have a function
that counts the number of active users, you should call that
"count_active_users()" or similar, you should _not_ call it "cntusr()".

Encoding the type of a function into the name (so-called Hungarian
notation) is brain damaged - the compiler knows the types anyway and can
check those, and it only confuses the programmer. No wonder MicroSoft
makes buggy programs.

LOCAL variable names should be short, and to the point. If you have
some random integer loop counter, it should probably be called "i".
Calling it "loop_counter" is non-productive, if there is no chance of it
being mis-understood. Similarly, "tmp" can be just about any type of
variable that is used to hold a temporary value.

If you are afraid to mix up your local variable names, you have another
problem, which is called the function-growth-hormone-imbalance syndrome.
See next chapter.


Chapter 4: Functions

Functions should be short and sweet, and do just one thing. They should
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,
as we all know), and do one thing and do that well.

The maximum length of a function is inversely proportional to the
complexity and indentation level of that function. So, if you have a
conceptually simple function that is just one long (but simple)
case-statement, where you have to do lots of small things for a lot of
different cases, it's OK to have a longer function.

However, if you have a complex function, and you suspect that a
less-than-gifted first-year high-school student might not even
understand what the function is all about, you should adhere to the
maximum limits all the more closely. Use helper functions with
descriptive names (you can ask the compiler to in-line them if you think
it's performance-critical, and it will probably do a better job of it
that you would have done).

Another measure of the function is the number of local variables. They
shouldn't exceed 5-10, or you're doing something wrong. Re-think the
function, and split it into smaller pieces. A human brain can
generally easily keep track of about 7 different things, anything more
and it gets confused. You know you're brilliant, but maybe you'd like
to understand what you did 2 weeks from now.


Chapter 5: Commenting

Comments are good, but there is also a danger of over-commenting. NEVER
try to explain HOW your code works in a comment: it's much better to
write the code so that the _working_ is obvious, and it's a waste of
time to explain badly written code.

Generally, you want your comments to tell WHAT your code does, not HOW.
Also, try to avoid putting comments inside a function body: if the
function is so complex that you need to separately comment parts of it,
you should probably go back to chapter 4 for a while. You can make
small comments to note or warn about something particularly clever (or
ugly), but try to avoid excess. Instead, put the comments at the head
of the function, telling people what it does, and possibly WHY it does
it.


Chapter 6: You've made a mess of it

That's OK, we all do. You've probably been told by your long-time Unix
user helper that "GNU emacs" automatically formats the C sources for
you, and you've noticed that yes, it does do that, but the defaults it
uses are less than desirable (in fact, they are worse than random
typing - a infinite number of monkeys typing into GNU emacs would never
make a good program).

So, you can either get rid of GNU emacs, or change it to use saner
values. To do the latter, you can stick the following in your .emacs file:

(defun linux-c-mode ()
"C mode with adjusted defaults for use with the Linux kernel."
(interactive)
(c-mode)
(c-set-style "K&R")
(setq c-basic-offset 8))

This will define the M-x linux-c-mode command. When hacking on a
module, if you put the string -*- linux-c -*- somewhere on the first
two lines, this mode will be automatically invoked. Also, you may want
to add

(setq auto-mode-alist (cons '("/usr/src/linux.*/.*\\.[ch]$" . linux-c-mode)
auto-mode-alist))

to your .emacs file if you want to have linux-c-mode switched on
automagically when you edit source files under /usr/src/linux.

But even if you fail in getting emacs to do sane formatting, not
everything is lost: use "indent".

Now, again, GNU indent has the same brain dead settings that GNU emacs
has, which is why you need to give it a few command line options.
However, that's not too bad, because even the makers of GNU indent
recognize the authority of K&R (the GNU people aren't evil, they are
just severely misguided in this matter), so you just give indent the
options "-kr -i8" (stands for "K&R, 8 character indents").

"indent" has a lot of options, and especially when it comes to comment
re-formatting you may want to take a look at the manual page. But
remember: "indent" is not a fix for bad programming.

Wednesday, July 11, 2007

Coding a virus replicator

-=-=-=-=-=-=-=-
THE REPLICATOR
-=-=-=-=-=-=-=-
The job of the replicator is to spread the virus throughout the system of
the clod who has caught the virus. How does it do this without destroying
the file it infects? The easiest type of replicator infects COM files. It
first saves the first few bytes of the infected file. It then copies a
small portion of its code to the beginning of the file, and the rest to the
end.

+----------------+ +------------+
| P1 | P2 | | V1 | V2 |
+----------------+ +------------+
The uninfected file The virus code

In the diagram, P1 is part 1 of the file, P2 is part 2 of the file, and V1
and V2 are parts 1 and 2 of the virus. Note that the size of P1 should be
the same as the size of V1, but the size of P2 doesn't necessarily have to
be the same size as V2. The virus first saves P1 and copies it to the
either 1) the end of the file or 2) inside the code of the virus. Let's
assume it copies the code to the end of the file. The file now looks like:

+---------------------+
| P1 | P2 | P1 |
+---------------------+

Then, the virus copies the first part of itself to the beginning of the
file.

+---------------------+
| V1 | P2 | P1 |
+---------------------+

Finally, the virus copies the second part of itself to the end of the file.
The final, infected file looks like this:

+-----------------------------+
| V1 | P2 | P1 | V2 |
+-----------------------------+

The question is: What the fuck do V1 and V2 do? V1 transfers control of
the program to V2. The code to do this is simple.

JMP FAR PTR Duh ; Takes four bytes
Duh DW V2_Start ; Takes two bytes

Duh is a far pointer (Segment:Offset) pointing to the first instruction of
V2. Note that the value of Duh must be changed to reflect the length of
the file that is infected. For example, if the original size of the
program is 79 bytes, Duh must be changed so that the instruction at
CS:[155h] is executed. The value of Duh is obtained by adding the length
of V1, the original size of the infected file, and 256 (to account for the
PSP). In this case, V1 = 6 and P1 + P2 = 79, so 6 + 79 + 256 = 341 decimal
(155 hex).

An alternate, albeit more difficult to understand, method follows:

DB 1101001b ; Code for JMP (2 byte-displacement)
Duh DW V2_Start - OFFSET Duh ; 2 byte displacement

This inserts the jump offset directly into the code following the jump
instruction. You could also replace the second line with

DW V2_Start - $

which accomplishes the same task.

V2 contains the rest of the code, i.e. the stuff that does everything else.
The last part of V2 copies P1 over V1 (in memory, not on disk) and then
transfers control to the beginning of the file (in memory). The original
program will then run happily as if nothing happened. The code to do this
is also very simple.

MOV SI, V2_START ; V2_START is a LABEL marking where V2 starts
SUB SI, V1_LENGTH ; Go back to where P1 is stored
MOV DI, 0100h ; All COM files are loaded @ CS:[100h] in memory
MOV CX, V1_LENGTH ; Move CX bytes
REP MOVSB ; DS:[SI] -> ES:[DI]

MOV DI, 0100h
JMP DI

This code assumes that P1 is located just before V2, as in:

P1_Stored_Here:
.
.
.
V2_Start:

It also assumes ES equals CS. If these assumptions are false, change the
code accordingly. Here is an example:

PUSH CS ; Store CS
POP ES ; and move it to ES
; Note MOV ES, CS is not a valid instruction
MOV SI, P1_START ; Move from whereever P1 is stored
MOV DI, 0100h ; to CS:[100h]
MOV CX, V1_LENGTH
REP MOVSB

MOV DI, 0100h
JMP DI

This code first moves CS into ES and then sets the source pointer of MOVSB
to where P1 is located. Remember that this is all taking place in memory,
so you need the OFFSET of P1, not just the physical location in the file.
The offset of P1 is 100h higher than the physical file location, as COM
files are loaded starting from CS:[100h].

So here's a summary of the parts of the virus and location labels:

V1_Start:
JMP FAR PTR Duh
Duh DW V2_Start
V1_End:

P2_Start:
P2_End:

P1_Start:
; First part of the program stored here for future use
P1_End:

V2_Start:
; Real Stuff
V2_End:

V1_Length EQU V1_End - V1_Start

Alternatively, you could store P1 in V2 as follows:

V2_Start:

P1_Start:
P1_End:

V2_End:

That's all there is to infecting a COM file without destroying it! Simple,
no? EXE files, however, are a little tougher to infect without rendering
them inexecutable - I will cover this topic in a later file.

Now let us turn our attention back to the replicator portion of the virus.
The steps are outlined below:

1) Find a file to infect
2) Check if it is already infected
3) If so, go back to 1
4) Infect it
5) If infected enough, quit
6) Otherwise, go back to 1

Finding a file to infect is a simple matter of writing a directory
traversal procedure and issuing FINDFIRST and FINDNEXT calls to find
possible files to infect. Once you find the file, open it and read the
first few bytes. If they are the same as the first few bytes of V1, then
the file is already infected. If the first bytes of V1 are not unique to
your virus, change it so that they are. It is *extremely* important that
your virus doesn't reinfect the same files, since that was how Jerusalem
was first detected. If the file wasn't already infected, then infect it!
Infection should take the following steps:

1) Change the file attributes to nothing.
2) Save the file date/time stamps.
3) Close the file.
4) Open it again in read/write mode.
5) Save P1 and append it to the end of the file.
6) Copy V1 to the beginning, but change the offset which it JMPs to so
it transfers control correctly. See the previous part on infection.
7) Append V2 to the end of the file.
8) Restore file attributes/date/time.

You should keep a counter of the number of files infected during this run.
If the number exceeds, say three, then stop. It is better to infect slowly
then to give yourself away by infecting the entire drive at once.

You must be sure to cover your tracks when you infect a file. Save the
file's original date/time/attributes and restore them when you are
finished. THIS IS VERY IMPORTANT! It takes about 50 to 75 bytes of code,
probably less, to do these few simple things which can do wonders for the
concealment of your program.

I will include code for the directory traversal function, as well as other
parts of the replicator in the next installment of my phunky guide.

-=-=-=-=-
CONCEALER
-=-=-=-=-
This is the part which conceals the program from notice by the everyday
user and virus scanner. The simplest form of concealment is the encryptor.
The code for a simple XOR encryption system follows:

encrypt_val db ?

decrypt:
encrypt:
mov ah, encrypt_val

mov cx, part_to_encrypt_end - part_to_encrypt_start
mov si, part_to_encrypt_start
mov di, si

xor_loop:
lodsb ; DS:[SI] -> AL
xor al, ah
stosb ; AL -> ES:[DI]
loop xor_loop
ret

Note the encryption and decryption procedures are the same. This is due to
the weird nature of XOR. You can CALL these procedures from anywhere in
the program, but make sure you do not call it from a place within the area
to be encrypted, as the program will crash. When writing the virus, set
the encryption value to 0. part_to_encrypt_start and part_to_encrypt_end
sandwich the area you wish to encrypt. Use a CALL decrypt in the beginning
of V2 to unencrypt the file so your program can run. When infecting a
file, first change the encrypt_val, then CALL encrypt, then write V2 to the
end of the file, and CALL decrypt. MAKE SURE THIS PART DOES NOT LIE IN THE
AREA TO BE ENCRYPTED!!!

This is how V2 would look with the concealer:

V2_Start:

Concealer_Start:
.
.
.
Concealer_End:

Replicator_Start:
.
.
.
Replicator_End:

Part_To_Encrypt_Start:
.
.
.
Part_To_Encrypt_End:
V2_End:

Alternatively, you could move parts of the unencrypted stuff between
Part_To_Encrypt_End and V2_End.

The value of encryption is readily apparent. Encryption makes it harder
for virus scanners to locate your virus. It also hides some text strings
located in your program. It is the easiest and shortest way to hide your
virus.

Encryption is only one form of concealment. At least one other virus hooks
into the DOS interrupts and alters the output of DIR so the file sizes
appear normal. Another concealment scheme (for TSR virii) alters DOS so
memory utilities do not detect the virus. Loading the virus in certain
parts of memory allow it to survive warm reboots. There are many stealth
techniques, limited only by the virus writer's imagination.

-=-=-=-=-
THE BOMB
-=-=-=-=-
So now all the boring stuff is over. The nastiness is contained here. The
bomb part of the virus does all the deletion/slowdown/etc which make virii
so annoying. Set some activation conditions of the virus. This can be
anything, ranging from when it's your birthday to when the virus has
infected 100 files. When these conditions are met, then your virus does
the good stuff. Some suggestions of possible bombs:

1) System slowdown - easily handled by trapping an interrupt and
causing a delay when it activates.
2) File deletion - Delete all ZIP files on the drive.
3) Message display - Display a nice message saying something to the
effect of "You are fucked."
4) Killing/Replacing the Partition Table/Boot Sector/FAT of the hard
drive - This is very nasty, as most dimwits cannot fix this.

This is, of course, the fun part of writing a virus, so be original!

-=-=-=-=-=-=-=-
OFFSET PROBLEMS
-=-=-=-=-=-=-=-
There is one caveat regarding calculation of offsets. After you infect a
file, the locations of variables change. You MUST account for this. All
relative offsets can stay the same, but you must add the file size to the
absolute offsets or your program will not work. This is the most tricky
part of writing virii and taking these into account can often greatly
increase the size of a virus. THIS IS VERY IMPORTANT AND YOU SHOULD BE
SURE TO UNDERSTAND THIS BEFORE ATTEMPTING TO WRITE A NONOVERWRITING VIRUS!
If you don't, you'll get fucked over and your virus WILL NOT WORK! One
entire part of the guide will be devoted to this subject.

-=-=-=-
TESTING
-=-=-=-
Testing virii is a dangerous yet essential part of the virus creation
process. This is to make certain that people *will* be hit by the virus
and, hopefully, wiped out. Test thoroughly and make sure it activates
under the conditions. It would be great if everyone had a second computer
to test their virii out, but, of course, this is not the case. So it is
ESSENTIAL that you keep BACKUPS of your files, partition, boot record, and
FAT. Norton is handy in this doing this. Do NOT disregard this advice
(even though I know that you will anyway) because you WILL be hit by your
own virii. When I wrote my first virus, my system was taken down for two
days because I didn't have good backups. Luckily, the virus was not overly
destructive. BACKUPS MAKE SENSE! LEECH A BACKUP PROGRAM FROM YOUR LOCAL
PIRATE BOARD! I find a RamDrive is often helpful in testing virii, as the
damage is not permanent. RamDrives are also useful for testing trojans,
but that is the topic of another file...

-=-=-=-=-=-=-
DISTRIBUTION
-=-=-=-=-=-=-
This is another fun part of virus writing. It involves sending your
brilliantly-written program through the phone lines to your local,
unsuspecting bulletin boards. What you should do is infect a file that
actually does something (leech a useful utility from another board), infect
it, and upload it to a place where it will be downloaded by users all over.
The best thing is that it won't be detected by puny scanner-wanna-bes by
McAffee, since it is new! Oh yeah, make sure you are using a false account
(duh). Better yet, make a false account with the name/phone number of
someone you don't like and upload the infected file under the his name.
You can call back from time to time and use a door such as ZDoor to check
the spread of the virus. The more who download, the more who share in the
experience of your virus!

-=-=-=-=-=-=-=-=-
OVERWRITING VIRII
-=-=-=-=-=-=-=-=-
All these virii do is spread throughout the system. They render the
infected files inexecutable, so they are easily detected. It is simple to
write one:

+-------------+ +-----+ +-------------+
| Program | + |Virus| = |Virus|am |
+-------------+ +-----+ +-------------+

These virii are simple little hacks, but pretty worthless because of their
easy detectability.

Tuesday, July 10, 2007

Building GUIs with JFC/Swing APIs Code

The Swing package is part of the Java Foundation Classes (JFC), which encompasses a group of features to build GUIs. Another API library for creating GUIs is the Abstract Window Toolkit (AWT) provided with the JDK 1.0 and 1.1 platforms. Although the Java 2 Platform still supports the AWT components, you are strongly encouraged to use Swing components instead.


The Swing Connection is an online magazine with an archive of articles that contain code samples. Topic categories include About Swing, Working with Swing Components, Putting it all Together, The Swing Text Package, and The PLAF Papers - About the Pluggable Look & Feel.


Accessibility Using Swing Components Accessibility
Accessibility technologies make the graphical components of a program's user interface available to screen readers, pointing devices, and other assistive technologies used by people with disabilities.
AccessApplet.java is an accessible applet. Accessibility technologies make the graphical components of a program's user interface available to screen readers, pointing devices, and other assistive technologies used by people with disabilities. For supporting information see What's New with Accessibility. Accessibility technologies make the graphical components of a program's user interface available to screen readers, pointing devices, and other assistive technologies used by people with disabilities.
AccessSimpleButton.java sets an accessible name and description on a button component. For supporting information see What's New with Accessibility.
AccessSimpleButtonRel.java displays a label and buttons, and specifies their accessible relationships. For supporting information see What's New with Accessibility.
ClipBoardExample.java shows how to implement a copy and paste operation. For supporting information see New Data Transfer Capabilities.
DragOne.java shows how to implement drag and drop. For supporting information see New Data Transfer Capabilities.
DragTwo.java implements one draggable text label, one draggable image label, one droppable image label, and a text field for dropping the text. For supporting information see New Data Transfer Capabilities.
ImageCopy.java shows how to copy images. For supporting information see New Data Transfer Capabilities.
ImageSelection.java shows how to transfer images. For supporting information see New Data Transfer Capabilities.
Ferret.java follows the mouse pointer as it moves across a user interface. When the mouse pointer points at a component in the interface, Ferret gets the accessible information for the component. This task is also important for assistive technologies such as hands-free pointing devices. For supporting information see What's New with Accessibility. Accessibility technologies make the graphical components of a program's user interface available to screen readers, pointing devices, and other assistive technologies used by people with disabilities.
SimpleIconButton.java shows how to specify an accessible description for an icon, retrieve the icon's description, and retrieve information about the icon's height and width. For supporting information see What's New with Accessibility. Accessibility technologies make the graphical components of a program's user interface available to screen readers, pointing devices, and other assistive technologies used by people with disabilities.
SimpleTable.java makes information about a table's data and changes to that data accessible. For supporting information see What's New with Accessibility. Accessibility technologies make the graphical components of a program's user interface available to screen readers, pointing devices, and other assistive technologies used by people with disabilities. Using Swing Components
CustomList.java creates a custom list using a cell renderer. For supporting information see Backstage at the JDC:Customizing JList Rendering.
ExitableJFrame.java shows how to shut down and close a Swing application. For supporting information see New to Java Programming Supplement, Jan 2002.
FrameExample.java creates a simple frame that contains a button and a panel. For supporting information see New to Java Programming Supplement, Jan 2002.
Getting Started with Swing from the Java Series book The Swing Tutorial.
Laying out components from the Java Series book The Swing Tutorial.
ListItem.java is a complete program that creates custom lists that display different color backgrounds. For supporting information see Backstage at the JDC:Customizing JList Rendering.
LayoutExampleSwing.java shows how to use the Abstract Window Toolkit (AWT) predefined layout managers, which can also be used with Swing components. For supporting information see Exploring the AWT Layout Managers.
MyCellRenderer.java shows how to implement the ListCellRenderer interface. For supporting information see Backstage at the JDC:Customizing JList Rendering.
PlainLIst.java shows how to create a list. For supporting information see Backstage at the JDC:Customizing JList Rendering.
Quick Start Guide to Swing from the Java Series book The Swing Tutorial.
Swing Features and Concepts from the Java Series book The Swing Tutorial.
TabbedPaneApp.java demonstrates how to create a JTabbedPane object with three tabs. For supporting information see New to Java Programming Supplement, April 2002.
Using Swing Components from the Java Series book The Swing Tutorial.
Using Swing Features Two from the Java Series book The Swing Tutorial.
VehicleTest.java instantiates FrameExample. For supporting information see New to Java Programming Supplement, Jan 2002.
Working with Graphics from the Java Series book The Swing Tutorial.
Writing Event Listeners from the Java Series book The Swing Tutorial.

Monday, July 9, 2007

Coding PHP 3 part 1

PHP is a tool that lets you create dynamic web pages. PHP-enabled web pages are treated just like regular HTML pages and you can create and edit them the same way you normally create regular HTML pages.

PHP means, PHP: Hypertext Preprocessor

PHP costs $0. You can download it from PHP site at www.php.net . The license is GNU Public License (GPL), the same license under which popular software like Linux, Emacs etc are released. You have complete access to the source code and if you want to, you can add your own features to PHP.

PHP is available for most Unix platforms, GNU/Linux and Microsoft Windows(TM). How to install PHP in your Windows(TM) PC or Unix machine is well documented at the PHP site. It is quite easy to install.

PHP is also Year2000 compliant, if your machine is!

1.1 History
Three years ago, Rasmus Lerdorf, created PHP as Personal Home Page Tools to manage his online resume. It was a very simple language. People noticed it and started making suggestions to extend it. Many people contributed and with the source code available freely, it became a very feature-rich language. And it continues to grow.

PHP, though easy to learn, was slower when compared to mod_perl (perl embedded into webserver) scripts. Now, there is a new engine called Zend which is almost as fast as mod_perl and PHP4 will utilize Zend engine completely. PHP4 is still in beta. Andi Gutmans and Zeev Suraski are the primary authors of Zend. Visit the Zend site at www.zend.com .

From a personal project, PHP usage has shot up quickly. Netcraft reports that in October 1999, 931122 Domains and 321128 IP Addresses deliver pages created by PHP.



1.2 Advantages of PHP
There are lots of advantages of using PHP. Some disadvantages that come to mind are the facts that being a open source project, so far no commercial support is available, and the relatively slow speed of execution of PHP (prior to version 4). However, the mailing lists for PHP are extremely useful and unless you are running a site that is as popular as Yahoo! or Amazon.com, you probably will not notice the difference in speed at all. I haven't! Let us see the advantages now.


Learning Curve
Personally, I love PHP's very easy learning curve. Unlike Java or Perl, you just don't need to sink into 100's of pages of documentation to write a program which does something useful. With just a few basic syntax and language features, you can be productive. Then, when you need to do something more specific, check the relevant documentation.

PHP's syntax is very similar to C, Perl, ASP or JSP. For people who know either of these languages, PHP is too easy. Inversely, if you know PHP, the language features can easily scale to other languages you want to learn.

To learn the core features, you just need 30 minutes. You already might know HTML very well, or you really know how to design good looking web sites, either using site builder tools or using plain HTML coding. Since PHP code can be added in a non-interfering way, after you design and implement your site, you can add PHP code to make it more dynamic.


Database Integration
PHP can be compiled with functions to interact with lot of databases. PHP with MySQL is a very popular combination. You can also write your wrap-around functions to indirectly call database functions. This way, you can easily change your code when you want to change your database. PHPLIB is a set of libraries written in PHP that provides most commonly required routines.


Extensibility
Like mentioned before, PHP has been evolving at a rapid pace. For a non-programmer it might be difficult to extend PHP to support additional functions, but for programmers it should not be very difficult.


Object Oriented Programming
PHP provides for Classes and Objects. Support for object oriented programming is sufficient enough for most programming tasks related to the web. PHP supports constructors, derived classes etc.


Rich Features
PHP is just a scripting wrap-around for many popular libraries, with a nice chocolate layer around it to make it easy to use for web. You can use PHP to connect to many databases including Oracle, MS-Access, MySQL. You can create graphics on the fly. You can write programs to download and display e-mail. You can even do network related functions. And best of all, you can decide what all capabilities your PHP installation should have. To quote Nissan's Xterra, your PHP can be made to have Everything you need, nothing you don't!.


Scalability
Traditionally, interactivity in web pages are achieved using CGI programs. CGI programs do not scale well, because, each run of a program occurs as a separate process. The solution is to compile the interpreters for languages used to write CGI programs into the webserver (mod_perl, JSP...). PHP also can be installed like this, though rarely people might want to use PHP in CGI version too. Embedded PHP installations scale well.


1.3 Competition : ASP, mod_perl, JSP
ASP is quite popular and I've written some applications in ASP since January 2000. Microsoft's arguments for ASP include availability of tools like Visual Interdev, language independence and easy learning curve. I have used VIM to code ASP, so I can't say much about the tools. Language independence is also not that much of an issue, just because in its default configuration it supports VBScript and JScript. You can also get extensions which makes it possible to write ASP in PerlScript making it a good choice for folks who are used to mod_perl and Apache. At least for a simple directory listing code, PHP3.0 on a P133 with 32MB RAM ran circles around ASP code on an NT4 machine with dual P500 with 600MB RAM! I tend to prefer PHP3. Also, VBScript's string functions are a pain to use (IMHO) - though it has REGEX capability now.

JSP has a very unique advantage that you need to know Java alone. From what I've seen, most people would prefer to work in one language and use that for as many purposes as possible. This, and the training costs involved will make Java an attractive option for companies. Additionally, there are already a number of tools out there for Java. Architecture-wise JSP+Servlets are comparable to VBScript+COM. For quick web-site building JSP may not be worth the trouble. For businesses, this would make more sense, especially since a number of Java based application servers are already in the market. From my personal experience, both Java and ASP need elite hardware, whereas PHP3 with Apache on Linux will perform adequately with your old P133 :-)

mod_perl is as powerful as Perl is. It is also quite fast. PHP4 with Zend engine will probably be as fast as mod_perl. However, Perl code can get unmaintainable quickly - this is just my opinion and goes well with Perl's idea of being a language for the lazy.

2. Tutorial
The on-line manual at the PHP site is extremely good. There are some additional links provided there, to different tutorials. This section, however is to just quickly get you familiar with PHP. It is in no way exhaustive. Just helps you get started writing PHP scripts.



2.1 Pre-requisites
You must have a working webserver with PHP support. Let us assume that all of your files with PHP code have an extension .php3.


2.2 PHP setup
Create a file called test.php3 with the contents as below:


Then point your browser to this file. Study the page and you'll know what options your PHP installations have.

2.3 Syntax
Like mentioned before, you can put PHP and HTML code in your file. So, you must have a way to distinguish between PHP and HTML code. We can do this in multiple ways. Pick one that you like and stick to that!


Escaping from HTML
Here are the possible ways to escape your PHP code from HTML.




<% ... %>

Statements
In PHP, this is same as in Perl or C. You use the semi-colon (;). The closing tag for your escape from HTML also implies an end-of-statement.


Comments
PHP supports C, C++ and Unix style comments.

/* C,C++ style multi-line comment */
// C++ style single line comment
# Unix style single line comment

Hello World!
With this basic info, let us create a PHP program which will just print the standard message!


--------------------------------------------------------------------------------




<br /> <?<br /> echo "Hello World!";<br /> ?><br />



First PHP page




// Single line C++ style comment
/*
printing the message
*/
echo "Hello World!";
# Unix style single line comment
?>




--------------------------------------------------------------------------------


2.4 Data Types
PHP supports integers, floats, strings, arrays and objects. The type of the variable is usually not set by the programmer, but determined by PHP at run-time (good riddance!). However, types can be expressly changed if necessary, using cast or settype() functions.


Numbers
Numbers can be integers or floating point numbers. You can specify a number using any of the following syntaxes.

$a = 1234; # decimal number
$a = -123; # a negative number
$a = 0123; # octal number (equivalent to 83 decimal)
$a = 0x12; # hexadecimal number (equivalent to 18 decimal)
$a = 1.234; # floating point number. "double"
$a = 1.2e3; # exponential format for double

Strings
Strings can be defined by using single quotes or double quotes. Whatever is inside single quotes is taken literally and values inside double quotes will be expanded. Backslash (\) can be used to escape special characters.

$first = 'Hello';
$second = "World";
$full1 = "$first $second"; # yields Hello World
$full2 = '$first $second';# yields $first $second
You can mix strings and numbers using arithmetic operators. Strings are then converted to numbers, using the initial portion. PHP manual has examples.

Arrays & Hashes
Arrays and hashes are implemented in the same way. How you use it determines what you get. You create arrays using list() or array() functions, or by explicitly setting values. Array index starts from 0. Though not explained here, you can have multi- dimensional arrays too.


$a[0] = "first"; // a two element array
$a[1] = "second";
$a[] = "third"; // easy way to append to the array.
// Now, $a[2] has "third"
echo count($a); // prints 3, number of elements in the array
// create a hash in one shot
$myphonebook = array (
"sbabu" => "5348",
"keith" => "4829",
"carole" => "4533"
);
// oops, forgot dean! add dean
$myphonebook["dean"] = "5397";
// yeoh! carole's number is not that. correct it!
$myphonebook["carole" => "4522"
// didn't we tell that hashes and arrays are
//implemented alike? let's see
echo "$myphonebook[0]"; // sbabu
echo "$myphonebook[1]"; // 5348
Some functions useful in this context are sort(), next(), prev(), each().


Objects
To create an object, use the new statement.

class foo {
function do_foo () {
echo "Doing foo.";
}
}
$bar = new foo;
$bar->do_foo();

Changing Types
From the PHP manual - "PHP does not require (or support) explicit type definition in variable declaration; a variable's type is determined by the context in which that variable is used. That is to say, if you assign a string value to variable var, var becomes a string. If you then assign an integer value to var, it becomes an integer."

$foo = "0"; // $foo is string (ASCII 48)
$foo++; // $foo is the string "1" (ASCII 49)
$foo += 1; // $foo is now an integer (2)
$foo = $foo + 1.3; // $foo is now a double (3.3)
$foo = 5 + "10 Little Piggies"; // $foo is integer (15)
$foo = 5 + "10 Small Pigs"; // $foo is integer (15)
Type casting works just like in C. You can also use settype() function.

2.5 Variables & Constants
As you might've noticed, variables in PHP are prefixed with the dollar sign ($). All variables have local scope. To make a variable defined outside any function to be seen inside a function, you've to specify it using global. For values to be retained inside functions, you can use static variables too.

$g_var = 1 ; // global scope
function test() {
global $g_var; // that's how you specify global variable
}
Slightly more advanced are variable variables. See the manual. These can be extremely useful at times.
PHP defines some constants internally. You can define your own constants using the define function, like define("CONSTANT","value").



2.6 Operators
PHP has all the common operators found in C, C++ or Java. Precedence rules are also same. Assignment operator is =.


Arithmetic & String
There is only one string operator.

$a + $b : addition
$a - $b : subtraction
$a * $b : multiplication
$a / $b : division
$a % $b : modulus (remainder)
$a . $b : string concatenation

Logical & Comparison
Logical operators are :

$a || $b : Or
$a or $b : Or
$a && $b : And
$a and $b : And
$a xor $b : EXOR. (True if either $a or $b is true, but not both)
! $a : Not
Comparison operators are :
$a == $b : Equality
$a != $b : Not Equal
$a < $b : Less Than
$a <= $b : Less Than or equal to
$a > $b : Greater Than
$a >= $b : Greater Than or equal to
Like in C, PHP has the ternary operator (?:) too. Bit operators are also available.

Precedence
Just like in C or Java!


2.7 Control Structures
PHP has standard control structures you find in C. We will briefly cover these.


if, else, elseif, if(): endif

if ( expr1 ) {
...
} elseif (expr2) {
} else {
}

// or like in Python
if (expr1) :
...
...
elseif (expr2) :
...
else :
...
endif;


Loops. while, do..while, for

while (expr) { ... }
do {...} while (expr);
for (expr1; expr2; expr3) {...}
// or like in Python
while (expr) :
...
endwhile;

switch
Switch is an elegant replacement for multiple if-elseif-else structure.

switch ($i) {
case 0:
print "i equals 0";
case 1:
print "i equals 1";
case 2:
print "i equals 2";
}

break, continue
break breaks out of current looping control-scructures.

continue is used within looping structures to skip the rest of the current loop iteration and continue execution at the beginning of the next iteration.


require, include
require is just like C's #include pre-processor directive. The file you specify with require is substituted in place. This means that you can't put require within if statements. To conditionally include code, we use include() function. It is a good practice to split PHP code into multiple files and inlcude only the necessary ones.


2.8 Functions
You can define your own functions as shown below. Return values can be any of the data types.

function foo ($arg_1, $arg_2, ..., $arg_n) {
echo "Example function.\n";
return $retval;
}
Any valid PHP code may appear inside a function, even other functions and class definitions. Functions must be defined before they are referenced.

2.9 Classes
Use the class construct to create classes. See the manual for a good explanation of classes.

class Employee {
var $empno; // employee number
var $empnm; // name

function add_employee($in_num, $in_name){
$this->empno = $in_num;
$this->empnm = $in_name;
}

function show() {
echo "$this->empno, $this->empnm";
return;
}

function changenm($in_name){
$this->empnm = $in_name;
}
}

$sbabu = new Employee;
$sbabu->add_employee(10,"sbabu");
$sbabu->changenm("babu");

$sbabu->show();

Wednesday, July 4, 2007

Introduce C++

Instructions for use
To whom is this tutorial directed?
This tutorial is for those people who want to learn programming in C++ and do not necessarily have any previous knowledge of other programming languages. Of course any knowledge of other programming languages or any general computer skill can be useful to better understand this tutorial, although it is not essential.
It is also suitable for those who need a little update on the new features the language has acquired from the latests standards.
If you are familiar with the C language, you can take the first 3 parts of this tutorial as a review of concepts, since they mainly explain the C part of C++. There are slight differences in the C++ syntax for some C features, so I recommend you its reading anyway.
The 4th part describes object-oriented programming.
The 5th part mostly describes the new features introduced by ANSI-C++ standard.
Structure of this tutorial
The tutorial is divided in 6 parts and each part is divided on its turn into different sections covering a topic each one. You can access any section directly from the section index available on the left side bar, or begin the tutorial from any point and follow the links at the bottom of each section.
Many sections include examples that describe the use of the newly acquired knowledge in the chapter. It is recommended to read these examples and to be able to understand each of the code lines that constitute it before passing to the next chapter.
A good way to gain experience with a programming language is by modifying and adding new functionalities on your own to the example programs that you fully understand. Don't be scared to modify the examples provided with this tutorial, that's the way to learn!
Compatibility Notes
The ANSI-C++ standard acceptation as an international standard is relatively recent. It was first published in November 1997, and revised in 2003. Nevertheless, the C++ language exists from a long time before (1980s). Therefore there are many compilers which do not support all the new capabilities included in ANSI-C++, specially those released prior to the publication of the standard.
This tutorial is thought to be followed with modern compilers that suport -at least on some degree- ANSI-C++ specifications. I encourage you to get one if yours is not adapted. There are many options, both commercial and free.
Anyway, an outdated version of this tutorial exists http://legacy.cplusplus.com/doc/tutorial/, which is compatible with older compilers.
Compilers
The examples included in this tutorial are all console programs. That means they use text to communicate with the user and to show their results.
All C++ compilers support the compilation of console programs. Check the user's manual of your compiler for more info on how to compile them.

Structure of a program

Probably the best way to start learning a programming language is by writing a program. Therefore, here is our first program:
// my first program in C++

#include
using namespace std;

int main ()
{
cout << "Hello World!";
return 0;
} Hello World!
The first panel shows the source code for our first program. The second one shows the result of the program once compiled and executed. The way to edit and compile a program depends on the compiler you are using. Depending on whether it has a Development Interface or not and on its version. Consult the compilers section and the manual or help included with your compiler if you have doubts on how to compile a C++ console program.
The previous program is the typical program that programmer apprentices write for the first time, and its result is the printing on screen of the "Hello World!" sentence. It is one of the simplest programs that can be written in C++, but it already contains the fundamental components that every C++ program has. We are going to look line by line at the code we have just written:
// my first program in C++
This is a comment line. All lines beginning with two slash signs (//) are considered comments and do not have any effect on the behavior of the program. The programmer can use them to include short explanations or observations within the source code itself. In this case, the line is a brief description of what our program is.
#include
Lines beginning with a pound sign (#) are directives for the preprocessor. They are not regular code lines with expressions but indications for the compiler's preprocessor. In this case the directive #include tells the preprocessor to include the iostream standard file. This specific file (iostream) includes the declarations of the basic standard input-output library in C++, and it is included because its functionality is going to be used later in the program.
using namespace std;
All the elements of the standard C++ library are declared within what is called a namespace, the namespace with the name std. So in order to access its functionality we declare with this expression that we will be using these entities. This line is very frequent in C++ programs that use the standard library, and in fact it will be included in most of the source codes included in these tutorials.
int main ()
This line corresponds to the beginning of the definition of the main function. The main function is the point by where all C++ programs start their execution, independently of its location within the source code. It does not matter whether there are other functions with other names defined before of after it - the instructions contained within this function's definition will always be the first ones to be executed in any C++ program. For that same reason, it is essential that all C++ programs have a main function.
The word main is followed in the code by a pair of parentheses (()). That is because it is a function declaration: In C++, what differentiates a function declaration from other types of expressions are these parentheses that follow its name. Optionally, these parentheses may enclose a list of parameters within them.
Right after these parentheses we can find the body of the main function enclosed in braces ({}). What is contained within these braces is what the function does when it is executed.
cout << "Hello World";
This line is a C++ statement. A statement is a simple or compound expression that can actually produce some effect. In fact, this statement performs the only action that generates a visible effect in our first program.
cout represents the standard output stream in C++, and the meaning of the entire statement is to insert a sequence of characters (in this case the Hello World sequence of characters) into the standard output stream (which usually is the screen).
cout is declared in the iostream standard file within the std namespace, so that's why we needed to include that specific file and to declare that we were going to use this specific namespace earlier in our code.
Notice that the statement ends with a semicolon character (;). This character is used to mark the end of the statement and in fact it must be included at the end of all expression statements in all C++ programs (one of the most common syntax errors is indeed to forget to include some semicolon after a statement).
return 0;
The return statement causes the main function to finish. return may be followed by a return code (in our example is followed by the return code 0). A return code of 0 for the main function is generally interpreted as the program worked as expected without any errors during its execution. This is the most usual way to end a C++ program.
You may have noticed that not all the lines of this program perform actions when the code is executed. There were lines containing only comments (those beginning by //). There were lines with directives for the compiler's preprocessor (those beginning by #). Then there were lines that began the declaration of a function (in this case, the main function) and, finally lines with statements (like the insertion into cout), which were all included within the block delimited by the braces ({}) of the main function.
The program has been structured in different lines in order to be more readable, but in C++, we do not have strict rules on how to separate instructions in different lines. For example, instead of
int main ()
{
cout << " Hello World ";
return 0;
}
We could have written:
int main () { cout << "Hello World"; return 0; }
All in just one line and this would have had exactly the same meaning as the previous code.
In C++, the separation between statements is specified with an ending semicolon (;) at the end of each one, so the separation in different code lines does not matter at all for this purpose. We can write many statements per line or write a single statement that takes many code lines. The division of code in different lines serves only to make it more legible and schematic for the humans that may read it.
Let us add an additional instruction to our first program:
// my second program in C++

#include

using namespace std;

int main ()
{
cout << "Hello World! ";
cout << "I'm a C++ program";
return 0;
} Hello World! I'm a C++ program
In this case, we performed two insertions into cout in two different statements. Once again, the separation in different lines of code has been done just to give greater readability to the program, since main could have been perfectly valid defined this way:
int main () { cout << " Hello World! "; cout << " I'm a C++ program "; return 0; }
We were also free to divide the code into more lines if we considered it more convenient:
int main ()
{
cout <<
"Hello World!";
cout
<< "I'm a C++ program";
return 0;
}
And the result would again have been exactly the same as in the previous examples.
Preprocessor directives (those that begin by #) are out of this general rule since they are not statements. They are lines read and discarded by the preprocessor and do not produce any code by themselves. Preprocessor directives must be specified in their own line and do not have to end with a semicolon (;).
Comments
Comments are parts of the source code disregarded by the compiler. They simply do nothing. Their purpose is only to allow the programmer to insert notes or descriptions embedded within the source code.
C++ supports two ways to insert comments:
// line comment
/* block comment */
The first of them, known as line comment, discards everything from where the pair of slash signs (//) is found up to the end of that same line. The second one, known as block comment, discards everything between the /* characters and the first appearance of the */ characters, with the possibility of including more than one line.
We are going to add comments to our second program:

/* my second program in C++
with more comments */

#include
using namespace std;

int main ()
{
cout << "Hello World! "; // prints Hello World!
cout << "I'm a C++ program"; // prints I'm a C++ program
return 0;
} Hello World! I'm a C++ program
If you include comments within the source code of your programs without using the comment characters combinations //, /* or */, the compiler will take them as if they were C++ expressions, most likely causing one or several error messages when you compile it.
Variables & Data Types.
The usefulness of the "Hello World" programs shown in the previous section is quite questionable. We had to write several lines of code, compile them, and then execute the resulting program just to obtain a simple sentence written on the screen as result. It certainly would have been much faster to type the output sentence by ourselves. However, programming is not limited only to printing simple texts on the screen. In order to go a little further on and to become able to write programs that perform useful tasks that really save us work we need to introduce the concept of variable.
Let us think that I ask you to retain the number 5 in your mental memory, and then I ask you to memorize also the number 2 at the same time. You have just stored two different values in your memory. Now, if I ask you to add 1 to the first number I said, you should be retaining the numbers 6 (that is 5+1) and 2 in your memory. Values that we could now for example subtract and obtain 4 as result.
The whole process that you have just done with your mental memory is a simile of what a computer can do with two variables. The same process can be expressed in C++ with the following instruction set:
a = 5;
b = 2;
a = a + 1;
result = a - b;
Obviously, this is a very simple example since we have only used two small integer values, but consider that your computer can store millions of numbers like these at the same time and conduct sophisticated mathematical operations with them.
Therefore, we can define a variable as a portion of memory to store a determined value.
Each variable needs an identifier that distinguishes it from the others, for example, in the previous code the variable identifiers were a, b and result, but we could have called the variables any names we wanted to invent, as long as they were valid identifiers.
Identifiers
A valid identifier is a sequence of one or more letters, digits or underline characters (_). Neither spaces nor punctuation marks or symbols can be part of an identifier. Only letters, digits and underline characters are valid. In addition, variable identifiers always have to begin with a letter. They can also begin with an underline character (_ ), but this is usually reserved for compiler specific keywords or external identifiers. In no case they can begin with a digit.
Another rule that you have to consider when inventing your own identifiers is that they cannot match any keyword of the C++ language or your compiler's specific ones since they could be confused with these. The standard reserved keywords are:
asm, auto, bool, break, case, catch, char, class, const, const_cast, continue, default, delete, do, double, dynamic_cast, else, enum, explicit, export, extern, false, float, for, friend, goto, if, inline, int, long, mutable, namespace, new, operator, private, protected, public, register, reinterpret_cast, return, short, signed, sizeof, static, static_cast, struct, switch, template, this, throw, true, try, typedef, typeid, typename, union, unsigned, using, virtual, void, volatile, wchar_t, while
Additionally, alternative representations for some operators cannot be used as identifiers since they are reserved words under some circumstances:
and, and_eq, bitand, bitor, compl, not, not_eq, or, or_eq, xor, xor_eq
Your compiler may also include some additional specific reserved keywords.

Very important: The C++ language is a "case sensitive" language. That means that an identifier written in capital letters is not equivalent to another one with the same name but written in small letters. Thus, for example, the RESULT variable is not the same as the result variable or the Result variable. These are three different variable identifiers.
Fundamental data types
When programming, we store the variables in our computer's memory, but the computer has to know what we want to store in them, since it is not going to occupy the same amount of memory to store a simple number than to store a single letter or a large number, and they are not going to be interpreted the same way.
The memory in our computers is organized in bytes. A byte is the minimum amount of memory that we can manage in C++. A byte can store a relatively small amount of data: one single character or a small integer (generally an integer between 0 and 255). In addition, the computer can manipulate more complex data types that come from grouping several bytes, such as long numbers or non-integer numbers.
Next you have a summary of the basic fundamental data types in C++, as well as the range of values that can be represented with each one:
Name Description Size* Range*
char Character or small integer. 1byte signed: -128 to 127
unsigned: 0 to 255
short int
(short) Short Integer. 2bytes signed: -32768 to 32767
unsigned: 0 to 65535
int Integer. 4bytes signed: -2147483648 to 2147483647
unsigned: 0 to 4294967295
long int
(long) Long integer. 4bytes signed: -2147483648 to 2147483647
unsigned: 0 to 4294967295
bool Boolean value. It can take one of two values: true or false. 1byte true or false
float Floating point number. 4bytes 3.4e +/- 38 (7 digits)
double Double precision floating point number. 8bytes 1.7e +/- 308 (15 digits)
long double Long double precision floating point number. 8bytes 1.7e +/- 308 (15 digits)
wchar_t Wide character. 2bytes 1 wide character
* The values of the columns Size and Range depend on the architecture of the system where the program is compiled and executed. The values shown above are those found on most 32bit systems. But for other systems, the general specification is that int has the natural size suggested by the system architecture (one word) and the four integer types char, short, int and long must each one be at least as large as the one preceding it. The same applies to the floating point types float, double and long double, where each one must provide at least as much precision as the preceding one.
Declaration of variables
In order to use a variable in C++, we must first declare it specifying which data type we want it to be. The syntax to declare a new variable is to write the specifier of the desired data type (like int, bool, float...) followed by a valid variable identifier. For example:
int a;
float mynumber;
These are two valid declarations of variables. The first one declares a variable of type int with the identifier a. The second one declares a variable of type float with the identifier mynumber. Once declared, the variables a and mynumber can be used within the rest of their scope in the program.
If you are going to declare more than one variable of the same type, you can declare all of them in a single statement by separating their identifiers with commas. For example:
int a, b, c;
This declares three variables (a, b and c), all of them of type int, and has exactly the same meaning as:
int a;
int b;
int c;
The integer data types char, short, long and int can be either signed or unsigned depending on the range of numbers needed to be represented. Signed types can represent both positive and negative values, whereas unsigned types can only represent positive values (and zero). This can be specified by using either the specifier signed or the specifier unsigned before the type name. For example:
unsigned short int NumberOfSisters;
signed int MyAccountBalance;
By default, if we do not specify either signed or unsigned most compiler settings will assume the type to be signed, therefore instead of the second declaration above we could have written:
int MyAccountBalance;
With exactly the same meaning (with or without the keyword signed)
An exception to this general rule is the char type, which exists by itself and is considered a different fundamental data type from signed char and unsigned char, thought to store characters. You should use either signed or unsigned if you intend to store numerical values in a char-sized variable.
short and long can be used alone as type specifiers. In this case, they refer to their respective integer fundamental types: short is equivalent to short int and long is equivalent to long int. The following two variable declarations are equivalent:
short Year;
short int Year;
Finally, signed and unsigned may also be used as standalone type specifiers, meaning the same as signed int and unsigned int respectively. The following two declarations are equivalent:
unsigned NextYear;
unsigned int NextYear;
To see what variable declarations look like in action within a program, we are going to see the C++ code of the example about your mental memory proposed at the beginning of this section:
// operating with variables

#include
using namespace std;

int main ()
{
// declaring variables:
int a, b;
int result;

// process:
a = 5;
b = 2;
a = a + 1;
result = a - b;

// print out the result:
cout << result;

// terminate the program:
return 0;
} 4
Do not worry if something else than the variable declarations themselves looks a bit strange to you. You will see the rest in detail in coming sections.
Scope of variables
All the variables that we intend to use in a program must have been declared with its type specifier in an earlier point in the code, like we did in the previous code at the beginning of the body of the function main when we declared that a, b, and result were of type int.
A variable can be either of global or local scope. A global variable is a variable declared in the main body of the source code, outside all functions, while a local variable is one declared within the body of a function or a block.

Global variables can be referred from anywhere in the code, even inside functions, whenever it is after its declaration.
The scope of local variables is limited to the block enclosed in braces ({}) where they are declared. For example, if they are declared at the beginning of the body of a function (like in function main) their scope is between its declaration point and the end of that function. In the example above, this means that if another function existed in addition to main, the local variables declared in main could not be accessed from the other function and vice versa.
Initialization of variables
When declaring a regular local variable, its value is by default undetermined. But you may want a variable to store a concrete value at the same moment that it is declared. In order to do that, you can initialize the variable. There are two ways to do this in C++:
The first one, known as c-like, is done by appending an equal sign followed by the value to which the variable will be initialized:
type identifier = initial_value ;
For example, if we want to declare an int variable called a initialized with a value of 0 at the moment in which it is declared, we could write:
int a = 0;
The other way to initialize variables, known as constructor initialization, is done by enclosing the initial value between parentheses (()):
type identifier (initial_value) ;
For example:
int a (0);
Both ways of initializing variables are valid and equivalent in C++.

// initialization of variables

#include
using namespace std;

int main ()
{
int a=5; // initial value = 5
int b(2); // initial value = 2
int result; // initial value undetermined

a = a + 3;
result = a - b;
cout << result;

return 0;
} 6
Introduction to strings
Variables that can store non-numerical values that are longer than one single character are known as strings. The C++ language library provides support for strings through the standard string class. This is not a fundamental type, but it behaves in a similar way as fundamental types do in its most basic usage.

A first difference with fundamental data types is that in order to declare and use objects (variables) of this type we need to include an additional header file in our source code: and have access to the std namespace (which we already had in all our previous programs thanks to the using namespace statement).
// my first string
#include
#include
using namespace std;

int main ()
{
string mystring = "This is a string";
cout << mystring;
return 0;
} This is a string
As you may see in the previous example, strings can be initialized with any valid string literal just like numerical type variables can be initialized to any valid numerical literal. Both initialization formats are valid with strings:
string mystring = "This is a string";
string mystring ("This is a string");
Strings can also perform all the other basic operations that fundamental data types can, like being declared without an initial value and being assigned values during execution:


// my first string
#include
#include
using namespace std;

int main ()
{
string mystring;
mystring = "This is the initial string content";
cout << mystring << endl;
mystring = "This is a different string content";
cout << mystring << endl;
return 0;
} This is the initial string content
This is a different string content

Constants
Constants are expressions with a fixed value.
Literals
Literals are used to express particular values within the source code of a program. We have already used these previously to give concrete values to variables or to express messages we wanted our programs to print out, for example, when we wrote:
a = 5;
the 5 in this piece of code was a literal constant.
Literal constants can be divided in Integer Numerals, Floating-Point Numerals, Characters, Strings and Boolean Values.
Integer Numerals
1776
707
-273
They are numerical constants that identify integer decimal values. Notice that to express a numerical constant we do not have to write quotes (") nor any special character. There is no doubt that it is a constant: whenever we write 1776 in a program, we will be referring to the value 1776.
In addition to decimal numbers (those that all of us are used to use every day) C++ allows the use as literal constants of octal numbers (base 8) and hexadecimal numbers (base 16). If we want to express an octal number we have to precede it with a 0 (zero character). And in order to express a hexadecimal number we have to precede it with the characters 0x (zero, x). For example, the following literal constants are all equivalent to each other:
75 // decimal
0113 // octal
0x4b // hexadecimal
All of these represent the same number: 75 (seventy-five) expressed as a base-10 numeral, octal numeral and hexadecimal numeral, respectively.
Literal constants, like variables, are considered to have a specific data type. By default, integer literals are of type int. However, we can force them to either be unsigned by appending the u character to it, or long by appending l:
75 // int
75u // unsigned int
75l // long
75ul // unsigned long
In both cases, the suffix can be specified using either upper or lowercase letters.
Floating Point Numbers
They express numbers with decimals and/or exponents. They can include either a decimal point, an e character (that expresses "by ten at the Xth height", where X is an integer value that follows the e character), or both a decimal point and an e character:
3.14159 // 3.14159
6.02e23 // 6.02 x 1023
1.6e-19 // 1.6 x 10-19
3.0 // 3.0
These are four valid numbers with decimals expressed in C++. The first number is PI, the second one is the number of Avogadro, the third is the electric charge of an electron (an extremely small number) -all of them approximated- and the last one is the number three expressed as a floating-point numeric literal.
The default type for floating point literals is double. If you explicitly want to express a float or long double numerical literal, you can use the f or l suffixes respectively:
3.14159L // long double
6.02e23f // float
Any of the letters than can be part of a floating-point numerical constant (e, f, l) can be written using either lower or uppercase letters without any difference in their meanings.
Character and string literals
There also exist non-numerical constants, like:
'z'
'p'
"Hello world"
"How do you do?"
The first two expressions represent single character constants, and the following two represent string literals composed of several characters. Notice that to represent a single character we enclose it between single quotes (') and to express a string (which generally consists of more than one character) we enclose it between double quotes (").
When writing both single character and string literals, it is necessary to put the quotation marks surrounding them to distinguish them from possible variable identifiers or reserved keywords. Notice the difference between these two expressions:
x
'x'
x alone would refer to a variable whose identifier is x, whereas 'x' (enclosed within single quotation marks) would refer to the character constant 'x'.
Character and string literals have certain peculiarities, like the escape codes. These are special characters that are difficult or impossible to express otherwise in the source code of a program, like newline (\n) or tab (\t). All of them are preceded by a backslash (\). Here you have a list of some of such escape codes:
\n newline
\r carriage return
\t tab
\v vertical tab
\b backspace
\f form feed (page feed)
\a alert (beep)
\' single quote (')
\" double quote (")
\? question mark (?)
\\ backslash (\)
For example:
'\n'
'\t'
"Left \t Right"
"one\ntwo\nthree"
Additionally, you can express any character by its numerical ASCII code by writing a backslash character (\) followed by the ASCII code expressed as an octal (base-8) or hexadecimal (base-16) number. In the first case (octal) the digits must immediately follow the backslash (for example \23 or \40), in the second case (hexadecimal), an x character must be written before the digits themselves (for example \x20 or \x4A).
String literals can extend to more than a single line of code by putting a backslash sign (\) at the end of each unfinished line.
"string expressed in \
two lines"
You can also concatenate several string constants separating them by one or several blank spaces, tabulators, newline or any other valid blank character:
"this forms" "a single" "string" "of characters"
Finally, if we want the string literal to be explicitly made of wide characters (wchar_t), instead of narrow characters (char), we can precede the constant with the L prefix:
L"This is a wide character string"
Wide characters are used mainly to represent non-English or exotic character sets.
Boolean literals
There are only two valid Boolean values: true and false. These can be expressed in C++ as values of type bool by using the Boolean literals true and false.
Defined constants (#define)
You can define your own names for constants that you use very often without having to resort to memory-consuming variables, simply by using the #define preprocessor directive. Its format is:
#define identifier value
For example:
#define PI 3.14159265
#define NEWLINE '\n'
This defines two new constants: PI and NEWLINE. Once they are defined, you can use them in the rest of the code as if they were any other regular constant, for example:
// defined constants: calculate circumference

#include
using namespace std;

#define PI 3.14159
#define NEWLINE '\n';

int main ()
{
double r=5.0; // radius
double circle;

circle = 2 * PI * r;
cout << circle;
cout << NEWLINE;

return 0;
} 31.4159
In fact the only thing that the compiler preprocessor does when it encounters #define directives is to literally replace any occurrence of their identifier (in the previous example, these were PI and NEWLINE) by the code to which they have been defined (3.14159265 and '\n' respectively).
The #define directive is not a C++ statement but a directive for the preprocessor; therefore it assumes the entire line as the directive and does not require a semicolon (;) at its end. If you append a semicolon character (;) at the end, it will also be appended in all occurrences within the body of the program that the preprocessor replaces.
Declared constants (const)
With the const prefix you can declare constants with a specific type in the same way as you would do with a variable:
const int pathwidth = 100;
const char tabulator = '\t';
const zipcode = 12440;
In case that no type is explicitly specified (as in the last example) the compiler assumes that it is of type int.


Operators

Once we know of the existence of variables and constants, we can begin to operate with them. For that purpose, C++ integrates operators. Unlike other languages whose operators are mainly keywords, operators in C++ are mostly made of signs that are not part of the alphabet but are available in all keyboards. This makes C++ code shorter and more international, since it relies less on English words, but requires a little of learning effort in the beginning.
You do not have to memorize all the content of this page. Most details are only provided to serve as a later reference in case you need it.
Assignation (=)
The assignation operator assigns a value to a variable.
a = 5;
This statement assigns the integer value 5 to the variable a. The part at the left of the assignation operator (=) is known as the lvalue (left value) and the right one as the rvalue (right value). The lvalue has to be a variable whereas the rvalue can be either a constant, a variable, the result of an operation or any combination of these.
The most important rule of assignation is the right-to-left rule: The assignation operation always takes place from right to left, and never the other way:
a = b;
This statement assigns to variable a (the lvalue) the value contained in variable b (the rvalue). The value that was stored until this moment in a is not considered at all in this operation, and in fact that value is lost. Consider also that we are only assigning the value of b to a at the moment of the assignation. Therefore a later change of b will not affect the new value of a.
For example, let us have a look at the following code - I have included the evolution of the content stored in the variables as comments:
// assignation operator

#include
using namespace std;

int main ()
{
int a, b; // a:?, b:?
a = 10; // a:10, b:?
b = 4; // a:10, b:4
a = b; // a:4, b:4
b = 7; // a:4, b:7

cout << "a:";
cout << a;
cout << " b:";
cout << b;

return 0;
} a:4 b:7
This code will give us as result that the value contained in a is 4 and the one contained in b is 7. Notice how a was not affected by the final modification of b, even though we declared a = b earlier (that is because of the right-to-left rule).
A property that C++ has over other programming languages is that the assignation operation can be used as the rvalue (or part of an rvalue) for another assignation. For example:
a = 2 + (b = 5);
is equivalent to:
b = 5;
a = 2 + b;
that means: first assign 5 to variable b and then assign to a the value 2 plus the result of the previous assignation of b (i.e. 5), leaving a with a final value of 7.
The following expression is also valid in C++:
a = b = c = 5;
It assigns 5 to the all the three variables: a, b and c.
Arithmetic operators ( +, -, *, /, % )
The five arithmetical operations supported by the C++ language are:
+ addition
- subtraction
* multiplication
/ division
% modulo
Operations of addition, subtraction, multiplication and division literally correspond with their respective mathematical operators. The only one that you might not be so used to see may be modulo; whose operator is the percentage sign (%). Modulo is the operation that gives the remainder of a division of two values. For example, if we write:
a = 11 % 3;
the variable a will contain the value 2, since 2 is the remainder from dividing 11 between 3.
Compound assignation (+=, -=, *=, /=, %=, >>=, <<=, &=, ^=, |=)
When we want to modify the value of a variable by performing an operation on the value currently stored in that variable we can use compound assignation operators:
expression is equivalent to
value += increase; value = value + increase;
a -= 5; a = a - 5;
a /= b; a = a / b;
price *= units + 1; price = price * (units + 1);
and the same for all other operators. For example:

// compund assignation

#include
using namespace std;

int main ()
{
int a, b=3;
a = b;
a+=2; // equivalent to a=a+2
cout << a;
return 0;
} 5
Increase and decrease (++, --)
Shortening even more some expressions, the increase operator (++) and the decrease operator (--) increase or reduce by one the value stored in a variable. They are equivalent to +=1 and to -=1, respectively. Thus:
c++;
c+=1;
c=c+1;
are all equivalent in its functionality: the three of them increase by one the value of c.
In the early C compilers, the three previous expressions probably produced different executable code depending on which one was used. Nowadays, this type of code optimization is generally done automatically by the compiler, thus the three expressions should produce exactly the same executable code.
A characteristic of this operator is that it can be used both as a prefix and as a suffix. That means that it can be written either before the variable identifier (++a) or after it (a++). Although in simple expressions like a++ or ++a both have exactly the same meaning, in other expressions in which the result of the increase or decrease operation is evaluated as a value in an outer expression they may have an important difference in their meaning: In the case that the increase operator is used as a prefix (++a) the value is increased before the result of the expression is evaluated and therefore the increased value is considered in the outer expression; in case that it is used as a suffix (a++) the value stored in a is increased after being evaluated and therefore the value stored before the increase operation is evaluated in the outer expression. Notice the difference:
Example 1 Example 2
B=3;
A=++B;
// A contains 4, B contains 4 B=3;
A=B++;
// A contains 3, B contains 4
In Example 1, B is increased before its value is copied to A. While in Example 2, the value of B is copied to A and then B is increased.
Relational and equality operators ( ==, !=, >, <, >=, <= )
In order to evaluate a comparison between two expressions we can use the relational and equality operators. The result of a relational operation is a Boolean value that can only be true or false, according to its Boolean result. We may want to compare two expressions, for example, to know if they are equal or if one is greater than the other is. Here is a list of the relational and equality operators that can be used in C++:
== Equal to
!= Not equal to
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
Here there are some examples:
(7 == 5) // evaluates to false.
(5 > 4) // evaluates to true.
(3 != 2) // evaluates to true.
(6 >= 6) // evaluates to true.
(5 < 5) // evaluates to false.
Of course, instead of using only numeric constants, we can use any valid expression, including variables. Suppose that a=2, b=3 and c=6,
(a == 5) // evaluates to false since a is not equal to 5.
(a*b >= c) // evaluates to true since (2*3 >= 6) is true.
(b+4 > a*c) // evaluates to false since (3+4 > 2*6) is false.
((b=2) == a) // evaluates to true.
Be careful! The operator = (one equal sign) is not the same as the operator == (two equal signs), the first one is an assignation operator (assigns the value at its right to the variable at its left) and the other one (==) is the equality operator that compares whether both expressions in the two sides of it are equal to each other. Thus, in the last expression ((b=2) == a), we first assigned the value 2 to b and then we compared it to a, that also stores the value 2, so the result of the operation is true.
Logical operators ( !, &&, || )
The Operator ! is the C++ operator to perform the Boolean operation NOT, it has only one operand, located at its right, and the only thing that it does is to inverse the value of it, producing false if its operand is true and true if its operand is false. Basically, it returns the opposite Boolean value of evaluating its operand. For example:
!(5 == 5) // evaluates to false because the expression at its right (5 == 5) is true.
!(6 <= 4) // evaluates to true because (6 <= 4) would be false.
!true // evaluates to false
!false // evaluates to true.
The logical operators && and || are used when evaluating two expressions to obtain a single relational result. The operator && corresponds with Boolean logical operation AND. This operation results true if both its two operands are true, and false otherwise. The following panel shows the result of operator && evaluating the expression a && b:
&& OPERATOR
a b a && b
true true true
true false false
false true false
false false false
The operator || corresponds with Boolean logical operation OR. This operation results true if either one of its two operands is true, thus being false only when both operands are false themselves. Here are the possible results of a || b:
|| OPERATOR
a b a || b
true true true
true false true
false true true
false false false
For example:
( (5 == 5) && (3 > 6) ) // evaluates to false ( true && false ).
( (5 == 5) || (3 > 6) ) // evaluates to true ( true || false ).
Conditional operator ( ? )
The conditional operator evaluates an expression returning a value if that expression is true and a different one if the expression is evaluated as false. Its format is:
condition ? result1 : result2
If condition is true the expression will return result1, if it is not it will return result2.
7==5 ? 4 : 3 // returns 3, since 7 is not equal to 5.
7==5+2 ? 4 : 3 // returns 4, since 7 is equal to 5+2.
5>3 ? a : b // returns the value of a, since 5 is greater than 3.
a>b ? a : b // returns whichever is greater, a or b.
// conditional operator

#include
using namespace std;

int main ()
{
int a,b,c;

a=2;
b=7;
c = (a>b) ? a : b;

cout << c;

return 0;
} 7
In this example a was 2 and b was 7, so the expression being evaluated (a>b) was not true, thus the first value specified after the question mark was discarded in favor of the second value (the one after the colon) which was b, with a value of 7.
Comma operator ( , )
The comma operator (,) is used to separate two or more expressions that are included where only one expression is expected. When the set of expressions has to be evaluated for a value, only the rightmost expression is considered.
For example, the following code:
a = (b=3, b+2);
Would first assign the value 3 to b, and then assign b+2 to variable a. So, at the end, variable a would contain the value 5 while variable b would contain value 3.
Bitwise Operators ( &, |, ^, ~, <<, >> )
Bitwise operators modify variables considering the bit patterns that represent the values they store.
operator asm equivalent description
& AND Bitwise AND
| OR Bitwise Inclusive OR
^ XOR Bitwise Exclusive OR
~ NOT Unary complement (bit inversion)
<< SHL Shift Left
>> SHR Shift Right
Explicit type casting operator
Type casting operators allow you to convert a datum of a given type to another. There are several ways to do this in C++. The simplest one, which has been inherited from the C language, is to precede the expression to be converted by the new type enclosed between parentheses (()):
int i;
float f = 3.14;
i = (int) f;
The previous code converts the float number 3.14 to an integer value (3), the remainder is lost. Here, the typecasting operator was (int). Another way to do the same thing in C++ is using the functional notation: preceding the expression to be converted by the type and enclosing the expression between parentheses:
i = int ( f );
Both ways of type casting are valid in C++.
sizeof()
This operator accepts one parameter, which can be either a type or a variable itself and returns the size in bytes of that type or object:
a = sizeof (char);
This will assign the value 1 to a because char is a one-byte long type.
The value returned by sizeof is a constant, so it is always determined before program execution.
Other operators
Later in these tutorials, we will see a few more operators, like the ones referring to pointers or the specifics for object-oriented programming. Each one is treated in its respective section.
Precedence of operators
When writing complex expressions with several operands, we may have some doubts about which operand is evaluated first and which later. For example, in this expression:
a = 5 + 7 % 2
we may doubt if it really means:
a = 5 + (7 % 2) // with a result of 6, or
a = (5 + 7) % 2 // with a result of 0
The correct answer is the first of the two expressions, with a result of 6. There is an established order with the priority of each operator, and not only the arithmetic ones (those whose preference come from mathematics) but for all the operators which can appear in C++. From greatest to lowest priority, the priority order is as follows:
Level Operator Description Grouping
1 :: scope Left-to-right
2 () [] . -> ++ -- dynamic_cast static_cast reinterpret_cast const_cast typeid postfix Left-to-right
3 ++ -- ~ ! sizeof new delete unary (prefix) Right-to-left
* & indirection and reference (pointers)
+ - unary sign operator
4 (type) type casting Right-to-left
5 .* ->* pointer-to-member Left-to-right
6 * / % multiplicative Left-to-right
7 + - additive Left-to-right
8 << >> shift Left-to-right
9 < > <= >= relational Left-to-right
10 == != equality Left-to-right
11 & bitwise AND Left-to-right
12 ^ bitwise XOR Left-to-right
13 | bitwise OR Left-to-right
14 && logical AND Left-to-right
15 || logical OR Left-to-right
16 ?: conditional Right-to-left
17 = *= /= %= += -= >>= <<= &= ^= != assignment Right-to-left
18 , comma Left-to-right
Grouping defines the precedence order in which operators are evaluated in the case that there are several operators of the same level in an expression.
All these precedence levels for operators can be manipulated or become more legible by removing possible ambiguities using parentheses signs ( and ), as in this example:
a = 5 + 7 % 2;
might be written either as:
a = 5 + (7 % 2);
or
a = (5 + 7) % 2;
depending on the operation that we want to perform.
So if you want to write complicated expressions and you are not completely sure of the precedence levels, always include parentheses. It will also become a code easier to read.

Control Structures
A program is usually not limited to a linear sequence of instructions. During its process it may bifurcate, repeat code or take decisions. For that purpose, C++ provides control structures that serve to specify what has to be done by our program, when and under which circumstances.
With the introduction of control structures we are going to have to introduce a new concept: the compound-statement or block. A block is a group of statements which are separated by semicolons (;) like all C++ statements, but grouped together in a block enclosed in braces: { }:
{ statement1; statement2; statement3; }
Most of the control structures that we will see in this section require a generic statement as part of its syntax. A statement can be either a simple statement (a simple instruction ending with a semicolon) or a compund statement (several instructions grouped in a block), like the one just described. In the case that we want the statement to be a simple statement, we do not need to enclose it in braces ({}). But in the case that we want the statement to be a compund statement it must enclosed between braces ({}) forming a block.
Conditional structure: if and else
The if keyword is used to execute a statement or block only if a condition is fulfilled. Its form is:
if (condition) statement
Where condition is the expression that is being evaluated. If this condition is true, statement is executed. If it is false, statement is ignored (not executed) and the program continues right after this conditional structure.
For example, the following code fragment prints x is 100 only if the value stored in the x variable is indeed 100:
if (x == 100)
cout << "x is 100";
If we want more than a single statement to be executed in case that the condition is true we can specify a block using braces { }:
if (x == 100)
{
cout << "x is ";
cout << x;
}
We can additionally specify what we want to happen if the condition is not fulfilled by using the keyword else. Its form used in conjunction with if is:
if (condition) statement1 else statement2
For example:
if (x == 100)
cout << "x is 100";
else
cout << "x is not 100";
prints on the screen x is 100 if indeed x has a value of 100, but if it has not -and only if not- it prints out x is not 100.
The if + else structures can be concatenated with the intention of verifying a range of values. The following example shows its use telling if the value currently stored in x is positive, negative or none of them (i.e. zero):
if (x > 0)
cout << "x is positive";
else if (x < 0)
cout << "x is negative";
else
cout << "x is 0";
Remember that in case that we want more than a single statement to be executed, we must group them in a block by enclosing them in braces { }.
Iteration structures (loops)
Loops have as purpose to repeat a statement a certain number of times or while a condition is fulfilled.
The while loop
Its format is: while (expression) statement
and its functionality is simply to repeat statement while the condition set in expression is true.
For example, we are going to make a program to count down using a while-loop:
// custom countdown using while

#include
using namespace std;

int main ()
{
int n;
cout << "Enter the starting number > ";
cin >> n;

while (n>0) {
cout << n << ", ";
--n;
}

cout << "FIRE!";
return 0;
} Enter the starting number > 8
8, 7, 6, 5, 4, 3, 2, 1, FIRE!
When the program starts the user is prompted to insert a starting number for the countdown. Then the while loop begins, if the value entered by the user fulfills the condition n>0 (that n is greater than zero) the block that follows the condition will be executed and repeated while the condition (n>0) remains being true.
The whole process of the previous program can be interpreted according to the following script (beginning in main):
1. User assigns a value to n
2. The while condition is checked (n>0). At this point there are two posibilities:
* condition is true: statement is executed (to step 3)
* condition is false: ignore statement and continue after it (to step 5)
3. Execute statement:
cout << n << ", ";
--n;
(prints the value of n on the screen and decreases n by 1)
4. End of block. Return automatically to step 2
5. Continue the program right after the block: print FIRE! and end program.
When creating a while-loop, we must always consider that it has to end at some point, therefore we must provide within the block some method to force the condition to become false at some point, otherwise the loop will continue looping forever. In this case we have included --n; that decreases the value of the variable that is being evaluated in the condition (n) by one - this will eventually make the condition (n>0) to become false after a certain number of loop iterations: to be more specific, when n becomes 0, that is where our while-loop and our countdown end.
Of course this is such a simple action for our computer that the whole countdown is performed instantly without any practical delay between numbers.
The do-while loop
Its format is:
do statement while (condition);
Its functionality is exactly the same as the while loop, except that condition in the do-while loop is evaluated after the execution of statement instead of before, granting at least one execution of statement even if condition is never fulfilled. For example, the following example program echoes any number you enter until you enter 0.






// number echoer

#include
using namespace std;

int main ()
{
unsigned long n;
do {
cout << "Enter number (0 to end): ";
cin >> n;
cout << "You entered: " << n << "\n";
} while (n != 0);
return 0;
} Enter number (0 to end): 12345
You entered: 12345
Enter number (0 to end): 160277
You entered: 160277
Enter number (0 to end): 0
You entered: 0
The do-while loop is usually used when the condition that has to determine the end of the loop is determined within the loop statement itself, like in the previous case, where the user input within the block is what is used to determine if the loop has to end. In fact if you never enter the a value 0 in the previous example you can be prompted for more numbers forever.
The for loop
Its format is:
for (initialization; condition; increase) statement;
and its main function is to repeat statement while condition remains true, like the while loop. But in addition, the for loop provides specific locations to contain an initialization statement and an increase statement. So this loop is specially designed to perform a repetitive action with a counter which is initialized and increased on each iteration.
It works in the following way:
1. initialization is executed. Generally it is an initial value setting for a counter variable. This is executed only once.
2. condition is checked. If it is true the loop continues, otherwise the loop ends and statement is skipped (not executed).
3. statement is executed. As usual, it can be either a single statement or a block enclosed in braces { }.
4. finally, whatever is specified in the increase field is executed and the loop gets back to step 2.
Here is an example of countdown using a for loop:
// countdown using a for loop
#include
using namespace std;
int main ()
{
for (int n=10; n>0; n--) {
cout << n << ", ";
}
cout << "FIRE!";
return 0;
} 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, FIRE!
The initialization and increase fields are optional. They can remain empty, but in all cases the semicolon signs between them must be written. For example we could write: for (;n<10;) if we wanted to specify no initialization and no increase; or for (;n<10;n++) if we wanted to include an increase field but no initialization (maybe because the variable was already initialized before).
Optionally, using the comma operator (,) we can specify more than one expression in any of the fields included in a for loop, like in initialization, for example. The comma operator (,) is an expression separator, it serves to separate more than one expression where only one is generally expected. For example, suppose that we wanted to initialize more than one variable in our loop:
for ( n=0, i=100 ; n!=i ; n++, i-- )
{
// whatever here...
}
This loop will execute for 50 times if neither n or i are modified within the loop:

n starts with a value of 0, and i with 100, the condition is n!=i (that n is not equal to i). Beacuse n is increased by one and i decreased by one, the loop's condition will become false after the 50th loop, when both n and i will be equal to 50.

Tuesday, July 3, 2007

Algorithms and Data Structures

Algorithm: a process or set of rules used for calculation or problem-solving, esp. with a computer.
Program: a series of coded instructions to control the operation of a computer or other machine. [-concise OED '91]

Example
Problem: Find the greatest common divisor (GCD) of two integers, m and n.
Euclid's Algorithm:

while m is greater than zero:
If n is greater than m, swap m and n.
Subtract n from m.
n is the GCD

Program (in C):
int gcd(int m, int n)
/* precondition: m>0 and n>0. Let g=gcd(m,n). */
{ while( m > 0 )
{ /* invariant: gcd(m,n)=g */
if( n > m )
{ int t = m; m = n; n = t; } /* swap */
/* m >= n > 0 */
m -= n;
}
return n;
}


©
L
.
A
l
l
i
s
o
n

The greatest common divisor (GCD) of [ ] and [ ] is [ ].
trace:

Correctness
Why do we believe that this algorithm devised thousands of years ago, is correct?

Given m>0 and n>0, let g = gcd(m,n).
(i) If m=n then m=n=gcd(m,n) and the algorithm sets m to zero and returns n, obviously correct.
(ii) Otherwise, suppose m>n. Then m=p×g and n=q×g where p and q are coprime, from the definition of greatest common divisor. We claim that gcd(m-n,n)=g. Now m-n=p×g-q×g=(p-q)g. so we must show that (p-q) and q are coprime. If not then p-q=a×c and q=b×c for some a,b,c>1. But then p=q+a×c=b×c+a×c=(a+b)×c and because q=b×c, p and q would not have been coprime ... contradiction. Hence (p-q) and q are coprime, and gcd(m-n,n)=gcd(m,n).
(iii) If m=n and that case has been covered.

So the algorithm is correct, provided that it terminates.

Termination
At the start of each iteration of the loop, either n>m or m>=n.
(i) If m>=n, then m is replaced by m-n which is smaller than the previous value of m, and still non-negative.
(ii) If n>m, m and n are exchanged, and at the next iteration case (i) will apply.
So at each iteration, max(m,n) either remains unchanged (for just one iteration) or it decreases.
This cannot go on for ever because m and n are integers (this fact is important), and eventually a lower limit is reached, when m=0 and n=g.

So the algorithm does terminate.

Testing
Having proved the algorithm to be correct, one might argue that there is no need to test it. But there might be an error in the proof or maybe the program has been coded wrongly. Good test values would include:

special cases where m or n equals 1, or
m, or n, or both equal small primes 2, 3, 5, ..., or
products of two small primes such as p1×p2 and p3×p2,
some larger values, but ones where you know the answers,
swapped values, (x,y) and (y,x), because gcd(m,n)=gcd(n,m).
The objective in testing is to "exercise" all paths through the code, in different combinations.

Debugging code be inserted to print the values of m and n at the end of each iteration to confirm that they behave as expected.

Complexity
We are interested in how much time and space (computer memory) a computer algorithm uses; i.e. how efficient it is. This is called time- and space- complexity. Typically the complexity is a function of the values of the inputs and we would like to know what function. We can also consider the best-, average-, and worst-cases.

Time
The time to execute one iteration of the loop depends on whether m>n or not, but both cases take constant time: one test, a subtraction and 4 assignments v. one test, a subtraction and one assignment. So the time taken for one iteration of the loop is bounded by a constant. The real question then is, how many iterations take place? The answer depends on m and n.

If m=n, there is just one iteration; this is the best-case.
If n=1, there are m iterations; this is the worst-case (equivalently, if m=1 there are n iterations).
The average-case time-complexity of this algorithm is difficult to analyse.

Space
The space-complexity of Euclid's algorithm is a constant, just space for three integers: m, n and t. We shall see later that this is `O(1)'.

Exercises
Devise a quicker version of Euclid's algorithm that does not sit in the loop subtracting individual copies of n from m when m>>n.