INTRO To C++

T. Weldon

Intro

Below is a summary of some of the basics of C++, along with a few pointers and tips on UNIX and Make. This only serves as a brief introduction, and you should rely heavily on a C++ programming textbook. One such textbook is suggested on the course outline. Because most students are already familiar with C++, it is not appropriate to require a textbook in the course. Feel free to choose any C++ textbook that you feel comfortable with (either in the campus bookstore or at Barnes and Noble, Amazon.com, etc.) If you are uncertain, then get the book that is recommended for the course.

UNIX

When dealing with compiling and building programs, the UNIX command line is often used rather than the mouse-based interfaces. The following is a summary of handy commands:

"pwd" - show current directory

"cd" - change directory

"mkdir" - create directory

"ls" - list files in directory

"ls - l" - long list (drwxr-xr-x = read/write/execute; fields: user group world)

"chmod 755 filename" to set file protections

cp xx yy - copy file xx to file yy

cp xx/* yy - copy all files in directory xx to directory yy (if yy exists)

"getfacl filename" - get afs file protections

rmdir - delete directory

rm - delete file

env - list environment variables

"man commandname" - to get on-line manual page

"control - z" - to suspend a foreground job

"bg %" - to run in background

"jobs" to list background jobs

"!!" - repeat last command (bang-bang)

"!-3" - repeat 3 commands ago

"^aws^des" - replace "aws" in last command with "des" (fixes errors!)

"which commandname" - tells which command will be executed

AFS file protection access control lists http://www.coe.uncc.edu/project_mosaic/notice/ACL.html

UNIX commands, http://www.coe.uncc.edu/project_mosaic/information/unix_commands.html

MOSAIC help , http://www.coe.uncc.edu/project_mosaic/

Make

Make is the primary means for managing the compilation of large C++ projects on UNIX. Make is a program that keeps track of which files have been edited/modified, and rebuilds any executable files that are affected by the changes. To build the program, all you need to do is type "make" in the directory containing the Makefile! When you do this, you should see some output like the following as the program is built (assuming the program is named vecdemo, and the class file is Vector.cc):

        % make
        CC -g  -DMAIN -DSVR4 -c  Vector.cc -o Vector.o 
        CC -g  -DMAIN -DSVR4 -c  vecdemo.cc -o vecdemo.o 
        CC -g  -DMAIN -DSVR4  vecdemo.o Vector.o -o vecdemo  -lm

(Without the makefile, you would have to do this by hand each tme you edit your program.)

The operation of the command "make" is governed by the presence of a file called "Makefile" in the current directory. The details of the structure of the makefile can be pretty complicated, so I have provided a simplified Makefile for your use. The Makefile supplied in the projects includes a command:

PROGRAM=f1

This line establishes the name of the program. So, all you have to do is modify that line (unless you need to add new libraries). The file presumes that your main source-code file will be named f1.cc, and that you will be using the same C++ class filename that I provided (Myclass.cc, or Iclass.cc). If all goes well, this is the only modification that you will need.

As far as libraries go (and headers include files), type "man log", and notice the sample compile line and header include statement. Just modify the compile line to include the needed "-lwhatever" to add library "whatever." In the case of log, "-lm" is indicated. (Also note the format of the include statement "#include ") If you wanted to add library "whatever," the command line in the Makefile would become:

$(CCC) $(PROGRAM).o Iclass.o -o $(PROGRAM) -lm -lwhatever

Finally, a brief note on the meaning of the lines in the Makefile. The 2 lines:

       Vector.o:Vector.cc Vector.h
                $(CCC) -c  Vector.cc -o Vector.o

say that if either file Vector.cc or Vector.h have more recent dates/times than the file Vector.o, execute the second line. In the second line $(CCC) is replaced by the earlier definition of CCC. The basic idea of these statements are to automatically update the appropriate files when the source files or headers are edited/changed.

You may wish to try out the gnu compiler, too. These lines are commented out using pound signs at the beginning of the lines.

Finally, if you type "make clean" the ".o" files and executable are deleted. Occasionally this is needed in complex compilings to "start over".

C++ Summary

First. C is a subset of C++. Any function that you can use in C can be used in C++. This means C programmers can quickly adapt to C++. It is even better if you don't have any C background, since you will not be tempted to do things the "wrong way." In particular, C programmers should be particularly careful in using "prototypes" and resist all temptation to use "malloc" (instead use "new").

The main features of C++ are that it introduces the notion of "classes" and permits "Object-oriented Coding." Classes allow the programmer to create new types of data structures, AND the class member functions are permitted to operate on the class data structures. Further, "firewalls" can be built around the data to prevent inadvertent data destruction. In general, it is good style to only access data through these class functions - NEVER directly access the data. But before we describe classes, we describe a few commands in C++.

Some C++ Commands

Let's start with the world's simplest C++ program, hello.cc:

	#include <iostream.h>

	int main ()				// this is the main executable
	{ 
		cout << "hello\n"; 
	}

Above, "int main() { ... }" is the main routine that will execute is defined. This is what will run when you compile hello.cc and type "hello" at the command line to execute the program. The "#include < iostream.h >" statement is used to include input/output functions from the UNIX system. In particular, it would include the definition of "cout <<." If file IO was used, "#include < fstream.h >" would also appear at the top of the file. Other commonly included files are "#include < stdlib.h > ," "#include < new.h >," "#include < math.h >," and "#include < string.h >."

"cout <<" is C++ handy method of printing to the terminal, and "\n" is a carriage return. Just as in C, curly brackets are used to begin and end functions and subprograms, and semicolon is used to end each command.

Finally, "//" is used to start comments in C++. (C programmers can still use /* */, but "//" is much easier.)

OK, let's make it a bit more complicated:

	#include <iostream.h>
	#include <string.h>
 
	int main ()				// this is the main executable
	{ 
		int x=2;
		float y=3;
		char s[20];

		strcpy(s,"one");
		strcat(s," two");
		
		cout << "hello x=" << x <<" x=" << (float) x << " y=" << y << "\n";
		cout <<"hello2:" << s <<"\n";
	}

An integer and floating point variable are created and initialized. A character array was also created of length 20, loaded with "one,", then concatenated with "two." (Note the added include file string.h, some systems call this strings.h.) The "(float)" in the printout is ud]sed to cast the integer variable to a floating point variable. Sometimes this is useful for integer types that occasionally are mishandled/mangled in the printing statements. After building and running this, the output will be:

hello x=2 x=2 y=3
hello2:one two

Finally, the typical math operators are available, +, -, =, *, /.

Some pointers on pointers and arrays

Pointers are not unique to C++, but are so frequently used that they are mentioned here. In the computer, the data is stored in memory locations. Each memory location has an unique address. Thus, there is a distinction between the address of the data (or location of the data) and the data itself (or contents of the location).

In C and C++, pointers are used to indicate the address of data - or to "point to" data. To define a pointer to an integer data item, one would write:

	int * xp;

As written above, xp is a pointer to some integer data. When the program runs, it creates space large enough to hold an integer and loads the variable xp with a pointer to the data area (loads xp with the address/location of the data). Thus, if one were to print out xp, it would print the address of the data rather than the data itself.

Given a pointer xp as defined above, it is necessary to provide some mechanism for accessing the data -- since we usually aren't interested in the address/location of data. To access the data, the "*" is used again. For example,

	int y;
	int * xp;

	* xp =4;
	y = * xp;

where the first command stores the value 4 in the locaction pointed to by xp. In the second command, the variable y is loaded with the contents of the location pointed to by xp. The use of "*" to access the data is called "dereferencing the pointer."

The above example is not very useful. The real power of pointers is in accessing arrays of data. By convention, the pointer points to the first storage location. Then,

	*(x + 11)

would access the 12th element in an array (0 offset is the first element). ( Note: *x + 11 would add 11 to the contents of the location pointed to by x.)

Finally, arrays can be dimensioned using the "[]" notation.

	int x[100];

The above creates a size 100 array of integers. The 8th item could be accessed using pointer notation as "*(x+7)" since the name of the array can by default be used as a pointer to the first element of an array. Note, array indices start with zero. Also, it is possible to access the 8th element as "x[7]"

Classes

Again, the main feature of C++ is its class definitions. The overall format of the class definition is:

class classname			//the class name
{
	private:		//what follows is not accessible to outside world
		int x;		// generally your data goes here
		float y;
	
	public:			// what follows is accessible to the outside world
				// generally the functions that access and 
				//manipulate the data follow this point
};

In the above example, data is placed in a private area denoted by the keyword private. This data can only be accessed directly by "member functions." Member functions, are functions that are contained in and defined in the class. These generally would appear after the "public:" keyword above. In addition, the class definition above would typically contain only "prototypes" of the functions, not the source code implementing the functions. (An exception is made for extremely short/trivial functions that can be done "in-line.") As a point of style, you are to include the class definitions in header files (i.e., Classname.h) with sourcecode in another file (i.e., Classname.h). This makes it simple to use the class in other programs just by including the line #include "Classname.h" (Note, parentheses are used to include local files, and the brackets are used to include global system files such as #include < new.h > ).

A larger example class

Let's now consider a more complex class, one designed for doing vector math. The class is called Vector, and its class definition is in the header file Vector.h.

	// Vector.h 
	#ifndef _Vector_h               // this prevents multiple inclusion 
                                        // of the header
	#define _Vector_h 1
	#include <stdlib.h>		// standard library
	#include <iostream.h>	// basic I/O library
	#include <new.h>		// library for new/delete
	#include <fstream.h>	// library for file I/O
	#include <string.h>

	class Vector			//our new class 
	{

	private:			// (private) data in our class:
	  char *Message;		// a character string
	  int x;			// an integer variable
	  int y;			// another

	public:				// (public) functions in our class:
	  Vector();			// default constructor
	  Vector(char *, int, int);	// "full" constructor
	  ~Vector();			// destructor
	  void print()			// function to print out the contents
					//      implemented "in-line" below:
		{cout << Message << " x=" << x << " y=" << y <<"\n";}

	  Vector &operator=(Vector&);	//"overloaded" equal
	  Vector inner(Vector&);	//inner product, x1*x2 + y1*y2
	};
	#endif

The main features to point out in this example are:

All source code to implement the functions is in Vector.cc with the exception of the in-line function print()

All data is private: int x and y, and character pointer Message

Only room for the pointer to Message is allocated, the constructor and destructors must manage the free-store memory that is allocated/deallocated to Message

"Vector()" is the default constructor function. It is used to create and initialize a class object when no arguments are given to the constructor(object is an instantiation of a class). This function would be activated if a statement such as "Vector z;" appeared in a program. The source code in file Vector.cc for "Vector()" might be:

	Vector::Vector()				//default constructor
	{
	Message = new  char[strlen("unnamed")+1];
	strcpy(Message,"unnamed");
	x =1;
	y=1;
	}

"Vector::" is used to prefix and denote member functions in the source code

New allocates memory from the "free store." This is different from allocating memory beforehand as in "char Message[20]" and allows dynamic resizing of data storage as the program runs. Memory is freed by the delete command (see below in the destructor)

Constructors do not specify a return value - they have no leading variable type (int, float, Vector) or leading "void"

"Vector(char *, int, int)" is a constructor that takes arguments to initialize all three of the data items in Vector. It would be invoked in a program by a statement "Vector z("vector z", 1, 2);" that would load the string "vector z" into the area pointed to by Message, and load 1 into x and 2 into y. The source code for "Vector(char *, int, int)" might be:

	Vector::Vector(char *m, int xx, int yy)		//full constructor
	{
	Message = new  char[strlen(m)+1];
	strcpy(Message,m);	
	x=xx;
	y=yy;
	}

In Vector::Vector(...) above, the C++ function "new" is used to allocate memory for the string. See below for more information on "new"

C++ determines which functions are called based on:

Name of function

The number of variables

The types of variables

Order of types is important (i.e., float, int, int, float )

~Vector(); is the destructor. It is used to clean up after a class object is no longer needed. This happens when temporary variables are freed. It is critical to release any memory that may have been allocated by the "new" command. The source code in file Vector.cc for ~Vector() might be:

	Vector::~Vector()				//destructor
	{
	delete Message;	//free up array memory
	}

Note: to delete an array of class objects, you must use the form "delete []." Since the above is an array of built-in char's, the "[]" isnt needed

The constructors and destructors do not have associated return types, the first function with a return type is "void print()," with return type void

The function print() is an "in-line" function, since its source code is included directly below the function prototype. Note the position of the semicolon. IN-LINE FUNCTIONS SHOULD BE SMALL; 1-LINE!

"Vector &operator=(Vector&)" is an overloaded equal operator. It is defined to add a Vector object (this is implied since all the functions in the class operate on class objects) to a second vector object ( indicated by the argument in parenthesis) and return a Vector object (as indicated by the leading "Vector")

The function "Vector &operator=(Vector&)" could be implemented as:

	Vector & Vector::operator=(Vector& v)		//Overloaded equal
	{
	x=v.x;
	y=v.y;
	return *this;
	}

This is "overloaded" since new functionality is added to the "=" operator - over and above the definitions for "built-in" variable types such as float, int, etc.

Overloading allows the C++ operator to define =,+,*,- for the new classes that are defined.

The constructor previously discussed is also overloaded, since one of the two constructors are called based on the types of arguments or number of arguments. Generally overloaded functions are resolved by the number and type of variables involved.

The above is a very simple overloaded equal that does not handle the potential problem of copying a Message strings with different sizes (possibly requiring new memory allocation)

If the function was invoked by a statement such as "a=b;", once inside the subroutine - the "b" would correspond to the argument "Vector& v", and the "a" would correspond to the default interpretations of "x" and "y"

The default interpretations of "x" and "y" can also be written "*this.x" and "*this.y" See below for more on "*this"

The suffix "&" following the variable type declaration is used to denote a "reference." In many programming languages, copies of variables are passed to functions rather than passing the variables themselves. When a reference is denoted, the actual variable is passed rather than a copy. Thus, any changes made to a variable are permanent if the variable is passed to the subroutine as a reference.

As a point of confusion, there is another use of "&"; "& variablename" uses the "&" to indicate "address of variablename" -- but this use is outside of a variable-type declaration. Steer clear of this use.

Since the function is defined to return a Vector object, the routine must include the last statement "return *this."

AS A RULE: do not return references in functions. Overloaded equal is the only exception. The danger of returning references is that the variables created in subroutines are destroyed when the subroutine terminates.

One could also define an overloaded equal "Vector& Vector::operator=(float z)." How?

The function "Vector inner(Vector&)" computes the vector inner product and returns a vector. It would be invoked by a statement such as "z=b.inner(c);" which would take the inner products of b and c. The function "Vector inner(Vector&)" could be implemented as:

	Vector Vector::inner(Vector& v)		//inner product
	{
	Vector a;
	delete a.Message;
	a.Message =  new  char[strlen( (*this).Message)+
		strlen(" * ") + strlen(v.Message)+1];
	strcpy(a.Message,(*this).Message);
	strcat(a.Message," * ");	
	strcat(a.Message,v.Message);
	a.x= (*this).x * v.x + (*this).y * v.y;
	a.y = 0;
	return a;
	}

Above, a new vector "a" is created for storing results

Notice the freeing of old memory and allocating new space for the combined message string

Suppose the subroutine was invoked by the command "z=b.inner(c);" At the computation of the inner product, (*this).x and (*this).y would then be the x and y components of "b," and v.x and v.y would be the x and y components of "c."

The result is put in the y component of the internally created vector "a"

The internally created vector "a" containing the results is returned, since the function is defined to return a vector

this and *this

A somewhat mysterious element of C++ is "this." Recall that the method of invoking functions is generally something like:

	a = b.myfunction(c,4,3,...);

In the above, function "myfunction" is invoked by class object "b" and passes arguments "c,4,3,..." to the function. Once inside the function, the arguments in the parentheses are identified by the argument variables in the function definition such as "Vector::myfunction(Myclass & x, int y, float z)." However, the invoking object "b" is not an argument, so some mechanism is required to get access to the invoking object. This mechanism is through the pointer "this" which points to the invoking object. Thus, the object is "*this" using "*" to dereference the pointer.

It is not always necessary to use "this" to access the invoking objec, although it helps prevent confusion. The foregoing example class "Vector" contained data "int x", "int y" and "char *Message". This data, for the invoking object, could be accessed by (*this).x, (*this).y, and (*this).Message. However, it is understood within a class function by default that x, y, and Message alone (without the *this) refer to the corresponding data items of the invoking object.

new and delete

The "new" and "delete" commands in the C++ language are used to allocate and deallocate memory for data. There are two ways to allocate memory: 1) allocate the memory in some fixed quantity at the time of compiling the program, 2) allocate memory dynamically as the program is running. In the first case, the size of the memory area is fixed and must be determined by the programmer beforehand. One example would be a fixed size array declaration such as:

	int z[1000];

The command above would create a fixed-size array with size 1000. A second method, is to allocate memory from the computer while the program is running. This is useful for handling unknown data sizes such as when reading in a file. In this case, the "new" command is used to allocate memory from the "free store". For example

	int * xp;
	xp =  new int[s];

where "s" is a variable containing the size of the array to be created. The above creates an array of size s, and sets the pointer xp to point at the first element in the array. Thus, "*(xp+4)=55;" would set the contents of the 4th element of the array to 55.

file IO

Use the functions provided in the projects to access files.