Tuesday, August 31, 2010

Use the variable closer to its use

Its good programming practice to define the variable closer to its use.
Eg. In cpp, this is better code:
for(int i = 0; i {...}

rather than
int i;
...
...
for(i=0;i

How to bound check arrays in cpp / c

Bound checking in cpp /c is headache....
char *strcpy(char *dest, const char *src)
{
   char *save = dest;
   while(*dest++ = *src++);
   return save;
}

//main func
char *src = "hello to c programming language";
char dest[12];

strcpy(dest,src); //calling function

Here we have no bound check on dest size or src size. When we pass it to function it is perfectly alright but
problem is dest is array which is just 12 bytes long...but src is larger string...

So if programmer is lucky , he gets Error - "Segmentation fault"
else in worse case, he gets his core dumped...that is his memory may have changed the effect of it can be seen after few days.

What's the solution?
We cant change this library function to check bound check, like sending size to it with both src and dest...because many programs might be using it...and this change may hamper these million of programs. So it is the responsibility of programmer to check whether he has provided enough space or not?
Note: There is no way right now to check bounds by [] operator.

Vectors
A vector will do bounds checking if you use the at() function, for example:
std::vector v(5);
v.at(3) = 10; 
v.at(5) = 20; // throws an exception, std::out_of_range
However, if you use operator[], there is no bounds checking. (And accessing non-existent elements leads to undefined behavior.)

File IO in c++ 1

Introduction
This tutorial will start with the very basis of File I/O (Input/Output) in C++. After that, I will look into aspects that are more advanced, showing you some tricks, and describing useful functions.
You need to have good understanding of C++, otherwise this tutorial will be unfamiliar and not useful to you!
Your Very First Program
I will first write the code, and after that, I will explain it line by line.
The first program, will create a file, and put some text into it.
#include < fstream >
using namespace std;

int main()
{
ofstream SaveFile("cpp-home.txt");
SaveFile << "Hello World, from www.cpp-home.com!";
SaveFile.close();
return 0;
}
Only that? Yes! This program will create the file cpp-home.txt in the directory from where you are executing it, and will put “Hello World, from www.cpp-home.com!” into it.
Here is what every line means:
#include - You need to include this file in order to use C++’s functions for File I/O.
In this file, are declared several classes, including ifstream, ofstream and fstream, which are all derived from istream and ostream.
ofstream SaveFile("cpp-home.txt");
1) ofstream means “output file stream”. It creates a handle for a stream to write in a file.
2) SaveFile – that’s the name of the handle. You can pick whatever you want!
3) (”cpp-home.txt”); - opens the file cpp-home.txt, which should be placed in the directory from where you execute the program. If such a file does not exists, it will be created for you, so you don’t need to worry about that!
Now, let’s look a bit deeper. First, I’d like to mention that ofstream is a class. So, ofstream SaveFile(”cpp-home.txt”); creates an object from this class. What we pass in the brackets, as parameter, is actually what we pass to the constructor. It is the name of the file. So, to summarize: we create an object from class ofstream, and we pass the name of the file we want to create, as an argument to the class’ constructor. There are other things, too, that we can pass, but I will look into that, later.
SaveFile << "Hello World, from www.cpp-home.com"; - “<<" looks familiar? Yes, you’ve seen it in cout <<. This ("<<") is a predefined operator. Anyway, what this line makes, is to put the text above in the file. As mentioned before, SaveFile is a handle to the opened file stream. So, we write the handle name, << and after it we write the text in inverted commas. If we want to pass variables instead of text in inverted commas, just pass it as a regular use of the cout <<. This way:
SaveFile << variablename;
That’s it!
SaveFile.close(); - As we have opened the stream, when we finish using it, we have to close it. SaveFile is an object from class ofstream, and this class (ofstream) has a function that closes the stream. That is the close() function. So, we just write the name of the handle, dot and close(), in order to close the file stream!
Notice: Once you have closed the file, you can’t access it anymore, until you open it again.
That’s the simplest program, to write in a file. It’s really easy! But as you will see later in this tutorial, there are more things to learn!
Reading A File
You saw how to write into a file. Now, when we have cpp-home.txt, we will read it, and display it on the screen.
First, I’d like to mention, that there are several ways to read a file. I will tell you about all of them (all I know) later. For now, I will show you the best way (in my mind).
As you are used already - I will first write the code, and after that, I will comment it in details.
#include

void main() //the program starts here
{
ifstream OpenFile("cpp-home.txt");
char ch;
while(!OpenFile.eof())
{
OpenFile.get(ch);
cout << ch;
}
OpenFile.close();
}

You should already know what the first line is. So, let me explain you the rest.
ifstream OpenFile("cpp-home.txt") – I suppose this seems a bit more familiar to you, already! ifstream means “input file stream”. In the previous program, it was ofstream, which means “output file stream”. The previous program is to write a file, that’s why it was “output”. But this program is to read from a file, that’s why it is “input”. The rest of the code on this line, should be familiar to you. OpenFile is the object from class ifstream, which will handle the input file stream. And in the inverted commas, is the name of the file to open.
Notice that that there is no check whether the file exists! I will show you how to check that, later!
char ch; - Declares an array of type char. Just to remind you- such arrays can hold just one sign from the ASCII table.
while(!OpenFile.eof()) – The function eof() returns a nonzero value if the end of the file has been reached. So, we make a while loop, that will loop until we reach the end of the file. So, we will get through the whole file, so that we can read it!
OpenFile.get(ch); - OpenFile is the object from class ifstream. This class declares a function called get(). So, we can use this function, as long as we have an object. The get() function extracts a single character from the stream and returns it. In this example, the get() function takes just one parameter- the variable name, where to put the read character. So, after calling OpenFile.get(ch) it will read one character from the stream OpenFile, and will put this character into the variable ch.
Notice: If you call this function for a second time, it will read the next character, but not the same one! You will learn why this happens, later.
That’s why, we loop until we reach the end of the file! And every time we loop, we read one character and put it into ch.
cout << ch; - Display ch, which has the read character.
File.close(); - As we have opened the file stream, we need to close it. Use the close() function, to close it! Just as in the previous program!
Notice: Once you have closed the file, you can’t access it anymore, until you open it again.
That’s all! I hope you understood my comments! When you compile and run this program, it should output:
“Hello World, from www.cpp-home.com!”
Managing I/O streams
In this chapter, I will mention about some useful functions. I will also show you how to open file to read and write in the same time. I will show you, also, other ways to open a file; how to check if opening was successful or not. So- read on!
So far, I have showed to you, just one way to open a file, either for reading, either for writing. But it can be opened another way, too! So far, you should be aware of this method:
ifstream OpenFile("cpp-home.txt");
Well, this is not the only way! As mentioned before, the above code creates an object from class ifstream, and passes the name of the file to be opened to its constructor. But in fact, there are several overloaded constructors, which can take more than one parameter. Also, there is function open() that can do the same job. Here is an example of the above code, but using the open() function:
ifstream OpenFile;
OpenFile.open("cpp-home.txt");

What is the difference you ask? Well, I made several tests, and found no difference! Just if you want to create a file handle, but don’t want to specify the file name immediately, you can specify it later with the function open(). And by the way, other use of open() is for example if you open a file, then close it, and using the same file handle open another file. This way, you will need the open() function.
Consider the following code example:
#include

void read (ifstream &T) //pass the file stream to the function
{
//the method to read a file, that I showed you before
char ch;
while(!T.eof())
{
T.get(ch);
cout << ch;
}
cout << endl << "--------" << endl;
}


void main()
{
ifstream T("file1.txt");
read(T);
T.close();
T.open("file2.txt");
read(T);
T.close();
}

So, as long as file1.txt and file2.txt exists and has some text into, you will see it!
Now, it’s time to show you that the file name is not the only parameter that you can pass to the open() function or the constructor (it’s the same). Here is a prototype:
ifstream OpenFile(char *filename, int open_mode);
You should know that filename is the name of the file (a string). The new here is the open mode. The value of open_mode defines how to be opened the file. Here is a table of the open modes:
Name Description
ios::inOpen file to read
ios::outOpen file to write
ios::appAll the date you write, is put at the end of the file. It calls ios::out
ios::ateAll the date you write, is put at the end of the file. It does not call ios::out
ios::truncDeletes all previous content in the file. (empties the file)
ios::nocreateIf the file does not exists, opening it with the open() function gets impossible.
ios::noreplaceIf the file exists, trying to open it with the open() function, returns an error.
ios::binaryOpens the file in binary mode.
In fact, all these values are int constants from an enumerated type. But for making your life easier, you can use them as you see them in the table.
Here is an example on how to use the open modes:
#include

void main()
{
ofstream SaveFile("file1.txt", ios::ate);
SaveFile << "That's new!\n";
SaveFile.close();
}

As you see in the table, using ios::ate will write at the end of the file. If I didn’t use it, the file will be overwritten, but as I use it, I just add text to it. So, if file1.txt has this text:
Hi! This is test from www.cpp-home.com!
Running the above code, will add “That’s new!” to it, so it will look this way:
Hi! This is test from www.cpp-home.com!That’s new!
If you want to set more than one open mode, just use the OR operator- |. This way:
ios::ate | ios::binary
I hope you now understand what open modes are!
Now, it’s time to show you something really useful! I bet you didn’t know that you could create a file stream handle, which you can use to read/write file, in the same time! Here is how it works:
fstream File("cpp-home.txt",ios::in | ios::out);
In fact, that is only the declaration. I will show you a code example, just several lines bellow. But I first want to mention some things you should know.
The code line above, creates a file stream handle, named “File”. As you know, this is an object from class fstream. When using fstream, you should specify ios::in and ios::out as open modes. This way, you can read from the file, and write in it, in the same time, without creating new file handles. Well, of course, you can only read or write. Then you should use either ios::in either ios::out, but if you are going to do it this way, why don’t you do it either with ifstream, either with ofstream?
Here is the code example:
#include

void main()
{
fstream File("test.txt",ios::in | ios::out);
File << "Hi!"; //put "Hi!" in the file
static char str[10]; //when using static, the array is automatically
//initialized, and very cell NULLed
File.seekg(ios::beg); //get back to the beginning of the file
//this function is explained a bit later
File >> str;
cout << str << endl;
File.close();
}

Okay, there are some new things here, so I will explain line by line:
fstream File("test.txt", ios::in | ios::out); - This line, creates an object from class fstream. At the time of execution, the program opens the file test.txt in read/write mode. This means, that you can read from the file, and put data into it, at the same time.
File << "Hi!"; - I beg you know what this is!
static char str[10]; - This makes a char array with 10 cells. I suppose static may be unfamiliar to you. If so- ignore it. It just initializes the array when at the time of creation.
File.seekg(ios::beg); - Okay, I want you to understand what this really do, so I will start with something a bit off-topic, but important.
Remember that? :
while(!OpenFile.eof())
{
OpenFile.get(ch);
cout << ch;
}

Did you ever wonder what really happens there? Yes or no, I will explain you. This is a while loop, that will loop until you reach the end of the file. But how do the loop know if the end of the file is reached? Well, when you read the file, there is something like an inside-pointer, that shows where you are up to, with the reading (and writing, too). It is like the cursor in Notepad. And every time you call OpenFile.get(ch) it returns the current character to the ch variable, and moves the inside-pointer one character after that, so that the next time this function is called, it will return the next character. And this repeats, until you reach the end of the file.
So, let’s get back to the code line. The function seekg() will put the inside-pointer to a specific place (specified by you). You can use:
ios::beg - to put it in the beginning of the file
ios::end - to put it at the end of the file
Or you can also set the number of characters to go back or after. For example, if you want to go 5 characters back, you should write:
File.seekg(-5);
If you want to go 40 character after, just write:
File.seekg(40);
I also have to mention, that the seekg() function is overloaded, and it can take two parameters, too. The other version is this one:
File.seekg(-5,ios::end);
In this example, you will be able to read the last 4 characters of the text, because:
1) You go to the end (ios::end)
2) You go 5 characters before the end (-5)
Why you will read 4 but not 5 characters? Well, just assume that one is lost, because the last thing in the file is not a character nor white space. It is just position.
You now may be wondering why did I use this function? Well, after I put “Hi!” in the file, the inside-pointer was set after it… at the end of the file. And as I want to read the file, I have nothing to read after the end, so I have to put the inside-pointer at the beginning. And that is exactly what this function does.
File >> str; - That’s new, too! Well, I believe this line reminds you of cin >> . I fact, it has much to do with it. This line reads one word from the file, and puts it into the specified array.
For example, if the file has this text:
Hi! Do you know me?
Using File >> str, will put just “Hi!” to the str array. You should have noticed, that it actually reads until it meets a white space.
And as what I put in the file was “Hi!” I don’t need to do a while loop, that takes more time to code. That’s why I used this way. By the way, in the while loop for reading, that I used so far, the program reads the file, char by char. But you can read it word by word, this way:

char str[30]; //the word can’t be more than 30 characters long
while(!OpenFile.eof())
{
OpenFile >> str;
cout << str;
}

You can also read it line by line, this way:

char line[100]; //a whole line will be stored here
while(!OpenFile.eof())
{
OpenFile.getline(line,100); //where 100 is the size of the array
cout << line << endl;
}

You now might be wondering which way to use? Well, I’d recommend you to use the line-by-line one, or the first that I mentioned- the one which reads char-by-char. The one that reads word-by-word is not good idea, because it won’t read the new line. So if you have new line in the file, it will not display it as a new line, but will append the text to the existing one. But using getline() or get() will show you the file, just as it is!
Now, I will show you how to check if the file opening was successful or not. In fact, there are few good ways to check for that, and I will mention them. Notice that where there is X, it can be either “o”, either “i” either nothing (it will then be fstream object).
Example 1: The most usual way

Xfstream File("cpp-home.txt");
if (!File)
{
cout << "Error opening the file! Aborting…\n";
exit(1);
}

Example 2: If the file is created, return an error
ofstream File("unexisting.txt", ios::nocreate);
if(!File)
{
cout << "Error opening the file! Aborting…\n";
exit(1);
}
Example 3: Using the fail() function
ofstream File("filer.txt", ios::nocreate);
if(File.fail())
{
cout << "Error opening the file! Aborting…\n";
exit(1);
}
The new in Example 3, is the fail() function. It returns a nonzero value if any I/O error (not end of file) has occurred.
I would also like to mention about something , that I find to be very useful! For example, if you have created a file stream, but you haven’t opened a file. This way:
ifstream File; //it could also be ofstream
This way, we have a handle, but we still have not opened the file. If you want to open it later, it can be done with the open() function, which I already covered in this tutorial. But if anywhere in your program, you need to know if currently there is an opened file, you can check it with the function is_open(). It retunrs 0 (false) if a file is not opened, and 1 (true) if there is an opened file. For example:
ofstream File1;
File1.open("file1.txt");
cout << File1.is_open() << endl;

The code above, will return 1, as we open a file (on line 2). But the code bellow will return 0, because we don’t open a file, but just create a file stream handle:
ofstream File1;
cout << File1.is_open() << endl;

Okay, enough on this topic.

Monday, August 30, 2010

Using == operator in better way in cpp

In cpp, it is possible that instead of
i==5

we can do

i=5

So we assign i = 5 and if it is like
if(cond)
cond gets true.

So better is
5==i
beause == is symmetric.
If someone writes by mistake is
5=i
As we get error = 'can't assign value to literal'.

Friday, August 27, 2010

Regular expression

1. Introduction

Although there are plenty of perl hackers and other regular expression users, the amount of decent tutorials and guides on regular expressions on the 'net remains exceptionally low. Because I still find relatively many questions about regular expressions, and see how others struggle with them, I decided to write this tutorial. Bear in mind that this is still a work in-progress.

1.1. Purpose

The purpose of this tutorial is to help the reader on his or her way in the world of regular expressions. The basic concepts are explained and the largest pitfalls are covered (no pun intended. Well, maybe just a little).

1.2. Notation

All regular expressions in this tutorial are presented in a monospace font and on a lightgray background with a darkgray outline. Because it would be very difficult to clearly show spaces in a regular expression, and particularly on a web page, every space in every regex shown is represented by the -symbol. The end result then looks like regularexpression.

1.3. Examples and Exercises

Most of the examples and all of the exercises were made using GNU egrep. Windows users can get a win32 port of GNU egrep here. Documentation can be found here. If you don't have GNU egrep on your UN*X box, you might try using your native implementation; it should be largely compatible.
If the directory where egrep is installed (like /usr/bin, or C:\windows\command) is in your environment's $PATH variable (%PATH% in Windows), you should be able to invoke egrep simply by typing
$ egrep
  Usage: egrep [OPTION]... PATTERN [FILE]...
  Try `egrep --help' for more information.
Here, the dollar-sign represents the shell's prompt (similar to C:\> in Windows), and should not be typed. All command-line invocation examples show this prompt. Following is the text that should actually be typed, which is always shown in boldface. The remaining lines contain the command's output.
Basically you give egrep a regular expression and the name of a file. egrep then tries to match the regex against each line of the file. A line is only printed if it matches the regex.

1.4. Copyright and Distribution

This regular expression tutorial is Copyright © 2003 by Kars Meyboom.
This tutorial may be freely reprinted in any medium provided that its content is not altered, presented in its entirety, and this copyright notice remains intact. All code examples in this tutorial are hereby released to the public domain.
Contact <kars@kde.nl> for more information.

2. What are they?

A regular expression, usually abbreviated to "regex" or "regexp", describes text patterns. Assume you're looking for a piece of text that starts with either two, three or four letters 'A', followed by exactly three letters 'C'. This pattern can be described with the regex A{2,4}C{3}.
From the above regex one can determine that not all characters are interpreted literally. The accolades (or curly braces, take your pick) clearly have a special meaning. Characters with such a special meaning are called meta characters. So, regular expressions have their own particular syntax, and so you could speak of a regex language.
As with most human languages the regex language has many dialects; regexes written for perl aren't automatically suited for sed, awk or grep, to name just a few standard UNIX tools.
I've chosen to write all the regexes in this tutorial in the POSIX dialect. This because POSIX is slowly winning terrain in the world of regexes, and because a fair amount of dialects are similar to it (well, actually it's the other way around). But this doesn't mean I'll be covering all the features of the POSIX 1003.2 regular expression standard. Another reason for using the POSIX dialect as opposed to the Perl dialect is because the Perl documentation does a much better job of explaining the Perl dialect than I ever will. Also, this way you won't be locked into any particular tool's regex extensions. In a way, the POSIX dialect can be considered the greatest common denominator.

2.1. Usage

A regex by itself does very little. Only by applying such a description of a text pattern to a piece of text does anything happen. The actual applying is done by a piece of software called a regex engine. The text is searched from the start until a piece of text is found that matches the pattern description (the regex), or until it runs out of text. Such a match is called a pattern match.
There are basically two ways of using regular expressions. One is by using special-purpose tools that were built specifically to apply regexes to text, like grep, egrep and sed. The other way is by using the regex capabilities built into a programming or scripting language. These days, most languages, like C, C++, Javascript, Python and PHP for example, provide functions or methods that can apply a regex to a piece of text. The code that actually applies regular expressions to text is called a regular expression engine.
awk and particularly perl don't quite fit either way. Once you get the hang of perl, you'll notice how tightly the concept of applying a regex to data is integrated into the whole design of perl.

3. Meta Characters

To be able to discuss meta characters, we first have to determine what "ordinary" characters mean to a regex. The regex cat does indeed find the "cat" in the text The neighbour's cat pees on my lawn, but also the "cat" in the winter catalog. So, regular expressions work purely on text, and don't look at the semantics. It's important to realise that the above regex doesn't mean anything more to the regex engine than a 'c', followed by an 'a', followed by a 't', where ever it may be in the text to which the regex is applied.
To get you started, here's a simple example. fruits.txt contains a list of types of fruit, eight total, one per line. Once you've downloaded the example and saved it, open a console (or DOS box or whatever) and move to the directory where you saved the file. Once there, type the following:
$ egrep pear fruits.txt
  pear
It might not be the most exciting demonstration of regular expressions at work, but if you get the same output, namely pear, it means you've successfully applied your first regex. Assuming this is your first time, that is.
Another example would be:
$ egrep ea fruits.txt
  pear
  peach
Slightly more interesting, the regex ea catches every line that contains ea, re-emphasising that regular expressions have no regard for semantics.
Another instructional example is:
$ egrep a fruits.txt
  apple
  orange
  pear
  peach
  grape
  banana
You might wonder what's so special about this example. The lesson lies in the last line of the result, banana. Recalling that egrep prints only those lines that match the pattern, you might wonder how egrep handles banana when applying a; perhaps you think it matches the line three times, which is more than once, and so the line is printed. The point is that egrep stops applying the regex as soon as it finds a match. Once it finds the first 'a', it stops searching, prints the line and moves on to the next.
This particular example illustrates that egrep doesn't care what it matches, or how often, but only wether it matches or not. Later we'll see examples that do care what or how often is matched. Obviously, these examples won't use egrep.
The last example for this section demonstrates a feature of egrep:
$  egrep -v a fruits.txt
  blueberry
  plum
The -v option tells egrep to invert the sense of the match. Now only lines that don't match the pattern are printed. And indeed, neither blueberry nor plum contains an 'a'.

3.1. Anchors

Using ^ and $ you can force a regex to match only at the start or end of a line, respectively. So ^cat matches only those lines that start with cat, and cat$ only matches lines ending with cat.
A hands-on example that uses the same fruits.txt as in the previous section is the regex ^p:
$ egrep ^p fruits.txt
  pear
  peach
  plum
As you can see, this regex fails to match both apple and grape, since neither starts with a 'p'. The fact that they contain a 'p' elsewhere is irrelevant. Similarly, the regex e$ only matches apple, orange and grape:
$ egrep 'e$' fruits.txt
  apple
  orange
  grape
Mind the quotes though! In most shells, the dollar-sign has a special meaning. By putting the regex in single-quotes (not double-quotes or back-quotes), the regex will be protected from the shell, so to speak. It's generally a good idea to single-quote your regexes, so that's what I'll do in the examples from now on.
The Windows shell is an exception, mind you. You'll be better off using double quotes in that case.
Moving on, ^cat$ only matches lines that contain exactly cat. You can find empty lines in a similar way with ^$. If you're having trouble understanding that last one, just apply the definitions. The regex basically says: "Match a start-of-line, followed by an end-of-line". That's pretty much what an empty line means, right?
Mind you, a regex with only a start-of-line anchor ^ always matches, since every line has a start. The same obviously goes for the end-of-line anchor. If you don't believe me, just try it out on the fruit list:
$ egrep '^' fruits.txt
  apple
  orange
  pear
  peach
  grape
  banana
  blueberry
  plum
A lot of regex implementations offer the ability to use word anchors. As you saw, a regex like cat not only finds the word cat, but also all those cases where cat is "hidden" in other, longer words. In such cases you can use the start-of-word and end-of-word anchors, \< and \>, respectively. These meta characters don't match on characters, but between them.
So if you were looking only for occurrences of the word cat, you could use the regex \.
For the next hands-on example you'll need the cats.txt file, which contains several words that contain cat. First, try the following:
$ egrep '\
  cat
  cattle
  catalog
  scrawny cat
From this example it becomes clear that start-of-word boundaries not only work between words, but also catch words at the beginning of a line.
These word boundary anchors aren't supported by all regex implementations though. A number of implementations (including perl's) offer is-word-boundary and not-a-word-boundary anchors instead, in which case the regex \ would have to be replaced with \bcat\b.
In this context, the term "word" should be taken lightly; every combination of letters, upper and lower case, the underscore ( _ ) and digits counts as a word when dealing with word boundary anchors.

3.2. Character Classes

With the […] construct you can indicate that on a certain position within the pattern one of several characters may appear. Suppose for instance that you're trying to find both cake and coke. In that case you can use the regex c[ao]ke.
Another example, to recognise hexadecimal digits, is [0123456789abcdefABCDEF]. This quickly becomes impractical though. Fortunately you can use a hyphen to specify a range: [0-9]. More than one range in a character class is also allowed: [0-9a-fA-F].
Just make sure you don't write [A-z] when you mean [A-Za-z]. Though it might look convenient, the first regex also catches the six characters between 'Z' and 'a' (if you're using the ASCII character set, that is).
You can also specify a negated character class by placing a caret (^) directly after the opening bracket: [^…]. This inverts the sense of the character class: [^0-9] matches any character but digits.
Fine, but what if you want those brackets, hyphen and caret to appear as characters inside a character class? One way is to escape them with a backslash: [\^\]]. Another way is to put them in places in the character class where they're not valid. The regex engine will then treat the character as a literal. So, place the dash first or last within the character class, the caret in any but the first place, and the closing bracket right after the opening bracket: []^[-] is a valid character class containing four characters.

3.3. The Dot

The dot, ., can be considered a special case of a character class, in that it matches any character. th.s for instance matches both this and thus, but also thqs, th#s, etc.
This means that a regex to find an IP address, for example 209.204.146.22, will not work. All three dots need to be escaped: 209\.204\.146\.22 does work. Well, there could be digits preceding or following the IP address. That can be solved by using word boundary anchors: \<209\.204\.146\.22\>.
Used inside a character class, the dot loses its meaning though. A character class to search for some punctuation characters, for example, might look like this: [.,:;].

3.4. Quantifiers

Using quantifiers, you can specify how often a character, character class or group may or must be repeated in sequence. The general form is {min,max}.
An example is the regex bo{1,2}t, which matches both bot and boot. To match any sequence of three to five vowels, you can use [aeiou]{3,5}. Or you can use a quantifier to make something optional: finds{0,1} matches find and finds. This case occurs often enough to justify an abbreviation: the regex finds? is effectively identical to the previous.
Important to notice at this point is that a quantifier only applies to the item that precedes it. The question mark in the above regex only applies to s, not to the entire finds.
If you want to match something a certain number of times you can set the mimimum equal to the maximum: ^-{80,80}$ matches lines that consist of exactly eighty dashes. Some regex implementations allow this form to be shortened to ...{num}. With this, the previous regex can be shortened to ^-{80}$.
It's also allowed to leave out the upper bound: a{5,} will match any row of at least five letters 'a'. The case of "one or more" (eg. a{1,} occurs much more frequently, though. That's why this form has an abbreviation, the +: a+ and a{1,} are effectively equivalent.
The case "zero or more" also has a short form: *. For instance, e* will match any number of letters 'e' in sequence, including zero. But be careful: a regex will always match as "early" as possible. So if you expect this regex, when applied to beer, to match the boldfaced text, you're wrong! That's because there's a sequence of e's at the beginning of the text, before the b. The fact that the sequence is zero characters long makes no difference to the regex. In such a case, e+ might be more appropriate.
Important to know is that quantifiers are greedy. That means, that if you apply the regex 1* to the text 11111, it will consume all the 1's. Not until a quantifier's greediness would cause a pattern mis-match will the quantifier release some of the text it consumed.
Take the regex [0-9]*25 for example, which matches numbers ending in 25. If you apply it to the text 3425, the quantifier will at first consume the entire text, because each of the characters can be matched by the character class [0-9]. But that prevents 25 from matching, causing the entire regex to fail.
In such cases the quantifier will release one character at a time. First the 5 is released, leaving the quantifier matching only 342. When it turns out that that still isn't enough, the 2 is released as well, allowing the rest of the regex, 25, to match.
This means that a regex that contains a lot of quantifiers will have many combinations to try before failing. So if the text it's applied to causes many near-matches, it might all of a sudden take a very long time to process the data.

3.5. Alternation

With the | meta character, the or, you can merge several regexes into a single regex. With this you supply the regex engine with alternatives. Jack and Jill are two seperate regexes, whereas Jack|Jill is one that will match either.
Further back I mentioned the regex c[ao]ke. Using alternation you can write it (less efficiently) as c(a|o)ke, where the parentheses (which therefor are meta characters too, more on this later) are used to limit the effect of the alternation.
Another, almost classic example is the regex ^(From|Subject|Date):, which can filter an email message's headers. In this particular example the parentheses are by no means optional; the regex ^From|Subject|Date: matches something else entirely. By pulling it apart you get three seperate regexes ^From, Subject and Date:, which clarifies (I hope) why the regex is wrong (as in, not fit for filtering email headers).

3.6. Grouping

In addition to the function of limiting the effect of alternation, parentheses (…) have another function, which is grouping for quantifiers. Everything about quantifiers that applies to characters and character classes also applies to groups.
An example is (hurrah){2,3}, which matches hurrah hurrah  as well as hurrah hurrah hurrah .
A more complex example combines alternation and grouping with a quantifier: (hurrah|yahoo){2,3}. That gives twelve possible combinations, including for example hurrah yahoo and yahoo hurrah yahoo .

3.7. Backreferences

The use of grouping has a very useful side-effect. That's because certain regex implementations "remember" the matched text in a grouping, and make this available both during and after the application of a regex.
Assume you have a piece of text you wish to search for double words, such as …whenwhen. Now, you could try to build a seperate regex for every word you can think of, but wouldn't it be convenient if you could say, "find something that matches this pattern, then match it again"?
You can. Provided your regex implementation supports it, parentheses ((…)) "remember" what they match. In that case you could search for double words with the regex ([a-zA-Z]+)blank\1. The meta character \1 is called a backreference.
Using this regex also catches cases such as when whenever though, so in this case ([a-zA-Z]+)\1\> might be a better regex.
In this example the meta character \1 refers to the first opening parentheses. You can of course have several groups in a regex, but the maximum number of backreferences is limited to nine (\1 ... \9) in most regex implementations.
To determine which backreference corresponds to which group, you need to count the number of opening parentheses from the left. In the example above, we only had one group, so that's easy. But the next example is a bit more complicated.
((the|a) (big( red)?|small( yellow)?) (car|bike)) contains six groups. The example file ( contains five lines, four of which can be matched by the regex:
$ egrep '((the|a) (big( red)?|small( yellow)?) (car|bike))' car.txt
  the big red car
  a small bike
  the small yellow car
  a big red bike
To clarify which backreference corresponds to which group, I wrote a small perl-script. This gives the following output:
$ perl -n refs.pl car.txt
  "the big red car"
  \1 => the big red car
  \2 => the
  \3 => big red
  \4 =>  red
  \5 => (null)
  \6 => car

  "a small bike"
  \1 => a small bike
  \2 => a
  \3 => small
  \4 => (null)
  \5 => (null)
  \6 => bike

  "the small yellow car"
  \1 => the small yellow car
  \2 => the
  \3 => small yellow
  \4 => (null)
  \5 =>  yellow
  \6 => car

  "a big red bike"
  \1 => a big red bike
  \2 => a
  \3 => big red
  \4 =>  red
  \5 => (null)
  \6 => bike

  
The script applies the regex to every line of the example file, and prints the backreferences if it matches. In the output, (null) indicates that the group to which the backreference corresponds is not part of the match.
So, you can use multiple groups in a regex, but the maximum number of backreferences is, in most regex implementations, limited to nine (\1 ... \9).
A slightly larger example is the task of untangling the query string in a URL, for example http://www.foobar.com/search?query=regex&view=detailed.
Assume we want to extract the name and value of the query variable from this URL. This can be done with the regex \?([a-zA-Z]+)=([^&]+). With this regex, we use \? to line up the regex with the query part, which starts after the question mark. Then we match the name of a variable using [a-zA-Z]+, and surround it with parentheses to save it for later processing, ([a-zA-Z]+). This should be followed by an equal sign, so we append = to the regex. Finally we need to capture the variable's value. This can be done with [^&]+, since the string that makes up the value goes on until the next &, which acts as a name=value delimiter. This also works if the value is not followed by an ampersand, in which case the variable's value takes up the rest of the URL. The value regex needs to be enclosed in parentheses since we want to save it for later, so we get ([^&]+).
Although there are two sets of parentheses in the regex, neither is used in the regex by a backreference. Then how do we get to the data? Well, this strongly depends on the tool in which the regex is used. Following are a few examples.
With perl, the content of both backreferences is available after the match in the variables $1 and $2. The following snippet of code shows how this can be used.
$url = 'http://www.foobar.com/search?query=regex&view=detailed';
    $url =~ /\?([a-zA-Z]+)=([^&]+)/;
    print "$1 = $2\n";
In PHP you'd have to use the ereg() set of functions (see the manual), like this for example:
$url = 'http://www.foobar.com/search?query=regex&view=detailed';
    ereg('\?([a-zA-Z]+)=([^&]+)', $url, $refs);
    echo "$refs[1] = $refs[2]\n";

4. Pitfalls

Misconceptions or lack of understanding of quantifiers are the main cause of errors, although even the most hardened regex hackers make these mistakes every once in a while. Take, for example, the text "Hey you", he said, "did you say something?". We'll try to match the first piece of quoted text, including the quotes. So we use the regex ".*", because we want to match a double-quote, followed by text, being an arbitrary character (.) matched an arbitrary number of times (*), followed by another double-quote.
But what we appear to match is not "Hey you", but "Hey you", he said, "did you say something?"!
Whoops. Slght mistake. But where? The point is that quantifiers are so greedy, they don't even look at what the rest of the regex might want to match. .* devours everything from the first 'H' after the first quote to the end of the line, and is then coerced to release the last character to match the final double-quote in the regex.
Apparently, we need to be more precise about what we mean: we want a double-quote, followed by everything but a double-quote, followed by a double-quote. Or rather: "[^"]*".
This regex does a much better job, but you need to realise that escaped quotes will ruin the fun: "When he yelled, \"Come here!\", I left", she said.. In this case we appear to match "When he yelled, \". A solution for this problem is less trivial than might appear at first glance, and falls outside the scope of this tutorial.

5. More Information

Most of my knowledge of regular expressions comes from the book Mastering Regular Expressions, written by Jeffrey Friedl and published by O'Reilly. For more application-oriented information about regular expressions you could try O'Reilly's books on sed & awk or perl.
If you're a PHP programmer, be sure to read the manual page entries for the ereg (POSIX) and preg (Perl-compatible) regular expression set of functions.
Perl programmers can either check the manual entry for regular expressions at Perldoc.com, or you could try typing man perlre at the shell prompt (if you're running a UN*X-like OS, that is).

Thursday, August 26, 2010

Linux/Unix : Working with files

Unix files
We have discussed unix files here.

File names


Displaying file contents -Overview

There are multiple ways to display file content. So, what is equivalent of notepad here?
  • cat - used to display the content
  • more - used to display the content, but paginate the content 
  • Moreover we can display part of files using head and tail.
  • We can search the files using grep.
  • Sort the files using sort
  • Compare the files using diff
  • cut the file part using cut
  • remove the duplicate lines using uniq.
  • Count the words, lines and characters using wc
  • Print the file using lpr, lpq and lprm.

Displaying the contents of file

    Displaying parts of files



    Searching files

    Sorting files


    Comparing files

    Comparing files using diff

    Cutting fields
    cut

    Pasting file contents
     
    Duplicate lines

    Non ascii files

    Printing files

    date and time and days

    cal
    for printing present month calendar

    cal n
    prints calendar for year n, example n = 2007

    cal m n
    prints calendar for month m and year n

    cat in linux


    cat

    1. Read/scan the man page for cat with the command:
      
           man cat 
           
    2. Use this command to display the contents of a file. What happens?
      
           cat  filename 
           
    3. Now try this command notice the difference. How many lines are in the file?
      
           cat -n  filename 
           
    4. The cat command is more often used for purposes other than just displaying a file. Try these commands to "concatenate" two files into a new, third file:
      
           cat file1                  - first, show file1 
           cat file2                  - then, show file2 
           cat file1 file2 > newfile  - now do the actual concatenate 
           cat newfile                - finally, show the result 
           
    OPTIONS:
         

    -A Show all.
    -b Omits line numbers for blank space in the output.
    -e A $ character will be printed at the end of each line prior to a new line.
    -E Displays a $ (dollar sign) at the end of each line.
    -n Line numbers for all the output lines.
    -s If the output has multiple empty lines it replaces it with one empty line.
    -T Displays the tab characters in the output.
    -v Non-printing characters (with the exception of tabs, new-lines and form-feeds) are printed visibly. 

    EXAMPLE:

    1. To Create a new file:
      cat > file1.txt
      This command creates a new file file1.txt. After typing into the file press control+d (^d) simultaneously to end the file.
    2. To Append data into the file:
      cat >> file1.txt
      To append data into the same file use append operator >> to write into the file, else the file will be overwritten (i.e., all of its contents will be erased).
    3. To display a file:
      cat file1.txt
      This command displays the data in the file.
    4. To concatenate several files and display:
      cat file1.txt file2.txt
      The above cat command will concatenate the two files (file1.txt and file2.txt) and it will display the output in the screen. Some times the output may not fit the monitor screen. In such situation you can print those files in a new file or display the file using less command.
      cat file1.txt file2.txt | less
    5. To concatenate several files and to transfer the output to another file.
      cat file1.txt file2.txt > file3.txt
      In the above example the output is redirected to new file file3.txt. The cat command will create new file file3.txt and store the concatenated output into file3.txt.

    ls in unix / linux

    1. Use ls without any arguments to display your current directory contents. How many files do you see?
    2. Now use ls with the -a option. How many files do you see this time? Notice that the "new" files all begin with a "dot", which indicates they are "hidden" files.
      
           ls -a
           
    3. This command is useful for distinguishing between directories, ordinary files, and executable files. Notice how its output differs from ls without arguments.
      
           ls -F
           
    4. Use the command ls -l to obtain a "long" listing of your files. Sample output from this command and an explanation of the information it provides appears below.
      
      -rwxr-xr-x   1 jsmith   staff         43 Mar 23 18:14 prog1
      -rw-r--r--   1 jsmith   staff      10030 Mar 22 20:41 sample.f
      drwxr-sr-x   2 jsmith   staff        512 Mar 23 18:07 subdir1
      drwxr-sr-x   2 jsmith   staff        512 Mar 23 18:06 subdir2
      drwxr-sr-x   2 jsmith   staff        512 Mar 23 18:06 subdir3
          1        2   3        4           5       6          7
      
      1 = access modes/permissions
      2 = number of links
      3 = owner
      4 = group
      5 = size (in bytes)
      6 = date/time of last modification
      7 = name of file  
           
    5. Recursive listings can be very useful. Try both of the commands below. What does the output tell you?
      
           ls -R
           ls -Rl
           
    6. Try three options together:
      
           ls -lFa
           
    More options
    -a, --all
    do not hide entries starting with .
    -A, --almost-all
    do not list implied . and ..
    --author
    print the author of each file
    -b, --escape
    print octal escapes for nongraphic characters
    --block-size=SIZE
    use SIZE-byte blocks
    -B, --ignore-backups
    do not list implied entries ending with ~
    -c
    with -lt: sort by, and show, ctime (time of last modification of file status information) with -l: show ctime and sort by name otherwise: sort by ctime
    -C
    list entries by columns
    --color[=WHEN]
    control whether color is used to distinguish file types. WHEN may be `never', `always', or `auto'
    -d, --directory
    list directory entries instead of contents
    -D, --dired
    generate output designed for Emacs' dired mode
    -f
    do not sort, enable -aU, disable -lst
    -F, --classify
    append indicator (one of */=@|) to entries
    --format=WORD
    across -x, commas -m, horizontal -x, long -l, single-column -1, verbose -l, vertical -C
    --full-time
    like -l --time-style=full-iso
    -g
    like -l, but do not list owner
    -G, --no-group
    inhibit display of group information
    -h, --human-readable
    print sizes in human readable format (e.g., 1K 234M 2G)
    --si
    likewise, but use powers of 1000 not 1024
    -H, --dereference-command-line
    follow symbolic links on the command line
    --indicator-style=WORD append indicator with style WORD to entry names:
    none (default), classify (-F), file-type (-p)
    -i, --inode
    print index number of each file
    -I, --ignore=PATTERN
    do not list implied entries matching shell PATTERN
    -k
    like --block-size=1K
    -l
    use a long listing format (shows permission , owner, size and last modified)
    -L, --dereference
    when showing file information for a symbolic link, show information for the file the link references rather than for the link itself
    -m
    fill width with a comma separated list of entries
    -n, --numeric-uid-gid
    like -l, but list numeric UIDs and GIDs
    -N, --literal
    print raw entry names (don't treat e.g. control characters specially)
    -o
    like -l, but do not list group information
    -p, --file-type
    append indicator (one of /=@|) to entries
    -q, --hide-control-chars
    print ? instead of non graphic characters
    --show-control-chars
    show non graphic characters as-is (default unless program is `ls' and output is a terminal)
    -Q, --quote-name
    enclose entry names in double quotes
    --quoting-style=WORD
    use quoting style WORD for entry names: literal, locale, shell, shell-always, c, escape
    -r, --reverse
    reverse order while sorting
    -R, --recursive
    list subdirectories recursively
    -s, --size
    print size of each file, in blocks
    -S
    sort by file size
    --sort=WORD
    extension -X, none -U, size -S, time -t, version -v
    status -c, time -t, atime -u, access -u, use -u
    --time=WORD
    show time as WORD instead of modification time: atime, access, use, ctime or status; use specified time as sort key if --sort=time
    --time-style=STYLE
    show times using style STYLE: full-iso, long-iso, iso, locale, +FORMAT
    FORMAT is interpreted like `date'; if FORMAT is FORMAT1FORMAT2, FORMAT1 applies to non-recent files and FORMAT2 to recent files; if STYLE is prefixed with `posix-', STYLE takes effect only outside the POSIX locale
    -t
    sort by modification time
    -T, --tabsize=COLS
    assume tab stops at each COLS instead of 8
    -u
    with -lt: sort by, and show, access time with -l: show access time and sort by name otherwise: sort by access time
    -U
    do not sort; list entries in directory order
    -v
    sort by version
    -w, --width=COLS
    assume screen width instead of current value
    -x
    list entries by lines instead of by columns
    -X
    sort alphabetically by entry extension
    -1
    list one file per line
    --help
    display this help and exit
    --version
    output version information and exit

    ps in linux

    ps r: Shows only running processes.
    ps f: Shows children descended from their parents in an ASCII art tree. I find this very useful when looking at problem processes. Use with the S option to see CPU information from children summed up with parents.

    ps e: Shows the command environment for each process. This is useful in a situation where a program works for one user but not for another, or on one machine but not on another.
    ps -t pts/3: Shows processes associated with the specified tty. I've found this useful when trying to work out who's doing what on a remote machine, and for how long.
    ps u username: Generates much more readable and human friendly output.
    ps -l username  : For long listing


    Own output format
    If you are bored by the regular output, you could simply change the format. To do so use the formatting characters which are supported by the ps command.
    If you execute the ps command with the 'o' parameter you can tell the ps command what you want to see:
    e.g.
    $ ps -o "%u : %U : %p : %a"
    RUSER    : USER     :   PID : COMMAND
    kinshukc : heyne    :  3363 : bash
    kinshukc : heyne    :  3367 : ps -o %u : %U : %p : %a

    for more do
    $man ps

    Wednesday, August 25, 2010

    Grep examples

    want to display lines starting with the string "root"
    grep ^root /etc/passwd
    root:x:0:0:root:/root:/bin/bash


    PatternWhat does it match?
    bagThe string bag.
    ^bagbag at beginning of line.
    bag$bag at end of line.
    ^bag$bag as the only word on line.
    [Bb]agBag or bag.
    b[aeiou]gSecond letter is a vowel.
    b[^aeiou]gSecond letter is a consonant (or uppercase or symbol).
    b.gSecond letter is any character.
    ^...$Any line containing exactly three characters.
    ^\.Any line that begins with a dot.
    ^\.[a-z][a-z]Same, followed by two lowercase letters (e.g., troff requests).
    ^\.[a-z]\{2\}Same as previous, grep or sed only.
    ^[^.]Any line that doesn't begin with a dot.
    bugs*bug, bugs, bugss, etc.
    "word"A word in quotes.
    "*word"*A word, with or without quotes.
    [A-Z][A-Z]*One or more uppercase letters.
    [A-Z]+Same, egrep or awk only.
    [A-Z].*An uppercase letter, followed by zero or more characters.
    [A-Z]*Zero or more uppercase letters.
    [a-zA-Z]Any letter.
    [^0-9A-Za-z]Any symbol (not a letter or a number).a^c

    Friday, August 13, 2010

    Orthogonality

    Literally orthogonality means at right angles, hence independent or irrelevant to.

    In programming languages, orthogonality means design so that changes in one thing don’t effect another. The example they give a user interface and database — you should be able to swap the database without changing the interface or make changes to the interface without affecting the database.

    When this term is used in describing computer instruction sets, orgothogonal instruction set can use any register for any purpose while in non-orthogonal set (such as the Intel Pentium), each register has special properties, e. g. only CX can be used for counting string loops.

    So non-orthogonality means exceptions to the general language rules, which make it harder to learn. It means that you cannot combine language features in all possible ways. Excessive orthogonality makes it possible to say silly things in the language that complicate the compilers. 

    Here are some examples of non-orthogonality in C:

    1. C has two kinds of built-in data structures, arrays and records (structs). Records can be returned from functions, but arrays cannot.
    2. A member of a struct can have any type except void or a structure of the same type.
    3. An array element can be any data type except void or a function.
    4. Parameters are passed by value, unless they are arrays, in which case they are passed by reference.
    5. a+b usually means that they are added, unless a is a pointer the value of b may be changed before the addition takes place.

    Orthogonality is one of the most important properties that can help make even complex designs compact. In a purely orthogonal design, operations do not have side effects; each action (whether it's an API call, a macro invocation, or a language operation) changes just one thing without affecting others. There is one and only one way to change each property of whatever system you are controlling.

    Eg. computer monitor has orthogonal controls. Its brightness can be changed independently of the contrast level, and (if the monitor has one) the color balance control will be independent of both. Otherwise it would have been difficult to adjust a monitor on which the brightness knob affected the color balance: you'd have to compensate by tweaking the color balance every time after you changed the brightness.




    Orthogonality reduces test and development time, because it's easier to verify code that neither causes side effects nor depends on side effects from other code — there are fewer combinations to test. If it breaks, orthogonal code is more easily replaced without disturbance to the rest of the system. Finally, orthogonal code is easier to document and reuse.

    The concept of refactoring, which first emerged as an explicit idea from the ‘Extreme Programming’ school, is closely related to orthogonality. To refactor code is to change its structure and organization without changing its observable behavior.

    Java and orthogonality
    Java is orthogonal in various cases where c or cpp fails. Here are some cases of Java as orthogonal language.
    Egs.
    1. C has two kinds of built-in data structures, arrays and records (structs and classes also in cpp). Records can be returned from functions, but arrays cannot. But in case arrays can also be returned.
    2. We have to check whether a combination of keywords/constructs that could affect each other when used simultaneously on an identifier. For example when applying public and static to a method, they do not interfere with each other, so these two are orthogonal.

    Unix and orthogonality
    Unix is praised for its design being orthogonal.

    Eg. File is opened for write access without exclusive-locking it for write, for example; This is not the case with every operating system. Old-style (System III) signals were non-orthogonal, because signal receipt had the side-effect of resetting the signal handler to the default die-on-receipt. There are large non-orthogonal patches like the BSD sockets API and very large ones like the X windowing system's drawing libraries.

    But on the whole the Unix API is a good example: Otherwise it not only would not but could not be so widely imitated by C libraries on other operating systems. This is also a reason that the Unix API repays study even if you are not a Unix programmer; it has lessons about orthogonality to teach.

    Longest common substring revisited

    For calculating longest substring we can use following algorithms:
    1. Dynamic Algorithm
    2. Hirschberg's algorithm

    Troubleshooting DNS servers

    There may be broadly 2 problems we face when dealing with DNS server:
    • The DNS server is not responding to clients.
    • The DNS server does not resolve names correctly.
    Dealing with them 1 by 1.

    The DNS server is not responding to clients

    Cause 1: Network failure

    Solution: Check if the hardware is fully ok, i.e. adapters are properly plugged or not. Then check network connectivity by pinging other computers or routers (such as its default gateway) that are used and available on the same network as the affected DNS servers.


    Cause2: Network is o.k. but non-responsive to client's query

    Solution: If the DNS client can ping the DNS server, verify that the DNS server is started or not and is able to listen to client's request. Try using the nslookup command to test whether the server can respond to DNS clients. In ubuntu, if you can't run nslookup you need to install a package called dnsutils - which provide clients such as nslookup, host and other tools. The Berkeley Internet Name Domain (BIND) implements an Internet domain name server. his package delivers various client programs related to DNS that are derived from the BIND source tree. In windows use nslookup on command prompt.


    Cause: The DNS server has been configured to limit service to a specific list of its configured IP addresses. The IP address originally used in testing its responsiveness is not included in this list.

    Solution: If the server was previously configured to restrict the IP addresses for which it responds to queries, it is possible that the IP address being used by clients to contact it is not in the list of restricted IP addresses permitted to provide service to clients.

    Try testing the server for a response again, but specify a different IP address known to be in the restricted interfaces list for the server. If the DNS server responds for that address, add the missing server IP address to the list.


    Cause: The DNS server has been configured to disable the use of its automatically created default reverse lookup zones.

    Solution: Verify that automatically created reverse lookup zones have been created for the server or that advanced configuration changes have not been previously made to the server.

    By default, DNS servers automatically create the following three standard reverse lookup zones based on Request for Comments (RFC) recommendations:

    These zones are created with common IP addresses covered by these zones that are not useful in a reverse lookup search (0.0.0.0, 127.0.0.1, and 255.255.255.255). By being authoritative for the zones corresponding to these addresses, the DNS service avoids unnecessary recursion to root servers in order to perform reverse lookups on these types of IP addresses.

    It is possible, although unlikely, that these automatic zones are not created. This is because disabling the creation of these zones involves advanced manual configuration of the server registry by a user.


    Cause: The DNS server is configured to use a non-default service port, such as in an advanced security or firewall configuration.

    Solution: Verify that the DNS server is not using a non-standard configuration.

    This is a rare but possible cause. By default, the nslookup command sends queries to targeted DNS servers using User Datagram Protocol (UDP) port 53. If the DNS server is located on another network only reachable through an intermediate host (such as a packet-filtering router or proxy server), the DNS server might use a non-standard port to listen for and receive client requests.

    If this situation applies, determine whether any intermediate firewall or proxy server configuration is intentionally used to block traffic on well-known service ports used for DNS. If not, you might be able to add such a packet filter onto these configurations to permit traffic to standard DNS ports.

    Also, check the DNS server event log to see if Event ID 414 or other critical service-related events have occurred which might indicate why the DNS server is not responding.


    The DNS server does not resolve names correctly

    Cause: The DNS server provides incorrect data for queries it successfully answers.

    Solution: Determine the cause of the incorrect data for the DNS server.

    Some of the most likely causes include the following:

    • Resource records (RRs) were not dynamically updated in a zone.
    • An error was made when manually adding or modifying static resource records in the zone.
    • Stale resource records in the DNS server database, left from cached lookups or zone records not updated with current information or removed when they are no longer needed.

    To help prevent the most common types of problems, be sure to first review best practices for tips and suggestions on deploying and managing your DNS servers. Also, follow and use the checklists appropriate for installing and configuring DNS servers and clients based on your deployment needs.

    If you are deploying DNS for Active Directory, note new directory integration features. These features can cause some differences for DNS server defaults when the DNS database is directory-integrated, that differ from those used with traditional file-based storage.

    Many DNS server problems start with failed queries at a client, so it is often good to start there and troubleshoot the DNS client first.


    Cause: The DNS server does not resolve names for computers or services outside of your immediate network, such as those located on external networks or the Internet.

    Solution: The server has a problem based on its ability to correctly perform recursion. Recursion is used in most DNS configurations to resolve names that are not located within the configured DNS domain name used by the DNS servers and clients.

    If a DNS server fails to resolve a name for which it is not authoritative, the cause is usually a failed recursive query. Recursive queries are used frequently by DNS servers to resolve remote names delegated to other DNS zones and servers.

    For recursion to work successfully, all DNS servers used in the path of a recursive query must be able to respond to and forward correct data. If not, a recursive query can fail for any of the following reasons:
    • The recursive query times out before it can be completed.
    • A remote DNS server fails to respond.
    • A remote DNS server provides incorrect data.

    If a server fails a recursive query for a remote name, review the following possible causes to troubleshoot the problem. If you do not understand recursion or the DNS query process, review conceptual topics in Help to better understand the issues involved.



    Cause: The DNS server is not configured to use other DNS servers to assist it in resolving queries.

    Solution: Check whether the DNS server can use both forwarders and recursion.

    By default, all DNS servers are enabled to use recursion, although the option to disable its use is configurable using the DNS console to modify advanced server options. The other possibility where recursion might be disabled is if the server is configured to use forwarders and recursion has been specifically disabled for that configuration.


    Cause: Current root hints for the DNS server are not valid.

    Solution: Check whether server root hints are valid.

    If configured and used correctly, root hints always should point to DNS servers authoritative for the zone containing the domain root and top-level domains.

    By default, DNS servers are configured to use root hints appropriate to your deployment, based on the following available choices when using the DNS console to configure a server:

    1. If the DNS server is installed as the first DNS server for your network, it is configured as a root server.

    For this configuration, root hints are disabled at the server because the server is authoritative for the root zone.

    2. If the installed server is an additional DNS server for your network, you can direct the Configure DNS Server Wizard to update its root hints from an existing DNS server on the network.

    3. If you do not have other DNS servers on your network but still need to resolve Internet DNS names, you can use the default root hints file which includes a list of Internet root servers authoritative for the Internet DNS namespace.

    Cause: The DNS server does not have network connectivity to the root servers.

    Solution: Test for connectivity to the root servers.

    If root hints appear to be configured correctly, verify that the DNS server used in a failed query can ping its root servers by IP address.

    If a ping attempt to one root server fails, it might indicate that an IP address for that root server has changed. Reconfiguration of root servers, however, is uncommon.

    A more likely cause is a full loss of network connectivity or in some cases, poor network performance on the intermediate network links between the DNS server and its configured root servers. Follow basic TCP/IP network troubleshooting steps to diagnose connections and determine whether this is the problem.

    By default, the DNS service uses a recursive time-out of 15 seconds before failing a recursive query. Under normal network conditions, this time-out does not need to be changed. If performance warrants it, however, you can increase this value.

    To review additional performance related information on DNS queries, you can enable and use the DNS server debug log file, Dns.log, which can provide extensive information about some types of service-related events.


    Cause: Other problems exist with updating DNS server data, such as an issue related to zones or dynamic updates.

    Solution: Determine whether the problem is related to zones. As needed, Troubleshoot any issues in this area, such as possible failure of zone transfer.

    Cloud Computing and Virtualization

    Cloud Computing is defined as a pool of virtualized computer resources. Based on this Virtualization the Cloud Computing paradigm allows workloads to be deployed and scaled-out quickly through the rapid provisioning of virtual machines or physical machines. A Cloud Computing platform supports redundant, self-recovering, highly scalable programming models that allow workloads to recover from many inevitable hardware/software failures.

    A Cloud Computing platform is more than a collection of computer resources because it provides a mechanism to manage those resources. In a Cloud Computing platform software is migrating from the desktop into the "clouds" of the Internet, promising users anytime, anywhere access to their programs and data. The concept of cloud computing and how virtualization enables it offers so many innovative opportunities that it is not surprising that there are new announcements every day. The innovation will continue and there will be massive value created for customers over the coming years.

    Use of Virtualization in cloud computing

    Virtualization is one of the elements that makes cloud computing. Though cloud computing does not center on virtualization or any one technology.


    Cloud computing can happen without virtualization. Certain hardware, operating system and even application clusters can deliver cloud services. But these technologies can be complicated and costly, often requiring a lot of work to provide a limited set of features.

    The more likely scenario is that a private cloud computing environment is built on a virtual infrastructure. Many organizations have deployed virtualization by creating virtual servers on top of their existing networking, storage and security stacks. But with private cloud computing, you need to think about and design these technologies in conjunction with one another.

    In other words, you built previous virtual infrastructures on these stacks, but you need to build a private cloud with these stacks.


    Cloud computing is as much a methodology as it is a technology. You cannot plan any single element without considering the effect on the others. You also have to add in practices and policies that govern chargeback, monitoring, procurement and many other facets of your IT infrastructure.

    For example, the ability to rapidly provision virtual machines does no good if it still takes six weeks to order and install a host server. Furthermore, procurement will always be a problem if chargeback is not recovering costs, and that requires resource and utilization monitoring. If your storage and compute resources have different provisioning schedules, they'll have to be documented and reconciled to properly forecast demand. I could go on, but your business requirements ultimately drive everything.

    Private cloud computing does not center on virtualization or any one technology. It uses a set of technologies that have been aligned to be highly flexible and provide a wide range of services. This approach does not require virtualization, but virtualization does lend well to the core concepts of cloud computing.

    Virtualization and cloud computing are also so closely connected because the major hypervisor vendors -- VMware, Microsoft and Citrix Systems -- are putting a lot of emphasis on the cloud. They have closely aligned their products with tools and complementary technologies that promote the adoption of private cloud computing.

    Cloud computing is a rapidly evolving discipline, and one that will reshape org charts as fast as it will change data center layouts. It closely aligns with virtualization, but it takes many technologies to be successful.

    Thursday, August 12, 2010

    Virtualization

    Virtualization is the term which may be used in various cases like hardware, memory management, storage, desktop, data and network. In these cases it has different functions and meanings.
    Here are some cases of virtualization:
    Hardware virtualization - Execution of software in an environment separated from the underlying hardware resources

    Memory virtualization -  Aggregating RAM resources from networked systems into a single memory pool

    There are other cases as well - network virtualization, storage virtualization and many more.

    Definition
    Virtualization is a framework or methodology of dividing the resources of a computer into multiple execution environments, by applying one or more concepts or technologies such as hardware and software partitioning, time-sharing, partial or complete machine simulation, emulation, quality of service, and many others.

    Note that this definition is rather loose, and includes concepts such as quality of service, which, even though being a separate field of study, is often used alongside virtualization. Often, such technologies come together in intricate ways to form interesting systems, one of whose properties is virtualization. In other words, the concept of virtualization is related to, or more appropriately synergistic with various paradigms. Consider the multi-programming paradigm: applications on modern systems run within a virtual machine model of some kind.

    Generally speaking virtualization abstracts out things.

    Benefits of virtualization
    1. Adaptability is becoming an increasing focus for the management of modern business. With new opportunities and threats always lurking on the horizon, businesses must be able to quickly, efficiently, and effectively react to their dynamic environment. With regard to IT infrastructure, virtualization is perhaps the most effective tool for facilitating this adaptability. In virtualized systems, the expansion and reduction of technical resources can be performed seamlessly. Because physical devices and applications are logically represented in a virtual environment, administrators can manipulate them with more flexibility and reduced detrimental effects than in the physical environment. Through the use of virtualization tools (which vary from 1 case to another), server workloads can be dynamically provisioned to servers, storage device usage can be manipulated, and should a problem occur, administrators can easily perform a rollback to working configurations. Generally, the addition (or removal) of hardware can be easily managed with virtualization tools.
      Increased demand for data or database capabilities can be easily met with data and database virtualization through the management of new DBMSs or physical infrastructure with virtualization tools. So this is storage virtualization. All of these examples illustrate the adaptable nature of the virtual enterprise.

    2. In addition to adaptability, there are lower operating costs through implementing virtualization within his or her infrastructure. There is much inherent efficiency that comes with implementing this type of system, because much of its focus is on optimizing the use of resources, thus reducing overhead for maintenance. Any element of current infrastructure can be leveraged more fully with virtualization. Switching costs for new operating systems or applications are lowered with the ability to more flexibly install and implement them. The consolidation of servers and storage space obviously increases the return on investment for this hardware by maximizing efficiency.

    3. Lowering costs will allow organization to reallocate the IT budget towards initiatives that are not related to the maintenance of current systems, such as research and development, partnerships, and the alignment of IT with business strategy.

    4. IT managers will be able to increase the productivity of employees across the entire organization through a properly implemented virtualization system. For businesses that rely on in-house application development, an increase in productivity and increased ease of implementation can be seen. Developers within a platform-virtualized environment can program in languages they are most proficient with. Debugging and testing applications becomes second nature with the ability to create contained virtual environments. In this instance, application and systems testing can be performed on a single workstation that employs a variety of virtual machines without the need to transfer and debug code to external computers. Enterprise-wide testing can be performed in isolated virtual machines, which do not interact with or compromise the resources actually being used on the network. Users in a virtualized environment do not know or care how their use of IT resources is being optimized. They are able to access needed information and perform work effectively and simply, without regard to the complexities that exist behind the scenes.

    Risks
    With any benefit, there is always associated risk. This is also the case for the practice of virtualization.
    Following are the risk involved with virtualization:
    1. The first problem that IT managers must be aware of occurs in the planning and implementation of virtualization. The organization must decide if, in fact, virtualization is right for their organization. The short-term costs of an ambitious virtualization project can be expensive, with the need for new infrastructure and configuration of current hardware. In businesses where cost reduction and flexibility of IT are not currently in alignment with the businesses strategy, other initiatives will be better suited. That is not to say that virtualization is not right for every environment, because most any organization can reap the benefits of a properly planned virtualization initiative. It is the timing and scope of such initiatives that must be scrutinized.
    2. Another risk associated with virtualization can occur in businesses that do not have an efficient element of redundancy in their systems. Because the convergence of resources often takes place in virtualization environments, especially in that of server virtualization, the physical failure of one piece of hardware will impact all virtual elements that it manages. It is therefore necessary to ensure that backup systems are in place to deal with such problems. Fortunately, because of the isolation inherent in virtualized systems, backup processes can be greatly simplified.
    3. A final problem that can occur in virtualized systems is increased overhead. The software layers inserted in between resources can chew up processor cycles, sometimes up to double-digit percentages.
    Conclusion
    With the positives far outweighing the negatives, virtualization is a technology that will soon be a universal practice. In the coming time, virtualization will become just a standard layer of the infrastructure stack. As costs for virtualization technology begin to decline, and more hardware manufacturers such as Intel and AMD begin to include built-in virtualization functionality in their products, it will become increasingly difficult to justify not using virtualization in an IT system. The unmatched effectiveness of virtualization to provide adaptability and reduce costs for the enterprise will empower IT managers and position their organizations for growth. Because of the inevitable induction of virtualization technology into the standard architecture stack, organizations from all types of businesses should begin sketching the path to their future in virtualization.