Tuesday, January 22, 2008

LINQ to Objects Sample from a systems administrator's point-of-view

One way to test if the LINQ concepts work is to apply it to my daily task as an administrator. As documented, anytime you are working with a collection, like an array or a list, you can use LINQ. In this case, it's LINQ to Objects. In this example, I'll work with one of the most common task I do, and that is to check for running processes in servers. I'm using the System.Diagnostics.Process class to get all the active processes in my machine. If you are doing a similar code in .NET 2.0, this is how it will look like

System.Diagnostics.Process[] myProcesses;
myProcesses = System.Diagnostics.Process.GetProcesses();
foreach (System.Diagnostics.Process instance in myProcesses)
{
//check for processes whose memory usage is grater than 10MB

if (instance.WorkingSet64 > 10 * 1024 * 1024)
{
Console.WriteLine("Name={0}, Mem Usage={1} bytes", instance.ProcessName, instance.WorkingSet64);
}
Console.WriteLine("Press any key when your keyboard is attached . . . ");
}
Console.ReadLine();


This code will extract all the processes in the local machine and displays those processes with memory usage greater than 10 MB. Now you might think that this is a bit easy with a few lines of codes. What you'll realize in the long run is, what if i need to sort them out in descending order and probably add other aggregate functions like grouping. It would require that you include a sorting algorithm probably to sort out by memory usage and grouping for that matter. What about counting the number of processes? Then you would have that incremental counter that we've always used. This is where LINQ to Objects make it quite a bit easy. Let's look at the codes for the same requirement, this time using LINQ.

var procQuery = from p in System.Diagnostics.Process.GetProcesses()
//check for processes whose memory usage is grater than 10MB
where p.WorkingSet64 > 10 * 1024 * 1024
select p;

foreach (var process in procQuery)
Console.WriteLine("Name = {0}, Mem Usage = {1} bytes",process.ProcessName, process.WorkingSet64);
Console.WriteLine("Press any key when your keyboard is attached . . . ");
Console.ReadLine();


Notice that there isn't much difference on the number of lines for both code sets. Now, if we try includng sorting and grouping or all those aggregate functions, we just need to call a method if you are using LINQ instead of using a sorting algorithm or something else. An orderby clause can be added to sort, a Count() method can be used to count the number of instances instead of the all too common incremental counters, etc. Compare that without using LINQ (do I hear an "eow?").

I'll probably extend this to something similar to Task Manager with all those fancy UI stuff and other fucntion but I'll save that at a later post. The original idea I had was to do something similar to a LINQ to LDAP which happens to be a very long process (and I just wish Microsoft comes up with an API for this one as well in the long run for Active Directory-related stuff)

Monday, January 21, 2008

LINQ for Beginners

Someone asked me about what was on my MSN Messenger status and explained to him what I was doing. Apparently, I was doing LINQ and had it as my MSN status. LINQ stands for Language INtegrated Query, is a condename for a project for a set of extensions to the .NET Framework that encompasses language-integrated query, set and transform operations. It extends C# and VB with native language syntax for queries and provides class libraries to take advantage of these capabilities, available only in .NET Framework 3.5 (this simply means if you want to write LINQ queries, they have to be using the correct framework). Now, what does that mean to developers? The fact that queries are usually expressed in a specialized query language for different data sources makes it difficult for developers to learn a query language for each data source or data format that they must access. This is what LINQ is all about. It simplifies data access by providing a consistent model for working with data across various kinds of sources and formats. In LINQ, data is translated into objects, something that developers are more comfortable at working with. Understanding LINQ will give us an idea of its capabilities and its benefits (I'll save the cons from an enterprise DBA's perspective at a future post).

To understand LINQ, we need to know the basic parts of a query operation; namely, obtaining the data source, creating the query and executing the query. This is simply generic - any access to a data source will definitely have to do these steps.

class LINQBasics
{
static void Main()
{

//Obtaining the data source
string[] names = {"Charlie", "Joe", "Yia Wei" , "Bob", "Mike"};

// Create the query

// query is an IEnumerable
var query = from name in names
where name.Contains("i")

orderby name
select name;

// Execute the query

foreach (string name in query)
{
Console.Write(name);
}
}
}


Looking at the code above, the first thing that we need to do is to have a data source. In this case, it's an array of string which supports the generic IEnumerable(T) interface. This makes it available for LINQ to query. A queryable type does not require special modification to serve as a LINQ data source so long as it is already loaded in memory or else you would have to load it into memory so LINQ can query the objects. This is applicable to data sources like XML files. Next, is the query. A query specifies information to retrieve from the data source. If you are familiar with SQL, you know what this looks like - the kind which includes select, from, where and the likes. Looking at the code above, you'll notice that its not like your typical SQL statement as the from clause appeared before the select clause. There are a couple of reasons for this. One, this adheres to the programming concept of delaring the variable before using it. Also, from the point of view of Visual Studio, this makes it easy to provide the IntelliSense feature using the dot (.) notation as the variable has already been declared and that the framework has already inferred the correct type to the object, thus providing the appropriate properties and methods, making it easy for the developers to write their code. Let's look at how the code was constructed. The from clause specifies the data source, in this case, the names collection. The where clause applies the filter, in this case, the list of all elements in the collection containing the letter "i." The select clause specifies the type of the returned elements. This means that you can create an instance of the elements in your collection. An example could be creating an instance of an object with fewer attributes. The query variable, query, just stores the information required to produce the results when the query is executed maybe at a later point. Simply defining the query variable does not return any data nor takes any action. The third component of the code above is query execution. Like I said, the query variable does not contain any data but rather simply contains only the query commands. The actual execution of the query is when we iterate over the query variable. There are a couple of ways to do this. One of which is shown above. The use of a foreach statement iterates thru the query variable and execute it as well. This concept is called deferred query execution. This is very much important when dealing with data sources such as highly-transactional database systems as you minimize connecting to the database unless necessary (database connections are additional resources on the database server as well). You can opt to execute the query immediately by using aggregate functions such as Count, Max, Average and First or calling the ToList() or ToArray() methods. Another way is to bind the collection to a data-bound control in either a web or windows form control similar to how we do it in previous versions, specifying the DataSource property of the control to be the query variable and calling the DataBind() method.

One other thing to highlight is the use of the keyword var, which is a new keyword introduced in C# 3.0. What this does it it looks at the value assigned to the variable and determines and sets the appropriate one. This concept is called type inference. From the code above, the query variable, query, appears to be an array of string. So the compiler will automatically assume that it is a variable of type IEnumerable. This is helpful if you do not know the variable type during runtime. But this does not mean that any type can be assigned to the variable after the initial assignment - something like a dynamic type - since .NET is a strongly typed language platform. This simply means that an object can take on a different type and the compiler can simply handle that. Assigning a different type to an already existing one violates the concept of polymorphism in object-oriented programming. Let's say you assign the value 12 to the query variable, query. This will throw a type conversion exception as the original type of the variable is a string collection.

This is just a tip of the iceberg for LINQ. There are a lot of reqources out there for LINQ to SQL, LINQ to XML, LINQ to Objects, LINQ to Entities and LINQ to DataSets. I'll try to post more examples to make programming in LINQ a bit more appealing to developers.

This article is also posted on the MSSQLTips.com site
Google