Tuesday, January 16, 2007

LINQ: the January CTP is out.

The January CTP for the next version of Visual Studio, code-named "Orcas" is now available. By CTP, Microsoft implies a Community Technology Preview - it is essentially a lightweight beta ship vehicle for a future product. By shipping this preview, Microsoft is trying to solicit feedback on the feature set ("I really like having that new dial to adjust my frimble."), on the realization of those features ("I like having that lever, but when I flip it this other thingy comes loose."), and on the prioritization of features ("The new dial makes my life easier, but I don't find much use for that new lever you've added.") The main highlights of this CTP are the new ADO.NET stuff (including the Entity Data Model) and the latest LINQ bits.

You might ask, just what is this LINQ thing? LINQ, the name, derives from "Language Integrated Query." LINQ is a codename for a set of extensions to the .NET Framework that encompass language-integrated query, set, and transform operations. It extends C# and Visual Basic with native language syntax for queries and provides class libraries to take advantage of these capabilities.

I took that description right from the main LINQ page on MSDN. Let me decompose that a little, because it's sort of dense. It makes sense if you already know what LINQ is, but if you come into it without any prior knowledge, I'm not so sure it clarifies anything.

First, LINQ is a set of extensions to a couple .NET languages, as well as to the base framework (or base class library). It's important to realize that LINQ is not just a new class library. LINQ also implies language syntax changes. You might be a skeptic and say, languages schmanguages - they're all Turing-complete, right? But no one who looks at Perl, T-SQL, and C# could avoid the conclusion that different languages have different balance points - they are optimized for different things. What LINQ is doing is extending the existing language syntax that we know and love in C# and VB today, to add some new stuff.

New stuff to do what? Ahh, to do query, set and transform operations. A query, like "Give me all the customers who ordered $10000 worth of product last month." Today, querying data is done differently, depending on where you get your data. If your data is in a relational database, then you use the DBMS query engine to do the queries. You write some SQL code as a string and send it to the database and it gives you rows of data back. (if you are an EJB programmer, then you write an EJBQL string, and a findByXxx method and a bunch of other stuff. ) On the other hand if you have your data in an XML document, then you can perform XPath or XQuery on it. Finally if you have your data in a plain old in-memory collection of objects, you may have to manually iterate through the collection to extract the data items that match your constraints. foreach anyone?

What LINQ attempts to do is standardize the way these queries are performed, regardless of the datasource, and secondly, to add query as a language feature, so that developers can get all the benefits (intellisense, and so on) of that.

For years different teams and projects in the software industry have grappled with the problem of object-relational mapping. The whole point was to save time in development. We all realize that much (most?) of the world's transactional data is stored in relational stores, and most new app development is done in object-oriented languages. So the challenge we've been grappling with has been, how to map the relational data to an in-memory object state, in effect making it simpler and easier for an OO programmer to fiddle with data. This effort has seen some small successes, but also some large failures. Also as more and more data moves into XML, mapping between relational stores and in-memory object representations seems to be missing half the point. We need something different!

Microsoft is taking a different tack on the question of how best to improve developer productivity around data handling, given the two constraints above - that data is relational (but more and more XML) and code is OO. With LINQ, Microsoft is saying, we think the opportunity for improvement is in the ability to query data more simply. It's not so much in the mapping - there are solutions here that get people "most of the way there". The challenge now is, once you mapped your DBMS schema to your Object schema, once you have filled a collection of objects from your database, then what? Can you filter that collection ("Give me all the customers that ordered over $10000 in the last month. Ok, now give me the ones whose orders included this particular product. Wait, now let me see the ones that ordered this other product.")

Or think a little more generally and consider an object that aggregates data from multiple stores - maybe some of them relational and some not. And you'd like to do "join" queries across all of that data. Wouldn't it be nice if that were possible without requiring the developer to speak 3 different dialects of query language - T-SQL, XPath and something else?

So that's what LINQ represents - Microsoft's integration of this query capability into the programming language itself. It is worth checking out; it is one of the things Microsoft believes will distinguish the next version of .NET from other alternatives, like Java. A big innovation. It seems that at least some people outside of Microsoft tend to agree. And with the January CTP of Visual Studio "Orcas", now you can check it out.

What about Interop, you say? Well that brings me to my point. Currently the design of LINQ allows for extensions so that one could integrate this in-language query capability with any backing store. Microsoft will do the work to integrate SQL Server, as well as XML content with the capability. We're hoping that given enough interest, we'll see a clear justification for doing integration with other relational stores. If you have strong opinions on this, please let me know.

Cheers,
-Dino

Original Post